Rename columns in pandas dataframe is a very basic operation when it comes to Data Wrangling. In this article I am going to cover 9 different tactics for renaming columns using pandas library. Some of these could be unknown to many aspiring Data Scientists.
Following are few tactics ways which I am going to cover in the article.
- Renaming specific columns with rename method
- Using axis parameter with pandas rename method
- Rename pandas columns using set_axis method
- Assign list of columns to .columns attribute of dataframe
- Renaming all columns with a lambda function
- Adding prefix and suffix to the column
- Rename columns using regular expressions
- Rename MultiIndex columns in Pandas
- Converting MultiIndex columns to single level
Before we move ahead, let’s import required libraries.
1 2 3 | import pandas as pd import numpy as np import re |
Rename columns in pandas using a map
First prepare a map or dictionary object with mapping of old column names as key and new column names as values. Using pandas rename function pass above created map as an argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 | ## Creating dataframe col_list = ['a','b','c','d','e' ] df = pd.DataFrame(np.random.rand(10,5), columns=col_list) ## Prepare a column map col_map = {'a': 'Feature 1', 'b': 'Feature 2', 'c': 'Feature 3', 'd': 'Feature 4', 'e': 'Feature 5'} ## Renaming all columns df = df.rename(columns=col_map) ## Rename only specific columns col_map = {'Feature 5': 'Target'} df = df.rename(columns=col_map) ## Use inplace parameter to save changes in the same dataframe df.rename(columns=col_map, inplace=True) |
Rename columns using axis parameter
From pandas release 0.21 onward you can set axis
parameter to 1
or columns
to rename pandas columns. We don’t have to use columns
parameter at all.
1 2 3 4 5 6 7 8 9 10 11 12 | ## Creating dataframe col_list = ['a','b','c','d','e'] df = pd.DataFrame(np.random.rand(10,5), columns=col_list) ## Prepare column map/dictonary with old and new names col_map = {'a': 'Feature 1', 'b': 'Feature 2', 'c': 'Feature 3', 'd': 'Feature 4', 'e': 'Feature 5'} renamed_df = df.rename(col_map, axis=1) ## OR renamed_df = df.rename(col_map, axis='columns') ## Set inplace=True to update the same dataframe |
Assign list of columns to .columns attribute of dataframe
We can simply assign new column name list to df.columns
. Consider following points while using this method.
- You must pass new column list with same length as total number of columns in your dataframe
- Input list of new column names must be in same sequence as your existing column sequence
- It is convenient when you want to do some cleanup or do minor changes in your existing column names
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ## Creating dataframe col_list = ['f1','f2','f3','f4','f5'] df = pd.DataFrame(np.random.rand(10,5), columns=col_list) ## Simple assignment new_cols = ['feature 1','feature 2','feature 3','feature 4','feature 5'] df.columns = new_cols print(df.columns) ## Output: Index(['feature 1', 'feature 2', 'feature 3', 'feature 4', 'feature 5'], dtype='object') ## Using inline for loop df.columns = [c.replace(' ', '_') for c in df.columns] print(df.columns) ## Output: Index(['feature_1', 'feature_2', 'feature_3', 'feature_4', 'feature_5'], dtype='object') ## Using map finction df.columns = list(map(lambda x: x.title(), df.columns)) print(df.columns) ## Output: Index(['Feature_1', 'Feature_2', 'Feature_3', 'Feature_4', 'Feature_5'], dtype='object') |
Rename pandas columns using set_axis method
Using set_axis
method is a bit tricky for renaming columns in pandas. Same as above example, you can only use this method if you want to rename all columns. You cannot use inplace=True
to update the existing dataframe.
1 2 3 4 5 6 7 8 9 | ## Creating dataframe col_list = ['a','b','c','d','e'] df = pd.DataFrame(np.random.rand(10,5), columns=col_list) ## Preparing List of renamed columns new_col_list = ['Feature 1', 'Feature 2', 'Feature 3', 'Feature 4', 'Feature 5'] ## Rename columns (make sure to use axis='columns', inplace=False) df = df.set_axis(new_col_list, axis='columns', inplace=False) |
In following example, we are trying to modify existing columns.
1 2 3 4 5 6 7 8 9 10 11 | ## Creating dataframe col_list = ['f1','f2','f3','f4','f5'] df = pd.DataFrame(np.random.rand(10,5), columns=col_list) ## Convert column names from f* to Feature* ## i.e. f1 -> Feature_1, f2 => Feature_2 and so on ## We are using map function to map existing columns new_columns = list(map(lambda x: 'Feature_'+x[1:], df.columns)) ## new_columns = ['Feature_1', 'Feature_2', 'Feature_3', 'Feature_4', 'Feature_5'] df = df.set_axis(new_columns, axis='columns', inplace=False) |
Why one should use set_axis method to rename pandas columns?
Primarily set_axis()
method is convenient with chain modifications. Let’s understand this concept with following example.
In pandas we can chain transformations in following way.
output_df = df.transformation1().transformation2().transformation3()
Now what if someone wants rename dataframe columns after transformation2()
? We have to store result of transformation2()
in some dataframe, rename dataframe columns and than execute transformation3()
1 2 3 | temp_df = df.transformation1().transformation2() temp_df.columns = new_col_list output_df = temp_df.transformation3() |
set_axis()
method allows to continue the transformation chain as following.
1 2 3 4 | output_df = df.transformation1() .transformation2() .set_axis() .transformation3() |
Renaming all columns with a lambda function
For updating all columns with some complicated logic, we can use lambda function directly with rename
method.
1 2 3 4 5 6 7 8 9 10 11 | ## Creating dataframe df = pd.DataFrame(np.random.rand(10,5), columns= ['f1','f2','f3','f4','f5']) ##### Ordinary Method ##### # new_columns = list(map(lambda x: 'Feature_'+x[1:], df.columns)) # df.columns = new_columns ##### Using Lambda Function ##### df.rename(columns=lambda x:'Feature_'+x[1:], inplace=True) print(df.columns) ## Output: Index(['Feature_1', 'Feature_2', 'Feature_3', 'Feature_4', 'Feature_5'], dtype='object') |
Adding prefix and suffix to the column
Pandas dataframe has two separate methods for adding prefix and suffix to all columns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ## Creating dataframe df = pd.DataFrame(np.random.rand(10,5), columns= ['a','b','c','d','e']) ## Adding prefix df = df.add_prefix('prev_') print(df.columns) ## Output: Index(['prev_a', 'prev_b', 'prev_c', 'prev_d', 'prev_e'], dtype='object') ## Creating dataframe df = pd.DataFrame(np.random.rand(10,5), columns= ['a','b','c','d','e']) ## Adding suffix df = df.add_suffix('_old') print(df.columns) ## Output: Index(['a_old', 'b_old', 'c_old', 'd_old', 'e_old'], dtype='object') |
Rename columns using regular expressions
With a sample dataframe in which each column is a date string in YYYY-MM-DD
format and we want to convert all columns to string in DD-MM-YYYY
format. Above conversion can easily be done using regex in python.
1 2 3 4 5 6 7 8 9 | ## Input DataFrame with columns in YYYY-MM-DD format df = pd.DataFrame({'2020-01-01':[1,2], '2020-02-01':[3,4], '2020-03-01':[5,6]}) print(df.columns) ## Output: Index(['2020-01-01', '2020-02-01', '2020-03-01'], dtype='object') ## Output DataFrame with columns in DD-MM-YYYY format df.rename(columns=lambda x: re.sub(r'(\d+)-(\d+)-(\d+)',r'\3-\2-\1',x), inplace=True) print(df.columns) ## Output: Index(['01-01-2020', '01-02-2020', '01-03-2020'], dtype='object') |
Rename MultiIndex columns in Pandas
To rename multiindex columns in pandas level by level, set_levels
method of DataFrame.columns
can be used.
1 2 3 4 5 6 7 8 | # Preparing Multiindex DataFrame for Car Sales column_arrays=[['BMW', 'BMW', 'BMW', 'Audi', 'Audi', 'Audi'], ['X7', 'X5', 'X3', 'Q7','Q5', 'Q3']] tuples = list(zip(*arrays)) index = pd.MultiIndex.from_tuples(tuples, names=['Company', 'Model']) car_sales_df = pd.DataFrame(np.random.randint(20000, 100000, (3, 6)), index=['2019', '2018', '2017'], columns=index) car_sales_df |
Company | BMW | Audi | ||||
---|---|---|---|---|---|---|
Model | X7 | X5 | X3 | Q7 | Q5 | Q3 |
2019 | 51362 | 84444 | 33119 | 29081 | 61722 | 96810 |
2018 | 26947 | 93564 | 34155 | 51099 | 85700 | 56759 |
2017 | 70147 | 59594 | 91484 | 78085 | 67305 | 95420 |
1 2 3 4 5 6 7 8 9 | #Renaming Level 0 Column Index level_0_list = list(map(lambda x: 'Manufacturer: '+x, car_sales_df.columns.levels[0])) car_sales_df.columns.set_levels(level_0_list, level=0, inplace=True) #Renaming Level 1 Column Index level_1_list = list(map(lambda x: 'Model: '+x, car_sales_df.columns.levels[1])) car_sales_df.columns.set_levels(level_1_list, level=1, inplace=True) car_sales_df |
Company | Manufacturer: BMW | Manufacturer: Audi | ||||
---|---|---|---|---|---|---|
Model | Model: X7 | Model: X5 | Model: X3 | Model: Q7 | Model: Q5 | Model: Q3 |
2019 | 81447 | 86712 | 72330 | 63879 | 46017 | 31607 |
2018 | 36484 | 75569 | 65724 | 44438 | 60058 | 63423 |
2017 | 71488 | 83257 | 21159 | 38778 | 30962 | 78200 |
Converting MultiIndex columns to single level
1 2 3 4 5 6 7 8 9 10 | # Preparing Multiindex DataFrame for Car Sales column_arrays=[['BMW', 'BMW', 'BMW', 'Audi', 'Audi', 'Audi'], ['X7', 'X5', 'X3', 'Q7','Q5', 'Q3']] tuples = list(zip(*arrays)) index = pd.MultiIndex.from_tuples(tuples, names=['Company', 'Model']) car_sales_df = pd.DataFrame(np.random.randint(20000, 100000, (3, 6)) , index=['2019', '2018', '2017'], columns=index) # Converting with map function car_sales_df.columns = list(map(lambda x: x[0]+' '+x[1],car_sales_df.columns)) car_sales_df |
BMW X7 | BMW X5 | BMW X3 | Audi Q7 | Audi Q5 | Audi Q3 | |
---|---|---|---|---|---|---|
2019 | 75763 | 50204 | 60707 | 48030 | 97776 | 42762 |
2018 | 51684 | 84571 | 38951 | 60971 | 44349 | 64299 |
2017 | 95356 | 49022 | 89497 | 44420 | 40216 | 54111 |
So ultimately we covered all different tactics to rename column in pandas dataframe. I hope these will help you in your journey from beginner to an expert in Data Wrangling.
Stay tuned for more awesome posts..!! Happy Learning..!!!
Leave a Reply