In this post, we will mainly focus on all features related to sort pandas dataframe. Pandas is a highly used library in python for data analysis. Mainly because of its enriched set of functionalities.
At the end of this post you will learn,
- Sorting pandas dataframe based on indexes
- Ascending and Descending Sorting on a single column
- Sort pandas dataframe with multiple columns
- Sorting dataframe by placing missing values first
- Pandas dataframe sorting with multi level indexes and columns
So, let’s explore more about pandas sort functionality.
Import libraries
1 2 | import numpy as np import pandas as pd |
Create pandas dataframe from lists
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | host_names = ['Maarten','Fronk & Lieve','Elke','Francis','Kristof', 'Liesbet','Chloé','Klara','Maarten','Kristina'] listing_count = [1, 7, 1, 2, 1, 1, 1, 1, 1, 1] review_count = [25, 1, 4, 59, 3, 89, 3, 2, 7, 14] accommodates = [2, 2, 3, 2, 2, 4, 2, 5, 4, 2] index=['2014-04-01', '2012-02-27', '2013-05-11', '2012-04-17', '2012-10-13', '2011-01-29', '2014-03-15', '2014-05-22', '2012-07-24', '2014-02-17'] data = {'host_name': host_names, 'listing_count': listing_count, 'review_count': review_count,'accommodates': accommodates} host_df = pd.DataFrame(data, index=index) host_df |
accommodates | host_name | listing_count | review_count | |
---|---|---|---|---|
2014-04-01 | 2 | Maarten | 1 | 25 |
2012-02-27 | 2 | Fronk & Lieve | 7 | 1 |
2013-05-11 | 3 | Elke | 1 | 4 |
2012-04-17 | 2 | Francis | 2 | 59 |
2012-10-13 | 2 | Kristof | 1 | 3 |
2011-01-29 | 4 | Liesbet | 1 | 89 |
2014-03-15 | 2 | Chloé | 1 | 3 |
2014-05-22 | 5 | Klara | 1 | 2 |
2012-07-24 | 4 | Maarten | 1 | 7 |
2014-02-17 | 2 | Kristina | 1 | 14 |
You can apply two type of sorting on any dataframe in pandas,
- Sorting on Index (using pandas sort_index function)
- Sorting on Columns (using pandas sort_values function)
Earlier pandas library was supporting sorted function But it function is deprecated in recent releases of pandas.
Sorting pandas dataframe based on indexes
We can use pandas sort_index function to sort above dataframe based index.
1 2 | # defaule is ascending=True to sort in ascending order host_df.sort_index(ascending=False) |
accommodates | host_name | listing_count | review_count | |
---|---|---|---|---|
2014-05-22 | 5 | Klara | 1 | 2 |
2014-04-01 | 2 | Maarten | 1 | 25 |
2014-03-15 | 2 | Chloé | 1 | 3 |
2014-02-17 | 2 | Kristina | 1 | 14 |
2013-05-11 | 3 | Elke | 1 | 4 |
2012-10-13 | 2 | Kristof | 1 | 3 |
2012-07-24 | 4 | Maarten | 1 | 7 |
2012-04-17 | 2 | Francis | 2 | 59 |
2012-02-27 | 2 | Fronk & Lieve | 7 | 1 |
2011-01-29 | 4 | Liesbet | 1 | 89 |
Ascending and Descending Sorting on a single column
For sorting dataframe based on the values of a single column, we can specifying the column name as an argument in pandas sort_values() function.
1 2 3 | # Default sorting ascending sorted_df = host_df.sort_values('host_name') sorted_df.head() |
accommodates | host_name | listing_count | review_count | |
---|---|---|---|---|
2014-03-15 | 2 | Chloé | 1 | 3 |
2013-05-11 | 3 | Elke | 1 | 4 |
2012-04-17 | 2 | Francis | 2 | 59 |
2012-02-27 | 2 | Fronk & Lieve | 7 | 1 |
2014-05-22 | 5 | Klara | 1 | 2 |
For reverse sorting (sorting in descending order) we can set ascending=False in sort_values() method.
1 2 | sorted_df = host_df.sort_values('host_name', ascending=False) sorted_df.head() |
accommodates | host_name | listing_count | review_count | |
---|---|---|---|---|
2014-03-15 | 2 | Chloé | 1 | 3 |
2013-05-11 | 3 | Elke | 1 | 4 |
2012-04-17 | 2 | Francis | 2 | 59 |
2012-02-27 | 2 | Fronk & Lieve | 7 | 1 |
2014-05-22 | 5 | Klara | 1 | 2 |
Sort pandas dataframe with multiple columns
With pandas sort functionality you can also sort multiple columns along with different sorting orders. For that, we have to pass list of columns to be sorted with argument by=[]. Additionally, in the same order we can also pass a list of boolean to argument ascending=[] specifying sorting order.
- While using by and ascending together, length of both list must be same
- Specify True for ascending and False for descending order sorting
- Sorting sequence will be from columns left to right from the list
Let’s sort our dataframe by lowest accommodates and higher review count.
1 2 3 | columns = ['accommodates', 'review_count'] order = [False, True] host_df.sort_values(by=columns, ascending=order) |
accommodates | host_name | listing_count | review_count | |
---|---|---|---|---|
2014-05-22 | 5 | Klara | 1 | 2 |
2012-07-24 | 4 | Maarten | 1 | 7 |
2011-01-29 | 4 | Liesbet | 1 | 89 |
2013-05-11 | 3 | Elke | 1 | 4 |
2012-02-27 | 2 | Fronk & Lieve | 7 | 1 |
2012-10-13 | 2 | Kristof | 1 | 3 |
2014-03-15 | 2 | Chloé | 1 | 3 |
2014-02-17 | 2 | Kristina | 1 | 14 |
2014-04-01 | 2 | Maarten | 1 | 25 |
2012-04-17 | 2 | Francis | 2 | 59 |
Sorting dataframe by placing missing values first
Let’s create the same dataframe with missing values in review count.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | host_names = ['Maarten','Fronk & Lieve','Elke','Francis','Kristof', 'Liesbet','Chloé','Klara','Maarten','Kristina'] listing_count = [1, 7, 1, 2, 1, 1, 1, 1, 1, 1] review_count = [25, np.nan, 4, 59, 3, 89, 3, 2, 7, np.nan] accommodates = [2, 2, 3, 2, 2, 4, 2, 5, 4, 2] index=['2014-04-01', '2012-02-27', '2013-05-11', '2012-04-17', '2012-10-13', '2011-01-29', '2014-03-15', '2014-05-22', '2012-07-24', '2014-02-17'] data = {'host_name': host_names, 'listing_count': listing_count, 'review_count': review_count,'accommodates': accommodates} host_df = pd.DataFrame(data, index=index) host_df |
accommodates | host_name | listing_count | review_count | |
---|---|---|---|---|
2014-04-01 | 2 | Maarten | 1 | 25.0 |
2012-02-27 | 2 | Fronk & Lieve | 7 | NaN |
2013-05-11 | 3 | Elke | 1 | 4.0 |
2012-04-17 | 2 | Francis | 2 | 59.0 |
2012-10-13 | 2 | Kristof | 1 | 3.0 |
2011-01-29 | 4 | Liesbet | 1 | 89.0 |
2014-03-15 | 2 | Chloé | 1 | 3.0 |
2014-05-22 | 5 | Klara | 1 | 2.0 |
2012-07-24 | 4 | Maarten | 1 | 7.0 |
2014-02-17 | 2 | Kristina | 1 | NaN |
In any sorting order, NaN values will always be placed at last. To change that default behavior we can specify na_position=’first’ as an argument.
1 2 3 4 | # Default behavior # NaN values will be in last # In any sorting order host_df.sort_values(by=['review_count'], na_position='first') |
accommodates | host_name | listing_count | review_count | |
---|---|---|---|---|
2012-02-27 | 2 | Fronk & Lieve | 7 | NaN |
2014-02-17 | 2 | Kristina | 1 | NaN |
2014-05-22 | 5 | Klara | 1 | 2.0 |
2012-10-13 | 2 | Kristof | 1 | 3.0 |
2014-03-15 | 2 | Chloé | 1 | 3.0 |
2013-05-11 | 3 | Elke | 1 | 4.0 |
2012-07-24 | 4 | Maarten | 1 | 7.0 |
2014-04-01 | 2 | Maarten | 1 | 25.0 |
2012-04-17 | 2 | Francis | 2 | 59.0 |
2011-01-29 | 4 | Liesbet | 1 | 89.0 |
Pandas dataframe sorting with multi level indexes and columns
Let’s create dataframe with multi level indexes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | host_names = ['Maarten','Fronk & Lieve','Elke','Francis','Kristof', 'Liesbet','Chloé','Klara','Maarten','Kristina'] listing_count = [1, 7, 1, 2, 1, 1, 1, 1, 1, 1] review_count = [25, np.nan, 4, 59, 3, 89, 3, 2, 7, np.nan] accommodates = [2, 2, 3, 2, 2, 4, 2, 5, 4, 2] year=[2014, 2012, 2013, 2012, 2012, 2011, 2014, 2014, 2012, 2014] data = {'joining_year': year, 'listing_count': listing_count, 'review_count': review_count,'accommodates': accommodates, 'host_names' : host_names} host_df = pd.DataFrame(data) # Setting two level indexes host_df.set_index(host_df.host_names, inplace=True) host_df.set_index(host_df.joining_year, append=True, inplace=True) # removing index columns from dataframe del host_df['host_names'] del host_df['joining_year'] host_df |
accommodates | listing_count | review_count | ||
---|---|---|---|---|
host_names | joining_year | |||
Maarten | 2014 | 2 | 1 | 25.0 |
Fronk & Lieve | 2012 | 2 | 7 | NaN |
Elke | 2013 | 3 | 1 | 4.0 |
Francis | 2012 | 2 | 2 | 59.0 |
Kristof | 2012 | 2 | 1 | 3.0 |
Liesbet | 2011 | 4 | 1 | 89.0 |
Chloé | 2014 | 2 | 1 | 3.0 |
Klara | 2014 | 5 | 1 | 2.0 |
Maarten | 2012 | 4 | 1 | 7.0 |
Kristina | 2014 | 2 | 1 | NaN |
In above dataframe we have two level index.
- At level 0 index is host name
- At level 1 index is year of joining
Now lets sort dataframe decending host names indexe with ascending year of joining index.
1 2 3 4 | # Sorting # Level 0 (Host Name) Descending # Level 1 (Year of joining) Ascending host_df.sort_index(level=[0, 1], ascending=[False, True]) |
accommodates | listing_count | review_count | ||
---|---|---|---|---|
host_names | joining_year | |||
Maarten | 2012 | 4 | 1 | 7.0 |
2014 | 2 | 1 | 25.0 | |
Liesbet | 2011 | 4 | 1 | 89.0 |
Kristof | 2012 | 2 | 1 | 3.0 |
Kristina | 2014 | 2 | 1 | NaN |
Klara | 2014 | 5 | 1 | 2.0 |
Fronk & Lieve | 2012 | 2 | 7 | NaN |
Francis | 2012 | 2 | 2 | 59.0 |
Elke | 2013 | 3 | 1 | 4.0 |
Chloé | 2014 | 2 | 1 | 3.0 |
In above dataframe you can see that there were two hosts named ‘Maarten’ joined in years 2012 and 2014.
Hope you njoyed the post. Stay tuned for such more posts on pandas.!!!!!
Leave a Reply