How to drop columns and rows in pandas dataframe

This post describes different ways of dropping columns of rows from pandas dataframe. While performing any data analysis task you often need to remove certain columns or entire rows which are not relevant. So let’s learn how to remove columns or rows using pandas drop function.

Here I have taken CSV file of airbnb hosts. Mainly because, airbnb data is generally well understood by most of the people. You can download this data from this link.

At the end of this post, you will learn:

  • Pandas drop columns using column name array
  • Removing all columns with NaN Values
  • Removing all rows with NaN Values
  • Pandas drop rows by index
  • Dropping rows based on index range
  • Removing top x rows from dataframe
  • Removing bottom x rows from dataframe

So Let’s get started….

Import Necessary Libraries

Create pandas dataframe from AirBnB Hosts CSV file

Here we are reading dataframe using pandas.read_csv() method.

Summary of Data

As number of columns in our dataframe are more, we are transposing the summary dataframe for reading. We will learn more about pandas describe() method in other post.

countuniquetopfreqfirstlastmeanstdmin25%50%75%max
host_name547375Maarten9NaNNaNNaNNaNNaNNaNNaNNaNNaN
host_since5473982012-02-27 00:00:0062009-10-05 00:00:002015-12-07 00:00:00NaNNaNNaNNaNNaNNaNNaN
host_location54661Antwerp, Flanders, Belgium314NaNNaNNaNNaNNaNNaNNaNNaNNaN
host_response_time5214within an hour240NaNNaNNaNNaNNaNNaNNaNNaNNaN
country5471Belgium547NaNNaNNaNNaNNaNNaNNaNNaNNaN
host_listings_count547NaNNaNNaNNaNNaN1.979893.33721111251
host_total_listings_count547NaNNaNNaNNaNNaN1.979893.33721111251
property_type54713Apartment393NaNNaNNaNNaNNaNNaNNaNNaNNaN
room_type5473Entire home/apt366NaNNaNNaNNaNNaNNaNNaNNaNNaN
accommodates547NaNNaNNaNNaNNaN3.009141.60867122416
bathrooms545NaNNaNNaNNaNNaN1.099080.42691901118
bedrooms547NaNNaNNaNNaNNaN1.204750.65893101115
beds547NaNNaNNaNNaNNaN1.795251.32672111216
bed_type5473Real Bed533NaNNaNNaNNaNNaNNaNNaNNaNNaN
amenities547515{}8NaNNaNNaNNaNNaNNaNNaNNaNNaN
square_feet33NaNNaNNaNNaNNaN498.091524.259004318612153
number_of_reviews547NaNNaNNaNNaNNaN23.197431.7685151125236

Pandas drop columns using column name array

In order to remove certain columns from dataframe, we can use pandas drop function. To remove one or more columns one should simple pass a list of columns.

Removing all columns with NaN Values

To remove all columns with NaN value we can simple use pandas dropna function. By simply specifying axis=1 the function will remove all columns which has atleast one row value is NaN.

As we can see in above output, pandas dropna function has removed 4 columns which had one or more NaN values.

Removing all rows with NaN Values

Similar to above example pandas dropna function can also remove all rows in which any of the column contain NaN value. By simply specifying axis=0 function will remove all rows which has atleast one column value is NaN.

Looking at the shape of output dataframe, it seems that it has just kept 26 rows with not null values.

Pandas drop rows by index

Firstly, let’s take few columns from the hosts dataframe and check it.

host_namehost_sincehost_locationhost_response_timehost_listings_count
0Maarten2014-04-01Antwerp, Flanders, Belgiumwithin a few hours1
1Fronk & Lieve2012-02-27Antwerpen, Vlaams Gewest, Belgiumwithin an hour7
2Elke2013-05-11Alveringem, Flanders, Belgiuma few days or more1
3Francis2012-04-17Antwerpen, Flemish Region, Belgiumwithin an hour2
4Kristof2012-10-13Antwerpen, Flanders, Belgiumwithin an hour1
5Liesbet2011-01-29Antwerp, Flemish Region, Belgiumwithin an hour1
6Chloé2014-03-15Antwerp, Flanders, Belgiumwithin a day1
7Klara2014-05-22Antwerp, Flanders, Belgiumwithin an hour1
8Maarten2012-07-24Antwerp, Flanders, Belgiumwithin a few hours1
9Kristina2014-02-17Antwerp, Flanders, Belgiumwithin an hour1

Now we can use pandas drop function to remove few rows. To remove one or more rows from a dataframe, we need to pass the array indexes for the rows which need to be removed. Also the argument axis=0 specifies that pandas drop function is being used to drop the rows.

host_namehost_sincehost_locationhost_response_timehost_listings_count
4Kristof2012-10-13Antwerpen, Flanders, Belgiumwithin an hour1
5Liesbet2011-01-29Antwerp, Flemish Region, Belgiumwithin an hour1
6Chloé2014-03-15Antwerp, Flanders, Belgiumwithin a day1
7Klara2014-05-22Antwerp, Flanders, Belgiumwithin an hour1
8Maarten2012-07-24Antwerp, Flanders, Belgiumwithin a few hours1
9Kristina2014-02-17Antwerp, Flanders, Belgiumwithin an hour1
10Fred2014-01-24Antwerp, Flanders, Belgiumwithin an hour2
11Francis2012-04-17Antwerpen, Flemish Region, Belgiumwithin an hour2
12Katrien2011-03-09Antwerpen, Flemish Region, BelgiumNaN1
13Alexandra2012-05-05Antwerp/Londonwithin a few hours1

Dropping rows based on index range

Removing top x rows from dataframe

host_namehost_sincehost_locationhost_response_timehost_listings_count
5Liesbet2011-01-29Antwerp, Flemish Region, Belgiumwithin an hour1
6Chloé2014-03-15Antwerp, Flanders, Belgiumwithin a day1
7Klara2014-05-22Antwerp, Flanders, Belgiumwithin an hour1
8Maarten2012-07-24Antwerp, Flanders, Belgiumwithin a few hours1
9Kristina2014-02-17Antwerp, Flanders, Belgiumwithin an hour1
10Fred2014-01-24Antwerp, Flanders, Belgiumwithin an hour2
11Francis2012-04-17Antwerpen, Flemish Region, Belgiumwithin an hour2
12Katrien2011-03-09Antwerpen, Flemish Region, BelgiumNaN1
13Alexandra2012-05-05Antwerp/Londonwithin a few hours1
14Koosje2013-01-08Antwerpen, Flanders, Belgiumwithin a few hours1

Removing bottom x rows from dataframe

So finally we learned to use pandas drop function to remove columns and rows.

In case you have any queries regarding this post, reach out to me. You can ask your questions in the comments section below and stay tuned…!!

 

Leave a Reply

Your email address will not be published. Required fields are marked *