Pandas Read CSV file | Loading CSV with pandas read_csv

In any data science/data analysis work, the first step is to read CSV file (with pandas library). Pandas read_csv function is popular to load any CSV file in pandas. In this post we’ll explore various options of pandas read_csv function.

Pandas Read CSV
pandas read_csv

Pandas read_csv¬†function has various options which help us to take care of certain things like formatting, handling null values etc. Let’s explore those options step by step.

In my earlier post, we discussed various ways to create dataframes from Lists and Dictionaries. Assuming we have different data-sources in the form of CSV files, following are the ways to read csv files and create pandas dataframe.

Load CSV data using default parameters

Let’s say we have some sample csv files at our /data/ directory.

COUNTRYQ1Q2Q3Q4
0US10.09.57.612.0
1UK11.23.86.99.0
2India9.67.38.311.0
3Singapore9.05.66.910.0

In case we want to consider first row or headers as a row we can specify header=None option

01234
0US10.09.57.612.0
1UK11.23.86.99.0
2India9.67.38.311.0
3Singapore9.05.66.910.0

Specify column names while loading CSV files

In case there are no headers available in CSV file, we can specify list of column names to names parameter.

With above option, pandas read_csv method will consider first row in csv as a row rather than column.

COUNTRYQ1Q2Q3Q4
0US10.09.57.612.0
1UK11.23.86.99.0
2India9.67.38.311.0
3Singapore9.05.66.910.0

Setting a column as an index with pandas read_csv

Q1Q2Q3Q4
COUNTRY
US10.09.57.612.0
UK11.23.86.99.0
India9.67.38.311.0
Singapore9.05.66.910.0

Using different separator in while loading CSV file

Many times we end-up in getting CSV files with some other separator like ‘;’ or ‘|’ or any other special character. In pandas, read_csv method provides an option delimiter to specify a separator. Let’s say we have co2 emission data separated by ‘;’. We can load our CSV file following way.

IndustryShare
0Agriculture & mining10.5
1Manufacturing and construction17.0
2Energy and water supply and waste treatment25.9
3Households22.2
4Services12.4
5Transport sector12.0

Parsing dates while loading CSV file in pandas

Sometimes we get date fields in csv file. When we load CSV files with default options, dtype of date columns remains to be Object. In case we want our date columns to be parsed as date we can use parse_date option of read_csv method.

Let’s understand this using one TSV file jira-issues.tsv

ISSUE object
PRIORITY object
STATUS object
OPEN_DATE object
CLOSE_DATE object
dtype: object

We can see in above result dtype of OPEN_DATE and CLOSE_DATE columns is Object.

Now let’s try to load it with parse_dates option.

ISSUE object
PRIORITY object
STATUS object
OPEN_DATE datetime64[ns]
CLOSE_DATE datetime64[ns]
dtype: object

ISSUEPRIORITYSTATUSOPEN_DATECLOSE_DATE
0PD-1023HighOpen2018-01-03NaT
1PD-1162HighClosed2018-02-052018-02-07
2PD-1231MediumClosed2018-02-272018-03-02
3PD-1345LowOpen2018-03-12NaT

Read empty values as blank string instead of NaN

If our csv data source has null values in it and we want to replace them with blank while loading CSV file, we can use keep_default_na option for the same. Let’s have a look at our previous dataframe.

ISSUEPRIORITYSTATUSOPEN_DATECLOSE_DATE
0PD-1023HighOpen2018-01-03
1PD-1162HighClosed2018-02-052018-02-07
2PD-1231MediumClosed2018-02-272018-03-02
3PD-1345LowOpen2018-03-12

Skip rows while reading csv file

Let’s say we need to skip first two rows while reading csv file in pandas. This is entirely possible with skiprows option.

Manufacturing and construction17.0
0Energy and water supply and waste treatment25.9
1Households22.2
2Services12.4
3Transport sector12.0

That’s it for now. I hope you enjoyed this post. I’ll cover some more topics regards pandas dataframes in my upcoming posts.

Till that time Stay Tuned….!!!!!

 

Leave a Reply

Your email address will not be published. Required fields are marked *