Pandas Time Series Data Manipulation

Pandas time series data manipulation is a must have skill for any data analyst/engineer. More than 70% of the world’s structured data is time series data. And pandas library in python provides powerful functions/APIs for time series data manipulation. So let’s learn the basics of data wrangling using pandas time series APIs.

For our time series data analysis task, we have taken temperature data of “Mountain View, CA” with datetime from 2018-10-25 to 2018-10-31. You can download it from this link.

At the end of this post, you will learn:

  • Creating pandas time series dataframe
  • Load time series CSV file
  • Get summary statistics
  • Data Selection from pandas time series dataframe
  • Data Slicing with pandas time series dataframe
  • Compute Aggregation on pandas time series dataframe
  • Plot your time series data

Creating pandas time series dataframe

We can manually create time series using pandas Series. Let’s try to create random hourly data points from 2018-10-25 to 2018-10-31.

temperature
2018-11-01 00:00:0015
2018-11-01 01:00:0010
2018-11-01 02:00:0017
2018-11-01 03:00:0016
2018-11-01 04:00:0013

Load time series CSV file

Another way is to read time series CSV data using Series.from_csv method which is deprecated method in python 3.6.

We can also read dataframe using pandas.read_csv and replace index with the datetime column.

temperature
datetime
2018-10-25 03:40:0015.0
2018-10-25 04:04:0015.0
2018-10-25 04:40:0015.0
2018-10-25 05:40:0014.0
2018-10-25 06:04:0014.0

Calculate summary statistics

We can get summary statistics using describe() methods on DataFrame as well as on Series.

Data Selection from pandas time series dataframe

Let’s say we have weather dataframe with datetime index named ‘datetime’, we can simply select data of any year, month and day by specifying it as an index.

temperature
datetime
2018-10-31 00:40:0012.8
2018-10-31 01:00:0012.2
2018-10-31 01:40:0012.2
2018-10-31 02:00:0012.0
2018-10-31 02:40:0012.0

Data slicing with time series dataframe

Get temperature data from 30 Oct 2018 4PM to 6 PM

temperature
datetime
2018-10-30 16:47:0025.0
2018-10-30 16:56:0025.6
2018-10-30 17:59:0023.3

Get temperature data from 28 Oct 2018 to 28 Oct 2018

temperature
datetime
2018-10-28 00:40:0017.0
2018-10-28 01:08:0017.0
2018-10-28 01:40:0017.0
2018-10-28 02:00:0017.0
2018-10-28 02:40:0017.0

Aggregation on time series dataframe

From your time series dataframe you can always change the granularity to higher level and aggregate. Here resample our existing dataframe to daily level min, max and average temperature values.

datetimeminmaxavg
02018-10-2514.027.018.411765
12018-10-2613.028.018.216216
22018-10-2713.028.018.550000
32018-10-2816.024.018.767442
42018-10-2913.023.316.819048
52018-10-3011.725.616.397619
62018-10-3110.626.717.402500

Plot your time series data

While data exploration, plotting time series data is a critical part to check the trend and seasonality.  Following is an example of plotting the two days temperature data.

pandas time series plot
pandas time series plot

In this post, we learned to play with time series data using pandas library.

In case you have any queries regarding time series data analysis in Python, reach out to me. You can ask your questions in the comments section below and stay tuned…!!

Leave a Reply

Your email address will not be published. Required fields are marked *