Pandas series Basic Understanding | First step towards data analysis

Pandas series is a single dimensional numpy array with labels. Pandas series can hold data with any datatype (i.e. integer, string, float, datetime, etc.). The labels of this numpy array are called indexes which also can be of any datatype.

In this post we will discover the details about pandas series and how such multiple series forms a dataframe.

Before that lets start with numpy array. Numpy arrays are simple 1D arrays which ca hold data with any datatype.

How Pandas series is different than lists or numpy array?

Series in pandas can be visualized as a dictionary where the key of dictionary is an index and value is among the array elements. So we can retrieve value by numeric indexes (which is by default) and also by non numeric values like dates or strings (if specified as a key)

Hence we can say that,

Dictionary + NumPy Array = Pandas Series

Creating pandas series

Create a series with lists

Let’s create a simple series by taking list as an input.

In above example,

  • Pandas series has default indexing which is same as array indexes
  • Length of index is same as the length of data
  • Datatype (dtype) of series is int64 as we used list of integers

In case we want to create series with custom labels in pandas, we simply need to pass another list of labels in same sequence.

Here we can see that,

  • Pandas series has labels as indexes
  • If single value is given, it will repeat with respect to each label

Create a series using numpy array

Hence, we can also use numpy array in place of list according to above example.

Create a series using dictionary

Generally indexes and values will be correlated. Rather passing them separately we can simply club them in a dictionary and create a series.

Using indexes with pandas series

Let’s create the two series,

  • Automative Series : data of most valuable brands within the automotive sector worldwide as of 2018, by brand value (in billion U.S. dollars)
  • Expected Growth Series : Expected growth of brand value in 2019 (in billion U.S. dollars)

 

Now we can also perform a math operation on above two series and derive new series. So let’s derive a new series for potential brand value as of 2019 by adding above two series.

We can check that in above result that new pandas series sum of two series with respect to indexes.

For some of the indexes (i.e. Maruti Suzuki and Tesla) in Expected Growth Series the fore cast was not available. So in final series we have NaN value corresponding to those indexes.

I hope this post has helped you to get started with basic understanding of pandas series. Let me know your feedback and stay tuned for some new posts on data analysis using pandas.

 

Leave a Reply

Your email address will not be published. Required fields are marked *