Data Analysts often use pandas describe method to get high level summary from dataframe. Pandas describe method plays a very critical role to understand data distribution of each column.
In this post, we will mainly focus on all features related to sort pandas dataframe. Pandas is a highly used library in python for data analysis. Mainly because of its enriched set of functionalities.
Pandas series is a single dimensional numpy array with labels. Pandas series can hold data with any datatype (i.e. integer, string, float, datetime, etc.). The labels of this numpy array are called indexes which also can be of any datatype.
This post describes different ways of dropping columns of rows from pandas dataframe. While performing any data analysis task you often need to remove certain columns or entire rows which are not relevant. So let’s learn how to remove columns or rows using pandas drop function.
Pandas time series data manipulation is a must have skill for any data analyst/engineer. More than 70% of the world’s structured data is time series data. And pandas library in python provides powerful functions/APIs for time series data manipulation. So let’s learn the basics of data wrangling using pandas time series APIs.
In any data science/data analysis work, the first step is to read CSV file (with pandas library). Pandas read_csv function is popular to load any CSV file in pandas. In this post we’ll explore various options of pandas read_csv function.
combineByKey RDD transformation is very similar to combiner in Hadoop MapReduce programming. In this post, we’ll discuss spark combineByKey example in depth and try to understand the importance of this function in detail. Continue reading “Apache Spark combineByKey Example”
Transposing numpy array is extremely simple using
np.transpose function. Fundamentally, transposing numpy array only make sense when you have array of 2 or more than 2 dimensions.
In post, we’ll learn to create pandas dataframe from python lists and dictionary objects. Creating pandas dataframe is fairly simple and basic step for Data Analysis. There are also other ways to create dataframe (i.e. from csv, excel files or even from databases queries). But we’ll cover other steps in other posts.
In python, reshaping numpy array can be very critical while creating a matrix or tensor from vectors. In order to reshape numpy array of one dimension to n dimensions one can use
np.reshape() method. Let’s check out some simple examples.