Pandas – Series and Dataframes

team-member-1

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags:

Pandas - Series

Data Structures
‘Pandas’ is a very powerful package in Python. It introduces two data structures (series and dataframes) to Python and you will see why it is so widely used when you work with it. Before we continue, some clarification. You will see lot of pictures of pandas, the animal, on the internet when people talk about the Pandas library. Actually, Pandas stands for Panel (Pan) Data (Da).

Series
A Series is a one-dimensional object. You may think of it as a list or column in a table. It will assign a labeled index to each item in the Series. By default, each item will receive an index label from 0 to N, where N is the length of the Series minus one.
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). This is a basic example of a Pandas Series:

my_series = pd.Series([6, 'LearnDataInsight', 3.14, -1789710578, 'testing testing!'])

As you can see there is ‘pd’ in front of ‘Series’. This when we import Pandas as ‘pd’. If we imported Pandas as ‘xyz’, we would have used that in front of Series.
Printing the variable my_series will give:
print (my_series)

Let’s give the series an index ;
my_new_series = pd.Series([6, 'LearnDataInsight', 3.14, -1789710578, 'testin testing!'], index=['a', 'b', 'c', 'f', 'm'])
Let’s print out the new series with Index:
print (my_new_series)

Panda - Series - try it out



import pandas as pd
my_series = pd.Series([6, 'LearnDataInsight', 3.14, -1789710578, 'testin testing!'])
print ('my_series, without index')
print (my_series)
my_new_series = pd.Series([6, 'LearnDataInsight', 3.14, -1789710578, 'testin testing!'], index=['a', 'b', 'c', 'f', 'm'])
print('======================')
print ('my_new_series, with index')
print (my_new_series)

 

Pandas - DataFrame

A DataFrame is a tabular data structure comprised of rows and columns, akin to a spreadsheet or database table. You can also think of a DataFrame as a group of Series objects that share an index. This would be the column names.
Let’s create a dataframe, called ‘df’, without index:

import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
First import the library. We then make a list called ‘data’. The variable ‘df’. Check the output of the df by using print (df)
Let’s give the dataframe an index. BTW you will see the use to “df” a lot as you go along. This fairly standard to give a variable this name to show that the variable is a dataframe. Try to stick to it 🙂
df_new = pd.DataFrame({'name_columns':[1,2,3,4]}, index =['rank1', 'rank2', 'rank3', 'rank4'])
Now the column has a name and the dataframe has an index.

Pandas - DataFrame - try it out



import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)
print ('=======================')
df_new = pd.DataFrame({'name_columns':[1,2,3,4]}, index =['rank1', 'rank2', 'rank3', 'rank4'])
df_new
print (df_new)

test

Related Articles

Pandas – Replacing Values

Rishi Sapra
0

We have already seen that detecting missing values and filling them are important steps in the data cleaning process. Just as important is correcting certain data points by replacing them with correct values. With Pandas a user can use different techniques to replaces certain values. We will highlight a few of them in this section.… Continue reading Pandas – Replacing Values

Read More

Leave a comment