Pandas – Concatenate

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags:

Introduction

In this ‘how-to’, we will show several ways of combining data using Pandas. If you have some experience with SQL, this should be very simple. If not, that is OK. Just try to follow the examples and try it out in the “Try it out”- section.
We prepared two datasets and show how to ‘concatenate’ them.

Dataset

We have two datasets, df1 and df2, and we want to combine them into one dataset. These are two subset of a large loans dataset that we have been using earlier. They contain the same type of data.

Check out the length of each data set with the len() and explore them further with head() method.



import pandas as pd

df1 =pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD01.csv")

df2 =pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD02.csv")





print (df1.head())

print (df2.head())

Simple concatenation of two dataframes

Now that you are familiar with both data frames let’s make them into one dataframe. We will combine both datasets row-wise.

print (len(df1)): Check how many rows the first dataset has.
print (df1.head()): Show the first five rows of the first dataset.
print (len(df2)): Check how many rows the second dataset has.
print (df2.tail()): Show the last five rows of the second dataset.
df3 = pd.concat([df1, df2]): Concatenate both datasets.
print (df3.head()): Show the first five rows of the third dataset. This should be the same as the the first five rows of the first dataset.
print (df3.tail()) Show the last five rows of the third dataset. The last five rows of the second dataset.
print (len(df3)): Show the number of rows of the third dataset.
We could have written this as.
df3 = pd.concat([df1, df2], axis=0), which have given the same result. Try it out!

As you might have noticed, the index shows some duplication. Let’ set the column “Project ID” as the index. df3.set_index('Project ID').



import pandas as pd

df1 =pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD01.csv")

df2 =pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD02.csv")

len(df1)

len(df2)





# Check how many rows the first dataset has

print (len(df1))

# Explore the first dataset

print (df1.head())

# Check how many rows the second dataset has

print (len(df2))

# Explore the first dataset

print (df2.tail())

# Concatenate both dataframes

df3 = pd.concat([df1, df2])

# check the first five rows

print (df3.head())

# check the last five rows

print (df3.tail())

# How many rows does the new dataset have

print (len(df3))

# Give the new dataset a more understandable index

df3.set_index('Project ID', inplace=True)

df3

Try it out

In this section you can try out some of the code yourself.
We have prepared two datasets.

a dataset with Project ID, the Country and the Borrower: df_country_borrower
a dataset with Project ID, Project name and interest rate: df_name_interest

After exploring them (tip: use head() only as they can be a bit large) try to do the following excercises.
Try to

Concatenate both datesets column wise.
Remove the duplicate column Project ID.



import pandas as pd

df1 =pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD06.csv")

df_country_borrower = df1[['Project ID', 'Country', 'Borrower']].head(15)

df_name_interest = df1[['Project ID', 'Project Name', 'Interest Rate']].head(15)

print (df_country_borrower ) print (df_name_interest)

Getting Started – Variables and Data types

Rishi Sapra

Python – Basics

Rishi Sapra

Pandas – Series and Dataframes

Rishi Sapra

Leave a comment Cancel reply

You must be logged in to post a comment.

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.