Pandas – sorting

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags:

Pandas - sorting

Sorting by rows

Pandas has good sorting capabilities. When datasets are very large you want to sort it even by multiple columns.
To do this Pandas, there are the following ways:

sort_values(): sort by one or more columns.
sort_index(): to sort by row index.

Both of these functions have the following options

sorting the data frame in specific order (ascending or descending)
sorting in place, sorting with missing values
what to do with missing values.

We will be working with the dataset ‘world_loans’ to give some examples.

world_loans.sort_values(['Country']) – sorts the data frame alphabetically by Country ascending, which is the default.
world_loans.sort_values(['Country'], ascending=False)– sorts the data frame alphabetically by Country, descending.

In our dataset we have multiple columns. Check the columns in the dataset: world_loans.columns. Suppose we want to sort by the columns ‘Country’ (descending) and ‘Original Amount’ (ascending).

world_loans.sort_values(['Country', 'Original Amount'], ascending=[False, True])

– the dataset is now first ordered by country and within each country the original loan amount is ordered.

If we want to make a change to the dataset permanently we use ‘inplace = True’. Like so:
world_loans.sort_values(['Country'], ascending=False, inplace = True) – this changes the dataset.

Missing values
Sometimes your dataset has values that are not available. What do you want to do with that? Pandas give you the option to choose whether you want the missing values, NaN, at the beginning or in the end of the column.
na_position : [{‘first’, ‘last’}, default ‘last’]: ‘first’ puts ‘NaNs’ at the beginning while ‘last’ puts NaNs at the end.
An example
world_loans.sort_values(['Country', na_position : ‘first’])– This will sort the dataset by the ‘Country’ column and will show all the rows that do not have country name defined first.



import pandas as pd

world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD06.csv")

world_loans = world_loans[['Project ID','Country', 'Guarantor','Original Amount','Type']]

world_loans = world_loans.set_index('Project ID')

world_loans['Original Amount'].astype(int)





sort_by_country =world_loans.sort_values(['Country'])

print ('Sort the dataset by Country and display the first 10 rows.')

print (sort_by_country[['Country', 'Guarantor', 'Type']].head(10))

print('')

print('')

sort_by_country_asc= world_loans.sort_values(['Country'], ascending=False)

print ('Sort the dataset in descending order by Country and display the first 10 rows.')

print (sort_by_country_asc.head(10))

print('')

print('')

sort_by_two_column_criteria = world_loans.sort_values(['Country', 'Original Amount'], ascending=[False, True])

print ('Sort the dataset in descending order by Country and ascending order by original amount. Then, display the first 10 rows.')

print(sort_by_two_column_criteria[['Country', 'Guarantor', 'Original Amount']].head(10))

Try Yourself

Try out some code yourself.



import pandas as pd

world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD06.csv")

world_loans = world_loans[['Project ID','Status', 'Borrower','Repaid to IBRD','Type']]

world_loans = world_loans.set_index('Project ID')





#check out the new dataframe

world_loans.head()

#Sort the world_loans dataset by 'Borrower' AND 'Status'. Give this dataframe a new name called 'status'

Getting Started – Variables and Data types

Rishi Sapra

Python – Basics

Rishi Sapra

Pandas – Series and Dataframes

Rishi Sapra

Leave a comment Cancel reply

You must be logged in to post a comment.

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others

Pandas - sorting

Try Yourself

Related Articles

Getting Started – Variables and Data types

Python – Basics

Pandas – Series and Dataframes

Leave a comment Cancel reply