Pandas – Filter with one criteria

team-member-1

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags:

Filter with one criteria

In this how-to we will discuss the filter capabilities in Pandas. Filtering in Pandas can be performed in a number of ways depending on several factors such as the size of the data, the granularity of the data the number of criteria specified and many more.

Let’s check out the dataset first: world_loans.head(). These are loans issued by the IBRD.

  • Dominica_loans = world_loans[world_loans['Country'] == 'Dominica'] -Display all loans issued to Dominica. The output should be in the format of an overview. First we creat a new variable and then set the criteria.
    This can then be used to get the total loan amount.
    Dominica_loans['Original Amount'].sum()
  • not_repaid = world_loans[world_loans['Repaid to IBRD'] > 0] – Which project loans have not been Repaid back to IBRD.
  • letter_B = world_loans[world_loans['Country'].str.startswith('B')] – Filter all the countries whose name start with the letter ‘B’.
  • date = world_loans[world_loans['First Repayment Date']> '1952-12-31']– Show all the loans with a first repayment data after the 31 December 1952.


import pandas as pd
world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD06.csv")
world_loans = world_loans.set_index('Project ID')
world_loans['Original Amount'].astype(int)
world_loans['First Repayment Date'] = pd.to_datetime(world_loans['First Repayment Date'])


# What are the loans that the country of Dominica has oustanding?
Dominica_loans = world_loans[world_loans['Country'] == 'Dominica']
print (Dominica_loans[['Country', 'Original Amount']])
print ('Total loans add up to USD' + str(Dominica_loans['Original Amount'].sum()))
# Which countries have not repaid their loans to IBRD?
not_repaid = world_loans[world_loans['Repaid to IBRD'] == 0]
print (not_repaid[['Country', 'Original Amount']])
print('Total number of unpaid loans:')
print (len(not_repaid[['Country', 'Original Amount']]))
# Filter all countries whose name start with the letter 'B'.
letter_B = world_loans[world_loans['Country'].str.startswith('B')]
print('Countries whose name start with letter_B')
print(letter_B[['Country', 'Original Amount']])
# Show all loans with a repayment date after the 31st of December 2015
date = world_loans[world_loans['First Repayment Date']> '2005-12-31']
print('Loans with first repayment date after 31st December 2015')
print(date[['Country', 'Original Amount','First Repayment Date']])


Try yourself

Try out some code yourself.


import pandas as pd
world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD06.csv")
world_loans = world_loans.set_index('Project ID')
world_loans['Original Amount'].astype(int)
world_loans['First Repayment Date'] = pd.to_datetime(world_loans['First Repayment Date'])


# What are the loans that the country of Belgium has oustanding?
Belgium_loans =
# Which countries have a loan higher than USD10 million to IBRD?
10_million =
# Filter all countries whose name start with the letter 'M'.
letter_M
# Show all loans with the first repayment date before the 31st of December 1990
repayment_1990 =



Leave a comment