Pandas – selecting with .loc

team-member-1

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags:

Pandas - selecting with .loc

Often when have large sets of data, we do not need all the data to make calculations. To select only a subset of a dataset Pandas has some very good functions.
In this How-To we will use a shortened dataset of the WorldBank.
To recap, a DataFrame is composed of three different components, the index, columns, and the data. The data is also known as the values.
In this case the index represents the sequence of values on the far left-hand side of the DataFrame.
Each individual value of the index is called a label. Sometimes the index is referred to as the row labels.
The dataset we have been using so far did not have row labels as such. So far, the row labels were assigned by the numbers automatically by Pandas.
Now we have converted the dataset in such a way that the column ‘Project ID’ is used as labels for the row; here is the code: world_loans.set_index('Project ID').
The labels of the columns remain the same.

To select in Pandas using labels, we use the ‘.loc’ method.

  • world_loans.loc['P037451'] – shows all the data for that Project ID (P037451).
  • world_loans.loc['P037451','Country'] – gives the country corresponding to the Project ID.
  • world_loans.loc[['P037451', 'P037470']] – shows all the data for two Project id’s. Notice that they have be in double brackets.
  • world_loans.loc[['P037451', 'P037470'], 'Country'] – shows the country for both Project ID’s.
  • world_loans.loc[['P037451', 'P037470'], ['Country', 'Amount']] – shows the country and Amount for both Project ID’s. Notice again both the double brackets.


import pandas as pd
world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
world_loans = world_loans.set_index('Project ID')
world_countries = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD_countires.csv")

 


# explore the data set
print (world_countries.head())
print ("Print all details for project id 'P037451' ")
print ("world_loans.loc['P037451']")
print ("What is the country that corresponds with project ID 'P037362'")
print (world_loans.loc['P037362','Country'])
print ("Print all the details of project with the following ID: P037451, P037470")
print (world_loans.loc[['P037451', 'P037470']] )
print ("What are the names of the countries that correspond with project ID P037451 and P037470.")
print(world_loans.loc[['P037451', 'P037470'], 'Country'])
print ("Print the names and the amount of the projects with the following ID: P037451, P037470.")
print (world_loans.loc[['P037451', 'P037470'], ['Country', 'Amount']])


Try yourself


import pandas as pd
world_countries = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD_countires.csv")
world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
world_loans = world_loans.set_index('Project ID')


# explore the data set
print (world_loans.head())
# Print all details for project id 'P037383'
# What is the country that corresponds with project ID 'P037451
# Print all the details of project with the following ID: P001018, P082309")
# What are the names of the countries that correspond with project ID P125032 and P157715.
# Print the names and the loan amount of the projects with the following ID: P077680, P109333.



Leave a comment