Pandas – Selecting with Series and Dataframes

team-member-1

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags:

Pandas - Selecting with Series

We created a Pandas series called ‘countries’. This is the first column of a larger dataframe, with which we will be working more extensively in other ‘How-to’s. You might want to check it by typing countries.head() in the console.
We want to select only the third county, Luxembourg and put it into a variable: third_country:

  • third_country = countries[2] selects the third, not second, row. This is indexed at “2”. This results in selecting only one row.
  • This is how you select the second, third and fourth country:
    second_third_fourth_country = countries[1:4]. Since the index starts at zero, these countries are indexed at “1”, “2”, and “3” respectively.
  • This code select all countries from the first untill the seventh country:
    first_seven = countries[:7] selects the first 7 rows: row 0-6 (including rows labeled ‘6’ in the index, which is the 7th row).
  • This code select all countries except for the last 15:
    not_last_15 = countries[:-15] selects all rows excluding the last 15 rows.

Try yourself


import pandas as pd
world_countries = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD_countires.csv")
world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
countries = world_countries.Country.head(20)



# explore the data set
print (countries.head())
print ('===========')
third_country = countries[2:3]
print ('The third country is:')
print (third_country)
print ('===========')
second_third_fourth_country = countries[1:4]
print ('The second, third and fourth country are:')
print (second_third_fourth_country)
print ('===========')
first_seven = countries[:7]
print ('The first seven countries are:')
print (first_seven)
print ('===========')
not_last_15 = countries[:-15]
print ('All countries except the last 15 countries are:')
print (not_last_15)
print('since the total number of countries is 20, you will only see the first five countries.')
print ('===========')



Pandas - Selecting with Dataframes

How do we go about making selection within a DataFrame? In the DataFrame world_loans we have a table with more than one column.
Namely: “Project ID”, “Country”, “Status”, “Interest Rate” and “Amount”. Let’s do some selection.

  • world_loans[2:3] select the third, not second, row and select all excluding the fourth row.
    This results in selecting only one row.
  • world_loans[1:5] selects the second, third and the fourth row.
  • world_loans[:7] selects the first 7 rows: row 0-6 (including rows labeled ‘6’ in the index, which is the 7th row).
  • world_loans[:-15] selects all rows excluding the last 15 rows.
  • world_loans['Country'] selects all rows of the column ‘Country’.
  • world_loans['Country'].head() selects the first five rows in the column ‘Country’. Check which index labels they have.
  • world_loans['Country'][2] selects the name of the country with the row index label ‘2’.
  • world_loans['Country'][6:9] selects the name of the country with the row index label ‘6’, ‘7’, and ‘8’.

Leave a comment