Pandas – Selecting with Series and Dataframes

team-member-1

Rishi Sapra

Technical Community leader, speaker, trainer and evangelist specialising in Power BI and Azure. Formally recognised by Microsoft as a Most Valuable Professional (MVP), Fast Track Recognised Solution Architect (FTRSA) and Microsoft Certified Trainer (MCT).

Tags:
Warning: Invalid argument supplied for foreach() in /home/customer/www/learndatainsights.com/public_html/wp-content/themes/twentytwentyone-child/single.php on line 152

Pandas - Selecting with Series

We created a Pandas series called ‘countries’. This is the first column of a larger dataframe, with which we will be working more extensively in other ‘How-to’s. You might want to check it by typing countries.head() in the console.
We want to select only the third county, Luxembourg and put it into a variable: third_country:

  • third_country = countries[2] selects the third, not second, row. This is indexed at “2”. This results in selecting only one row.
  • This is how you select the second, third and fourth country:
    second_third_fourth_country = countries[1:4]. Since the index starts at zero, these countries are indexed at “1”, “2”, and “3” respectively.
  • This code select all countries from the first untill the seventh country:
    first_seven = countries[:7] selects the first 7 rows: row 0-6 (including rows labeled ‘6’ in the index, which is the 7th row).
  • This code select all countries except for the last 15:
    not_last_15 = countries[:-15] selects all rows excluding the last 15 rows.

Try yourself


import pandas as pd
world_countries = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD_countires.csv")
world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
countries = world_countries.Country.head(20)



# explore the data set
print (countries.head())
print ('===========')
third_country = countries[2:3]
print ('The third country is:')
print (third_country)
print ('===========')
second_third_fourth_country = countries[1:4]
print ('The second, third and fourth country are:')
print (second_third_fourth_country)
print ('===========')
first_seven = countries[:7]
print ('The first seven countries are:')
print (first_seven)
print ('===========')
not_last_15 = countries[:-15]
print ('All countries except the last 15 countries are:')
print (not_last_15)
print('since the total number of countries is 20, you will only see the first five countries.')
print ('===========')



Pandas - Selecting with Dataframes

How do we go about making selection within a DataFrame? In the DataFrame world_loans we have a table with more than one column.
Namely: “Project ID”, “Country”, “Status”, “Interest Rate” and “Amount”. Let’s do some selection.

  • world_loans[2:3] select the third, not second, row and select all excluding the fourth row.
    This results in selecting only one row.
  • world_loans[1:5] selects the second, third and the fourth row.
  • world_loans[:7] selects the first 7 rows: row 0-6 (including rows labeled ‘6’ in the index, which is the 7th row).
  • world_loans[:-15] selects all rows excluding the last 15 rows.
  • world_loans['Country'] selects all rows of the column ‘Country’.
  • world_loans['Country'].head() selects the first five rows in the column ‘Country’. Check which index labels they have.
  • world_loans['Country'][2] selects the name of the country with the row index label ‘2’.
  • world_loans['Country'][6:9] selects the name of the country with the row index label ‘6’, ‘7’, and ‘8’.

Want to provide feedback on this blog post or enquire about having training for your company/team? Fill in the form below and let us know!

Leave a comment