Pandas – Selecting with Series and Dataframes
Pandas - Selecting with Series
We created a Pandas series called ‘countries’. This is the first column of a larger dataframe, with which we will be working more extensively in other ‘How-to’s. You might want to check it by typing countries.head()
in the console.
We want to select only the third county, Luxembourg and put it into a variable: third_country:
third_country = countries[2]
selects the third, not second, row. This is indexed at “2”. This results in selecting only one row.- This is how you select the second, third and fourth country:
second_third_fourth_country = countries[1:4]
. Since the index starts at zero, these countries are indexed at “1”, “2”, and “3” respectively. - This code select all countries from the first untill the seventh country:
first_seven = countries[:7]
selects the first 7 rows: row 0-6 (including rows labeled ‘6’ in the index, which is the 7th row). - This code select all countries except for the last 15:
not_last_15 = countries[:-15]
selects all rows excluding the last 15 rows.
Try yourself
import pandas as pd
world_countries = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD_countires.csv")
world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
countries = world_countries.Country.head(20)
# explore the data set
print (countries.head())
print ('===========')
third_country = countries[2:3]
print ('The third country is:')
print (third_country)
print ('===========')
second_third_fourth_country = countries[1:4]
print ('The second, third and fourth country are:')
print (second_third_fourth_country)
print ('===========')
first_seven = countries[:7]
print ('The first seven countries are:')
print (first_seven)
print ('===========')
not_last_15 = countries[:-15]
print ('All countries except the last 15 countries are:')
print (not_last_15)
print('since the total number of countries is 20, you will only see the first five countries.')
print ('===========')
Pandas - Selecting with Dataframes
How do we go about making selection within a DataFrame? In the DataFrame world_loans we have a table with more than one column.
Namely: “Project ID”, “Country”, “Status”, “Interest Rate” and “Amount”. Let’s do some selection.
world_loans[2:3]
select the third, not second, row and select all excluding the fourth row.
This results in selecting only one row.world_loans[1:5]
selects the second, third and the fourth row.world_loans[:7]
selects the first 7 rows: row 0-6 (including rows labeled ‘6’ in the index, which is the 7th row).world_loans[:-15]
selects all rows excluding the last 15 rows.world_loans['Country']
selects all rows of the column ‘Country’.world_loans['Country'].head()
selects the first five rows in the column ‘Country’. Check which index labels they have.world_loans['Country'][2]
selects the name of the country with the row index label ‘2’.world_loans['Country'][6:9]
selects the name of the country with the row index label ‘6’, ‘7’, and ‘8’.