Pandas – Selecting with integers: .iloc
Pandas - Selecting with .iloc
Often when have large sets of data, we do not need all the data to make calculations. To select only a subset of a dataset Pandas has some very good functions.
In the following How-to we will use a shortened dataset of the WorldBank.
To recap, a DataFrame is composed of three different components, the index,
columns, and the data. The data is also known as the values. In this case the index represents the sequence of values on the far left-hand side of the DataFrame.
Each individual value of the index is called a label. Sometimes the index is referred to as the row labels.
In the example above, the row labels are not very interesting and are just the integers beginning from 0 up to n-1, where n is the number of rows in the table.
This means that in a dataset with 50 rows the row labels will be from 0 to 49.
We will start first by selecting using ‘iloc’. This is an indexer using integer location. We use example with our dataset, world_loans.
the convention is as follows:
[name of the DataFrame].iloc[row index/indices, column number(s)]
Input
- The input for the row/column index/indices can a be list of integers: [2,5,6]
- Slice of rows: [4:8]
- Single values: 3
Selecting rows
-
world_loans.iloc[2:3]
selects the third, not second, row with Project ID “P037451”. This is indexed at “2”. This results in selecting only one row. -
world_loans.iloc[1:4]
selects the second, third and fourth row. It is indexed at “1”, “2”, and “3”. -
world_loans.iloc[:7]
selects the first 7 rows: row 0-6 (including rows labeled ‘6’ in the index, which is the 7th row). -
world_loans.iloc[:-15]
selects all rows excluding the last 15 rows. -
world_loans.iloc[[1,6,8]]
selects all rows with the index labels 1, 6 and 8. Notice that for selecting multiple rows double brackets have to used.
Selecting columns
How about making selecting within a DataFrame? In the DataFrame world_loans we have a table with more than one column.
Namely: “Project ID”, “Country”, “Status”, “Interest Rate” and “Amount”. Let’s do some selection.
-
world_loans.iloc[:,1:2]
selects the column labeled “Country” in the dataset. -
world_loans.iloc[:,1:5]
selects all rows and the second, third, fourth and fifth column of the dataset.
Selecting rows and columns
-
world_loans.iloc[[1,4,7],[0,2,4]]
selects the row 1, 4, 7 and column labeled “Project ID”, “Status” and “Amount”. -
world_loans.iloc[:,:-2]
selects all rows and all columns excluding the last 2 columns.
import pandas as pd
world_countries = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
world_loans = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
print ("Print the first five rows")
print (world_loans.head())
print (" ")
print ("Print Luxembourg only")
print (world_loans.iloc[2:3])
print (" ")
print ("Print France untill Belgium on row 4")
print (world_loans.iloc[0:4])
print (" ")
print ("Print the first 7 rows")
print (world_loans.iloc[:7])
print (" ")
print ("Print all rows excluding the last 15 rows")
print(world_loans.iloc[:-15])
print ("Print rows with index label 1, 6 and 8")
print(world_loans.iloc[[1,6,8]])
### Columns
print (" ")
print ("selects the column indexed at '1' (Country) in the dataset")
print (world_loans.iloc[:,1:2])
print (" ")
print ("Print all rows and the second, third, fourth and fifth column")
print (world_loans.iloc[:,1:5])
print (" ")
print ("Print the row at index 1, 4, 7 and column labeled Project ID, Status and Amount")
print(world_loans.iloc[[1,4,7],[0,2,4]])
print ("Print all rows and all columns excluding the last 2 columns")
print(world_loans.iloc[:,:-2])
Try Yourself
Try out some some code yourself the suggested exercises in the IDE. Feel free to come up with your own exercises.
import pandas as pd
world_countries = pd.read_csv("hhttps://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
world_loans = pd.read_csv("hhttps://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
# explore the dataset 'world_countries'.
# Print the first 20 rows of 'world_countries'
# print the first the following rows: the 3rd, 7th 9th and 56th.
# print last 20 rows, using iloc
# print the first 12 rows
# print all rows excluding the last 70 rows"
# explore the dataset 'world_loans'. No. rows and columns.
# print the first two columns
# print the second and fourth column and the fifth and sixth row.