Pandas – Dropping entries
Pandas - Dropping entries
Often, when dealing with large datasets, you want to remove some data, rows or columns.
This can be for different reason:
- Having a subset makes it easier to see the rest of the data.
- Some of the rows just does not make sense.
- You just want to exclude certain rows from calculations for a particular reason.
In the console below, we uploaded the IBRD loan dataset. The last five rows will be displayed using the tail()
function.
To drop only one row select it as follows:
world.drop(4)
In the console below, we first remove one row, then we remove two rows. The number of rows go down from 25 to 24 and eventually to 22. Press the ‘run’ button and see how the code works.
import pandas as pd
world = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
world = world[['Project ID','Country','Interest Rate','Amount']].head(25)
print (world.tail())
# How many records do we start with?
print ("number of rows: " + str(len(world)))
#remove only one row
world.drop(1, inplace=True)
print ("number of rows: " + str(len(world)))
#remove multiple rows
world.drop([20, 21], inplace=True)
print (world.tail())
#How many records are there now?
print ("number of rows: " + str(len(world)))
Try yourself
In the tools we upload the dataset world. Explore this.
- How many rows does this have? What is the overall shape?
- Remove the rows that show loans issued to ‘Myanmar’ and ‘Norway’.
- How many rows are there left?
import pandas as pd
world = pd.read_csv("https://raw.githubusercontent.com/naveen1973/data-analysis-and-visualization-using-python/master/IBRD11.csv")
world = world[['Project ID','Country','Amount']].head(10)