The Pandas DataFrame has several Re-indexing/Selection/Label Manipulations methods. When applied to a DataFrame, these methods evaluate, modify the elements and return the results.
This is Part 10 of the DataFrame methods series:
- Part 1 focuses on the DataFrame methods
abs()
,all()
,any()
,clip()
,corr()
, andcorrwith()
. - Part 2 focuses on the DataFrame methods
count()
,cov()
,cummax()
,cummin()
,cumprod()
,cumsum()
. - Part 3 focuses on the DataFrame methods
describe()
,diff()
,eval()
,kurtosis()
. - Part 4 focuses on the DataFrame methods
mad()
,min()
,max()
,mean()
,median()
, andmode()
. - Part 5 focuses on the DataFrame methods
pct_change()
,quantile()
,rank()
,round()
,prod()
, andproduct()
. - Part 6 focuses on the DataFrame methods
add_prefix()
,add_suffix()
, andalign()
. - Part 7 focuses on the DataFrame methods
at_time()
,between_time()
,drop()
,drop_duplicates()
andduplicated()
. - Part 8 focuses on the DataFrame methods
equals()
,filter()
,first()
,last(), head()
, andtail()
- Part 9 focuses on the DataFrame methods
equals()
,filter()
,first()
,last()
,head()
, andtail()
- Part 10 focuses on the DataFrame methods
reset_index()
,sample()
,set_axis()
,set_index()
,take()
, andtruncate()
Getting Started
Required Starter Code
import pandas as pd import numpy as np
Before any data manipulation can occur, two new libraries will require installation.
- The
pandas
library enables access to/from a DataFrame. - The
numpy
library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions.
To install these libraries, navigate to an IDE terminal. At the command prompt ($
), execute the code below. For the terminal used in this example, the command prompt is a dollar sign ($
). Your terminal prompt may be different.
$ pip install pandas
Hit the <Enter>
key on the keyboard to start the installation process.
$ pip install numpy
Hit the <Enter>
key on the keyboard to start the installation process.
Feel free to check out the correct ways of installing those libraries here:
If the installations were successful, a message displays in the terminal indicating the same.
DataFrame reset_index()
The reset_index()
method resets the DataFrames index and reverts the DataFrame to the default (original) index.
The syntax for this method is as follows:
DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill="")
Parameter | Description |
---|---|
level |
This parameter can be an integer, string, tuple, or list-like. It removes said levels from the index. By default, this parameter removes all levels. |
drop |
Do not insert an index into a DataFrame column. This option will reset the index to the original integer index. |
col_level |
If multi-level, this parameter determines the insertion level. By default, use the first level. |
For this example, we have three (3) Classical Composers with some details about their life. They will be assigned levels based on the difficulty of their compositions.
Code – reset_index()
:
data = {'Composer': ['Chopin', 'Listz', 'Haydn'], 'Born': [1810, 1811, 1732], 'Country': ['France', 'Austria', 'Austria']} index = {'Level-1', 'Level-2', 'Level-3'} df = pd.DataFrame(data, index) print(df) df.reset_index(inplace=True, drop=True) print(df)
- Line [1] creates a dictionary of lists and saves it to
data
. - Line [2] sets index labels for the Composers and saves them to the variable
index
. - Line [3] creates a DataFrame and assigns it to
df
. - Line [4] outputs the result to the terminal.
- Line [5] resets the DataFrame index (
reset_index()
) back to the original integer index. - Line [6] outputs the result to the terminal.
Output:
df
Composer | Born | Country | |
Level-1 | Chopin | 1810 | France |
Level-3 | Listz | 1811 | Austria |
Level-2 | Haydn | 1732 | Austria |
result
Composer | Born | Country | |
0 | Chopin | 1810 | France |
1 | Listz | 1811 | Austria |
2 | Haydn | 1732 | Austria |
Another way to accomplish the above task is to use the concat()
method.
Code – concat()
:
data = {'Composer': ['Chopin', 'Listz', 'Haydn'], 'Born': [1810, 1811, 1732], 'Country': ['France', 'Austria', 'Austria']} index = {'Level-1', 'Level-2', 'Level-3'} df = pd.DataFrame(data, index) print(df) df1 = pd.concat([df], ignore_index=True) print(df)
- Line [1] creates a dictionary of lists and saves it to
data
. - Line [2] sets index labels for the Composers and saves them to the variable
index
. - Line [3] creates a DataFrame and assigns it to
df
. - Line [4] outputs the result to the terminal.
- Line [5] resets the DataFrame index (
concat()
) back to the original integer index. - Line [6] outputs the result to the terminal.
Output:
df
Composer | Born | Country | |
Level-1 | Chopin | 1810 | France |
Level-3 | Listz | 1811 | Austria |
Level-2 | Haydn | 1732 | Austria |
result
Composer | Born | Country | |
0 | Chopin | 1810 | France |
1 | Listz | 1811 | Austria |
2 | Haydn | 1732 | Austria |
DataFrame sample()
The sample()
method retrieves and returns a random sample of columns or rows (depending on the selected axis) from a DataFrame/Series.
The syntax for this method is as follows:
DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)
Parameter | Description |
---|---|
n |
N is the number of elements (items) to return from the selected axis. By default, one (1). |
frac |
The fraction of elements (items) to return from the selected axis. If frac , do not use the N parameter. If the value of frac is more than one (1) replace parameter must be True . |
replace |
If True , allow a sample of the same row more than once. If False , do not allow the same row more than once. By default, False . |
weights |
If None , weight is set to equal probability weighting. If a Series, it will align with the object on the index. If not located, ignore the index: assign the weights zero (0). If a DataFrame, accept the column name when the selected axis is zero (0). |
axis |
If zero (0) or index is selected, apply to each row. Default is 0. If one (1), apply to each column. |
ignore_index |
If True , the index will start numbering from 0 on (ex: 0, 1, 2, etc.). |
For these examples, the finxters.csv
data saves to a DataFrame to manipulate the data.
df = pd.read_csv('finxters.csv') result = df['First_Name'].sample(n=3, random_state=1) print(result)
- Line [1] reads in the comma-separated CSV file and saves it to
df
. - Line [2] retrieves three (3) random ‘
First_Names
‘ values and saves them to theresult
variable. - Line [3] outputs the result to the terminal.
Output:
27 | Victoria |
35 | Diana |
40 | Owen |
Name: First_Name, dtype: object |
In this example, the np.random.randint()
method calls and generates random integers on a selected column.
Code – Example 2:
df = pd.read_csv('finxters.csv') nums = np.random.randint(df['FID'], size=50) result = df['FID'].sample(n=3, random_state=nums) print(result)
- Line [1] reads in the comma-separated CSV file and saves it to
df
. - Line [2] generates random integers (
np.random.randint()
) from the CSV file based on the ‘FID
‘ column. - Line [3] retrieves three (3) integers from the random numbers generated on Line [2]. This output saves to the
result
variable. - Line [4] outputs the result to the terminal.
Output:
34 | 3002381 |
15 | 3002244 |
17 | 3002260 |
Name: FID, dtype: int64 |
DataFrame set_axis()
The set_axis()
method assigns index(es) to the selected axis.
The syntax for this method is as follows:
DataFrame.set_axis(labels, axis=0, inplace=False)
Parameter | Description |
---|---|
labels |
This parameter is a list or a list-like object containing index labels. |
axis |
If zero (0) or index is selected, apply to each row. Default is 0. If one (1), apply to each column. |
inplace |
If False , a copy of the original DataFrame/Series is updated. This parameter is None , by default. |
For these examples, the index saves to the selected axis.
In this example, we set the axis to the row index.
Code – Example 1:
df = pd.DataFrame({'Micah': [123, 120, 144], 'Paula': [129, 125, 90], 'Chloe': [101, 95, 124]}) print(df) result = df.set_axis(['Day-1', 'Day-2', 'Day-3'], axis="index") print(result)
- Line [1] creates a dictionary of lists and saves it to
df
. - Line [2] outputs the DataFrame (
df
) to the terminal. - Line [3] sets the new axis for the DataFrame and saves it to the
result
variable. - Line [4] outputs the result to the terminal.
Output:
df
Micah | Paula | Chloe | |
0 | 123 | 129 | 101 |
1 | 120 | 125 | 95 |
2 | 144 | 90 | 124 |
result
Micah | Paula | Chloe | |
Day-1 | 123 | 129 | 101 |
Day-2 | 120 | 125 | 95 |
Day-3 | 144 | 90 | 124 |
In this example, we set the axis to the column index.
Code – Example 2:
df = pd.DataFrame({'Micah': [123, 120, 144], 'Paula': [129, 125, 90], 'Chloe': [101, 95, 124]}) print(df) result = df.set_axis(['Micah M', 'Paula D', 'Chloe J'], axis="columns") print(result)
- Line [1] creates a dictionary of lists and saves it to
df
. - Line [2] outputs the DataFrame (
df
) to the terminal. - Line [3] sets the new axis for the DataFrame and saves it to the
result
variable. - Line [4] outputs the result to the terminal.
Output:
df
Micah | Paula | Chloe | |
0 | 123 | 129 | 101 |
1 | 120 | 125 | 95 |
2 | 144 | 90 | 124 |
result
Micah M | Paula D | Chloe J | |
0 | 123 | 129 | 101 |
1 | 120 | 125 | 95 |
2 | 144 | 90 | 124 |
DataFrame set_index()
The set_index()
method sets the DataFrame index using existing columns/rows.
The syntax for this method is as follows:
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
Parameter | Description |
---|---|
keys |
A single column or list-like array. Must be the same length as DataFrame. |
drop |
Do not insert an index into a DataFrame. |
append |
If True , append columns to index. If False , do not append. By default, True . |
inplace |
If True , the original DataFrame is updated. If False , a new object is updated and returns. |
verify_integrity |
This parameter checks the new index for duplicates (columns). Set to False for faster performance. |
For this example, the Salesperson(s) who sold the highest number of cars over four (4) months display.
df = pd.DataFrame({'Salesman': ['Greg', 'Fred', 'Helen', 'Tim'], 'Month': ['Jan', 'Feb', 'Mar', 'Apr'], 'Sold': [165, 156, 196, 124]}) result = df.set_index('Salesperson') print(result)
- Line [1] creates a Dictionary of Lists and saves it to df.
- Line [2] sets the index to ‘Salesperson’ and saves it to the result variable.
- Line [3] outputs the result to the terminal.
Output:
Month | Sold | |
Salesperson | ||
Greg | Jan | 165 |
Fred | Feb | 156 |
Helen | Mar | 196 |
Tim | Apr | 124 |
DataFrame take()
The take()
method returns the elements (data) across the selected axis. The indexing performs on the actual position of the DataFrame element.
🛑 Note: This method has been deprecated (since version 1.0.0).
The syntax for this method is as follows:
DataFrame.take(indices, axis=0, is_copy=None, **kwargs)
Parameter | Description |
---|---|
indices |
List (array) of integers that specify locations to take. |
axis |
If zero (0) or index is selected, apply to each row. Default is 0. If one (1), apply to each column. |
is_copy |
As of pandas v1.0, this parameter always returns a copy. |
**kwargs |
To be compatible with numpy.take() , the take() method does not affect the output. |
For this example, the finxters.csv
data saves to a DataFrame to manipulate the data.
df = pd.read_csv('finxters.csv') result = df.take([30, 31], axis=0) print(result)
- Line [1] reads in the comma-separated CSV file and saves it to
df
. - Line [2] takes the 30th and 31st row of the CSV file and saves it to the
result
variable. - Line [3] outputs the result to the terminal.
Output:

DataFrame truncate()
The truncate()
method truncates a DataFrame/Series before and after a selected index value.
The syntax for this method is as follows:
DataFrame.truncate(before=None, after=None, axis=None, copy=True)
Parameters:
Parameter | Description |
---|---|
before |
Truncate (remove) rows before a said index value. The data type can be a date, string, or integer. |
after |
Truncate (remove) rows after a said index value. The data type can be a date, string, or integer. |
axis |
If zero (0) or index is selected, apply to each row. Default is 0. If one (1), apply to each column. |
copy |
If True , a copy of the truncated DataFrame/Series returns. This boolean is True by default. |
For this example, we have a DataFrame containing a message.
df = pd.DataFrame({'C': ['f', 'i', 'n', 'x', 't', 'e', 'r'], 'O': ['p', 'u', 'z', 'z', 'l', 'e', 's'], 'D': ['a', 'w', 'e', 's', 'o', 'm', 'e'], 'E': ['w', 'a', 'y', '-', 't', 'o', '-'], 'R': ['l', 'e', 'r', 'n', '!', '!', '!']}, index=[1, 2, 3, 4, 5, 6, 7]) print(df) result = df.truncate(before=2, after=4) print(result)
- Line [1] creates a DataFrame from a dictionary of lists and saves it to
df
. - Line [2] outputs the result to the terminal.
- Line [3] truncates and saves the output to the
result
variable. - Line [4] outputs the result to the terminal.
Output:
df

result

Learn Pandas the Fun Way by Solving Code Puzzles
If you want to boost your Pandas skills, consider checking out my puzzle-based learning book Coffee Break Pandas (Amazon Link).
It contains 74 hand-crafted Pandas puzzles including explanations. By solving each puzzle, you’ll get a score representing your skill level in Pandas. Can you become a Pandas Grandmaster?
Coffee Break Pandas offers a fun-based approach to data science mastery—and a truly gamified learning experience.