Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
Pandas provides a variety of methods for data selection. Let's walk through the essentials:
Let's use a simple dataframe for illustration:
import pandas as pd data = { 'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12] } df = pd.DataFrame(data)
# Select one column a_series = df['A'] # Select multiple columns subset = df[['A', 'C']]
.loc
and .iloc
.loc
is label-based indexing, and .iloc
is integer-location based indexing.
.loc
:# Select a single row by label row_1 = df.loc[1] # Select multiple rows rows_1_and_2 = df.loc[[1, 2]] # Select rows and specific columns subset = df.loc[0:2, ['A', 'C']]
.iloc
:# Select a single row by integer location row_1 = df.iloc[1] # Select multiple rows rows_1_and_2 = df.iloc[0:2] # Select rows and specific columns by index subset = df.iloc[0:2, 0:2]
Often, you want to select rows based on some conditions:
# Rows where 'A' is greater than 2 condition = df['A'] > 2 subset = df[condition] # Combining conditions condition = (df['A'] > 2) & (df['B'] < 8) subset = df[condition]
isin
MethodUseful for filtering data based on a list of values:
# Rows where 'A' is either 1 or 4 condition = df['A'].isin([1, 4]) subset = df[condition]
query
MethodYou can also use the query
method to select data:
subset = df.query('A > 2 & B < 8')
Using a combination of .loc
and .iloc
, you can select specific data points:
# Selecting the value in the first row of column 'A' value = df.loc[0, 'A'] # Using iloc value = df.iloc[0, 0]
You can also use indexing to set values:
# Set the value in the first row of 'A' to 100 df.loc[0, 'A'] = 100
at
and iat
For selecting or setting a single value:
# Using 'at' value = df.at[0, 'A'] # Using 'iat' value = df.iat[0, 0]
Pandas offers a multitude of methods for indexing and selecting data, catering to almost any data manipulation task you might encounter. Familiarizing yourself with these methods will make your data analysis journey in Python more enjoyable and efficient!
Basics of loc and iloc in Pandas:
loc
is label-based indexing, and iloc
is integer-based indexing.loc
:df.loc[:, 'Column_Name']
Selecting columns by name in Pandas DataFrame:
loc
method.df['Column_Name']
Indexing and selecting rows in Pandas based on conditions:
df[df['Column_Name'] > 5]
Slicing and subsetting data with Pandas:
loc
or iloc
for slicing.df.loc[2:5, ['Column1', 'Column2']]
Hierarchical indexing in Pandas MultiIndex DataFrames:
pd.MultiIndex
.multi_index_df = df.set_index(['Index1', 'Index2'])
Indexing and selecting data in time-series with Pandas:
pd.to_datetime
for time conversion.time_series_df = df.set_index(pd.to_datetime(df['Date_Column']))
Combining boolean indexing with Pandas selection:
df[(df['Column1'] > 5) & (df['Column2'] < 10)]
Advanced techniques for data selection with Pandas:
isin()
, query()
, and filter()
methods.df.query('Column_Name == "Value"')
Indexing and selecting data by position in Pandas:
iloc
for integer-based indexing.df.iloc[2:5, 0:3]
Using at and iat for fast scalar selection in Pandas:
at
for label-based scalar selection, and iat
for integer-based.value = df.at[2, 'Column_Name']
Handling missing data during indexing and selection:
dropna()
or fillna()
to handle missing values.df.dropna(subset=['Column_Name'])
Code examples for indexing and selecting data with Pandas in Python: