Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
Iterating over rows and columns of a DataFrame is a common task in data analysis. However, it's crucial to note that vectorized operations (which apply operations to entire columns or datasets) are much faster than iterating row-by-row or column-by-column. Still, there are cases where iterating is beneficial or necessary.
iterrows()
iterrows()
returns an iterator that yields index and row data as Series.
import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 3], 'B': ['p', 'q', 'r'] }) for index, row in df.iterrows(): print(index, row['A'], row['B'])
itertuples()
itertuples()
returns an iterator that yields a named tuple of the rows.
for row in df.itertuples(): print(row.Index, row.A, row.B)
itertuples()
is generally faster than iterrows()
and is recommended for most use cases.
You can iterate over columns using the items()
method:
for column_name, content in df.items(): print('Column Name:', column_name) print('Content:', content.tolist())
apply()
You can use the apply()
method to apply a function along the axis of a DataFrame (either rows or columns).
# Define a function to apply def example_function(x): return x * 2 # Apply function to each column df.apply(example_function)
To apply a function to each row, you can set axis=1
:
df.apply(lambda row: row['A'] + 2, axis=1)
applymap()
applymap()
is used to apply a function to each element of the DataFrame:
df.applymap(lambda x: str(x) + "_modified")
Before diving into iteration, always consider vectorized operations, which are usually faster and more concise. For example, to add 2 to each element in column 'A', instead of iterating over each row, you can simply do:
df['A'] = df['A'] + 2
apply()
can help.In summary, while iteration over rows or columns is possible and sometimes necessary in pandas, always consider if a vectorized solution is available, as it's typically faster and more efficient.
Using iterrows method for row-wise iteration in Pandas:
iterrows
returns an iterator that yields index and row data.for index, row in df.iterrows(): print(f"Index: {index}, Data: {row['Column_Name']}")
Iterating over columns and applying functions in Pandas DataFrame:
iteritems
for column-wise iteration.for column, data in df.iteritems(): print(f"Column: {column}, Data: {data.mean()}")
Looping through rows and columns with apply method in Pandas:
apply
.df['New_Column'] = df.apply(lambda row: row['Column1'] + row['Column2'], axis=1)
Vectorized operations vs. iteration in Pandas:
df['New_Column'] = df['Column1'] + df['Column2']
Row-wise and column-wise iteration using itertuples in Pandas:
itertuples
provides a more efficient way for row-wise iteration.for row in df.itertuples(): print(f"Index: {row.Index}, Data: {row.Column_Name}")
Handling missing data during DataFrame iteration in Pandas:
dropna()
or fillna()
to handle missing values during iteration.for index, row in df.dropna().iterrows(): print(f"Index: {index}, Data: {row['Column_Name']}")
Iterating over grouped data with Pandas DataFrame:
groupby
for grouping data.for group_name, group_data in df.groupby('Group_Column'): print(f"Group Name: {group_name}, Group Data: {group_data}")
Combining iteration with other Pandas operations:
for index, row in df[df['Column1'] > 5].iterrows(): print(f"Index: {index}, Data: {row['Column_Name']}")
Selective iteration based on conditions in Pandas DataFrame:
for index, row in df[df['Column1'] > 5].iterrows(): print(f"Index: {index}, Data: {row['Column_Name']}")
Using enumerate for iteration over DataFrame rows and columns:
enumerate
for getting both index and value during iteration.for idx, row in enumerate(df.itertuples()): print(f"Index: {idx}, Data: {row.Column_Name}")
Code examples for iterating over rows and columns in a Pandas DataFrame in Python: