Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
A DataFrame is a two-dimensional labeled data structure in pandas, similar to a spreadsheet, SQL table, or a dictionary of Series objects. Let's go through a step-by-step tutorial on using pandas DataFrames.
Install pandas:
pip install pandas
import pandas as pd
There are various ways to create a DataFrame:
data = { 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': ['p', 'q', 'r'] } df = pd.DataFrame(data) print(df)
data = [[1, 4, 'p'], [2, 5, 'q'], [3, 6, 'r']] df = pd.DataFrame(data, columns=['A', 'B', 'C']) print(df)
# Assuming data.csv contains our previous data df = pd.read_csv('data.csv') print(df)
To access a column:
print(df['A'])
Rows can be accessed using the iloc[]
and loc[]
methods:
# Access the first row using integer-location based indexing print(df.iloc[0]) # Access using label (if you have a custom index) print(df.loc[0])
df['D'] = [10, 11, 12] print(df)
df = df.drop(columns=['D']) print(df)
print(df.describe())
print(df.T)
By values:
df = df.sort_values(by='B') print(df)
filtered_df = df[df['A'] > 1] print(filtered_df)
Let's assume a DataFrame with missing values:
data = { 'A': [1, 2, None], 'B': [4, None, 6], 'C': ['p', 'q', 'r'] } df = pd.DataFrame(data)
df = df.dropna() print(df)
df = df.fillna(value=0) print(df)
Using groupby()
:
grouped = df.groupby('C') print(grouped.mean())
Given two DataFrames:
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]}) df2 = pd.DataFrame({'key': ['C', 'D', 'E', 'F'], 'value': [3, 4, 5, 6]})
merged = pd.merge(df1, df2, on='key', how='inner') print(merged)
merged = pd.merge(df1, df2, on='key', how='outer') print(merged)
Create DataFrame in Python Pandas:
pd.DataFrame()
constructor to create a DataFrame from various data structures.import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22], 'City': ['New York', 'San Francisco', 'Los Angeles']} df = pd.DataFrame(data)
Pandas DataFrame indexing and selection:
loc[]
and iloc[]
for label-based and integer-based indexing, respectively.# Label-based indexing selected_data = df.loc[1:2, ['Name', 'Age']] # Integer-based indexing selected_data = df.iloc[0:2, 0:2]
DataFrame operations in Pandas:
df['Age'] = df['Age'] + 2
Merge and concatenate DataFrames in Pandas:
pd.concat()
and pd.merge()
for concatenation and merging of DataFrames, respectively.df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']}) df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']}) result_df = pd.concat([df1, df2], ignore_index=True)
Filtering and selecting rows in Pandas DataFrame:
filtered_df = df[df['Age'] > 25]
GroupBy in Pandas DataFrame:
groupby()
function for grouping and aggregating data.grouped_data = df.groupby('City')['Age'].mean()
Reshaping and pivoting in Pandas DataFrame:
pivot()
and melt()
.pivoted_df = df.pivot(index='Name', columns='City', values='Age')
Handling missing data in Pandas DataFrame:
dropna()
, fillna()
, or interpolate()
to handle missing values.cleaned_df = df.dropna()
Visualization with Pandas DataFrame:
df.plot(kind='bar', x='Name', y='Age')