Pandas Tutorial

Creating Objects

Viewing Data

Selection

Manipulating Data

Grouping Data

Merging, Joining and Concatenating

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Operations

Visualization

Applications and Projects

Python | Pandas DataFrame

A DataFrame is a two-dimensional labeled data structure in pandas, similar to a spreadsheet, SQL table, or a dictionary of Series objects. Let's go through a step-by-step tutorial on using pandas DataFrames.

1. Setup:

Install pandas:

pip install pandas

2. Import necessary libraries:

import pandas as pd

3. Creating a DataFrame:

There are various ways to create a DataFrame:

a. From a Dictionary:

data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': ['p', 'q', 'r']
}
df = pd.DataFrame(data)
print(df)

b. From a List of Lists:

data = [[1, 4, 'p'], [2, 5, 'q'], [3, 6, 'r']]
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
print(df)

c. From a CSV File:

# Assuming data.csv contains our previous data
df = pd.read_csv('data.csv')
print(df)

4. Accessing Data:

a. Columns:

To access a column:

print(df['A'])

b. Rows:

Rows can be accessed using the iloc[] and loc[] methods:

# Access the first row using integer-location based indexing
print(df.iloc[0])

# Access using label (if you have a custom index)
print(df.loc[0])

5. Adding and Deleting Columns:

a. Adding:

df['D'] = [10, 11, 12]
print(df)

b. Deleting:

df = df.drop(columns=['D'])
print(df)

6. Basic Operations:

a. Statistics:

print(df.describe())

b. Transpose:

print(df.T)

c. Sorting:

By values:

df = df.sort_values(by='B')
print(df)

7. Filtering:

filtered_df = df[df['A'] > 1]
print(filtered_df)

8. Handling Missing Data:

Let's assume a DataFrame with missing values:

data = {
    'A': [1, 2, None],
    'B': [4, None, 6],
    'C': ['p', 'q', 'r']
}
df = pd.DataFrame(data)

a. Drop rows with NaN values:

df = df.dropna()
print(df)

b. Fill NaN values:

df = df.fillna(value=0)
print(df)

9. Grouping:

Using groupby():

grouped = df.groupby('C')
print(grouped.mean())

10. Merging/Joining DataFrames:

Given two DataFrames:

df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['C', 'D', 'E', 'F'], 'value': [3, 4, 5, 6]})

a. Inner Join:

merged = pd.merge(df1, df2, on='key', how='inner')
print(merged)

b. Outer Join:

merged = pd.merge(df1, df2, on='key', how='outer')
print(merged)
  1. Create DataFrame in Python Pandas:

    • Use the pd.DataFrame() constructor to create a DataFrame from various data structures.
    import pandas as pd
    
    data = {'Name': ['Alice', 'Bob', 'Charlie'],
            'Age': [25, 30, 22],
            'City': ['New York', 'San Francisco', 'Los Angeles']}
    
    df = pd.DataFrame(data)
    
  2. Pandas DataFrame indexing and selection:

    • Use methods like loc[] and iloc[] for label-based and integer-based indexing, respectively.
    # Label-based indexing
    selected_data = df.loc[1:2, ['Name', 'Age']]
    
    # Integer-based indexing
    selected_data = df.iloc[0:2, 0:2]
    
  3. DataFrame operations in Pandas:

    • Perform various operations on DataFrame columns, such as addition, subtraction, and element-wise operations.
    df['Age'] = df['Age'] + 2
    
  4. Merge and concatenate DataFrames in Pandas:

    • Use pd.concat() and pd.merge() for concatenation and merging of DataFrames, respectively.
    df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
    df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})
    
    result_df = pd.concat([df1, df2], ignore_index=True)
    
  5. Filtering and selecting rows in Pandas DataFrame:

    • Use boolean indexing to filter rows based on conditions.
    filtered_df = df[df['Age'] > 25]
    
  6. GroupBy in Pandas DataFrame:

    • Utilize the groupby() function for grouping and aggregating data.
    grouped_data = df.groupby('City')['Age'].mean()
    
  7. Reshaping and pivoting in Pandas DataFrame:

    • Reshape data using functions like pivot() and melt().
    pivoted_df = df.pivot(index='Name', columns='City', values='Age')
    
  8. Handling missing data in Pandas DataFrame:

    • Use methods like dropna(), fillna(), or interpolate() to handle missing values.
    cleaned_df = df.dropna()
    
  9. Visualization with Pandas DataFrame:

    • Visualize data using built-in plotting functions or by integrating with libraries like Matplotlib or Seaborn.
    df.plot(kind='bar', x='Name', y='Age')