Pandas Tutorial

Creating Objects

Viewing Data

Selection

Manipulating Data

Grouping Data

Merging, Joining and Concatenating

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Operations

Visualization

Applications and Projects

Box plot visualization with Pandas and Seaborn

Box plots (also called whisker plots) are great for visualizing the distribution of data and spotting outliers. They display the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum of a dataset.

Below is a step-by-step tutorial on creating box plots using Pandas and Seaborn:

Step 1: Import Necessary Libraries

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Step 2: Create or Load Your Data

For the sake of the tutorial, let's create a sample dataframe.

# Create a sample dataframe
df = pd.DataFrame({
    'Group': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'Values': [12, 15, 13, 8, 24, 15, 20, 22, 19]
})

Step 3: Box Plot with Pandas

Pandas has a built-in method for box plotting:

df.boxplot(by='Group', column=['Values'])
plt.title('Box plot grouped by Group')
plt.suptitle('')  # This line removes the default title
plt.show()

Step 4: Box Plot with Seaborn

Seaborn provides a more aesthetically pleasing box plot:

sns.boxplot(x='Group', y='Values', data=df)
plt.title('Box plot grouped by Group')
plt.show()

Additional Customizations

  • Horizontal Boxplot:
sns.boxplot(x='Values', y='Group', data=df)
plt.title('Horizontal box plot')
plt.show()
  • Add Swarmplot (display each data point):
sns.boxplot(x='Group', y='Values', data=df)
sns.swarmplot(x='Group', y='Values', data=df, color=".25")
plt.title('Box plot with Swarmplot')
plt.show()
  • Change Colors:
palette = {"A": "r", "B": "g", "C": "b"}
sns.boxplot(x='Group', y='Values', data=df, palette=palette)
plt.title('Colored Box plot by Group')
plt.show()
  • Displaying Multiple Box Plots (useful for comparing distributions across multiple variables):
# Extend the dataframe
df['Values2'] = df['Values'] + 5

sns.boxplot(data=df[['Values', 'Values2']])
plt.title('Multiple Box Plots')
plt.show()

Box plots are especially helpful when comparing distributions across groups or variables. They can help quickly spot anomalies, outliers, or patterns that might not be immediately apparent from raw data. Seaborn's customizability further aids in creating visually appealing and informative plots.

  1. Pandas and Seaborn box plot examples:

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrame
    df = pd.DataFrame({'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
                       'Value': [10, 15, 20, 25, 15, 30]})
    
    # Create a box plot using Seaborn
    sns.boxplot(x='Category', y='Value', data=df)
    plt.show()
    
  2. Box plot customization with Seaborn and Pandas:

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrame
    df = pd.DataFrame({'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
                       'Value': [10, 15, 20, 25, 15, 30]})
    
    # Create a customized box plot using Seaborn
    sns.boxplot(x='Category', y='Value', data=df, color='skyblue', width=0.5)
    plt.title('Customized Box Plot')
    plt.show()
    
  3. Visualizing distribution of data with box plots in Pandas:

    import pandas as pd
    import matplotlib.pyplot as plt
    
    # Sample DataFrame
    df = pd.DataFrame({'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
                       'Value': [10, 15, 20, 25, 15, 30]})
    
    # Visualize data distribution with box plots in Pandas
    df.boxplot(by='Category', column='Value')
    plt.title('Box Plot of Value by Category')
    plt.show()
    
  4. Using box plots for outlier detection with Seaborn:

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrame
    df = sns.load_dataset('diamonds')
    
    # Create a box plot for outlier detection
    sns.boxplot(x='cut', y='price', data=df)
    plt.title('Box Plot for Outlier Detection')
    plt.show()
    
  5. Grouped box plots in Pandas and Seaborn:

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrame
    df = pd.DataFrame({'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
                       'Value': [10, 15, 20, 25, 15, 30],
                       'Group': ['X', 'Y', 'X', 'Y', 'X', 'Y']})
    
    # Create grouped box plots using Seaborn
    sns.boxplot(x='Category', y='Value', hue='Group', data=df)
    plt.title('Grouped Box Plots')
    plt.show()
    
  6. Comparing multiple box plots with Pandas and Seaborn:

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrame
    df = pd.DataFrame({'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
                       'Value1': [10, 15, 20, 25, 15, 30],
                       'Value2': [5, 10, 15, 20, 10, 25]})
    
    # Compare multiple box plots using Seaborn
    sns.boxplot(x='Category', y='Value1', data=df, label='Value1')
    sns.boxplot(x='Category', y='Value2', data=df, label='Value2')
    plt.title('Multiple Box Plots Comparison')
    plt.legend()
    plt.show()
    
  7. Combining box plots with other Seaborn visualizations:

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrame
    df = sns.load_dataset('tips')
    
    # Combine box plots with swarm plots using Seaborn
    sns.boxplot(x='day', y='total_bill', data=df)
    sns.swarmplot(x='day', y='total_bill', data=df, color='black')
    plt.title('Box Plot with Swarm Plot')
    plt.show()
    
  8. Pandas DataFrame preparation for box plot analysis:

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrame preparation
    df = pd.DataFrame({'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
                       'Value': [10, 15, 20, 25, 15, 30],
                       'Group': ['X', 'Y', 'X', 'Y', 'X', 'Y']})
    
    # Pivot DataFrame for box plot analysis
    df_pivot = df.pivot(columns='Group', values='Value')
    
  9. Advanced box plot techniques with Seaborn and Pandas:

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrame
    df = sns.load_dataset('tips')
    
    # Advanced box plot techniques with Seaborn
    sns.boxplot(x='day', y='total_bill', hue='sex', data=df, notch=True, palette='Set2')
    plt.title('Advanced Box Plot Techniques')
    plt.show()
    
  10. Adding annotations to box plots in Seaborn:

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrame
    df = sns.load_dataset('tips')
    
    # Add annotations to box plot using Seaborn
    ax = sns.boxplot(x='day', y='total_bill', data=df)
    ax.annotate('Outlier', xy=(2, 45), xytext=(3, 50),
                arrowprops=dict(facecolor='red', shrink=0.05))
    plt.title('Box Plot with Annotation')
    plt.show()
    
  11. Styling and theming box plots in Seaborn and Pandas:

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Set Seaborn style and color palette
    sns.set(style='whitegrid', palette='pastel')
    
    # Sample DataFrame
    df = sns.load_dataset('tips')
    
    # Create a styled box plot using Seaborn
    sns.boxplot(x='day', y='total_bill', data=df)
    plt.title('Styled Box Plot')
    plt.show()
    
  12. Side-by-side box plots for multiple datasets with Pandas and Seaborn:

    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # Sample DataFrames
    df1 = pd.DataFrame({'Category': ['A', 'A', 'B', 'B'],
                        'Value': [10, 15, 20, 25]})
    df2 = pd.DataFrame({'Category': ['C', 'C', 'D', 'D'],
                        'Value': [15, 30, 10, 25]})
    
    # Concatenate DataFrames for side-by-side box plots
    df_combined = pd.concat([df1, df2])
    
    # Create side-by-side box plots using Seaborn
    sns.boxplot(x='Category', y='Value', data=df_combined)
    plt.title('Side-by-Side Box Plots')
    plt.show()