Pandas Tutorial

Creating Objects

Viewing Data

Selection

Manipulating Data

Grouping Data

Merging, Joining and Concatenating

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Operations

Visualization

Applications and Projects

Pandas Built-in Data Visualization

Pandas, while primarily a data manipulation library, also offers built-in capabilities for basic data visualization. It integrates well with Matplotlib, so you can customize the visualizations if needed. Here's a tutorial on pandas' built-in data visualization tools:

1. Setup:

Firstly, ensure you have the required libraries installed:

pip install pandas matplotlib

2. Import necessary libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

3. Sample Data:

Let's create a sample DataFrame to visualize:

# Creating a sample DataFrame
df = pd.DataFrame({
    'A': np.random.randn(100),
    'B': np.random.randint(1, 10, 100),
    'C': np.random.choice(['X', 'Y', 'Z'], 100)
})

4. Basic Visualizations:

a. Line Plot:

By default, plot creates a line plot for numeric data:

df['A'].plot()
plt.title('Line Plot')
plt.show()

b. Histogram:

Histograms show the distribution of a dataset:

df['A'].plot(kind='hist', edgecolor='black')
plt.title('Histogram')
plt.show()

c. Bar Plot:

Bar plots can be used for categorical data:

df['C'].value_counts().plot(kind='bar')
plt.title('Bar Plot')
plt.show()

d. Box Plot:

Box plots visualize the distribution of data and can show outliers:

df[['A', 'B']].plot(kind='box')
plt.title('Box Plot')
plt.show()

e. Scatter Plot:

Scatter plots are used to see the relationship between two series:

df.plot(kind='scatter', x='A', y='B')
plt.title('Scatter Plot')
plt.show()

f. Area Plot:

Area plots are commonly used for time-series data:

df['A'].cumsum().plot(kind='area', alpha=0.5)
plt.title('Area Plot')
plt.show()

5. Customizing the Plots:

Because pandas visualizations are built on Matplotlib, you can use Matplotlib functions to customize them. For instance, you can add a title, change x and y axis labels, or adjust the figure size.

df['A'].plot(figsize=(10,5))
plt.title('Customized Plot')
plt.xlabel('Index')
plt.ylabel('Value')
plt.show()

6. Using Subplots:

For DataFrames, subplots=True can be used to plot data from different columns in separate subplots:

df.plot(subplots=True, layout=(2,1), figsize=(10, 5))
plt.tight_layout()
plt.show()

Summary:

Pandas provides an easy-to-use interface for basic visualizations, making it convenient for preliminary data exploration. For more advanced visualizations or customizations, integrating pandas with libraries like Matplotlib or Seaborn is a seamless experience.

  1. Basic charts and plots with Pandas:

    • Create basic plots such as line, bar, scatter, and histogram.
    • Example:
      import pandas as pd
      import matplotlib.pyplot as plt
      
      # Create a DataFrame
      data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
      df = pd.DataFrame(data)
      
      # Plot a line chart
      df.plot(x='A', y='B', kind='line')
      plt.show()
      
  2. Customizing Pandas plots with Matplotlib:

    • Enhance and customize plots using Matplotlib functions.
    • Example:
      import pandas as pd
      import matplotlib.pyplot as plt
      
      # Create a DataFrame
      data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}
      df = pd.DataFrame(data)
      
      # Customize a bar chart
      ax = df.plot(kind='bar', x='A', y='B')
      ax.set(title='Customized Bar Chart', xlabel='X-axis', ylabel='Y-axis')
      plt.show()
      
  3. Pandas plotting for exploratory data analysis (EDA):

    • Visualize data distributions, trends, and relationships.
    • Example:
      import pandas as pd
      
      # Load a dataset
      df = pd.read_csv('data.csv')
      
      # EDA with pair plots
      pd.plotting.scatter_matrix(df, diagonal='hist')
      
  4. Using different plot types in Pandas:

    • Explore various plot types: bar, line, scatter, box, etc.
    • Example:
      # Scatter plot
      df.plot.scatter(x='A', y='B')
      
  5. Time series visualization with Pandas:

    • Visualize time series data using line plots.
    • Example:
      import pandas as pd
      
      # Create a time series DataFrame
      df = pd.read_csv('time_series_data.csv', parse_dates=True, index_col='Date')
      
      # Plot time series data
      df.plot(kind='line')
      
  6. Advanced plotting options with Pandas:

    • Utilize advanced options like subplots, secondary y-axis, etc.
    • Example:
      # Subplots
      df.plot(subplots=True, layout=(2, 1))
      
  7. Combining Pandas plotting with Seaborn and Matplotlib:

    • Integrate Pandas with Seaborn and Matplotlib for enhanced features.
    • Example:
      import seaborn as sns
      
      # Scatter plot with Seaborn styling
      sns.scatterplot(data=df, x='A', y='B')
      
  8. Styling and theming Pandas plots:

    • Apply styles and themes to enhance plot aesthetics.
    • Example:
      # Apply a style
      plt.style.use('seaborn-darkgrid')
      
  9. Code examples for Pandas built-in data visualization in Python: