Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
Pandas, while primarily a data manipulation library, also offers built-in capabilities for basic data visualization. It integrates well with Matplotlib, so you can customize the visualizations if needed. Here's a tutorial on pandas' built-in data visualization tools:
Firstly, ensure you have the required libraries installed:
pip install pandas matplotlib
import pandas as pd import numpy as np import matplotlib.pyplot as plt
Let's create a sample DataFrame to visualize:
# Creating a sample DataFrame df = pd.DataFrame({ 'A': np.random.randn(100), 'B': np.random.randint(1, 10, 100), 'C': np.random.choice(['X', 'Y', 'Z'], 100) })
By default, plot
creates a line plot for numeric data:
df['A'].plot() plt.title('Line Plot') plt.show()
Histograms show the distribution of a dataset:
df['A'].plot(kind='hist', edgecolor='black') plt.title('Histogram') plt.show()
Bar plots can be used for categorical data:
df['C'].value_counts().plot(kind='bar') plt.title('Bar Plot') plt.show()
Box plots visualize the distribution of data and can show outliers:
df[['A', 'B']].plot(kind='box') plt.title('Box Plot') plt.show()
Scatter plots are used to see the relationship between two series:
df.plot(kind='scatter', x='A', y='B') plt.title('Scatter Plot') plt.show()
Area plots are commonly used for time-series data:
df['A'].cumsum().plot(kind='area', alpha=0.5) plt.title('Area Plot') plt.show()
Because pandas visualizations are built on Matplotlib, you can use Matplotlib functions to customize them. For instance, you can add a title, change x and y axis labels, or adjust the figure size.
df['A'].plot(figsize=(10,5)) plt.title('Customized Plot') plt.xlabel('Index') plt.ylabel('Value') plt.show()
For DataFrames, subplots=True
can be used to plot data from different columns in separate subplots:
df.plot(subplots=True, layout=(2,1), figsize=(10, 5)) plt.tight_layout() plt.show()
Pandas provides an easy-to-use interface for basic visualizations, making it convenient for preliminary data exploration. For more advanced visualizations or customizations, integrating pandas with libraries like Matplotlib or Seaborn is a seamless experience.
Basic charts and plots with Pandas:
import pandas as pd import matplotlib.pyplot as plt # Create a DataFrame data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]} df = pd.DataFrame(data) # Plot a line chart df.plot(x='A', y='B', kind='line') plt.show()
Customizing Pandas plots with Matplotlib:
import pandas as pd import matplotlib.pyplot as plt # Create a DataFrame data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]} df = pd.DataFrame(data) # Customize a bar chart ax = df.plot(kind='bar', x='A', y='B') ax.set(title='Customized Bar Chart', xlabel='X-axis', ylabel='Y-axis') plt.show()
Pandas plotting for exploratory data analysis (EDA):
import pandas as pd # Load a dataset df = pd.read_csv('data.csv') # EDA with pair plots pd.plotting.scatter_matrix(df, diagonal='hist')
Using different plot types in Pandas:
# Scatter plot df.plot.scatter(x='A', y='B')
Time series visualization with Pandas:
import pandas as pd # Create a time series DataFrame df = pd.read_csv('time_series_data.csv', parse_dates=True, index_col='Date') # Plot time series data df.plot(kind='line')
Advanced plotting options with Pandas:
# Subplots df.plot(subplots=True, layout=(2, 1))
Combining Pandas plotting with Seaborn and Matplotlib:
import seaborn as sns # Scatter plot with Seaborn styling sns.scatterplot(data=df, x='A', y='B')
Styling and theming Pandas plots:
# Apply a style plt.style.use('seaborn-darkgrid')
Code examples for Pandas built-in data visualization in Python: