Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
A Pandas DataFrame is a 2D labeled data structure with columns that can be of different types. Here's a step-by-step tutorial to create a Pandas DataFrame:
import pandas as pd
There are several ways to create a DataFrame:
data = { 'Name': ['John', 'Anna', 'Ella'], 'Age': [28, 22, 33], 'City': ['New York', 'Paris', 'London'] } df = pd.DataFrame(data) print(df)
data_list = [ {'Name': 'John', 'Age': 28, 'City': 'New York'}, {'Name': 'Anna', 'Age': 22, 'City': 'Paris'}, {'Name': 'Ella', 'Age': 33, 'City': 'London'} ] df = pd.DataFrame(data_list) print(df)
data_list_of_lists = [ ['John', 28, 'New York'], ['Anna', 22, 'Paris'], ['Ella', 33, 'London'] ] df = pd.DataFrame(data_list_of_lists, columns=['Name', 'Age', 'City']) print(df)
Once you've created your DataFrame, here are some basic operations you might find useful:
print(df.head())
print(df.tail())
print(df.info())
This gives you statistics such as mean, standard deviation, min, 25th percentile, etc.
print(df.describe())
print(df['Name'])
For example, setting 'Name' as the index:
df.set_index('Name', inplace=True) print(df)
If you want to reset the index:
df.reset_index(inplace=True) print(df)
These are the basics of creating and manipulating a Pandas DataFrame. The Pandas library is extremely powerful and offers many advanced functionalities like groupby, merge, pivot, etc. But the above steps should give you a good starting point.
Pandas DataFrame creation from dictionaries in Python:
import pandas as pd # Create a Pandas DataFrame from a dictionary data = {'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']} df = pd.DataFrame(data)
Reading external data and initializing Pandas DataFrame:
import pandas as pd # Read external data (e.g., CSV) into Pandas DataFrame df = pd.read_csv('your_file.csv')
Generating a Pandas DataFrame from NumPy arrays:
import pandas as pd import numpy as np # Create NumPy arrays data = np.array([[1, 'A'], [2, 'B'], [3, 'C']]) # Create Pandas DataFrame from NumPy arrays df = pd.DataFrame(data, columns=['Column1', 'Column2'])
Creating a DataFrame with specified index and columns in Pandas:
import pandas as pd # Create Pandas DataFrame with specified index and columns df = pd.DataFrame([[1, 'A'], [2, 'B'], [3, 'C']], index=['Row1', 'Row2', 'Row3'], columns=['Column1', 'Column2'])
Pandas DataFrame initialization from CSV file:
import pandas as pd # Initialize Pandas DataFrame from CSV file df = pd.read_csv('your_file.csv')
Concatenating and merging to form a Pandas DataFrame:
import pandas as pd # Concatenate DataFrames vertically df1 = pd.DataFrame({'A': [1, 2], 'B': ['X', 'Y']}) df2 = pd.DataFrame({'A': [3, 4], 'B': ['Z', 'W']}) result_concat = pd.concat([df1, df2]) # Merge DataFrames based on a common column df3 = pd.DataFrame({'A': [1, 2], 'C': ['P', 'Q']}) result_merge = pd.merge(result_concat, df3, on='A')
Using Pandas to create DataFrame from JSON data:
import pandas as pd # Create Pandas DataFrame from JSON data json_data = '{"Column1": [1, 2, 3], "Column2": ["A", "B", "C"]}' df = pd.read_json(json_data)
Appending data to an existing Pandas DataFrame:
import pandas as pd # Create an initial Pandas DataFrame df1 = pd.DataFrame({'A': [1, 2], 'B': ['X', 'Y']}) # Append new data to the existing DataFrame df2 = pd.DataFrame({'A': [3], 'B': ['Z']}) result_append = df1.append(df2, ignore_index=True)
Creating Pandas DataFrame with custom index and column names:
import pandas as pd # Create Pandas DataFrame with custom index and column names data = {'Value1': [1, 2, 3], 'Value2': ['A', 'B', 'C']} df = pd.DataFrame(data, index=['Row1', 'Row2', 'Row3'], columns=['Value1', 'Value2'])
Reshaping and pivoting data to create a DataFrame in Pandas:
import pandas as pd # Reshape and pivot data to create a Pandas DataFrame data = {'Category': ['A', 'A', 'B', 'B'], 'Value': [10, 15, 20, 25]} df = pd.DataFrame(data) df_pivoted = df.pivot(index='Category', columns='Value')
Initialization of a Pandas DataFrame with datetime index:
import pandas as pd from datetime import datetime # Initialize Pandas DataFrame with datetime index date_index = pd.date_range(start='2022-01-01', periods=3, freq='D') df = pd.DataFrame({'Value': [1, 2, 3]}, index=date_index)
Filling missing data while creating a DataFrame in Pandas:
import pandas as pd # Create a Pandas DataFrame with missing values filled data = {'Value1': [1, 2, None, 4], 'Value2': ['A', 'B', 'C', None]} df = pd.DataFrame(data) df_filled = df.fillna({'Value1': 0, 'Value2': 'Unknown'})
Combining multiple DataFrames to create a new one in Pandas:
import pandas as pd # Combine multiple DataFrames to create a new one df1 = pd.DataFrame({'A': [1, 2], 'B': ['X', 'Y']}) df2 = pd.DataFrame({'C': ['Z', 'W']}) result_combined = pd.concat([df1, df2], axis=1)