Pandas Tutorial

Creating Objects

Viewing Data

Selection

Manipulating Data

Grouping Data

Merging, Joining and Concatenating

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Operations

Visualization

Applications and Projects

Creating a Pandas DataFrame

A Pandas DataFrame is a 2D labeled data structure with columns that can be of different types. Here's a step-by-step tutorial to create a Pandas DataFrame:

Step 1: Import Necessary Libraries

import pandas as pd

Step 2: Creating a DataFrame

There are several ways to create a DataFrame:

2.1 From a Dictionary of Arrays/Lists

data = {
    'Name': ['John', 'Anna', 'Ella'],
    'Age': [28, 22, 33],
    'City': ['New York', 'Paris', 'London']
}

df = pd.DataFrame(data)
print(df)

2.2 From a List of Dictionaries

data_list = [
    {'Name': 'John', 'Age': 28, 'City': 'New York'},
    {'Name': 'Anna', 'Age': 22, 'City': 'Paris'},
    {'Name': 'Ella', 'Age': 33, 'City': 'London'}
]

df = pd.DataFrame(data_list)
print(df)

2.3 From a List of Lists (with columns specified)

data_list_of_lists = [
    ['John', 28, 'New York'],
    ['Anna', 22, 'Paris'],
    ['Ella', 33, 'London']
]

df = pd.DataFrame(data_list_of_lists, columns=['Name', 'Age', 'City'])
print(df)

Step 3: Basic DataFrame Operations

Once you've created your DataFrame, here are some basic operations you might find useful:

3.1 Display the First Few Rows

print(df.head())

3.2 Display the Last Few Rows

print(df.tail())

3.3 Get Information about the DataFrame

print(df.info())

3.4 Describe Numerical Columns

This gives you statistics such as mean, standard deviation, min, 25th percentile, etc.

print(df.describe())

3.5 Access a Specific Column

print(df['Name'])

3.6 Set an Index

For example, setting 'Name' as the index:

df.set_index('Name', inplace=True)
print(df)

3.7 Reset an Index

If you want to reset the index:

df.reset_index(inplace=True)
print(df)

These are the basics of creating and manipulating a Pandas DataFrame. The Pandas library is extremely powerful and offers many advanced functionalities like groupby, merge, pivot, etc. But the above steps should give you a good starting point.

Pandas DataFrame creation from dictionaries in Python:

import pandas as pd

# Create a Pandas DataFrame from a dictionary
data = {'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)

Reading external data and initializing Pandas DataFrame:

import pandas as pd

# Read external data (e.g., CSV) into Pandas DataFrame
df = pd.read_csv('your_file.csv')

Generating a Pandas DataFrame from NumPy arrays:

import pandas as pd
import numpy as np

# Create NumPy arrays
data = np.array([[1, 'A'], [2, 'B'], [3, 'C']])

# Create Pandas DataFrame from NumPy arrays
df = pd.DataFrame(data, columns=['Column1', 'Column2'])

Creating a DataFrame with specified index and columns in Pandas:

import pandas as pd

# Create Pandas DataFrame with specified index and columns
df = pd.DataFrame([[1, 'A'], [2, 'B'], [3, 'C']], index=['Row1', 'Row2', 'Row3'], columns=['Column1', 'Column2'])

Pandas DataFrame initialization from CSV file:

import pandas as pd

# Initialize Pandas DataFrame from CSV file
df = pd.read_csv('your_file.csv')

Concatenating and merging to form a Pandas DataFrame:

import pandas as pd

# Concatenate DataFrames vertically
df1 = pd.DataFrame({'A': [1, 2], 'B': ['X', 'Y']})
df2 = pd.DataFrame({'A': [3, 4], 'B': ['Z', 'W']})
result_concat = pd.concat([df1, df2])

# Merge DataFrames based on a common column
df3 = pd.DataFrame({'A': [1, 2], 'C': ['P', 'Q']})
result_merge = pd.merge(result_concat, df3, on='A')

Using Pandas to create DataFrame from JSON data:

import pandas as pd

# Create Pandas DataFrame from JSON data
json_data = '{"Column1": [1, 2, 3], "Column2": ["A", "B", "C"]}'
df = pd.read_json(json_data)

Appending data to an existing Pandas DataFrame:

import pandas as pd

# Create an initial Pandas DataFrame
df1 = pd.DataFrame({'A': [1, 2], 'B': ['X', 'Y']})

# Append new data to the existing DataFrame
df2 = pd.DataFrame({'A': [3], 'B': ['Z']})
result_append = df1.append(df2, ignore_index=True)

Creating Pandas DataFrame with custom index and column names:

import pandas as pd

# Create Pandas DataFrame with custom index and column names
data = {'Value1': [1, 2, 3], 'Value2': ['A', 'B', 'C']}
df = pd.DataFrame(data, index=['Row1', 'Row2', 'Row3'], columns=['Value1', 'Value2'])

Reshaping and pivoting data to create a DataFrame in Pandas:

import pandas as pd

# Reshape and pivot data to create a Pandas DataFrame
data = {'Category': ['A', 'A', 'B', 'B'],
        'Value': [10, 15, 20, 25]}
df = pd.DataFrame(data)
df_pivoted = df.pivot(index='Category', columns='Value')

Initialization of a Pandas DataFrame with datetime index:

import pandas as pd
from datetime import datetime

# Initialize Pandas DataFrame with datetime index
date_index = pd.date_range(start='2022-01-01', periods=3, freq='D')
df = pd.DataFrame({'Value': [1, 2, 3]}, index=date_index)

Filling missing data while creating a DataFrame in Pandas:

import pandas as pd

# Create a Pandas DataFrame with missing values filled
data = {'Value1': [1, 2, None, 4], 'Value2': ['A', 'B', 'C', None]}
df = pd.DataFrame(data)
df_filled = df.fillna({'Value1': 0, 'Value2': 'Unknown'})

Combining multiple DataFrames to create a new one in Pandas:

import pandas as pd

# Combine multiple DataFrames to create a new one
df1 = pd.DataFrame({'A': [1, 2], 'B': ['X', 'Y']})
df2 = pd.DataFrame({'C': ['Z', 'W']})
result_combined = pd.concat([df1, df2], axis=1)