Pandas Tutorial

Creating Objects

Viewing Data

Selection

Manipulating Data

Grouping Data

Merging, Joining and Concatenating

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Operations

Visualization

Applications and Projects

Python | Working with date and time using Pandas

Date and time are crucial data types that you'll often encounter, and Pandas offers robust support for working with them. Here's a concise tutorial on handling date and time using Pandas:

1. Set Up Environment and Libraries:

import pandas as pd

2. Creation:

a. Creating a Datetime Series:

dates = pd.Series(pd.date_range('2023-01-01', periods=5, freq='D'))
print(dates)

b. From Strings to Datetime:

date_strings = ["2023-01-01", "2023-01-02", "2023-01-03"]
date_series = pd.to_datetime(date_strings)
print(date_series)

3. Accessing Date Properties:

# Create a DataFrame with dates
df = pd.DataFrame({
    'dates': pd.date_range('2023-01-01', periods=5, freq='D')
})

df['year'] = df['dates'].dt.year
df['month'] = df['dates'].dt.month
df['day'] = df['dates'].dt.day
df['weekday'] = df['dates'].dt.weekday
print(df)

4. Date Offsets:

a. Adding/Subtracting Days:

df['3_days_later'] = df['dates'] + pd.DateOffset(days=3)
print(df)

b. Using Timedelta:

df['1_week_before'] = df['dates'] - pd.Timedelta(weeks=1)
print(df)

5. Filtering by Date:

mask = (df['dates'] >= '2023-01-03') & (df['dates'] <= '2023-01-05')
filtered_df = df[mask]
print(filtered_df)

6. Setting Date as Index:

df.set_index('dates', inplace=True)
print(df)

7. Date Resampling: If you have time series data, you can resample it to different frequencies.

# Sample time series data
date_rng = pd.date_range(start='2023-01-01', end='2023-01-10', freq='12H')
ts = pd.DataFrame(date_rng, columns=['date'])
ts['data'] = range(len(date_rng))

# Set date as index
ts.set_index('date', inplace=True)

# Resample to daily frequency
daily_mean = ts.resample('D').mean()
print(daily_mean)

8. Date Shifting:

# Shift data by 1 day
ts['lagged'] = ts['data'].shift(1)
print(ts)

9. Periods and Period Arithmetic:

# Define a period
period = pd.Period('2023-01')
print(period)  # Represents whole of January 2023

# Arithmetic operations
print(period + 1)  # This will represent February 2023

10. Time Zones:

# Convert naive DatetimeIndex to timezone-aware DatetimeIndex
ts = ts.tz_localize('UTC')

# Convert to another timezone
ts = ts.tz_convert('US/Eastern')
print(ts)

These are just the basics of date and time functionalities in Pandas. The library offers a comprehensive set of tools and utilities for more advanced operations and handling. It's useful to consult the Pandas documentation or other references when working with complex date and time operations.

  1. Working with datetime in Pandas:

    • Description: Perform basic datetime operations in Pandas, such as creating datetime objects.
    • Code:
      import pandas as pd
      
      # Create a datetime object
      date_time = pd.to_datetime('2022-01-01 12:30:45')
      
  2. Handling dates in Pandas DataFrame:

    • Description: Handle dates in a Pandas DataFrame, including creating a DataFrame with datetime columns.
    • Code:
      import pandas as pd
      
      # Create DataFrame with datetime columns
      df = pd.DataFrame({'date': pd.date_range('2022-01-01', periods=5), 'value': [10, 20, 30, 40, 50]})
      
  3. Time series analysis with Pandas:

    • Description: Perform time series analysis using Pandas on a DataFrame with datetime index.
    • Code:
      import pandas as pd
      
      # Create time series DataFrame
      df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03']))
      
      # Perform time series analysis
      rolling_mean = df['value'].rolling(window=2).mean()
      
  4. Datetime indexing and selection in Pandas:

    • Description: Use datetime indexing and selection techniques in Pandas to filter data.
    • Code:
      import pandas as pd
      
      # Create DataFrame with datetime index
      df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03']))
      
      # Select data for a specific date
      selected_data = df.loc['2022-01-02']
      
  5. Resampling time series data in Pandas:

    • Description: Resample time series data using the .resample() method in Pandas.
    • Code:
      import pandas as pd
      
      # Create time series DataFrame
      df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03']))
      
      # Resample to weekly frequency
      weekly_data = df.resample('W').sum()
      
  6. Time-based operations in Pandas:

    • Description: Perform time-based operations on a Pandas DataFrame, such as shifting data based on time.
    • Code:
      import pandas as pd
      
      # Create time series DataFrame
      df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03']))
      
      # Shift data by 1 day
      df['shifted_value'] = df['value'].shift(1)
      
  7. Convert string to datetime with Pandas:

    • Description: Convert a string to datetime using pd.to_datetime() in Pandas.
    • Code:
      import pandas as pd
      
      # Convert string to datetime
      date_string = '2022-01-01'
      datetime_object = pd.to_datetime(date_string)
      
  8. Dealing with time zones in Pandas:

    • Description: Handle time zones in Pandas by using the .tz_localize() and .tz_convert() methods.
    • Code:
      import pandas as pd
      
      # Create datetime with time zone
      datetime_with_tz = pd.to_datetime('2022-01-01 12:30:45').tz_localize('UTC')
      
      # Convert time zone
      datetime_converted = datetime_with_tz.tz_convert('America/New_York')
      
  9. Pandas timedelta for date and time manipulation:

    • Description: Use pd.Timedelta for date and time manipulation, such as adding or subtracting time intervals.
    • Code:
      import pandas as pd
      
      # Create a Timedelta object
      time_delta = pd.Timedelta(days=5, hours=3, minutes=30)
      
      # Add Timedelta to datetime
      new_datetime = pd.to_datetime('2022-01-01') + time_delta