Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
Date and time are crucial data types that you'll often encounter, and Pandas offers robust support for working with them. Here's a concise tutorial on handling date and time using Pandas:
1. Set Up Environment and Libraries:
import pandas as pd
2. Creation:
a. Creating a Datetime Series:
dates = pd.Series(pd.date_range('2023-01-01', periods=5, freq='D')) print(dates)
b. From Strings to Datetime:
date_strings = ["2023-01-01", "2023-01-02", "2023-01-03"] date_series = pd.to_datetime(date_strings) print(date_series)
3. Accessing Date Properties:
# Create a DataFrame with dates df = pd.DataFrame({ 'dates': pd.date_range('2023-01-01', periods=5, freq='D') }) df['year'] = df['dates'].dt.year df['month'] = df['dates'].dt.month df['day'] = df['dates'].dt.day df['weekday'] = df['dates'].dt.weekday print(df)
4. Date Offsets:
a. Adding/Subtracting Days:
df['3_days_later'] = df['dates'] + pd.DateOffset(days=3) print(df)
b. Using Timedelta:
df['1_week_before'] = df['dates'] - pd.Timedelta(weeks=1) print(df)
5. Filtering by Date:
mask = (df['dates'] >= '2023-01-03') & (df['dates'] <= '2023-01-05') filtered_df = df[mask] print(filtered_df)
6. Setting Date as Index:
df.set_index('dates', inplace=True) print(df)
7. Date Resampling: If you have time series data, you can resample it to different frequencies.
# Sample time series data date_rng = pd.date_range(start='2023-01-01', end='2023-01-10', freq='12H') ts = pd.DataFrame(date_rng, columns=['date']) ts['data'] = range(len(date_rng)) # Set date as index ts.set_index('date', inplace=True) # Resample to daily frequency daily_mean = ts.resample('D').mean() print(daily_mean)
8. Date Shifting:
# Shift data by 1 day ts['lagged'] = ts['data'].shift(1) print(ts)
9. Periods and Period Arithmetic:
# Define a period period = pd.Period('2023-01') print(period) # Represents whole of January 2023 # Arithmetic operations print(period + 1) # This will represent February 2023
10. Time Zones:
# Convert naive DatetimeIndex to timezone-aware DatetimeIndex ts = ts.tz_localize('UTC') # Convert to another timezone ts = ts.tz_convert('US/Eastern') print(ts)
These are just the basics of date and time functionalities in Pandas. The library offers a comprehensive set of tools and utilities for more advanced operations and handling. It's useful to consult the Pandas documentation or other references when working with complex date and time operations.
Working with datetime in Pandas:
import pandas as pd # Create a datetime object date_time = pd.to_datetime('2022-01-01 12:30:45')
Handling dates in Pandas DataFrame:
import pandas as pd # Create DataFrame with datetime columns df = pd.DataFrame({'date': pd.date_range('2022-01-01', periods=5), 'value': [10, 20, 30, 40, 50]})
Time series analysis with Pandas:
import pandas as pd # Create time series DataFrame df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])) # Perform time series analysis rolling_mean = df['value'].rolling(window=2).mean()
Datetime indexing and selection in Pandas:
import pandas as pd # Create DataFrame with datetime index df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])) # Select data for a specific date selected_data = df.loc['2022-01-02']
Resampling time series data in Pandas:
.resample()
method in Pandas.import pandas as pd # Create time series DataFrame df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])) # Resample to weekly frequency weekly_data = df.resample('W').sum()
Time-based operations in Pandas:
import pandas as pd # Create time series DataFrame df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])) # Shift data by 1 day df['shifted_value'] = df['value'].shift(1)
Convert string to datetime with Pandas:
pd.to_datetime()
in Pandas.import pandas as pd # Convert string to datetime date_string = '2022-01-01' datetime_object = pd.to_datetime(date_string)
Dealing with time zones in Pandas:
.tz_localize()
and .tz_convert()
methods.import pandas as pd # Create datetime with time zone datetime_with_tz = pd.to_datetime('2022-01-01 12:30:45').tz_localize('UTC') # Convert time zone datetime_converted = datetime_with_tz.tz_convert('America/New_York')
Pandas timedelta for date and time manipulation:
pd.Timedelta
for date and time manipulation, such as adding or subtracting time intervals.import pandas as pd # Create a Timedelta object time_delta = pd.Timedelta(days=5, hours=3, minutes=30) # Add Timedelta to datetime new_datetime = pd.to_datetime('2022-01-01') + time_delta