Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
In this tutorial, we'll cover the fundamentals of working with timestamps in pandas.
A Timestamp represents a single point in time. In pandas, it's a replacement for Python's native datetime
, but is based on the more efficient numpy.datetime64
data type.
Make sure you have pandas installed:
pip install pandas
import pandas as pd
You can create a timestamp using pd.Timestamp
:
ts = pd.Timestamp('2023-08-31') print(ts)
Timestamps can be created from various string formats:
ts1 = pd.Timestamp('2023-08-31 12:45:30') ts2 = pd.Timestamp('2023/08/31') ts3 = pd.Timestamp('31/08/2023') ts4 = pd.Timestamp('2023, 31 August') print(ts1, ts2, ts3, ts4, sep="\n")
To get the current date and time:
current = pd.Timestamp.now() print(current)
By default, Timestamp
objects are timezone-naive. To localize a timestamp:
ts_tz = pd.Timestamp('2023-08-31 12:45:30').tz_localize('Asia/Tokyo') print(ts_tz)
To convert to another timezone:
ts_tz_ny = ts_tz.tz_convert('America/New_York') print(ts_tz_ny)
Once you have a Timestamp
, there are numerous attributes and methods you can access:
Attributes:
print(ts.year) print(ts.month) print(ts.day) print(ts.hour)
Date-related methods:
print(ts.to_period('D')) # Convert to a period (in this case, daily frequency) print(ts.weekday()) # Returns day of the week (Monday=0, Sunday=6)
You can add or subtract time from a timestamp using date offsets:
week_later = ts + pd.DateOffset(weeks=1) print(week_later) two_days_prior = ts - pd.DateOffset(days=2) print(two_days_prior)
Timestamps can be part of DataFrame and Series objects, which allows for more complex operations like time-based indexing and time series analysis:
dates = pd.date_range('20230101', periods=6) df = pd.DataFrame({'date': dates, 'value': range(6)}) print(df) # Time-based indexing print(df[df['date'] > '2023-01-03'])
Pandas provides the Timestamp
class as a powerful tool for handling date and time data, offering numerous built-in methods and attributes for common operations. It's the foundation for much of pandas' time series functionality, making it essential for anyone working with time-based data in Python.
Create Timestamp in Pandas DataFrame:
pd.to_datetime()
function to create a Pandas DataFrame with Timestamps.import pandas as pd # Create DataFrame with Timestamps df = pd.DataFrame({'timestamp': pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])})
Working with Timestamps in Pandas:
pd.Timestamp
objects.import pandas as pd # Create Timestamp timestamp = pd.Timestamp('2022-01-01 12:30:45') # Access components year = timestamp.year month = timestamp.month day = timestamp.day hour = timestamp.hour minute = timestamp.minute second = timestamp.second
Convert string to Timestamp in Pandas:
pd.to_datetime()
to convert a string to a Pandas Timestamp.import pandas as pd # Convert string to Timestamp timestamp = pd.to_datetime('2022-01-01 12:30:45')
Indexing and selecting by Timestamp in Pandas:
import pandas as pd # Create DataFrame with Timestamps df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])) # Select data based on Timestamp selected_data = df.loc['2022-01-02':'2022-01-03']
Pandas to_datetime for Timestamp conversion:
pd.to_datetime()
for converting various formats to Pandas Timestamps.import pandas as pd # Convert different formats to Timestamp timestamp_1 = pd.to_datetime('2022-01-01') timestamp_2 = pd.to_datetime('2022-01-01 12:30:45') timestamp_3 = pd.to_datetime(1641000000, unit='s')
Resampling time series data with Pandas Timestamp:
.resample()
method to resample time series data based on Timestamps.import pandas as pd # Create DataFrame with Timestamps df = pd.DataFrame({'value': [10, 20, 30]}, index=pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])) # Resample to daily frequency resampled_df = df.resample('D').sum()
Manipulating Timestamps in Pandas Series:
import pandas as pd # Create Series with Timestamps series = pd.Series(pd.to_datetime(['2022-01-01', '2022-01-02', '2022-01-03'])) # Add 1 day to Timestamps series += pd.Timedelta(days=1)
Handling time zones with Pandas Timestamp:
.tz_localize()
and .tz_convert()
methods to handle time zones in Pandas Timestamps.import pandas as pd # Create Timestamp with time zone timestamp_with_tz = pd.Timestamp('2022-01-01 12:30:45', tz='UTC') # Convert time zone timestamp_converted = timestamp_with_tz.tz_convert('America/New_York')
Timestamp arithmetic in Pandas:
import pandas as pd # Create Timestamps timestamp_1 = pd.Timestamp('2022-01-01 12:30:45') timestamp_2 = pd.Timestamp('2022-01-02 15:00:00') # Calculate time difference time_difference = timestamp_2 - timestamp_1