Pandas Tutorial
Creating Objects
Viewing Data
Selection
Manipulating Data
Grouping Data
Merging, Joining and Concatenating
Working with Date and Time
Working With Text Data
Working with CSV and Excel files
Operations
Visualization
Applications and Projects
A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, objects, etc.). It is similar to a column in a spreadsheet, a dataset in R, or a list, dict, or array in standard Python.
Here's a step-by-step tutorial to create a Pandas Series:
import pandas as pd
s1 = pd.Series([1, 2, 3, 4, 5]) print(s1)
This will create a Series with a default index (0 to N-1).
s2 = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']) print(s2)
The keys become the index.
s3 = pd.Series({'a': 1, 'b': 2, 'c': 3}) print(s3)
You can use both position and label to retrieve values.
# Using position print(s1[2]) # Using label print(s2['c'])
s4 = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e']) # Addition print(s2 + s4) # Scalar multiplication print(s2 * 3)
print(s1[s1 > 3])
print('b' in s2) # True print('f' in s2) # False
print(s2.values) print(s2.index)
Pandas uses NaN
(not a number) to indicate missing values.
s5 = pd.Series({'a': 1, 'b': 2, 'c': 3, 'd': None}) print(s5)
You can check for missing values using:
print(s5.isnull())
And you can fill or drop missing values using fillna()
and dropna()
respectively.
The Pandas Series is a foundational data structure in the Pandas library, and understanding how to manipulate it is crucial before diving into more complex operations with DataFrames.
Initialization of Pandas Series with NumPy arrays:
import pandas as pd import numpy as np # Create a Pandas Series from a NumPy array data = np.array([1, 2, 3]) series = pd.Series(data)
Series creation from dictionaries in Python using Pandas:
import pandas as pd # Create a Pandas Series from a dictionary data = {'A': 1, 'B': 2, 'C': 3} series = pd.Series(data)
Reading external data and initializing Pandas Series:
import pandas as pd # Read external data (e.g., CSV) into Pandas Series series = pd.read_csv('your_file.csv', squeeze=True)
Generating a Pandas Series from a CSV file:
import pandas as pd # Generate a Pandas Series from a CSV file series = pd.read_csv('your_file.csv', squeeze=True)
Using Pandas to create Series from JSON data:
import pandas as pd # Create a Pandas Series from JSON data json_data = '{"A": 1, "B": 2, "C": 3}' series = pd.read_json(json_data, typ='series')
Concatenating and merging to form a Pandas Series:
import pandas as pd # Concatenate and merge multiple Series to create a new one series1 = pd.Series([1, 2, 3]) series2 = pd.Series([4, 5, 6]) result_concat = pd.concat([series1, series2]) result_merge = pd.merge(series1, series2, how='outer')
Appending data to an existing Pandas Series:
import pandas as pd # Append new data to an existing Pandas Series series1 = pd.Series([1, 2, 3]) series2 = pd.Series([4, 5, 6]) result_append = series1.append(series2, ignore_index=True)
Creating a Series with a specified index in Pandas:
import pandas as pd # Create a Pandas Series with a specified index data = [1, 2, 3] index = ['A', 'B', 'C'] series = pd.Series(data, index=index)
Reshaping and transforming data into a Pandas Series:
import pandas as pd # Reshape and transform data into a Pandas Series data = {'A': [1, 2, 3], 'B': [4, 5, 6]} df = pd.DataFrame(data) series = df.stack()
Initialization of a Pandas Series with datetime index:
import pandas as pd from datetime import datetime # Initialize Pandas Series with datetime index date_index = pd.date_range(start='2022-01-01', periods=3, freq='D') series = pd.Series([1, 2, 3], index=date_index)
Filling missing data while creating a Series in Pandas:
import pandas as pd # Create a Pandas Series with missing values filled data = [1, 2, None, 4] series = pd.Series(data) series_filled = series.fillna(0)
Combining multiple Series to create a new one in Pandas:
import pandas as pd # Combine multiple Series to create a new one series1 = pd.Series([1, 2, 3]) series2 = pd.Series([4, 5, 6]) result_combined = pd.concat([series1, series2], ignore_index=True)