Pandas Tutorial

Creating Objects

Viewing Data

Selection

Manipulating Data

Grouping Data

Merging, Joining and Concatenating

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Operations

Visualization

Applications and Projects

Pandas GroupBy

The groupby method in pandas is a powerful tool for segmenting a DataFrame into subsets according to some criteria. It's particularly useful for aggregating data, computing summary statistics, and restructuring data in various ways.

Here's a step-by-step tutorial on using groupby in pandas:

1. Setup:

Ensure you have the required libraries:

pip install pandas

2. Import the necessary libraries:

import pandas as pd

3. Create a Sample DataFrame:

For this tutorial, let's create a sample DataFrame:

data = {
    'Department': ['IT', 'HR', 'Finance', 'IT', 'HR'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Salary': [55000, 60000, 65000, 58000, 62000]
}

df = pd.DataFrame(data)

4. Basic Grouping:

To group data by the 'Department' column:

grouped = df.groupby('Department')

This creates a GroupBy object. It hasn't actually computed anything yet, but it has some useful methods and attributes.

5. Aggregate Data:

Once grouped, we can aggregate data in various ways:

a. Compute mean of each group:

mean_salaries = grouped['Salary'].mean()
print(mean_salaries)

Output:

Department
Finance    65000
HR         61000
IT         56500
Name: Salary, dtype: int64

b. Compute multiple aggregations:

aggregations = grouped['Salary'].agg(['mean', 'sum', 'max', 'min'])
print(aggregations)

c. Applying different aggregations to different columns:

result = grouped.agg({
    'Salary': ['mean', 'sum'],
    'Employee': 'count'
})
print(result)

6. Iterating Over Groups:

You can iterate over each group in a GroupBy object:

for department, group_data in grouped:
    print(department)
    print(group_data, '\n')

7. Filtering Groups:

Suppose we want to filter groups based on some criteria:

# Filter departments with average salary greater than 60000
filtered = grouped.filter(lambda x: x['Salary'].mean() > 60000)
print(filtered)

8. Transforming Groups:

You can transform the values in each group:

# Deduct 5000 from each salary in the 'IT' department
deducted_salary = grouped.transform(lambda x: x['Salary'] - 5000 if x.name == 'IT' else x['Salary'])
print(deducted_salary)

9. Multi-level Grouping:

You can group by multiple columns:

df['Experience'] = ['Senior', 'Junior', 'Senior', 'Junior', 'Senior']
grouped_multi = df.groupby(['Department', 'Experience'])

# Compute mean salary
mean_salaries_multi = grouped_multi['Salary'].mean()
print(mean_salaries_multi)

Summary:

The groupby method in pandas is versatile and provides numerous options for data aggregation, transformation, and filtering. It plays a pivotal role in exploratory data analysis and preprocessing, helping generate insights and prepare data for further analysis or visualization.

  1. GroupBy in Pandas with examples:

    • The groupby() function in Pandas is used for grouping rows based on specified columns.
    • Example:
      import pandas as pd
      
      # Create a DataFrame
      df = pd.DataFrame({'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 15, 20, 25]})
      
      # GroupBy 'Category'
      grouped = df.groupby('Category')
      
  2. Aggregate functions in Pandas GroupBy:

    • After grouping, aggregate functions like sum(), mean(), count(), etc., can be applied.
    • Example:
      # Aggregate using sum
      sum_values = grouped['Value'].sum()
      
  3. Pandas GroupBy multiple columns:

    • You can group by multiple columns by passing a list of column names to the groupby() function.
    • Example:
      # GroupBy multiple columns
      grouped_multiple = df.groupby(['Category', 'AnotherColumn'])
      
  4. GroupBy and sum in Pandas:

    • Use the sum() function to get the sum of values within each group.
    • Example:
      # GroupBy and sum
      sum_values = grouped['Value'].sum()
      
  5. Pandas GroupBy count unique values:

    • The nunique() function counts the number of unique values within each group.
    • Example:
      # GroupBy and count unique values
      unique_counts = grouped['Value'].nunique()
      
  6. How to reset index after GroupBy in Pandas:

    • After a GroupBy operation, use reset_index() to move grouped columns back to DataFrame columns.
    • Example:
      # Reset index after GroupBy
      result = sum_values.reset_index()
      
  7. GroupBy and apply function in Pandas:

    • Apply custom functions using the apply() function after grouping.
    • Example:
      # GroupBy and apply custom function
      result = grouped['Value'].apply(custom_function)
      
  8. Pandas GroupBy mean and median:

    • Obtain the mean and median values within each group using mean() and median() functions.
    • Example:
      # GroupBy and mean
      mean_values = grouped['Value'].mean()
      
      # GroupBy and median
      median_values = grouped['Value'].median()
      
  9. GroupBy and filter in Pandas:

    • Use filter() to filter groups based on specified conditions.
    • Example:
      # GroupBy and filter
      filtered_groups = grouped.filter(lambda x: x['Value'].sum() > 30)