Pandas Tutorial

Creating Objects

Viewing Data

Selection

Manipulating Data

Grouping Data

Merging, Joining and Concatenating

Working with Date and Time

Working With Text Data

Working with CSV and Excel files

Operations

Visualization

Applications and Projects

Boolean Indexing in Pandas

Boolean indexing is a powerful feature in pandas that allows you to filter data from a DataFrame or Series based on a condition or a set of conditions. It's a critical tool in any data analyst's toolbox. This tutorial will guide you through using boolean indexing in pandas.

1. Setup:

First, let's set up the environment and create a sample DataFrame:

import pandas as pd

# Sample DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': ['p', 'q', 'r', 's', 't']
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

2. Basic Boolean Indexing:

2.1 Single Condition:

Filter rows where values in column 'A' are greater than 3:

filtered_df = df[df['A'] > 3]
print("\nFiltered DataFrame (A > 3):")
print(filtered_df)

2.2 Multiple Conditions:

Filter rows where values in column 'A' are greater than 2 and values in column 'B' are less than 40:

filtered_df = df[(df['A'] > 2) & (df['B'] < 40)]
print("\nFiltered DataFrame (2 < A and B < 40):")
print(filtered_df)

Note: Always use & (and), | (or), and ~ (not) with parentheses around each condition when combining conditions.

3. Using isin():

If you want to filter data based on a list of values:

values = ['p', 's']
filtered_df = df[df['C'].isin(values)]
print("\nFiltered DataFrame (C in ['p', 's']):")
print(filtered_df)

4. Using ~ for Negation:

To select rows where column 'C' is NOT in the list of values:

values = ['p', 's']
filtered_df = df[~df['C'].isin(values)]
print("\nFiltered DataFrame (C not in ['p', 's']):")
print(filtered_df)

5. Combining Boolean Indexing with Other Operations:

You can combine boolean indexing with other DataFrame operations:

5.1 Count rows that meet a condition:

count = (df[df['A'] > 2]).shape[0]
print(f"\nNumber of rows where A > 2: {count}")

5.2 Calculate mean of a column based on a condition:

mean_val = df[df['A'] > 2]['B'].mean()
print(f"\nMean of column 'B' where A > 2: {mean_val}")

Full Code:

Combining all the steps, you'll get:

import pandas as pd

# Sample DataFrame
data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': ['p', 'q', 'r', 's', 't']
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Boolean Indexing
print("\nFiltered DataFrame (A > 3):")
print(df[df['A'] > 3])

print("\nFiltered DataFrame (2 < A and B < 40):")
print(df[(df['A'] > 2) & (df['B'] < 40)])

print("\nFiltered DataFrame (C in ['p', 's']):")
print(df[df['C'].isin(['p', 's'])])

print("\nFiltered DataFrame (C not in ['p', 's']):")
print(df[~df['C'].isin(['p', 's'])])

print(f"\nNumber of rows where A > 2: {(df[df['A'] > 2]).shape[0]}")
print(f"\nMean of column 'B' where A > 2: {df[df['A'] > 2]['B'].mean()}")

This tutorial offers a foundational understanding of boolean indexing in pandas. It's a versatile tool that can be combined with other functions and methods for more complex data manipulations.

  1. Python Pandas boolean indexing examples:

    import pandas as pd
    
    # Sample DataFrame
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Boolean mask for filtering
    mask = df['A'] > 2
    
    # Apply boolean mask to filter data
    filtered_data = df[mask]
    
  2. Filtering data with boolean conditions in Pandas:

    import pandas as pd
    
    # Sample DataFrame
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Filter data using boolean conditions
    filtered_data = df[df['A'] > 2]
    
  3. Indexing and selecting data with boolean arrays in Pandas:

    import pandas as pd
    
    # Sample DataFrame
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Create a boolean array
    bool_array = [True, False, True, False, True]
    
    # Select data using boolean array
    selected_data = df[bool_array]
    
  4. Applying multiple boolean conditions to Pandas DataFrame:

    import pandas as pd
    
    # Sample DataFrame
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Multiple boolean conditions
    condition1 = df['A'] > 2
    condition2 = df['B'] == 'X'
    
    # Combine conditions using logical operators
    combined_condition = condition1 & condition2
    
    # Apply combined condition to filter data
    filtered_data = df[combined_condition]
    
  5. Creating boolean masks for advanced data selection in Pandas:

    import pandas as pd
    
    # Sample DataFrame
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Create boolean masks
    mask1 = df['A'] > 2
    mask2 = df['B'] == 'X'
    
    # Combine masks using logical operators
    combined_mask = mask1 & mask2
    
    # Apply combined mask to filter data
    filtered_data = df[combined_mask]
    
  6. Combining boolean indexing with other Pandas operations:

    import pandas as pd
    
    # Sample DataFrame
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Boolean mask for filtering
    mask = df['A'] > 2
    
    # Select and perform operations on filtered data
    df.loc[mask, 'B'] = 'Z'
    
  7. Boolean indexing for missing data handling in Pandas:

    import pandas as pd
    
    # Sample DataFrame with missing values
    df = pd.DataFrame({'A': [1, 2, 3, None, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Boolean mask for missing values
    mask = df['A'].isna()
    
    # Replace missing values based on the boolean mask
    df.loc[mask, 'A'] = 0
    
  8. Using boolean indexing with categorical data in Pandas:

    import pandas as pd
    
    # Sample DataFrame with categorical column
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    df['B'] = df['B'].astype('category')
    
    # Boolean mask for categorical values
    mask = df['B'] == 'X'
    
    # Apply boolean mask to filter data
    filtered_data = df[mask]
    
  9. Efficient boolean indexing techniques in Pandas:

    import pandas as pd
    
    # Sample DataFrame
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Efficient boolean indexing using query method
    filtered_data = df.query('A > 2 and B == "X"')
    
  10. Pandas boolean indexing vs. traditional indexing:

    import pandas as pd
    
    # Sample DataFrame
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Traditional indexing
    traditional_data = df[df['A'] > 2]
    
    # Boolean indexing
    boolean_data = df.query('A > 2')
    
  11. Applying boolean indexing to time-series data in Pandas:

    import pandas as pd
    import datetime
    
    # Sample DataFrame with time-series data
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']},
                      index=pd.date_range('2022-01-01', periods=5, freq='D'))
    
    # Boolean mask for time-based filtering
    mask = df.index > datetime.datetime(2022, 1, 3)
    
    # Apply boolean mask to filter time-series data
    filtered_data = df[mask]
    
  12. Code examples for effective boolean indexing in Pandas:

    import pandas as pd
    
    # Sample DataFrame
    df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': ['X', 'Y', 'X', 'Y', 'X']})
    
    # Boolean mask for filtering
    mask = (df['A'] > 2) & (df['B'] == 'X')
    
    # Apply boolean mask to filter data
    filtered_data = df[mask]