R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Create Subsets of a Data frame - subset() Function in R

Subsetting data frames is a common task in data analysis. The subset() function in R provides an easy way to extract parts of a data frame based on conditions. In this tutorial, we'll walk through how to use the subset() function to create subsets of a data frame.

1. Basic Usage:

The subset() function takes the following main arguments:

  • x: the data frame you want to subset.
  • subset: the condition(s) based on which rows will be selected.
  • select: the columns you want to include in the subset.

Let's use the built-in mtcars data frame to illustrate:

# Load the mtcars data frame
data(mtcars)

# Subset cars with mpg > 20
subset_mpg <- subset(mtcars, subset = mpg > 20)
head(subset_mpg)

2. Multiple Conditions:

You can use logical operators like & (and) and | (or) to combine multiple conditions:

# Subset cars with mpg > 20 and hp < 100 (high mileage, low horsepower)
subset_mpg_hp <- subset(mtcars, subset = (mpg > 20) & (hp < 100))
head(subset_mpg_hp)

3. Selecting Specific Columns:

Use the select argument to specify the columns you want:

# Subset cars with mpg > 20 and select only mpg and hp columns
subset_mpg_select <- subset(mtcars, subset = mpg > 20, select = c(mpg, hp))
head(subset_mpg_select)

4. Excluding Columns:

You can exclude specific columns using the - sign:

# Subset cars with mpg > 20 and exclude the mpg and hp columns
subset_mpg_exclude <- subset(mtcars, subset = mpg > 20, select = -c(mpg, hp))
head(subset_mpg_exclude)

5. Subsetting with Character Columns:

Let's say you have a data frame with character columns, and you want to subset based on string values:

# Sample data frame
df <- data.frame(ID = 1:5, Label = c("A", "B", "A", "C", "B"))

# Subset rows with Label "A"
subset_label <- subset(df, Label == "A")
print(subset_label)

Important Note:

While subset() is great for interactive use and quick data exploration, it's generally not recommended for programming or use within functions, as there are some scoping issues that can arise. For programming, it's safer to use [, [[, or the $ operators to subset data frames.

This tutorial gives you a solid foundation to subset data frames in R using the subset() function. Adjust the conditions and column selections as per your needs to get the desired subsets.

  1. R subset function example:

    # Create a data frame
    data_frame <- data.frame(
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22),
      Score = c(90, 85, 95)
    )
    
    # Use subset() function to create a subset based on a condition
    subset_example <- subset(data_frame, Age > 25)
    
  2. Create subsets of data in R:

    # Create a data frame
    data_frame <- data.frame(
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22),
      Score = c(90, 85, 95)
    )
    
    # Create subsets of data based on conditions
    subset_age_above_25 <- subset(data_frame, Age > 25)
    subset_score_above_90 <- subset(data_frame, Score > 90)
    
  3. Filter data frame with subset() in R:

    # Create a data frame
    data_frame <- data.frame(
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22),
      Score = c(90, 85, 95)
    )
    
    # Use subset() function to filter the data frame based on a condition
    filtered_data <- subset(data_frame, Age > 25 & Score > 90)
    
  4. Subset data frame by conditions in R:

    # Create a data frame
    data_frame <- data.frame(
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22),
      Score = c(90, 85, 95)
    )
    
    # Subset the data frame based on conditions
    subset_condition <- data_frame[data_frame$Age > 25 & data_frame$Score > 90, ]
    
  5. Using subset() to select columns in R:

    # Create a data frame
    data_frame <- data.frame(
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22),
      Score = c(90, 85, 95)
    )
    
    # Use subset() function to select specific columns
    subset_columns <- subset(data_frame, select = c(Name, Score))
    
  6. R subset rows based on criteria:

    # Create a data frame
    data_frame <- data.frame(
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22),
      Score = c(90, 85, 95)
    )
    
    # Subset the data frame based on row indices or criteria
    subset_rows <- data_frame[c(1, 3), ]
    
  7. Subset data frame by multiple conditions in R:

    # Create a data frame
    data_frame <- data.frame(
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22),
      Score = c(90, 85, 95)
    )
    
    # Subset the data frame based on multiple conditions
    subset_multiple_conditions <- data_frame[data_frame$Age > 25 & data_frame$Score > 90, ]
    
  8. Subset data frame by column values in R:

    # Create a data frame
    data_frame <- data.frame(
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22),
      Score = c(90, 85, 95)
    )
    
    # Subset the data frame based on specific column values
    subset_column_values <- subset(data_frame, Name %in% c("Alice", "Charlie"))
    
  9. Subset data frame by variable types in R:

    # Create a data frame
    data_frame <- data.frame(
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22),
      Score = c(90, 85, 95),
      Grade = c("A", "B", "A")
    )
    
    # Subset the data frame based on variable types
    subset_numeric_variables <- subset(data_frame, select = sapply(data_frame, is.numeric))