R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Subsetting data frames is a common task in data analysis. The subset()
function in R provides an easy way to extract parts of a data frame based on conditions. In this tutorial, we'll walk through how to use the subset()
function to create subsets of a data frame.
The subset()
function takes the following main arguments:
x
: the data frame you want to subset.subset
: the condition(s) based on which rows will be selected.select
: the columns you want to include in the subset.Let's use the built-in mtcars
data frame to illustrate:
# Load the mtcars data frame data(mtcars) # Subset cars with mpg > 20 subset_mpg <- subset(mtcars, subset = mpg > 20) head(subset_mpg)
You can use logical operators like &
(and) and |
(or) to combine multiple conditions:
# Subset cars with mpg > 20 and hp < 100 (high mileage, low horsepower) subset_mpg_hp <- subset(mtcars, subset = (mpg > 20) & (hp < 100)) head(subset_mpg_hp)
Use the select
argument to specify the columns you want:
# Subset cars with mpg > 20 and select only mpg and hp columns subset_mpg_select <- subset(mtcars, subset = mpg > 20, select = c(mpg, hp)) head(subset_mpg_select)
You can exclude specific columns using the -
sign:
# Subset cars with mpg > 20 and exclude the mpg and hp columns subset_mpg_exclude <- subset(mtcars, subset = mpg > 20, select = -c(mpg, hp)) head(subset_mpg_exclude)
Let's say you have a data frame with character columns, and you want to subset based on string values:
# Sample data frame df <- data.frame(ID = 1:5, Label = c("A", "B", "A", "C", "B")) # Subset rows with Label "A" subset_label <- subset(df, Label == "A") print(subset_label)
While subset()
is great for interactive use and quick data exploration, it's generally not recommended for programming or use within functions, as there are some scoping issues that can arise. For programming, it's safer to use [
, [[
, or the $
operators to subset data frames.
This tutorial gives you a solid foundation to subset data frames in R using the subset()
function. Adjust the conditions and column selections as per your needs to get the desired subsets.
R subset function example:
# Create a data frame data_frame <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Score = c(90, 85, 95) ) # Use subset() function to create a subset based on a condition subset_example <- subset(data_frame, Age > 25)
Create subsets of data in R:
# Create a data frame data_frame <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Score = c(90, 85, 95) ) # Create subsets of data based on conditions subset_age_above_25 <- subset(data_frame, Age > 25) subset_score_above_90 <- subset(data_frame, Score > 90)
Filter data frame with subset() in R:
# Create a data frame data_frame <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Score = c(90, 85, 95) ) # Use subset() function to filter the data frame based on a condition filtered_data <- subset(data_frame, Age > 25 & Score > 90)
Subset data frame by conditions in R:
# Create a data frame data_frame <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Score = c(90, 85, 95) ) # Subset the data frame based on conditions subset_condition <- data_frame[data_frame$Age > 25 & data_frame$Score > 90, ]
Using subset() to select columns in R:
# Create a data frame data_frame <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Score = c(90, 85, 95) ) # Use subset() function to select specific columns subset_columns <- subset(data_frame, select = c(Name, Score))
R subset rows based on criteria:
# Create a data frame data_frame <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Score = c(90, 85, 95) ) # Subset the data frame based on row indices or criteria subset_rows <- data_frame[c(1, 3), ]
Subset data frame by multiple conditions in R:
# Create a data frame data_frame <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Score = c(90, 85, 95) ) # Subset the data frame based on multiple conditions subset_multiple_conditions <- data_frame[data_frame$Age > 25 & data_frame$Score > 90, ]
Subset data frame by column values in R:
# Create a data frame data_frame <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Score = c(90, 85, 95) ) # Subset the data frame based on specific column values subset_column_values <- subset(data_frame, Name %in% c("Alice", "Charlie"))
Subset data frame by variable types in R:
# Create a data frame data_frame <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 22), Score = c(90, 85, 95), Grade = c("A", "B", "A") ) # Subset the data frame based on variable types subset_numeric_variables <- subset(data_frame, select = sapply(data_frame, is.numeric))