R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Factors are an essential data structure in R, used for categorical variables. They store both the actual data and the distinct categories (levels) of the categorical variable. Factors can be ordered or unordered, and they play a crucial role in statistical modeling in R since many statistical models assume that categorical data is stored as factors.
In this tutorial, we'll cover:
To create a factor in R, use the factor()
function:
data <- c("apple", "banana", "apple", "cherry", "banana") fruit_factor <- factor(data) print(fruit_factor)
This will print:
[1] apple banana apple cherry banana Levels: apple banana cherry
To view the levels of a factor:
levels(fruit_factor) # [1] "apple" "banana" "cherry"
Sometimes you may want to modify the levels of a factor, either to change the order or to rename them:
levels(fruit_factor) <- c("a_fruit", "b_fruit", "c_fruit") print(fruit_factor)
You can also combine levels:
data <- c("low", "medium", "low", "high", "medium") factor_data <- factor(data, levels = c("low", "medium", "high")) factor_data <- factor(as.character(factor_data), levels = c("low", "medium/high")) levels(factor_data)[2:3] <- "medium/high" print(factor_data)
To convert a factor back to a character vector:
char_data <- as.character(fruit_factor) print(char_data)
And to convert it to a numeric vector:
num_data <- as.numeric(fruit_factor) print(num_data)
Note: The numeric values correspond to the factor levels. So, the first level will be represented as 1, the second as 2, and so on.
Factors can be ordered, which means that the levels have an inherent order:
sizes <- c("medium", "small", "large", "medium") size_factor <- factor(sizes, ordered = TRUE, levels = c("small", "medium", "large")) print(size_factor)
This will indicate that the factor is ordered when you print it:
[1] medium small large medium Levels: small < medium < large
Factors in R are essential for handling categorical data. They come with functionalities that help in better representation and ordering of categories, which is crucial when performing statistical analyses or modeling. It's always important to ensure that categorical data is stored as factors, especially when preparing your data for techniques that require them, like many of the functions in the stats
package.
Creating and manipulating factors in R:
# Creating a factor gender <- factor(c("Male", "Female", "Male", "Female")) # Manipulating factors levels(gender)
R factor levels and labels:
# Creating a factor with custom levels and labels education <- factor(c("High School", "College", "High School", "Graduate"), levels = c("High School", "College", "Graduate"), labels = c("HS", "Col", "Grad"))
Converting character vectors to factors in R:
# Converting character vector to factor colors <- c("Red", "Green", "Blue", "Red", "Green") factor_colors <- factor(colors)
R factor vs. character data type:
# Character vector char_vector <- c("A", "B", "C", "A", "B") # Factor factor_vector <- factor(char_vector)
Working with ordered factors in R:
# Creating an ordered factor size <- factor(c("Small", "Medium", "Large"), ordered = TRUE, levels = c("Small", "Medium", "Large")) # Comparing ordered factors size[1] > size[2]
R code for recoding and releveling factors:
# Recoding factor levels recoded_gender <- factor(gender, levels = c("Female", "Male")) # Releveling factors relevel_gender <- relevel(gender, ref = "Female")
Visualizing factors in R plots:
# Bar plot of factor frequencies barplot(table(gender))
Handling missing values in factors in R:
# Handling missing values in factors missing_values <- factor(c("A", "B", NA, "A", "B")) # Drop missing levels missing_values <- droplevels(missing_values)