R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Factors in R

Factors are an essential data structure in R, used for categorical variables. They store both the actual data and the distinct categories (levels) of the categorical variable. Factors can be ordered or unordered, and they play a crucial role in statistical modeling in R since many statistical models assume that categorical data is stored as factors.

In this tutorial, we'll cover:

  1. Creating Factors
  2. Accessing Levels
  3. Modifying Levels
  4. Converting Between Factors and Other Data Types
  5. Ordered Factors

1. Creating Factors

To create a factor in R, use the factor() function:

data <- c("apple", "banana", "apple", "cherry", "banana")
fruit_factor <- factor(data)
print(fruit_factor)

This will print:

[1] apple  banana apple  cherry banana
Levels: apple banana cherry

2. Accessing Levels

To view the levels of a factor:

levels(fruit_factor)
# [1] "apple"  "banana" "cherry"

3. Modifying Levels

Sometimes you may want to modify the levels of a factor, either to change the order or to rename them:

levels(fruit_factor) <- c("a_fruit", "b_fruit", "c_fruit")
print(fruit_factor)

You can also combine levels:

data <- c("low", "medium", "low", "high", "medium")
factor_data <- factor(data, levels = c("low", "medium", "high"))
factor_data <- factor(as.character(factor_data), levels = c("low", "medium/high"))
levels(factor_data)[2:3] <- "medium/high"
print(factor_data)

4. Converting Between Factors and Other Data Types

To convert a factor back to a character vector:

char_data <- as.character(fruit_factor)
print(char_data)

And to convert it to a numeric vector:

num_data <- as.numeric(fruit_factor)
print(num_data)

Note: The numeric values correspond to the factor levels. So, the first level will be represented as 1, the second as 2, and so on.

5. Ordered Factors

Factors can be ordered, which means that the levels have an inherent order:

sizes <- c("medium", "small", "large", "medium")
size_factor <- factor(sizes, ordered = TRUE, levels = c("small", "medium", "large"))
print(size_factor)

This will indicate that the factor is ordered when you print it:

[1] medium small  large  medium
Levels: small < medium < large

Conclusion

Factors in R are essential for handling categorical data. They come with functionalities that help in better representation and ordering of categories, which is crucial when performing statistical analyses or modeling. It's always important to ensure that categorical data is stored as factors, especially when preparing your data for techniques that require them, like many of the functions in the stats package.

  1. Creating and manipulating factors in R:

    # Creating a factor
    gender <- factor(c("Male", "Female", "Male", "Female"))
    
    # Manipulating factors
    levels(gender)
    
  2. R factor levels and labels:

    # Creating a factor with custom levels and labels
    education <- factor(c("High School", "College", "High School", "Graduate"),
                       levels = c("High School", "College", "Graduate"),
                       labels = c("HS", "Col", "Grad"))
    
  3. Converting character vectors to factors in R:

    # Converting character vector to factor
    colors <- c("Red", "Green", "Blue", "Red", "Green")
    factor_colors <- factor(colors)
    
  4. R factor vs. character data type:

    # Character vector
    char_vector <- c("A", "B", "C", "A", "B")
    
    # Factor
    factor_vector <- factor(char_vector)
    
  5. Working with ordered factors in R:

    # Creating an ordered factor
    size <- factor(c("Small", "Medium", "Large"), ordered = TRUE, levels = c("Small", "Medium", "Large"))
    
    # Comparing ordered factors
    size[1] > size[2]
    
  6. R code for recoding and releveling factors:

    # Recoding factor levels
    recoded_gender <- factor(gender, levels = c("Female", "Male"))
    
    # Releveling factors
    relevel_gender <- relevel(gender, ref = "Female")
    
  7. Visualizing factors in R plots:

    # Bar plot of factor frequencies
    barplot(table(gender))
    
  8. Handling missing values in factors in R:

    # Handling missing values in factors
    missing_values <- factor(c("A", "B", NA, "A", "B"))
    
    # Drop missing levels
    missing_values <- droplevels(missing_values)