R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

tidyr Package in R

The tidyr package is a part of the tidyverse in R and is used for tidying up datasets. It provides a suite of functions that help in reshaping data structures, changing the layout, and ensuring data follows the tidy data principles:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

This tutorial will guide you through some of the main functions of tidyr.

1. Installation and Loading:

First, you need to install and load the package:

install.packages("tidyr")
library(tidyr)

2. pivot_longer():

Converts data from a "wide" format to a "long" format.

data <- data.frame(
  name = c("John", "Jane"),
  test1 = c(85, 88),
  test2 = c(92, 90)
)

data_long <- data %>% pivot_longer(cols = c(test1, test2), names_to = "test", values_to = "score")

3. pivot_wider():

Converts data from a "long" format to a "wide" format.

data_wide <- data_long %>% pivot_wider(names_from = test, values_from = score)

4. separate():

Separates one column into multiple columns.

data <- data.frame(name_age = c("John_25", "Jane_30"))
data_sep <- data %>% separate(name_age, into = c("name", "age"), sep = "_")

5. unite():

Combines multiple columns into a single column.

data_unite <- data_sep %>% unite("name_age", name, age, sep = "_")

6. drop_na():

Removes rows with missing values in specific columns.

data <- data.frame(name = c("John", "Jane", "Jim"), age = c(25, NA, 30))
data_clean <- data %>% drop_na(age)

7. fill():

Fills missing values in a column with the previous or following value.

data <- data.frame(group = c("A", "A", "B", "B"), value = c(1, NA, 3, NA))
data_fill <- data %>% fill(value)

8. nest() and unnest():

nest() collapses selected columns into a nested data frame, while unnest() does the reverse.

data <- data.frame(group = c("A", "B"), val1 = 1:2, val2 = 3:4)
data_nested <- data %>% nest(data = c(val1, val2))

data_unnested <- data_nested %>% unnest(data)

9. complete():

Expands a dataset to include all combinations of specified columns.

data <- data.frame(group = c("A", "B"), value = 1:2)
data_complete <- data %>% complete(group, nesting(value = 1:3))

Conclusion:

The tidyr package offers a versatile suite of functions to make data tidying simpler. When combined with other tidyverse packages like dplyr, it becomes a powerful tool for data manipulation in R. Remember that real-world data often comes messy, and mastering tidyr can save a significant amount of time during the data cleaning process.

  1. Data Tidying with tidyr in R:

    • tidyr is a package that helps tidy messy data by reshaping and restructuring it for easier analysis.
    # Example: Tidying data with tidyr
    library(tidyr)
    tidy_data <- gather(original_data, key = "Variable", value = "Value", -ID)
    
  2. R tidyr vs reshape2 Comparison:

    • tidyr and reshape2 are both used for data reshaping, but tidyr provides a more consistent syntax.
    # Example: tidyr vs reshape2
    library(tidyr)
    library(reshape2)
    tidy_data <- gather(original_data, key = "Variable", value = "Value", -ID)
    
  3. Pivoting and Gathering Data with tidyr in R:

    • Use pivot_longer to gather columns into key-value pairs.
    # Example: Pivot longer with tidyr
    gathered_data <- pivot_longer(original_data, cols = starts_with("Var"), names_to = "Variable", values_to = "Value")
    
  4. Spreading Data with tidyr in R:

    • Use pivot_wider to spread key-value pairs back into columns.
    # Example: Pivot wider with tidyr
    spread_data <- pivot_wider(gathered_data, names_from = "Variable", values_from = "Value")
    
  5. Separating and Uniting Columns with tidyr:

    • separate and unite functions help split and combine columns, respectively.
    # Example: Separating and uniting columns
    separated_data <- separate(original_data, col = "Combined", into = c("Var1", "Var2"), sep = "_")
    united_data <- unite(separated_data, col = "Combined", Var1, Var2, sep = "_")
    
  6. Handling Missing Values with tidyr in R:

    • drop_na removes rows with missing values, while fill replaces them.
    # Example: Handling missing values
    cleaned_data <- drop_na(original_data)
    filled_data <- fill(original_data, Value)
    
  7. Reshaping Data Frames with tidyr:

    • gather and spread are core functions for reshaping data frames.
    # Example: Reshaping data frames
    tidy_data <- gather(original_data, key = "Variable", value = "Value", -ID)
    wide_data <- spread(tidy_data, key = "Variable", value = "Value")
    
  8. Wide to Long Format Conversion in R tidyr:

    • Convert data from wide to long format for better analysis.
    # Example: Wide to long format conversion
    long_data <- pivot_longer(wide_data, cols = -ID, names_to = "Variable", values_to = "Value")
    
  9. Working with Nested Data Frames in tidyr:

    • Use unnest to work with nested data frames within a column.
    # Example: Working with nested data frames
    unnested_data <- unnest(nested_data)
    
  10. Tidying Messy Data in R:

    • Tidy messy data using functions like separate, spread, and gather.
    # Example: Tidying messy data
    tidy_data <- separate(original_data, col = "Combined", into = c("Var1", "Var2"), sep = "_") %>%
                  gather(key = "Variable", value = "Value", -ID)
    
  11. R tidyr and dplyr Integration:

    • tidyr and dplyr work well together for comprehensive data manipulation.
    # Example: tidyr and dplyr integration
    library(dplyr)
    tidy_data <- original_data %>%
                 gather(key = "Variable", value = "Value", -ID) %>%
                 filter(Value > 0) %>%
                 spread(key = "Variable", value = "Value")
    
  12. Pivot Longer and Pivot Wider in tidyr:

    • pivot_longer and pivot_wider are essential for converting data between long and wide formats.
    # Example: Pivot longer and pivot wider
    long_data <- pivot_longer(original_data, cols = -ID, names_to = "Variable", values_to = "Value")
    wide_data <- pivot_wider(long_data, names_from = "Variable", values_from = "Value")
    
  13. Tidying Factors and Character Columns in R:

    • Use mutate and recode to tidy factors and character columns.
    # Example: Tidying factors and character columns
    tidy_factors <- original_data %>%
                    mutate(Category = recode(Category, "A" = "CategoryA", "B" = "CategoryB"))
    
  14. R gather and spread Functions with tidyr:

    • gather and spread are versatile functions for reshaping data frames.
    # Example: gather and spread with tidyr
    gathered_data <- gather(original_data, key = "Variable", value = "Value", -ID)
    spread_data <- spread(gathered_data, key = "Variable", value = "Value")