R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Data Reshaping in R

Data reshaping in R involves transforming data from wide format to long format or vice versa, as well as other structural manipulations. The tidyverse collection, specifically tidyr, provides functions for such tasks. This tutorial will walk you through data reshaping techniques in R using tidyr.

1. Installing and Loading Required Packages:

install.packages("tidyverse")
library(tidyverse)

2. Creating a Sample Data Frame:

df <- tibble(
  Subject = c("John", "Jane", "Doe"),
  Test1 = c(90, 85, 88),
  Test2 = c(92, 87, 78)
)
print(df)

3. Converting Data from Wide to Long Format:

This is called "melting" or "gathering" data.

df_long <- df %>%
  gather(key = "Test", value = "Score", -Subject)

print(df_long)

4. Converting Data from Long to Wide Format:

This is often called "casting" or "spreading" data.

df_wide <- df_long %>%
  spread(key = "Test", value = "Score")

print(df_wide)

5. Separating a Column into Multiple Columns:

Imagine a column has combined data that you want to split.

df_sep <- tibble(
  Subject = c("John", "Jane", "Doe"),
  Test_Score = c("Test1_90", "Test1_85", "Test1_88")
)

df_separated <- df_sep %>%
  separate(col = Test_Score, into = c("Test", "Score"), sep = "_")

print(df_separated)

6. Combining Multiple Columns into One:

This is the opposite of the separate function.

df_unite <- df_separated %>%
  unite(col = "Test_Score", Test, Score, sep = "_")

print(df_unite)

7. Nesting and Unnesting:

a. Nesting:

You can create nested data frames where one column is a list of data frames.

df_nested <- df %>%
  group_by(Subject) %>%
  nest()

print(df_nested)

b. Unnesting:

This can reverse the nesting operation.

df_unnested <- df_nested %>%
  unnest(cols = data)

print(df_unnested)

8. Pivot Longer and Pivot Wider:

tidyr has introduced two new functions that make the process of reshaping data more intuitive and flexible: pivot_longer() and pivot_wider().

a. Pivot Longer:

Equivalent to gather(), but with more flexibility.

df_longer <- df %>%
  pivot_longer(cols = starts_with("Test"), names_to = "Test", values_to = "Score")

print(df_longer)

b. Pivot Wider:

Equivalent to spread(), but with more features.

df_wider <- df_longer %>%
  pivot_wider(names_from = "Test", values_from = "Score")

print(df_wider)

Conclusion:

Data reshaping is a foundational skill for anyone working with data in R. With functions from tidyr and the broader tidyverse, this task becomes intuitive and efficient. By mastering these techniques, you can prepare your data for various analytical procedures, making your analyses clearer and more insightful.

  1. Wide to long format in R:

    • Convert data from wide to long format using functions like gather or pivot_longer.
    # Using tidyr gather function for wide to long
    library(tidyr)
    long_data <- gather(original_wide_data, key = "Variable", value = "Value", -ID)
    
  2. Melting and casting data in R:

    • Melting involves converting wide data to a long format, while casting transforms it back to wide.
    # Using reshape2 for melting and casting
    library(reshape2)
    melted_data <- melt(original_wide_data, id.vars = "ID")
    casted_data <- dcast(melted_data, ID ~ variable, value.var = "value")
    
  3. Reshape data frame in R:

    • Use the reshape package for reshaping data frames.
    # Using reshape package for reshaping
    library(reshape)
    reshaped_data <- melt(original_wide_data, id.vars = "ID")
    casted_data <- cast(reshaped_data, ID ~ variable, value.var = "value")
    
  4. Tidyr package for data reshaping in R:

    • The tidyr package provides functions like gather, spread, pivot_longer, and pivot_wider.
    # Using tidyr for data reshaping
    library(tidyr)
    long_data <- pivot_longer(original_wide_data, cols = -ID, names_to = "Variable", values_to = "Value")
    wide_data <- pivot_wider(long_data, names_from = "Variable", values_from = "Value")
    
  5. R gather and spread functions:

    • gather and spread functions in tidyr are useful for reshaping data.
    # Using tidyr gather and spread functions
    library(tidyr)
    long_data <- gather(original_wide_data, key = "Variable", value = "Value", -ID)
    wide_data <- spread(long_data, key = "Variable", value = "Value")
    
  6. Reshaping time-series data in R:

    • Time-series data may need reshaping for analysis or visualization purposes.
    # Reshaping time-series data
    library(tidyr)
    long_ts_data <- pivot_longer(original_wide_ts_data, cols = -Time, names_to = "Variable", values_to = "Value")
    
  7. R pivot_longer and pivot_wider examples:

    • pivot_longer and pivot_wider functions in tidyr provide flexibility in reshaping.
    # Using tidyr pivot_longer and pivot_wider
    library(tidyr)
    long_data <- pivot_longer(original_wide_data, cols = -ID, names_to = "Variable", values_to = "Value")
    wide_data <- pivot_wider(long_data, names_from = "Variable", values_from = "Value")