R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Data reshaping in R involves transforming data from wide format to long format or vice versa, as well as other structural manipulations. The tidyverse
collection, specifically tidyr
, provides functions for such tasks. This tutorial will walk you through data reshaping techniques in R using tidyr
.
install.packages("tidyverse") library(tidyverse)
df <- tibble( Subject = c("John", "Jane", "Doe"), Test1 = c(90, 85, 88), Test2 = c(92, 87, 78) ) print(df)
This is called "melting" or "gathering" data.
df_long <- df %>% gather(key = "Test", value = "Score", -Subject) print(df_long)
This is often called "casting" or "spreading" data.
df_wide <- df_long %>% spread(key = "Test", value = "Score") print(df_wide)
Imagine a column has combined data that you want to split.
df_sep <- tibble( Subject = c("John", "Jane", "Doe"), Test_Score = c("Test1_90", "Test1_85", "Test1_88") ) df_separated <- df_sep %>% separate(col = Test_Score, into = c("Test", "Score"), sep = "_") print(df_separated)
This is the opposite of the separate function.
df_unite <- df_separated %>% unite(col = "Test_Score", Test, Score, sep = "_") print(df_unite)
a. Nesting:
You can create nested data frames where one column is a list of data frames.
df_nested <- df %>% group_by(Subject) %>% nest() print(df_nested)
b. Unnesting:
This can reverse the nesting operation.
df_unnested <- df_nested %>% unnest(cols = data) print(df_unnested)
tidyr
has introduced two new functions that make the process of reshaping data more intuitive and flexible: pivot_longer()
and pivot_wider()
.
a. Pivot Longer:
Equivalent to gather()
, but with more flexibility.
df_longer <- df %>% pivot_longer(cols = starts_with("Test"), names_to = "Test", values_to = "Score") print(df_longer)
b. Pivot Wider:
Equivalent to spread()
, but with more features.
df_wider <- df_longer %>% pivot_wider(names_from = "Test", values_from = "Score") print(df_wider)
Data reshaping is a foundational skill for anyone working with data in R. With functions from tidyr
and the broader tidyverse
, this task becomes intuitive and efficient. By mastering these techniques, you can prepare your data for various analytical procedures, making your analyses clearer and more insightful.
Wide to long format in R:
gather
or pivot_longer
.# Using tidyr gather function for wide to long library(tidyr) long_data <- gather(original_wide_data, key = "Variable", value = "Value", -ID)
Melting and casting data in R:
# Using reshape2 for melting and casting library(reshape2) melted_data <- melt(original_wide_data, id.vars = "ID") casted_data <- dcast(melted_data, ID ~ variable, value.var = "value")
Reshape data frame in R:
reshape
package for reshaping data frames.# Using reshape package for reshaping library(reshape) reshaped_data <- melt(original_wide_data, id.vars = "ID") casted_data <- cast(reshaped_data, ID ~ variable, value.var = "value")
Tidyr package for data reshaping in R:
tidyr
package provides functions like gather
, spread
, pivot_longer
, and pivot_wider
.# Using tidyr for data reshaping library(tidyr) long_data <- pivot_longer(original_wide_data, cols = -ID, names_to = "Variable", values_to = "Value") wide_data <- pivot_wider(long_data, names_from = "Variable", values_from = "Value")
R gather and spread functions:
gather
and spread
functions in tidyr
are useful for reshaping data.# Using tidyr gather and spread functions library(tidyr) long_data <- gather(original_wide_data, key = "Variable", value = "Value", -ID) wide_data <- spread(long_data, key = "Variable", value = "Value")
Reshaping time-series data in R:
# Reshaping time-series data library(tidyr) long_ts_data <- pivot_longer(original_wide_ts_data, cols = -Time, names_to = "Variable", values_to = "Value")
R pivot_longer and pivot_wider examples:
pivot_longer
and pivot_wider
functions in tidyr
provide flexibility in reshaping.# Using tidyr pivot_longer and pivot_wider library(tidyr) long_data <- pivot_longer(original_wide_data, cols = -ID, names_to = "Variable", values_to = "Value") wide_data <- pivot_wider(long_data, names_from = "Variable", values_from = "Value")