R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
The tidyr
package is a part of the tidyverse
in R and is used for tidying up datasets. It provides a suite of functions that help in reshaping data structures, changing the layout, and ensuring data follows the tidy data principles:
This tutorial will guide you through some of the main functions of tidyr
.
First, you need to install and load the package:
install.packages("tidyr") library(tidyr)
pivot_longer()
:Converts data from a "wide" format to a "long" format.
data <- data.frame( name = c("John", "Jane"), test1 = c(85, 88), test2 = c(92, 90) ) data_long <- data %>% pivot_longer(cols = c(test1, test2), names_to = "test", values_to = "score")
pivot_wider()
:Converts data from a "long" format to a "wide" format.
data_wide <- data_long %>% pivot_wider(names_from = test, values_from = score)
separate()
:Separates one column into multiple columns.
data <- data.frame(name_age = c("John_25", "Jane_30")) data_sep <- data %>% separate(name_age, into = c("name", "age"), sep = "_")
unite()
:Combines multiple columns into a single column.
data_unite <- data_sep %>% unite("name_age", name, age, sep = "_")
drop_na()
:Removes rows with missing values in specific columns.
data <- data.frame(name = c("John", "Jane", "Jim"), age = c(25, NA, 30)) data_clean <- data %>% drop_na(age)
fill()
:Fills missing values in a column with the previous or following value.
data <- data.frame(group = c("A", "A", "B", "B"), value = c(1, NA, 3, NA)) data_fill <- data %>% fill(value)
nest()
and unnest()
:nest()
collapses selected columns into a nested data frame, while unnest()
does the reverse.
data <- data.frame(group = c("A", "B"), val1 = 1:2, val2 = 3:4) data_nested <- data %>% nest(data = c(val1, val2)) data_unnested <- data_nested %>% unnest(data)
complete()
:Expands a dataset to include all combinations of specified columns.
data <- data.frame(group = c("A", "B"), value = 1:2) data_complete <- data %>% complete(group, nesting(value = 1:3))
The tidyr
package offers a versatile suite of functions to make data tidying simpler. When combined with other tidyverse
packages like dplyr
, it becomes a powerful tool for data manipulation in R. Remember that real-world data often comes messy, and mastering tidyr
can save a significant amount of time during the data cleaning process.
Data Tidying with tidyr in R:
tidyr
is a package that helps tidy messy data by reshaping and restructuring it for easier analysis.# Example: Tidying data with tidyr library(tidyr) tidy_data <- gather(original_data, key = "Variable", value = "Value", -ID)
R tidyr vs reshape2 Comparison:
tidyr
and reshape2
are both used for data reshaping, but tidyr
provides a more consistent syntax.# Example: tidyr vs reshape2 library(tidyr) library(reshape2) tidy_data <- gather(original_data, key = "Variable", value = "Value", -ID)
Pivoting and Gathering Data with tidyr in R:
pivot_longer
to gather columns into key-value pairs.# Example: Pivot longer with tidyr gathered_data <- pivot_longer(original_data, cols = starts_with("Var"), names_to = "Variable", values_to = "Value")
Spreading Data with tidyr in R:
pivot_wider
to spread key-value pairs back into columns.# Example: Pivot wider with tidyr spread_data <- pivot_wider(gathered_data, names_from = "Variable", values_from = "Value")
Separating and Uniting Columns with tidyr:
separate
and unite
functions help split and combine columns, respectively.# Example: Separating and uniting columns separated_data <- separate(original_data, col = "Combined", into = c("Var1", "Var2"), sep = "_") united_data <- unite(separated_data, col = "Combined", Var1, Var2, sep = "_")
Handling Missing Values with tidyr in R:
drop_na
removes rows with missing values, while fill
replaces them.# Example: Handling missing values cleaned_data <- drop_na(original_data) filled_data <- fill(original_data, Value)
Reshaping Data Frames with tidyr:
gather
and spread
are core functions for reshaping data frames.# Example: Reshaping data frames tidy_data <- gather(original_data, key = "Variable", value = "Value", -ID) wide_data <- spread(tidy_data, key = "Variable", value = "Value")
Wide to Long Format Conversion in R tidyr:
# Example: Wide to long format conversion long_data <- pivot_longer(wide_data, cols = -ID, names_to = "Variable", values_to = "Value")
Working with Nested Data Frames in tidyr:
unnest
to work with nested data frames within a column.# Example: Working with nested data frames unnested_data <- unnest(nested_data)
Tidying Messy Data in R:
separate
, spread
, and gather
.# Example: Tidying messy data tidy_data <- separate(original_data, col = "Combined", into = c("Var1", "Var2"), sep = "_") %>% gather(key = "Variable", value = "Value", -ID)
R tidyr and dplyr Integration:
tidyr
and dplyr
work well together for comprehensive data manipulation.# Example: tidyr and dplyr integration library(dplyr) tidy_data <- original_data %>% gather(key = "Variable", value = "Value", -ID) %>% filter(Value > 0) %>% spread(key = "Variable", value = "Value")
Pivot Longer and Pivot Wider in tidyr:
pivot_longer
and pivot_wider
are essential for converting data between long and wide formats.# Example: Pivot longer and pivot wider long_data <- pivot_longer(original_data, cols = -ID, names_to = "Variable", values_to = "Value") wide_data <- pivot_wider(long_data, names_from = "Variable", values_from = "Value")
Tidying Factors and Character Columns in R:
mutate
and recode
to tidy factors and character columns.# Example: Tidying factors and character columns tidy_factors <- original_data %>% mutate(Category = recode(Category, "A" = "CategoryA", "B" = "CategoryB"))
R gather and spread Functions with tidyr:
gather
and spread
are versatile functions for reshaping data frames.# Example: gather and spread with tidyr gathered_data <- gather(original_data, key = "Variable", value = "Value", -ID) spread_data <- spread(gathered_data, key = "Variable", value = "Value")