R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

tidyr Package in R

The tidyr package is a part of the tidyverse in R and is used for tidying up datasets. It provides a suite of functions that help in reshaping data structures, changing the layout, and ensuring data follows the tidy data principles:

Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.

This tutorial will guide you through some of the main functions of tidyr.

1. Installation and Loading:

First, you need to install and load the package:

install.packages("tidyr")
library(tidyr)

2. `pivot_longer()`:

Converts data from a "wide" format to a "long" format.

data <- data.frame(
  name = c("John", "Jane"),
  test1 = c(85, 88),
  test2 = c(92, 90)
)

data_long <- data %>% pivot_longer(cols = c(test1, test2), names_to = "test", values_to = "score")

3. `pivot_wider()`:

Converts data from a "long" format to a "wide" format.

data_wide <- data_long %>% pivot_wider(names_from = test, values_from = score)

4. `separate()`:

Separates one column into multiple columns.

data <- data.frame(name_age = c("John_25", "Jane_30"))
data_sep <- data %>% separate(name_age, into = c("name", "age"), sep = "_")

5. `unite()`:

Combines multiple columns into a single column.

data_unite <- data_sep %>% unite("name_age", name, age, sep = "_")

6. `drop_na()`:

Removes rows with missing values in specific columns.

data <- data.frame(name = c("John", "Jane", "Jim"), age = c(25, NA, 30))
data_clean <- data %>% drop_na(age)

7. `fill()`:

Fills missing values in a column with the previous or following value.

data <- data.frame(group = c("A", "A", "B", "B"), value = c(1, NA, 3, NA))
data_fill <- data %>% fill(value)

8. `nest()` and `unnest()`:

nest() collapses selected columns into a nested data frame, while unnest() does the reverse.

data <- data.frame(group = c("A", "B"), val1 = 1:2, val2 = 3:4)
data_nested <- data %>% nest(data = c(val1, val2))

data_unnested <- data_nested %>% unnest(data)

9. `complete()`:

Expands a dataset to include all combinations of specified columns.

data <- data.frame(group = c("A", "B"), value = 1:2)
data_complete <- data %>% complete(group, nesting(value = 1:3))

Conclusion:

The tidyr package offers a versatile suite of functions to make data tidying simpler. When combined with other tidyverse packages like dplyr, it becomes a powerful tool for data manipulation in R. Remember that real-world data often comes messy, and mastering tidyr can save a significant amount of time during the data cleaning process.

Data Tidying with tidyr in R:
- tidyr is a package that helps tidy messy data by reshaping and restructuring it for easier analysis.
```
# Example: Tidying data with tidyr
library(tidyr)
tidy_data <- gather(original_data, key = "Variable", value = "Value", -ID)
```

R tidyr vs reshape2 Comparison:

tidyr and reshape2 are both used for data reshaping, but tidyr provides a more consistent syntax.

# Example: tidyr vs reshape2
library(tidyr)
library(reshape2)
tidy_data <- gather(original_data, key = "Variable", value = "Value", -ID)

Pivoting and Gathering Data with tidyr in R:

Use pivot_longer to gather columns into key-value pairs.

# Example: Pivot longer with tidyr
gathered_data <- pivot_longer(original_data, cols = starts_with("Var"), names_to = "Variable", values_to = "Value")

Spreading Data with tidyr in R:

Use pivot_wider to spread key-value pairs back into columns.

# Example: Pivot wider with tidyr
spread_data <- pivot_wider(gathered_data, names_from = "Variable", values_from = "Value")

Separating and Uniting Columns with tidyr:

separate and unite functions help split and combine columns, respectively.

# Example: Separating and uniting columns
separated_data <- separate(original_data, col = "Combined", into = c("Var1", "Var2"), sep = "_")
united_data <- unite(separated_data, col = "Combined", Var1, Var2, sep = "_")

Handling Missing Values with tidyr in R:

drop_na removes rows with missing values, while fill replaces them.

# Example: Handling missing values
cleaned_data <- drop_na(original_data)
filled_data <- fill(original_data, Value)

Reshaping Data Frames with tidyr:

gather and spread are core functions for reshaping data frames.

# Example: Reshaping data frames
tidy_data <- gather(original_data, key = "Variable", value = "Value", -ID)
wide_data <- spread(tidy_data, key = "Variable", value = "Value")

Wide to Long Format Conversion in R tidyr:

Convert data from wide to long format for better analysis.

# Example: Wide to long format conversion
long_data <- pivot_longer(wide_data, cols = -ID, names_to = "Variable", values_to = "Value")

Working with Nested Data Frames in tidyr:
- Use unnest to work with nested data frames within a column.
```
# Example: Working with nested data frames
unnested_data <- unnest(nested_data)
```

Tidying Messy Data in R:

Tidy messy data using functions like separate, spread, and gather.

# Example: Tidying messy data
tidy_data <- separate(original_data, col = "Combined", into = c("Var1", "Var2"), sep = "_") %>%
              gather(key = "Variable", value = "Value", -ID)

R tidyr and dplyr Integration:

tidyr and dplyr work well together for comprehensive data manipulation.

# Example: tidyr and dplyr integration
library(dplyr)
tidy_data <- original_data %>%
             gather(key = "Variable", value = "Value", -ID) %>%
             filter(Value > 0) %>%
             spread(key = "Variable", value = "Value")

Pivot Longer and Pivot Wider in tidyr:

pivot_longer and pivot_wider are essential for converting data between long and wide formats.

# Example: Pivot longer and pivot wider
long_data <- pivot_longer(original_data, cols = -ID, names_to = "Variable", values_to = "Value")
wide_data <- pivot_wider(long_data, names_from = "Variable", values_from = "Value")

Tidying Factors and Character Columns in R:

Use mutate and recode to tidy factors and character columns.

# Example: Tidying factors and character columns
tidy_factors <- original_data %>%
                mutate(Category = recode(Category, "A" = "CategoryA", "B" = "CategoryB"))

R gather and spread Functions with tidyr:

gather and spread are versatile functions for reshaping data frames.

# Example: gather and spread with tidyr
gathered_data <- gather(original_data, key = "Variable", value = "Value", -ID)
spread_data <- spread(gathered_data, key = "Variable", value = "Value")

tidyr Package in R

1. Installation and Loading:

2. pivot_longer():

3. pivot_wider():

4. separate():

5. unite():

6. drop_na():

7. fill():

8. nest() and unnest():

9. complete():