R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Data Wrangling - Data Transformation in R

Data wrangling, also known as data transformation, involves cleaning, structuring, and enriching raw data into a format that's suitable for analysis. R, with its tidyverse suite of packages, especially dplyr, makes data wrangling intuitive and efficient.

1. Installing and Loading Required Packages:

install.packages("tidyverse")
library(tidyverse)

2. Basic Data Manipulation with `dplyr`:

Let's consider the mpg dataset from the ggplot2 package:

data(mpg)
head(mpg)

a. Selecting Columns with `select()`:

select(mpg, manufacturer, model, hwy)

b. Filtering Rows with `filter()`:

filter(mpg, cyl == 4 & hwy > 30)

c. Arranging Rows with `arrange()`:

Sort data by the hwy column:

arrange(mpg, hwy)

For descending order:

arrange(mpg, desc(hwy))

d. Creating or Modifying Columns with `mutate()`:

Create a new column that calculates miles per gallon as a ratio of cty to hwy:

mutate(mpg, mpg_ratio = cty / hwy)

e. Summarizing Data with `summarise()`:

Calculate the mean highway miles per gallon:

summarise(mpg, mean_hwy = mean(hwy, na.rm = TRUE))

f. Grouped Operations with `group_by()`:

Calculate the mean highway miles per gallon for each number of cylinders:

mpg %>%
  group_by(cyl) %>%
  summarise(mean_hwy = mean(hwy, na.rm = TRUE))

3. Joining Data:

Join operations combine data from multiple datasets. Let's consider two example data frames:

df1 <- tibble(id = 1:3, name = c("Alice", "Bob", "Charlie"))
df2 <- tibble(id = c(2, 3, 4), score = c(90, 85, 70))

a. Inner Join:

inner_join(df1, df2, by = "id")

b. Left Join:

left_join(df1, df2, by = "id")

4. Chaining Operations:

You can chain multiple operations using the pipe (%>%) operator:

mpg %>%
  filter(cyl == 4) %>%
  mutate(mpg_ratio = cty / hwy) %>%
  select(manufacturer, model, mpg_ratio) %>%
  arrange(desc(mpg_ratio))

5. Working with String Data:

The stringr package in the tidyverse offers string manipulation functions:

library(stringr)

# Convert to uppercase
str_to_upper(c("a", "b", "c"))

# Replace characters in a string
str_replace_all("abcabc", "a", "z")

6. Working with Date Data:

The lubridate package makes working with dates easier:

library(lubridate)

# Parse dates
ymd("20230101")

# Extract components
day(ymd("20230515"))

Conclusion:

This tutorial provides an overview of essential data wrangling and transformation techniques in R using the tidyverse packages. Mastering these operations is vital for efficient and robust data analysis in R. Always consult the package documentation and vignettes for more in-depth knowledge and advanced techniques.

Data transformation techniques in R:

Data transformation involves modifying, aggregating, or reformatting data.

# Using dplyr for data transformation
library(dplyr)
transformed_data <- original_data %>%
  mutate(TransformedColumn = log(NumericColumn))

Tidying data in R:

Tidying data involves organizing it into a consistent and structured format.

# Using tidyr for tidying data
library(tidyr)
tidy_data <- original_data %>%
  gather(Key, Value, -ID)

Reshaping data in R:

Reshaping data is the process of changing its structure using functions like gather, spread, or pivot_longer, pivot_wider.

# Using tidyr for reshaping data
library(tidyr)
long_data <- pivot_longer(original_wide_data, cols = -ID, names_to = "Variable", values_to = "Value")

Data cleaning and preprocessing in R:

Data cleaning involves handling missing values, outliers, and ensuring data quality.

# Using dplyr for data cleaning
library(dplyr)
cleaned_data <- original_data %>%
  filter(!is.na(NumericColumn)) %>%
  arrange(ID)

Manipulating data frames in R:

Use dplyr functions for efficient manipulation of data frames.

# Using dplyr for data frame manipulation
library(dplyr)
manipulated_data <- original_data %>%
  select(Column1, Column2) %>%
  filter(Column1 > 5) %>%
  mutate(NewColumn = Column2 * 2)

Data wrangling with dplyr and tidyr:

dplyr and tidyr are powerful packages for data wrangling tasks.

# Using dplyr and tidyr for data wrangling
library(dplyr)
library(tidyr)
wrangled_data <- original_data %>%
  gather(Key, Value, -ID) %>%
  filter(!is.na(Value)) %>%
  spread(Key, Value)

Data Wrangling - Data Transformation in R

1. Installing and Loading Required Packages:

2. Basic Data Manipulation with dplyr:

a. Selecting Columns with select():

b. Filtering Rows with filter():

c. Arranging Rows with arrange():

d. Creating or Modifying Columns with mutate():

e. Summarizing Data with summarise():

f. Grouped Operations with group_by():