R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

dplyr Package in R

dplyr is one of the most popular packages in R for data manipulation. Developed by Hadley Wickham, it provides a coherent system to operate on datasets using a set of "verbs" that perform common data manipulation tasks. This tutorial will introduce you to some of the core functionalities of dplyr.

1. Install and Load the `dplyr` package:

If you haven't already installed it, do so with:

install.packages("dplyr")

Load the package:

library(dplyr)

2. Basic Verbs in `dplyr`:

The main verbs in dplyr are:

select(): Choose variables (columns) from a dataset.
filter(): Filter rows based on some criteria.
arrange(): Reorder rows.
mutate(): Create or transform columns.
summarise(): Summarize data.

3. Working with the `dplyr` Verbs:

a. `select()`:

Choose specific columns from a dataset.

data(mtcars)
select(mtcars, mpg, hp)

b. `filter()`:

Select rows based on a condition.

filter(mtcars, mpg > 20, hp < 100)

c. `arrange()`:

Sort the data based on a column. Use desc() for descending order.

arrange(mtcars, mpg)          # Ascending order
arrange(mtcars, desc(mpg))    # Descending order

d. `mutate()`:

Create a new column or modify an existing one.

mutate(mtcars, efficiency = mpg/hp)

e. `summarise()`:

Create a summary of your data.

summarise(mtcars, avg_mpg = mean(mpg), max_hp = max(hp))

4. Chaining (`%>%` operator):

dplyr offers a chaining mechanism using %>% (pipe operator) to combine multiple operations.

mtcars %>%
  filter(mpg > 20) %>%
  select(mpg, hp) %>%
  arrange(desc(hp))

This code filters the rows where mpg is more than 20, selects the mpg and hp columns, and arranges them in descending order based on hp.

5. Working with Groups (`group_by()`):

Grouping is a powerful tool in dplyr, allowing you to split data and operate on each group.

mtcars %>%
  group_by(cyl) %>%
  summarise(avg_mpg = mean(mpg))

This groups the mtcars dataset by the cyl column and calculates the average mpg for each group of cylinders.

6. Joining Data:

dplyr also supports various types of joins such as inner_join(), left_join(), right_join(), and full_join(). They work similarly to SQL joins.

df1 <- data.frame(id = 1:3, name = c("A", "B", "C"))
df2 <- data.frame(id = 2:4, score = c(85, 90, 78))

inner_join(df1, df2, by = "id")

Conclusion:

dplyr simplifies data manipulation tasks in R, making the code readable and efficient. While this tutorial covered the basics, dplyr offers a lot more functionalities that can be explored further in its documentation and vignettes.

dplyr package in R:
- Description: The dplyr package is a powerful tool for data manipulation in R, providing a set of functions that simplify and streamline common data manipulation tasks.
- Code:
```
# Install and load the dplyr package
install.packages("dplyr")
library(dplyr)
```

Data manipulation with dplyr:

Description: Use dplyr functions to manipulate data, making tasks like filtering, summarizing, and arranging more intuitive and readable.

Code:

# Sample data frame
data <- data.frame(
  ID = c(1, 2, 3),
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 22)
)

# Filter data using dplyr
filtered_data <- data %>% filter(Age > 25)

Filtering data with dplyr in R:
- Description: Use the filter() function in dplyr to subset data based on specified conditions.
- Code:
```
# Filter data for individuals older than 25
filtered_data <- data %>% filter(Age > 25)
```
Grouping and summarizing with dplyr:
- Description: Employ the group_by() and summarize() functions to group data by one or more variables and calculate summary statistics.
- Code:
```
# Group data by Age and calculate mean for each group
summarized_data <- data %>% group_by(Age) %>% summarize(mean_age = mean(Age))
```

Joining tables with dplyr in R:

Description: Use left_join(), right_join(), inner_join(), or other join functions in dplyr to combine tables based on common columns.

Code:

# Sample data frames to join
df1 <- data.frame(ID = c(1, 2), Value1 = c(10, 20))
df2 <- data.frame(ID = c(2, 3), Value2 = c(30, 40))

# Left join based on ID
joined_data <- left_join(df1, df2, by = "ID")

Mutating variables with dplyr:
- Description: Use the mutate() function to create or modify variables (columns) in a data frame.
- Code:
```
# Add a new variable 'IsAdult' based on Age
mutated_data <- data %>% mutate(IsAdult = ifelse(Age >= 18, "Yes", "No"))
```

Arranging and selecting columns with dplyr in R:

Description: Use arrange() to sort rows based on one or more columns, and select() to choose specific columns.

Code:

# Arrange data by Age in descending order
arranged_data <- data %>% arrange(desc(Age))

# Select only 'Name' and 'Age' columns
selected_data <- data %>% select(Name, Age)

dplyr Package in R

1. Install and Load the dplyr package:

2. Basic Verbs in dplyr:

3. Working with the dplyr Verbs:

a. select():

b. filter():

c. arrange():

d. mutate():

e. summarise():

4. Chaining (%>% operator):

5. Working with Groups (group_by()):

6. Joining Data:

Conclusion:

1. Install and Load the `dplyr` package:

2. Basic Verbs in `dplyr`:

3. Working with the `dplyr` Verbs:

a. `select()`:

b. `filter()`:

c. `arrange()`:

d. `mutate()`:

e. `summarise()`:

4. Chaining (`%>%` operator):

5. Working with Groups (`group_by()`):