R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Data frames are one of the primary data structures in R, ideal for storing tabular data. They're similar to matrices but can hold columns of different types (numeric, character, factor, etc.). Here's a tutorial covering basic operations you can perform on data frames in R:
Using the data.frame()
function:
df <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 23), Score = c(85, 90, 82) ) print(df)
You can access columns in a data frame using the $
operator or the [[
operator:
print(df$Name) print(df[["Age"]])
Use indexing:
# Access the first row print(df[1, ]) # Access the first and third rows print(df[c(1, 3), ])
df$City <- c("London", "Paris", "Berlin") print(df)
Use the rbind()
function:
new_row <- data.frame(Name = "David", Age = 28, Score = 88, City = "Madrid") df <- rbind(df, new_row) print(df)
df$City <- NULL # remove the City column print(df)
df <- df[-2, ] # remove the second row print(df)
Filter rows based on some condition:
filtered_df <- df[df$Age > 24, ] print(filtered_df)
Use the order()
function:
sorted_df <- df[order(df$Age), ] # sort by Age in ascending order print(sorted_df)
To get a summary of the numeric columns:
summary(df)
num_rows <- nrow(df) num_cols <- ncol(df) print(paste("Number of rows:", num_rows)) print(paste("Number of columns:", num_cols))
print(colnames(df)) print(sapply(df, class))
You can use the apply()
function to apply a function over rows or columns. For example, to get the mean of each numeric column:
print(apply(df[, sapply(df, is.numeric)], 2, mean))
Join two data frames by a common column using the merge()
function:
df2 <- data.frame(Name = c("Alice", "Charlie", "David"), Grade = c("A", "B", "C")) merged_df <- merge(df, df2, by = "Name") print(merged_df)
These are just some basic operations you can perform on data frames in R. With its rich ecosystem of packages and vast community support, R provides many more advanced functionalities for data frame manipulation, especially with packages like dplyr
and tidyr
.
Subset, filter, and select in R DataFrame:
# Subset DataFrame based on a condition subset_data <- original_data[original_data$Age > 25, ] # Filter DataFrame using dplyr library(dplyr) filtered_data <- original_data %>% filter(Age > 25) %>% select(Name, Age)
Joining DataFrames in R:
# Join DataFrames using merge merged_data <- merge(df1, df2, by = "common_column") # Join DataFrames using dplyr joined_data <- inner_join(df1, df2, by = "common_column")
Grouping and aggregation in R DataFrame:
# Group and aggregate using base R grouped_data <- aggregate(Score ~ Group, data = original_data, mean) # Group and aggregate using dplyr library(dplyr) grouped_data_dplyr <- original_data %>% group_by(Group) %>% summarise(mean_score = mean(Score))
Sorting and ordering DataFrame in R:
# Sort DataFrame based on a column using base R sorted_data <- original_data[order(original_data$Age), ] # Sort DataFrame using dplyr library(dplyr) sorted_data_dplyr <- original_data %>% arrange(Age)
Reshaping DataFrames in R:
# Reshape DataFrame using tidyr library(tidyr) reshaped_data <- spread(original_data, key = Type, value = Value)
Handling missing values in R DataFrame:
# Remove rows with missing values using base R cleaned_data <- original_data[complete.cases(original_data), ] # Remove missing values using dplyr library(dplyr) cleaned_data_dplyr <- original_data %>% drop_na()