R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Data frames are a fundamental data structure in R for storing tabular data. They are similar to tables in databases or Excel spreadsheets. This tutorial will walk you through the basics of working with data frames in R.
data.frame
Functionstudents <- data.frame( Name = c("Alice", "Bob", "Charlie"), Age = c(20, 21, 22), Major = c("Biology", "Math", "History"), stringsAsFactors = FALSE # Prevents strings from being converted to factors )
names <- c("Alice", "Bob", "Charlie") ages <- c(20, 21, 22) majors <- c("Biology", "Math", "History") students <- data.frame(Name = names, Age = ages, Major = majors, stringsAsFactors = FALSE)
You can access columns using the $
operator or double square brackets.
students$Name students[["Name"]]
Use the row index to access specific rows.
students[1, ] # First row students[1:2, ] # First two rows
students[1, "Name"] # Name of the first student
students$GPA <- c(3.5, 3.8, 3.6) # Add a new GPA column
students$Age <- students$Age + 1 # Increment age by 1
students$GPA <- NULL # Removes the GPA column
head
and tail
head(students, n = 2) # Displays the first two rows tail(students, n = 2) # Displays the last two rows
str
Displays the internal structure of an R object, which is particularly useful for data frames.
str(students)
summary
Provides summary statistics for each column in a data frame.
summary(students)
dim
, nrow
, and ncol
dim(students) # Dimensions of the data frame (rows, columns) nrow(students) # Number of rows ncol(students) # Number of columns
rownames
rownames(students) # Get or set row names
subset <- students[students$Age > 20, ] # Students older than 20
Data frames are one of the main reasons R is such a powerful tool for data analysis. They provide a flexible and intuitive structure for handling and analyzing structured data. The more you work with them, the more efficient your data operations in R will become. For advanced data frame operations, consider exploring packages like dplyr
and tidyr
.
Creating and initializing data frames in R:
# Creating a data frame my_data <- data.frame( ID = c(1, 2, 3, 4, 5), Name = c("John", "Alice", "Bob", "Eva", "Mike"), Age = c(25, 30, 22, 28, 35) )
R code for subsetting and indexing data frames:
# Subsetting rows based on condition subset_data <- my_data[my_data$Age > 25, ] # Indexing columns age_column <- my_data$Age
Manipulating columns and rows in R data frames:
# Adding a new column my_data$Salary <- c(50000, 60000, 45000, 55000, 70000) # Removing a column my_data <- my_data[, -3] # Remove the third column (Age)
Aggregating and summarizing data frames in R:
# Aggregating data by Age and calculating mean summarised_data <- aggregate(cbind(Salary, ID) ~ Age, data = my_data, FUN = mean)
Joining and merging data frames in R:
# Creating two data frames df1 <- data.frame(ID = 1:3, Value = c("A", "B", "C")) df2 <- data.frame(ID = 2:4, Score = c(10, 15, 20)) # Inner join inner_join(df1, df2, by = "ID") # Full outer join merge(df1, df2, by = "ID", all = TRUE)
R data frame operations and functions:
# Applying a function to a column my_data$Bonus <- apply(my_data[, c("Salary", "Age")], 1, function(x) x[1] * 0.1 + x[2]) # Row-wise operations row_sums <- apply(my_data[, c("Salary", "Age")], 1, sum)
Handling missing values in R data frames:
# Creating a data frame with missing values my_data_missing <- data.frame(ID = 1:5, Name = c("John", NA, "Bob", "Eva", "Mike"), Age = c(25, 30, NA, 28, 35)) # Removing rows with missing values my_data_clean <- na.omit(my_data_missing) # Imputing missing values with mean my_data_missing$Age[is.na(my_data_missing$Age)] <- mean(my_data_missing$Age, na.rm = TRUE)
Reshaping and transforming data frames in R:
# Reshaping data frame reshaped_data <- spread(my_data, key = Age, value = Salary) # Transposing data frame transposed_data <- t(my_data)
Time series analysis with data frames in R:
zoo
or xts
for time series data.# Creating a time series data frame time_series_data <- data.frame(Date = seq(as.Date("2022-01-01"), as.Date("2022-01-05"), by = "days"), Value = c(10, 15, 20, 25, 30)) # Converting to time series object library(xts) time_series_xts <- xts(time_series_data$Value, order.by = as.Date(time_series_data$Date))