R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables. In R, regression models are typically fitted using the lm()
function, which stands for "linear model."
In this tutorial, we will cover:
Simple linear regression examines the relationship between two quantitative variables.
Suppose you have data on car speeds and stopping distances and want to see how speed affects distance.
# Example data speed <- c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11) distance <- c(2, 10, 4, 22, 16, 10, 18, 26, 34, 17) # Fit the model model <- lm(distance ~ speed) # Summarize the model summary(model)
If you have more than one independent variable, you can use multiple linear regression.
Imagine you want to predict a person's weight based on their height and age:
# Example data height <- c(152, 171, 164, 175, 178) age <- c(45, 26, 30, 34, 50) weight <- c(68, 55, 60, 72, 80) # Fit the model model_multi <- lm(weight ~ height + age) # Summarize the model summary(model_multi)
After fitting a regression model, it's important to diagnose its suitability and check the assumptions.
You can plot the residuals against fitted values:
plot(model$fitted.values, model$residuals) abline(h = 0, lty = 2)
You should also check for normality of residuals:
hist(model$residuals) qqnorm(model$residuals) qqline(model$residuals)
After fitting your model, you can use it to make predictions:
newdata <- data.frame(speed = c(12, 13, 14)) predictions <- predict(model, newdata) print(predictions)
This tutorial provides a basic overview of regression analysis in R. The lm()
function is the foundational tool for this, and the outputs from summary()
offer insights about the fitted model. Beyond the basics, there's a vast landscape in regression analysis, including handling of categorical predictors, interaction terms, nonlinear relationships, and much more. The car
, MASS
, and ggplot2
packages are some of the many resources available in R to enhance and refine your regression analyses.
Linear regression in R programming:
# Sample data x <- c(1, 2, 3, 4, 5) y <- c(2, 3, 4, 4, 5) # Linear regression model lm_model <- lm(y ~ x) # Summary of the model summary(lm_model)
Multiple regression in R with lm function:
# Sample data with multiple predictors x1 <- c(1, 2, 3, 4, 5) x2 <- c(2, 3, 4, 5, 6) y <- c(3, 4, 5, 5, 6) # Multiple regression model multiple_lm_model <- lm(y ~ x1 + x2) # Summary of the model summary(multiple_lm_model)
Logistic regression in R:
# Sample data for logistic regression x <- c(1, 2, 3, 4, 5) y <- c(0, 0, 1, 1, 1) # Logistic regression model logistic_model <- glm(y ~ x, family = binomial) # Summary of the model summary(logistic_model)
R code for regression diagnostics:
# Residuals vs. Fitted plot plot(lm_model, which = 1) # Normal Q-Q plot plot(lm_model, which = 2) # Scale-location plot plot(lm_model, which = 3)
Using ggplot2 for regression plots in R:
# Install and load ggplot2 package install.packages("ggplot2") library(ggplot2) # Scatter plot with regression line ggplot(data.frame(x, y), aes(x, y)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
Stepwise regression in R:
# Install and load the MASS package install.packages("MASS") library(MASS) # Stepwise regression stepwise_model <- stepAIC(lm_model, direction = "both") # Summary of the model summary(stepwise_model)
Ridge and Lasso regression in R:
# Install and load glmnet package install.packages("glmnet") library(glmnet) # Ridge regression ridge_model <- glmnet(as.matrix(model_matrix(lm_model)), y, alpha = 0) # Lasso regression lasso_model <- glmnet(as.matrix(model_matrix(lm_model)), y, alpha = 1)
Time series regression in R:
# Sample time series data time <- seq(as.Date("2022-01-01"), as.Date("2022-01-05"), by = "days") y <- c(3, 4, 5, 5, 6) # Time series regression time_series_model <- lm(y ~ time) # Summary of the model summary(time_series_model)
Comparing regression models in R:
# Compare models using AIC AIC(lm_model, multiple_lm_model, logistic_model)