R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Regression Analysis in R

Regression analysis is a powerful statistical method that allows you to examine the relationship between two or more variables. In R, regression models are typically fitted using the lm() function, which stands for "linear model."

In this tutorial, we will cover:

  1. Simple Linear Regression
  2. Multiple Linear Regression
  3. Model Diagnostics
  4. Making Predictions

1. Simple Linear Regression

Simple linear regression examines the relationship between two quantitative variables.

Suppose you have data on car speeds and stopping distances and want to see how speed affects distance.

# Example data
speed <- c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11)
distance <- c(2, 10, 4, 22, 16, 10, 18, 26, 34, 17)

# Fit the model
model <- lm(distance ~ speed)

# Summarize the model
summary(model)

2. Multiple Linear Regression

If you have more than one independent variable, you can use multiple linear regression.

Imagine you want to predict a person's weight based on their height and age:

# Example data
height <- c(152, 171, 164, 175, 178)
age <- c(45, 26, 30, 34, 50)
weight <- c(68, 55, 60, 72, 80)

# Fit the model
model_multi <- lm(weight ~ height + age)

# Summarize the model
summary(model_multi)

3. Model Diagnostics

After fitting a regression model, it's important to diagnose its suitability and check the assumptions.

  • Residuals: Differences between observed and predicted values.
  • Fitted values: Predicted values using the model.

You can plot the residuals against fitted values:

plot(model$fitted.values, model$residuals)
abline(h = 0, lty = 2)

You should also check for normality of residuals:

hist(model$residuals)
qqnorm(model$residuals)
qqline(model$residuals)

4. Making Predictions

After fitting your model, you can use it to make predictions:

newdata <- data.frame(speed = c(12, 13, 14))
predictions <- predict(model, newdata)
print(predictions)

Conclusion

This tutorial provides a basic overview of regression analysis in R. The lm() function is the foundational tool for this, and the outputs from summary() offer insights about the fitted model. Beyond the basics, there's a vast landscape in regression analysis, including handling of categorical predictors, interaction terms, nonlinear relationships, and much more. The car, MASS, and ggplot2 packages are some of the many resources available in R to enhance and refine your regression analyses.

  1. Linear regression in R programming:

    # Sample data
    x <- c(1, 2, 3, 4, 5)
    y <- c(2, 3, 4, 4, 5)
    
    # Linear regression model
    lm_model <- lm(y ~ x)
    
    # Summary of the model
    summary(lm_model)
    
  2. Multiple regression in R with lm function:

    # Sample data with multiple predictors
    x1 <- c(1, 2, 3, 4, 5)
    x2 <- c(2, 3, 4, 5, 6)
    y <- c(3, 4, 5, 5, 6)
    
    # Multiple regression model
    multiple_lm_model <- lm(y ~ x1 + x2)
    
    # Summary of the model
    summary(multiple_lm_model)
    
  3. Logistic regression in R:

    # Sample data for logistic regression
    x <- c(1, 2, 3, 4, 5)
    y <- c(0, 0, 1, 1, 1)
    
    # Logistic regression model
    logistic_model <- glm(y ~ x, family = binomial)
    
    # Summary of the model
    summary(logistic_model)
    
  4. R code for regression diagnostics:

    # Residuals vs. Fitted plot
    plot(lm_model, which = 1)
    
    # Normal Q-Q plot
    plot(lm_model, which = 2)
    
    # Scale-location plot
    plot(lm_model, which = 3)
    
  5. Using ggplot2 for regression plots in R:

    # Install and load ggplot2 package
    install.packages("ggplot2")
    library(ggplot2)
    
    # Scatter plot with regression line
    ggplot(data.frame(x, y), aes(x, y)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
    
  6. Stepwise regression in R:

    • Stepwise regression involves iteratively adding or removing predictors based on statistical criteria.
    # Install and load the MASS package
    install.packages("MASS")
    library(MASS)
    
    # Stepwise regression
    stepwise_model <- stepAIC(lm_model, direction = "both")
    
    # Summary of the model
    summary(stepwise_model)
    
  7. Ridge and Lasso regression in R:

    # Install and load glmnet package
    install.packages("glmnet")
    library(glmnet)
    
    # Ridge regression
    ridge_model <- glmnet(as.matrix(model_matrix(lm_model)), y, alpha = 0)
    
    # Lasso regression
    lasso_model <- glmnet(as.matrix(model_matrix(lm_model)), y, alpha = 1)
    
  8. Time series regression in R:

    # Sample time series data
    time <- seq(as.Date("2022-01-01"), as.Date("2022-01-05"), by = "days")
    y <- c(3, 4, 5, 5, 6)
    
    # Time series regression
    time_series_model <- lm(y ~ time)
    
    # Summary of the model
    summary(time_series_model)
    
  9. Comparing regression models in R:

    # Compare models using AIC
    AIC(lm_model, multiple_lm_model, logistic_model)