R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Bootstrapping is a powerful resampling technique that is used to estimate the distribution of a statistic (like the mean or variance) by resampling with replacement from the data. It provides a way to quantify the uncertainty of various statistics of the data.
Let's walk through a basic bootstrapping tutorial in R:
1. Bootstrapping the Mean
Suppose you have a small dataset and you want to estimate the uncertainty (confidence interval) of the mean.
# Sample data data <- c(10, 20, 30, 40, 50) # Number of bootstrap samples n_bootstrap <- 1000 # Store bootstrap means here bootstrap_means <- numeric(n_bootstrap) # Bootstrapping set.seed(123) # For reproducibility for(i in 1:n_bootstrap) { sample_data <- sample(data, replace = TRUE, size = length(data)) bootstrap_means[i] <- mean(sample_data) } # Getting the 95% confidence interval quantile(bootstrap_means, c(0.025, 0.975))
2. Bootstrapping Regression Models
Suppose you have a simple linear regression and you want to bootstrap the coefficients to get their confidence intervals.
# Sample data set.seed(123) x <- rnorm(100) y <- 3*x + rnorm(100) # Bootstrapping n_bootstrap <- 1000 coefficients <- matrix(0, n_bootstrap, 2) # 2 for intercept and slope for(i in 1:n_bootstrap) { idx <- sample(1:length(y), replace = TRUE) bootstrap_sample <- lm(y[idx] ~ x[idx]) coefficients[i,] <- coef(bootstrap_sample) } # 95% CI for Intercept and Slope apply(coefficients, 2, quantile, c(0.025, 0.975))
3. Using the boot
Package
For more complex bootstrapping tasks, the boot
package can be very helpful.
install.packages("boot") library(boot) # Bootstrapping mean using boot package statistic_function <- function(data, indices) { return(mean(data[indices])) } boot_obj <- boot(data = data, statistic = statistic_function, R = 1000) boot_obj # For confidence intervals boot.ci(boot_obj, type = "perc") # Percentile method
This package can handle many complex bootstrapping tasks, making the procedure easier and more efficient.
4. Visualizing Bootstrap Distributions
Using the hist()
function, you can visualize the distribution of bootstrap statistics.
hist(bootstrap_means, main = "Bootstrap Distribution of Mean", xlab = "Mean", col = "lightblue", border = "black")
This visual representation helps understand the variability of the statistic and the shape of its distribution.
Bootstrapping is a versatile and powerful technique, especially when the sample size is small, or when the theoretical distribution of a statistic is complex or unknown. However, remember that bootstrapping provides an empirical distribution and relies heavily on the assumption that the sample represents the population well.
Bootstrapping in R Example:
# Generate a sample data set.seed(123) data <- rnorm(100) # Perform bootstrapping bootstrapped_means <- replicate(1000, mean(sample(data, replace = TRUE)))
How to Perform Bootstrap Resampling in R:
# Bootstrap resampling set.seed(123) bootstrapped_sample <- sample(data, replace = TRUE)
Bootstrapped Confidence Intervals in R:
# Calculate bootstrapped confidence intervals boot_ci <- quantile(bootstrapped_means, c(0.025, 0.975))
Bootstrap Sampling Techniques in R:
Simple Bootstrap:
# Simple bootstrap boot_sample <- sample(data, replace = TRUE)
Stratified Bootstrap:
# Stratified bootstrap for categorical data library(boot) strata <- rep(1:2, each = 50) strat_boot <- stratified(data, strata, boot.fun = mean, R = 1000)
Bootstrapping for Hypothesis Testing in R:
# Bootstrap for hypothesis testing observed_diff <- mean(data[data$group == "A"]) - mean(data[data$group == "B"]) bootstrap_diff <- replicate(1000, { group_A <- sample(data[data$group == "A"], replace = TRUE) group_B <- sample(data[data$group == "B"], replace = TRUE) mean(group_A) - mean(group_B) }) p_value <- mean(abs(bootstrap_diff) >= abs(observed_diff))
Resampling Methods and Bootstrapping in R:
# Jackknife resampling library(bootstrap) jackknife_result <- jackknife(data, mean, stype = "i")
Bootstrapping Regression Models in R:
# Bootstrapping regression models library(boot) boot_model <- boot(data = my_data, statistic = function(data, indices) { sampled_data <- data[indices, ] lm_result <- lm(response ~ predictor, data = sampled_data) return(coef(lm_result)) }, R = 1000)
Bootstrapping Time Series Data in R:
# Bootstrapping time series data library(boot) boot_time_series <- tsboot(data, statistic = mean, R = 1000)
Comparing Bootstrap Methods in R:
Percentile Bootstrap:
# Percentile bootstrap boot_percentile_ci <- quantile(bootstrapped_means, c(0.025, 0.975))
BCa (Bias-Corrected and Accelerated) Bootstrap:
# BCa bootstrap library(boot) boot_bca_ci <- boot(data, mean, R = 1000, conf = 0.95, type = "bca")
Bootstrapping vs Monte Carlo Simulation in R:
Bootstrapping:
# Bootstrapping example set.seed(123) data <- rnorm(100) bootstrapped_means <- replicate(1000, mean(sample(data, replace = TRUE)))
Monte Carlo Simulation:
# Monte Carlo simulation example set.seed(123) sim_data <- rnorm(1000) simulated_means <- replicate(1000, mean(sample(sim_data, replace = TRUE)))