R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Statistics in R

R is a powerful tool for statistical analysis and visualization. In this tutorial, we will cover the basics of statistics in R, including:

  1. Descriptive Statistics
  2. Inferential Statistics
  3. Correlation and Regression

Let's dive in.

1. Descriptive Statistics

Descriptive statistics provide a summary of the main aspects of the data.

1.1 Measures of Central Tendency

  • Mean:
data <- c(1, 2, 3, 4, 5)
mean(data)
  • Median:
median(data)
  • Mode (R does not have a built-in mode function. You can write one like this):
getmode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}
getmode(data)

1.2 Measures of Dispersion

  • Variance:
var(data)
  • Standard Deviation:
sd(data)
  • Range:
range(data)

1.3 Summary of Data

To get a quick summary of data:

summary(data)

2. Inferential Statistics

Inferential statistics are used to make inferences or draw conclusions about a population based on a sample.

2.1 T-test

For comparing the means of two groups:

group1 <- c(1, 2, 3, 4, 5)
group2 <- c(5, 6, 7, 8, 9)
t.test(group1, group2)

2.2 ANOVA

For comparing the means of more than two groups:

group3 <- c(10, 11, 12, 13, 14)
anova_results <- aov(data ~ group, data = data.frame(data = c(group1, group2, group3), group = factor(rep(1:3, each = 5))))
summary(anova_results)

2.3 Chi-Squared Test

For testing relationships between categorical variables:

table1 <- matrix(c(10, 20, 30, 40), ncol = 2)
chisq.test(table1)

3. Correlation and Regression

3.1 Correlation

To find the correlation between two variables:

x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)
cor(x, y)

3.2 Linear Regression

To establish a relationship between two variables:

model <- lm(y ~ x)
summary(model)

Conclusion

This tutorial provides a basic introduction to some common statistical analyses in R. R offers a plethora of packages and functions for more advanced statistics, so it's beneficial to explore further based on your specific needs. Whether you're looking to understand the basics or dive deep into advanced statistical modeling, R has the tools to help you make data-driven decisions.

  1. Descriptive statistics in R programming:

    • Descriptive statistics summarize and describe the main features of a dataset.
    # Example of descriptive statistics
    data <- c(10, 15, 12, 8, 20)
    mean_value <- mean(data)
    median_value <- median(data)
    sd_value <- sd(data)
    summary_stats <- summary(data)
    
  2. Inferential statistics using R:

    • Inferential statistics make predictions or inferences about a population based on a sample.
    # Example of inferential statistics (t-test)
    group1 <- c(25, 30, 35, 40, 45)
    group2 <- c(20, 22, 25, 28, 30)
    
    t_test_result <- t.test(group1, group2)
    
  3. Statistical tests in R (t-test, ANOVA, chi-square, etc.):

    • R provides functions for various statistical tests to analyze differences or relationships.
    # Example of ANOVA
    factor_levels <- rep(c("A", "B", "C"), each = 10)
    response_variable <- rnorm(30)
    
    anova_result <- aov(response_variable ~ factor(factor_levels))
    
  4. Correlation and regression analysis in R:

    • Correlation measures the strength and direction of a linear relationship, while regression predicts one variable based on another.
    # Example of correlation and regression
    x <- c(2, 3, 5, 7, 8)
    y <- c(10, 12, 15, 20, 22)
    
    correlation_coefficient <- cor(x, y)
    regression_model <- lm(y ~ x)
    
  5. Probability distributions in R:

    • R provides functions to work with various probability distributions.
    # Example of probability distribution (normal distribution)
    data <- rnorm(1000, mean = 0, sd = 1)
    
  6. Data visualization for statistical analysis in R:

    • Data visualization enhances the interpretation of statistical results.
    # Example of data visualization (boxplot)
    boxplot(group1, group2, names = c("Group 1", "Group 2"), col = c("blue", "green"))
    
  7. Multivariate statistics in R:

    • Multivariate statistics analyze relationships among multiple variables.
    # Example of multivariate analysis (principal component analysis)
    data_matrix <- matrix(rnorm(100), ncol = 5)
    pca_result <- princomp(data_matrix)
    
  8. Time series analysis and forecasting in R:

    • R offers tools for analyzing time series data and making forecasts.
    # Example of time series analysis (ARIMA model)
    time_series_data <- ts(rnorm(100), start = 1)
    arima_model <- arima(time_series_data, order = c(1, 1, 1))