R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Central Tendency in R

Central tendency is a measure that identifies a single value as representative of an entire distribution. In R, the most common measures of central tendency are the mean, median, and mode. Let's explore these measures in more detail.

1. Mean

The mean, often called the average, is the sum of all values divided by the number of values.

Function: mean()

Example:

data <- c(4, 5, 6, 6, 7)
avg <- mean(data)
print(avg)  # Output: 5.6

2. Median

The median is the middle value of a dataset when it's arranged in order. If there is an even number of data points, the median is the average of the two middle values.

Function: median()

Example:

data <- c(4, 5, 6, 6, 7)
med <- median(data)
print(med)  # Output: 6

3. Mode

R does not have a built-in mode function. However, you can create a simple custom function to calculate it. The mode is the value that appears most frequently in a dataset.

Example:

mode_function <- function(x) {
  uniqv <- unique(x)
  uniqv[which.max(tabulate(match(x, uniqv)))]
}

data <- c(4, 5, 6, 6, 7)
mode_value <- mode_function(data)
print(mode_value)  # Output: 6

This mode_function finds the unique values in a dataset and then identifies which one appears most frequently.

Additional Information:

For more extensive data exploration, the summary() function provides a quick overview of central tendencies and other statistical properties:

data <- c(4, 5, 6, 6, 7)
summary(data)

Output:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   4.00    5.00    6.00    5.60    6.00    7.00 

Handling NA values:

By default, functions like mean() and median() return NA if there are any NA values in the dataset. To exclude NA values in calculations, use the argument na.rm=TRUE.

data_with_na <- c(4, 5, 6, NA, 7)
mean(data_with_na, na.rm = TRUE)  # Output: 5.5

Summary:

Understanding and measuring central tendency is essential in statistics and data analysis. R provides easy-to-use functions to calculate the mean and median, and with a simple custom function, you can also compute the mode.

  1. R Central Tendency Measures Example:

    # Generate sample data
    set.seed(123)
    data <- c(1, 2, 2, 3, 3, 3, 4, 4, 5)
    
    # Calculate mean, median, mode, weighted mean, and quantiles
    mean_value <- mean(data)
    median_value <- median(data)
    mode_value <- table(data)[which.max(table(data))]
    weighted_mean <- weighted.mean(data, w = rep(1, length(data)))
    quantiles <- quantile(data, c(0.25, 0.5, 0.75))
    
  2. Calculating Mean in R:

    # Calculate mean in R
    data <- c(1, 2, 3, 4, 5)
    mean_value <- mean(data)
    
  3. Median Calculation in R:

    # Calculate median in R
    data <- c(1, 2, 3, 4, 5)
    median_value <- median(data)
    
  4. Mode Function in R:

    # Calculate mode using a custom function
    mode_function <- function(x) {
      tbl <- table(x)
      mode_value <- as.numeric(names(tbl)[tbl == max(tbl)])
      return(mode_value)
    }
    
    data <- c(1, 2, 2, 3, 3, 3, 4, 4, 5)
    mode_value <- mode_function(data)
    
  5. Weighted Mean in R:

    # Calculate weighted mean in R
    values <- c(1, 2, 3, 4, 5)
    weights <- c(0.1, 0.2, 0.3, 0.2, 0.2)
    weighted_mean <- sum(values * weights) / sum(weights)
    
  6. Using Quantiles for Central Tendency in R:

    # Calculate quantiles in R
    data <- c(1, 2, 3, 4, 5)
    quantiles <- quantile(data, c(0.25, 0.5, 0.75))
    
  7. Robust Measures of Central Tendency in R:

    # Calculate robust measures (median and trimmed mean)
    data <- c(1, 2, 2, 3, 3, 3, 4, 4, 5)
    median_value <- median(data)
    trimmed_mean <- mean(data, trim = 0.2)
    
  8. Comparing Mean, Median, and Mode in R:

    # Compare mean, median, and mode
    data <- c(1, 2, 2, 3, 3, 3, 4, 4, 5)
    mean_value <- mean(data)
    median_value <- median(data)
    mode_function <- function(x) {
      tbl <- table(x)
      mode_value <- as.numeric(names(tbl)[tbl == max(tbl)])
      return(mode_value)
    }
    mode_value <- mode_function(data)
    
  9. Handling Missing Values in Central Tendency Measures in R:

    # Handle missing values
    data <- c(1, 2, NA, 4, 5)
    mean_value <- mean(data, na.rm = TRUE)
    median_value <- median(data, na.rm = TRUE)
    
  10. Visualizing Central Tendency with Boxplots in R:

    # Visualize central tendency with boxplots
    set.seed(123)
    data <- rnorm(100)
    boxplot(data, main = "Boxplot of Data", ylab = "Values")