R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Central tendency is a measure that identifies a single value as representative of an entire distribution. In R, the most common measures of central tendency are the mean, median, and mode. Let's explore these measures in more detail.
The mean, often called the average, is the sum of all values divided by the number of values.
Function: mean()
data <- c(4, 5, 6, 6, 7) avg <- mean(data) print(avg) # Output: 5.6
The median is the middle value of a dataset when it's arranged in order. If there is an even number of data points, the median is the average of the two middle values.
Function: median()
data <- c(4, 5, 6, 6, 7) med <- median(data) print(med) # Output: 6
R does not have a built-in mode function. However, you can create a simple custom function to calculate it. The mode is the value that appears most frequently in a dataset.
mode_function <- function(x) { uniqv <- unique(x) uniqv[which.max(tabulate(match(x, uniqv)))] } data <- c(4, 5, 6, 6, 7) mode_value <- mode_function(data) print(mode_value) # Output: 6
This mode_function
finds the unique values in a dataset and then identifies which one appears most frequently.
For more extensive data exploration, the summary()
function provides a quick overview of central tendencies and other statistical properties:
data <- c(4, 5, 6, 6, 7) summary(data)
Output:
Min. 1st Qu. Median Mean 3rd Qu. Max. 4.00 5.00 6.00 5.60 6.00 7.00
NA
values:By default, functions like mean()
and median()
return NA
if there are any NA
values in the dataset. To exclude NA
values in calculations, use the argument na.rm=TRUE
.
data_with_na <- c(4, 5, 6, NA, 7) mean(data_with_na, na.rm = TRUE) # Output: 5.5
Understanding and measuring central tendency is essential in statistics and data analysis. R provides easy-to-use functions to calculate the mean and median, and with a simple custom function, you can also compute the mode.
R Central Tendency Measures Example:
# Generate sample data set.seed(123) data <- c(1, 2, 2, 3, 3, 3, 4, 4, 5) # Calculate mean, median, mode, weighted mean, and quantiles mean_value <- mean(data) median_value <- median(data) mode_value <- table(data)[which.max(table(data))] weighted_mean <- weighted.mean(data, w = rep(1, length(data))) quantiles <- quantile(data, c(0.25, 0.5, 0.75))
Calculating Mean in R:
# Calculate mean in R data <- c(1, 2, 3, 4, 5) mean_value <- mean(data)
Median Calculation in R:
# Calculate median in R data <- c(1, 2, 3, 4, 5) median_value <- median(data)
Mode Function in R:
# Calculate mode using a custom function mode_function <- function(x) { tbl <- table(x) mode_value <- as.numeric(names(tbl)[tbl == max(tbl)]) return(mode_value) } data <- c(1, 2, 2, 3, 3, 3, 4, 4, 5) mode_value <- mode_function(data)
Weighted Mean in R:
# Calculate weighted mean in R values <- c(1, 2, 3, 4, 5) weights <- c(0.1, 0.2, 0.3, 0.2, 0.2) weighted_mean <- sum(values * weights) / sum(weights)
Using Quantiles for Central Tendency in R:
# Calculate quantiles in R data <- c(1, 2, 3, 4, 5) quantiles <- quantile(data, c(0.25, 0.5, 0.75))
Robust Measures of Central Tendency in R:
# Calculate robust measures (median and trimmed mean) data <- c(1, 2, 2, 3, 3, 3, 4, 4, 5) median_value <- median(data) trimmed_mean <- mean(data, trim = 0.2)
Comparing Mean, Median, and Mode in R:
# Compare mean, median, and mode data <- c(1, 2, 2, 3, 3, 3, 4, 4, 5) mean_value <- mean(data) median_value <- median(data) mode_function <- function(x) { tbl <- table(x) mode_value <- as.numeric(names(tbl)[tbl == max(tbl)]) return(mode_value) } mode_value <- mode_function(data)
Handling Missing Values in Central Tendency Measures in R:
# Handle missing values data <- c(1, 2, NA, 4, 5) mean_value <- mean(data, na.rm = TRUE) median_value <- median(data, na.rm = TRUE)
Visualizing Central Tendency with Boxplots in R:
# Visualize central tendency with boxplots set.seed(123) data <- rnorm(100) boxplot(data, main = "Boxplot of Data", ylab = "Values")