R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Descriptive analysis helps in summarizing and understanding the main features of a dataset, often through visual means or summary statistics. In this tutorial, we'll explore some basic descriptive analysis techniques in R.
You don't need any special packages for basic descriptive analysis, just base R.
For demonstration purposes, we'll use the built-in mtcars
dataset:
data(mtcars) head(mtcars)
These measures provide a central value for the data distribution.
mean(mtcars$mpg)
median(mtcars$mpg)
getmode <- function(v) { uniqv <- unique(v) uniqv[which.max(tabulate(match(v, uniqv)))] } getmode(mtcars$gear)
These measures describe the spread or variability of the data.
var(mtcars$mpg)
sd(mtcars$mpg)
range(mtcars$mpg)
IQR(mtcars$mpg)
The summary()
function provides a statistical summary of all variables:
summary(mtcars)
Shows the distribution of a numerical variable.
hist(mtcars$mpg, main="Histogram of MPG", xlab="MPG", border="blue", col="lightgreen", breaks=10)
Displays the distribution and spread of a numerical variable.
boxplot(mtcars$mpg, main="Boxplot of MPG", ylab="MPG", col="lightblue")
Useful for categorical data.
barplot(table(mtcars$cyl), main="Bar Plot of Cylinder Counts", xlab="Number of Cylinders", ylab="Frequency", col="lightpink", border="red")
Shows the relationship between two numerical variables.
plot(mtcars$hp, mtcars$mpg, main="Scatterplot of HP vs. MPG", xlab="Horsepower", ylab="MPG", pch=19, col="blue")
To see how two variables are linearly related:
cor(mtcars$hp, mtcars$mpg)
Descriptive analysis is the first step in understanding your data before moving on to more complex analyses or hypothesis testing. R provides a variety of functions and visualization tools to conduct a thorough descriptive analysis of your dataset. Always ensure that your interpretations align with the type of data and the nature of your research question.
Summary statistics in R:
# Summary statistics in R data <- c(23, 45, 12, 67, 89, 34, 56, 78) summary_stats <- summary(data) print(summary_stats)
Descriptive analytics in R:
# Descriptive analytics in R data <- data.frame( Age = c(25, 30, 22, 35, 28), Salary = c(50000, 60000, 45000, 70000, 55000) ) summary_stats <- summary(data) print(summary_stats)
R summary() function examples:
summary()
function to get a summary of a dataset.# Using summary() function in R data <- data.frame( Height = c(160, 170, 155, 180, 165), Weight = c(55, 70, 50, 85, 62) ) data_summary <- summary(data) print(data_summary)
Measures of central tendency in R:
# Measures of central tendency in R data <- c(23, 45, 12, 67, 89, 34, 56, 78) mean_value <- mean(data) median_value <- median(data) mode_value <- table(data)[which.max(table(data))] print(paste("Mean:", mean_value)) print(paste("Median:", median_value)) print(paste("Mode:", mode_value))
R descriptive analysis of data frames:
# Descriptive analysis of data frames in R data <- data.frame( Age = c(25, 30, 22, 35, 28), Salary = c(50000, 60000, 45000, 70000, 55000) ) summary_stats <- summary(data) print(summary_stats)
Box plots and whisker plots in R:
# Box plots and whisker plots in R data <- data.frame( Group = rep(c("A", "B", "C"), each = 50), Values = c(rnorm(50), rnorm(50, mean = 2), rnorm(50, mean = 4)) ) boxplot(Values ~ Group, data = data, main = "Box Plot")
Histograms and frequency distributions in R:
# Histograms and frequency distributions in R data <- c(23, 45, 12, 67, 89, 34, 56, 78, 67, 45, 56) hist(data, main = "Histogram", xlab = "Values", col = "lightblue", border = "black")
Descriptive analysis of categorical data in R:
# Descriptive analysis of categorical data in R data <- c("Category1", "Category2", "Category1", "Category3", "Category2") table_summary <- table(data) print(table_summary)