R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Stratified boxplots, also known as grouped or side-by-side boxplots, allow you to compare the distributions of a numerical variable across different levels of a categorical variable. In R, the boxplot()
function can be used to generate stratified boxplots. This tutorial will walk you through the process:
First, let's set up some sample data and load necessary libraries:
# Generating sample data set.seed(123) group1 <- rnorm(50, mean=50, sd=10) group2 <- rnorm(50, mean=60, sd=15) group3 <- rnorm(50, mean=55, sd=12) data <- data.frame( Value = c(group1, group2, group3), Group = factor(rep(1:3, each=50)) ) # Load necessary libraries library(ggplot2)
Using base R's boxplot()
function:
boxplot(Value ~ Group, data = data, main="Stratified Boxplot", xlab="Group", ylab="Value")
Using ggplot2
:
ggplot(data, aes(x=Group, y=Value)) + geom_boxplot() + labs(title="Stratified Boxplot", x="Group", y="Value")
Using base R:
boxplot(Value ~ Group, data = data, main="Stratified Boxplot", xlab="Group", ylab="Value", col=c("red", "green", "blue"))
Using ggplot2
:
ggplot(data, aes(x=Group, y=Value, fill=Group)) + geom_boxplot() + labs(title="Stratified Boxplot", x="Group", y="Value") + scale_fill_manual(values=c("red", "green", "blue"))
Sometimes it's useful to overlay individual data points on the boxplot:
Using ggplot2
:
ggplot(data, aes(x=Group, y=Value, fill=Group)) + geom_boxplot(alpha=0.7) + geom_jitter(width=0.2) + labs(title="Stratified Boxplot with Data Points", x="Group", y="Value") + scale_fill_manual(values=c("red", "green", "blue"))
You might prefer horizontal boxplots in some cases:
Using ggplot2
:
ggplot(data, aes(y=Group, x=Value, fill=Group)) + geom_boxplot() + labs(title="Horizontal Stratified Boxplot", y="Group", x="Value") + scale_fill_manual(values=c("red", "green", "blue"))
Base R's boxplot()
function provides quick and simple boxplots, but for more intricate customizations, ggplot2
is more flexible.
When adding individual data points with geom_jitter()
, the width
argument can control the spread of the jitter.
Stratified boxplots provide a clear way to visualize and compare distributions across different groups. Whether you use base R or ggplot2
, the key is to ensure your visualizations effectively communicate the underlying data patterns.
Creating Grouped Boxplots in R:
boxplot()
function to create grouped boxplots.# Example with two groups boxplot(value ~ group, data=my_data)
R Boxplot by Group or Category:
boxplot(value ~ category, data=my_data)
Stratified Boxplot with ggplot2 in R:
ggplot2
for more customization in stratified boxplots.library(ggplot2) ggplot(my_data, aes(x=category, y=value)) + geom_boxplot()
Adding Colors to Stratified Boxplots in R:
ggplot(my_data, aes(x=category, y=value, fill=category)) + geom_boxplot()
Customizing Boxplot Appearance in R:
notch
, outline
, and width
.boxplot(value ~ group, data=my_data, notch=TRUE, outline=FALSE, width=0.5)
Multiple Boxplots on One Graph in R:
boxplot(value ~ group, data=my_data, add=TRUE, col="lightblue")
R Boxplot by Factor Levels:
boxplot(value ~ factor_variable, data=my_data)
Grouped Boxplot with Notched Boxes in R:
boxplot(value ~ group, data=my_data, notch=TRUE)
Interactive Stratified Boxplot in R:
plotly
for dynamic exploration.library(plotly) plot_ly(my_data, x=~category, y=~value, type="box")
Comparing Distributions with Stratified Boxplots in R:
ggplot(my_data, aes(x=category, y=value, color=category)) + geom_boxplot()
Combining Boxplots and Scatter Plots in R:
ggplot(my_data, aes(x=category, y=value)) + geom_boxplot() + geom_jitter()
R Boxplot Outliers by Group:
boxplot(value ~ group, data=my_data, outline=TRUE)
Boxplot Summary Statistics in R:
summary()
function.summary_boxplot <- boxplot(value ~ group, data=my_data) summary(summary_boxplot)
Handling Missing Data in Stratified Boxplots in R:
na.omit()
.ggplot(na.omit(my_data), aes(x=category, y=value, fill=category)) + geom_boxplot()