R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Boxplots (or box-and-whisker plots) are a graphical representation of the distribution of data. They show the median, quartiles, and potential outliers for a dataset.
Let's walk through the basics of creating and customizing boxplots in R:
1. Basic Boxplot
Given a dataset:
data <- rnorm(100) boxplot(data, main="Basic Boxplot", ylab="Values")
This will give you a simple boxplot of the data.
2. Multiple Boxplots
If you have multiple groups:
group1 <- rnorm(100, mean=5) group2 <- rnorm(100, mean=7) group3 <- rnorm(100, mean=6) data_list <- list(Group1=group1, Group2=group2, Group3=group3) boxplot(data_list, main="Multiple Boxplots", ylab="Values")
3. Coloring the Boxplot
You can customize the colors:
boxplot(data_list, main="Colored Boxplots", ylab="Values", col=c("red", "blue", "green"))
4. Horizontal Boxplot
Change the orientation using the horizontal
argument:
boxplot(data, horizontal=TRUE, main="Horizontal Boxplot")
5. Notch
A notch can be added to the boxplot to give a rough indication of the significance of the differences between medians:
boxplot(data_list, notch=TRUE, main="Notched Boxplots", ylab="Values")
If two boxplots have notches that do not overlap, this is 'strong evidence' that their medians differ.
6. Plotting Without Outliers
Outliers are typically plotted as individual points outside the whiskers. To suppress them:
boxplot(data, outline=FALSE, main="Boxplot without Outliers")
7. Getting Boxplot Statistics
If you want to extract the boxplot statistics:
stats <- boxplot.stats(data) print(stats$stats) # Print the five-number summary (min, lower-hinge, median, upper-hinge, max) print(stats$out) # Print the outliers
8. Adding Points to a Boxplot
Sometimes it's helpful to overlay the actual data points:
boxplot(data_list, main="Boxplot with Overlaid Points", ylab="Values") stripchart(data_list, vertical=TRUE, method="jitter", add=TRUE, pch=21, bg="blue")
9. Customizing the Appearance
You can customize further by passing graphical parameters:
boxplot(data, col="lightblue", border="black", whisklty=2, staplelty=1, main="Customized Boxplot", ylab="Values")
Where:
whisklty
: line type for the whiskers.staplelty
: line type for the boxplot staple ends.10. Combining with Other Plots
You can combine a boxplot with other types of plots. For instance, adding a density plot:
boxplot(data, main="Boxplot with Density", ylab="Values") par(new=TRUE) plot(density(data), col="red", lty=2, lwd=2, axes=FALSE, ann=FALSE) axis(4) mtext("Density", side=4, line=2)
Boxplots are versatile and provide a compact view of the distribution of data, making them a crucial tool for exploratory data analysis.
R Boxplot Example:
# Create a simple boxplot set.seed(123) data <- rnorm(100) boxplot(data)
How to Create Boxplots in R:
# Create boxplots for multiple groups set.seed(123) group1 <- rnorm(50, mean = 10, sd = 2) group2 <- rnorm(50, mean = 15, sd = 3) boxplot(group1, group2, names = c("Group 1", "Group 2"))
Customizing Boxplots in ggplot2 in R:
# Create a boxplot using ggplot2 library(ggplot2) set.seed(123) data <- data.frame(value = rnorm(100), group = rep(c("A", "B"), each = 50)) ggplot(data, aes(x = group, y = value)) + geom_boxplot() + labs(title = "Boxplot Example", x = "Group", y = "Value")
Adding Colors to Boxplots in R:
# Add colors to boxplots set.seed(123) data <- data.frame(value = rnorm(100), group = rep(c("A", "B"), each = 50)) boxplot(value ~ group, data = data, col = c("lightblue", "lightgreen"))
Side-by-Side Boxplots in R:
# Side-by-side boxplots set.seed(123) group1 <- rnorm(50, mean = 10, sd = 2) group2 <- rnorm(50, mean = 15, sd = 3) boxplot(group1, group2, names = c("Group 1", "Group 2"), col = c("lightblue", "lightgreen"))
Notched Boxplots in R:
# Notched boxplot set.seed(123) data <- rnorm(100) boxplot(data, notch = TRUE)
Outlier Detection in Boxplots Using R:
# Outlier detection in boxplots set.seed(123) data <- rnorm(100) boxplot(data, outline = TRUE)
Grouped Boxplots in R:
# Grouped boxplots set.seed(123) group <- rep(c("A", "B"), each = 50) value <- rnorm(100) boxplot(value ~ group)
Comparing Boxplots in R:
# Compare boxplots set.seed(123) group1 <- rnorm(50, mean = 10, sd = 2) group2 <- rnorm(50, mean = 15, sd = 3) boxplot(group1, group2, names = c("Group 1", "Group 2"), col = c("lightblue", "lightgreen"))