R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Stratified Boxplot in R

Stratified boxplots, also known as grouped or side-by-side boxplots, allow you to compare the distributions of a numerical variable across different levels of a categorical variable. In R, the boxplot() function can be used to generate stratified boxplots. This tutorial will walk you through the process:

1. Basic Setup

First, let's set up some sample data and load necessary libraries:

# Generating sample data
set.seed(123)
group1 <- rnorm(50, mean=50, sd=10)
group2 <- rnorm(50, mean=60, sd=15)
group3 <- rnorm(50, mean=55, sd=12)

data <- data.frame(
  Value = c(group1, group2, group3),
  Group = factor(rep(1:3, each=50))
)

# Load necessary libraries
library(ggplot2)

2. Basic Stratified Boxplot

Using base R's boxplot() function:

boxplot(Value ~ Group, data = data, main="Stratified Boxplot", xlab="Group", ylab="Value")

Using ggplot2:

ggplot(data, aes(x=Group, y=Value)) + 
  geom_boxplot() + 
  labs(title="Stratified Boxplot", x="Group", y="Value")

3. Customizing Appearance

Using base R:

boxplot(Value ~ Group, data = data, main="Stratified Boxplot", xlab="Group", ylab="Value",
        col=c("red", "green", "blue"))

Using ggplot2:

ggplot(data, aes(x=Group, y=Value, fill=Group)) + 
  geom_boxplot() + 
  labs(title="Stratified Boxplot", x="Group", y="Value") + 
  scale_fill_manual(values=c("red", "green", "blue"))

4. Adding Data Points

Sometimes it's useful to overlay individual data points on the boxplot:

Using ggplot2:

ggplot(data, aes(x=Group, y=Value, fill=Group)) + 
  geom_boxplot(alpha=0.7) + 
  geom_jitter(width=0.2) + 
  labs(title="Stratified Boxplot with Data Points", x="Group", y="Value") + 
  scale_fill_manual(values=c("red", "green", "blue"))

5. Horizontal Boxplots

You might prefer horizontal boxplots in some cases:

Using ggplot2:

ggplot(data, aes(y=Group, x=Value, fill=Group)) + 
  geom_boxplot() + 
  labs(title="Horizontal Stratified Boxplot", y="Group", x="Value") + 
  scale_fill_manual(values=c("red", "green", "blue"))

6. Tips:

  • Base R's boxplot() function provides quick and simple boxplots, but for more intricate customizations, ggplot2 is more flexible.

  • When adding individual data points with geom_jitter(), the width argument can control the spread of the jitter.

Conclusion

Stratified boxplots provide a clear way to visualize and compare distributions across different groups. Whether you use base R or ggplot2, the key is to ensure your visualizations effectively communicate the underlying data patterns.

  1. Creating Grouped Boxplots in R:

    • Use the boxplot() function to create grouped boxplots.
    # Example with two groups
    boxplot(value ~ group, data=my_data)
    
  2. R Boxplot by Group or Category:

    • Group data by a categorical variable in a boxplot.
    boxplot(value ~ category, data=my_data)
    
  3. Stratified Boxplot with ggplot2 in R:

    • Use ggplot2 for more customization in stratified boxplots.
    library(ggplot2)
    ggplot(my_data, aes(x=category, y=value)) + geom_boxplot()
    
  4. Adding Colors to Stratified Boxplots in R:

    • Enhance visual appeal by adding colors to boxplots.
    ggplot(my_data, aes(x=category, y=value, fill=category)) + geom_boxplot()
    
  5. Customizing Boxplot Appearance in R:

    • Customize appearance using parameters like notch, outline, and width.
    boxplot(value ~ group, data=my_data, notch=TRUE, outline=FALSE, width=0.5)
    
  6. Multiple Boxplots on One Graph in R:

    • Compare multiple groups or categories on a single graph.
    boxplot(value ~ group, data=my_data, add=TRUE, col="lightblue")
    
  7. R Boxplot by Factor Levels:

    • Create boxplots based on factor levels.
    boxplot(value ~ factor_variable, data=my_data)
    
  8. Grouped Boxplot with Notched Boxes in R:

    • Add notches to visualize confidence intervals.
    boxplot(value ~ group, data=my_data, notch=TRUE)
    
  9. Interactive Stratified Boxplot in R:

    • Use interactive plotting libraries like plotly for dynamic exploration.
    library(plotly)
    plot_ly(my_data, x=~category, y=~value, type="box")
    
  10. Comparing Distributions with Stratified Boxplots in R:

    • Compare distributions across different categories.
    ggplot(my_data, aes(x=category, y=value, color=category)) + geom_boxplot()
    
  11. Combining Boxplots and Scatter Plots in R:

    • Combine boxplots with scatter plots for a comprehensive view.
    ggplot(my_data, aes(x=category, y=value)) + geom_boxplot() + geom_jitter()
    
  12. R Boxplot Outliers by Group:

    • Identify and highlight outliers in boxplots.
    boxplot(value ~ group, data=my_data, outline=TRUE)
    
  13. Boxplot Summary Statistics in R:

    • Display summary statistics using the summary() function.
    summary_boxplot <- boxplot(value ~ group, data=my_data)
    summary(summary_boxplot)
    
  14. Handling Missing Data in Stratified Boxplots in R:

    • Address missing data using functions like na.omit().
    ggplot(na.omit(my_data), aes(x=category, y=value, fill=category)) + geom_boxplot()