R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
A Pareto chart is a type of chart that contains both bars and a line graph, where individual values are represented in descending order by bars, and the cumulative total is represented by the line. It's named after Vilfredo Pareto and is commonly used in quality control to highlight the most significant factors in a set of data.
In this tutorial, we'll walk through:
The Pareto Principle, also known as the 80/20 rule, asserts that 80% of the outcomes result from 20% of the causes. In the context of quality control, it often means a small number of causes are responsible for a large percentage of the defects.
For this example, suppose we have a dataset that shows the number of defects attributed to various causes in a manufacturing process.
# Sample data defects <- data.frame( Cause = c("A", "B", "C", "D", "E"), Frequency = c(80, 60, 40, 30, 10) )
Before plotting, we need to:
# Sorting the data defects <- defects[order(-defects$Frequency), ] # Compute cumulative sum and percentage defects$CumulativeSum <- cumsum(defects$Frequency) defects$CumulativePercent <- defects$CumulativeSum / sum(defects$Frequency) * 100
Using the ggplot2
package, we can now create the Pareto chart.
install.packages("ggplot2") library(ggplot2) ggplot(defects, aes(x=reorder(Cause, Frequency), y=Frequency)) + geom_bar(stat="identity") + geom_line(aes(y=CumulativePercent), group=1, colour="blue") + geom_point(aes(y=CumulativePercent), group=1, colour="blue") + scale_y_continuous(name = "Frequency", sec.axis = sec_axis(~./max(defects$CumulativePercent) * 100, name="Cumulative Percentage")) + labs(title="Pareto Chart", x="Cause") + theme_minimal()
This script will generate a Pareto chart with bars showing the frequency of each cause and a line showing the cumulative percentage.
The Pareto chart is a powerful tool for visualizing and understanding the most significant factors or causes in a dataset. By identifying and focusing on these primary causes, organizations can efficiently tackle problems and improve processes.
Creating Pareto charts with ggplot2 in R:
# Example using ggplot2 library(ggplot2) data <- data.frame(Category = c("A", "B", "C", "D", "E"), Frequency = c(30, 25, 20, 15, 10)) ggplot(data, aes(x = Category, y = Frequency)) + geom_bar(stat = "identity", fill = "blue") + geom_line(aes(y = cumsum(Frequency)/sum(Frequency)*100, group = 1), color = "red") + scale_y_continuous(labels = scales::percent_format(scale = 1)) + labs(title = "Pareto Chart", x = "Category", y = "Frequency")
R code for Pareto analysis:
# Example Pareto analysis data <- data.frame(Category = c("A", "B", "C", "D", "E"), Frequency = c(30, 25, 20, 15, 10)) sorted_data <- data[order(-data$Frequency), ] cumulative_percentage <- cumsum(sorted_data$Frequency) / sum(sorted_data$Frequency) * 100
Using base R graphics for Pareto charts:
# Example using base R data <- data.frame(Category = c("A", "B", "C", "D", "E"), Frequency = c(30, 25, 20, 15, 10)) barplot(data$Frequency, names.arg = data$Category, col = "blue", main = "Pareto Chart")
Adding annotations to Pareto charts in R:
# Example with annotation ggplot(data, aes(x = Category, y = Frequency)) + geom_bar(stat = "identity", fill = "blue") + geom_line(aes(y = cumsum(Frequency)/sum(Frequency)*100, group = 1), color = "red") + annotate("text", x = "C", y = 25, label = "Critical Point", color = "green") + scale_y_continuous(labels = scales::percent_format(scale = 1)) + labs(title = "Annotated Pareto Chart", x = "Category", y = "Frequency")
Grouping and categorizing data for Pareto analysis in R:
# Example with grouped data grouped_data <- data.frame(Group = rep(c("Group1", "Group2"), each = 5), Category = c("A", "B", "C", "D", "E"), Frequency = c(15, 20, 25, 30, 35)) ggplot(grouped_data, aes(x = Category, y = Frequency, fill = Group)) + geom_bar(stat = "identity", position = "dodge") + labs(title = "Grouped Pareto Chart", x = "Category", y = "Frequency")
Conditional formatting in Pareto charts with R:
# Example with conditional formatting ggplot(data, aes(x = Category, y = Frequency, fill = ifelse(Frequency > 25, "High", "Low"))) + geom_bar(stat = "identity") + labs(title = "Conditional Formatting in Pareto Chart", x = "Category", y = "Frequency")
Dynamic Pareto charts in Shiny apps with R:
# Example Shiny app with a dynamic Pareto chart library(shiny) ui <- fluidPage( selectInput("group_var", "Select Grouping Variable", choices = c("None", "Group1", "Group2")), plotOutput("pareto_chart") ) server <- function(input, output) { output$pareto_chart <- renderPlot({ # Dynamic Pareto chart based on input$group_var # ... }) } shinyApp(ui, server)