R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Histograms in R

Histograms are a way of visualizing the distribution of a dataset. In R, the hist() function is used to create histograms. Let's delve into a basic tutorial on creating and customizing histograms in R:

1. Basic Histogram:

Using the built-in mtcars dataset, let's visualize the distribution of miles per gallon (mpg):

data(mtcars)

# Basic histogram
hist(mtcars$mpg, main="Histogram of Miles Per Gallon", xlab="Miles Per Gallon")

2. Customize Bins:

The number and width of bins can drastically change the appearance of a histogram:

# Specify the number of bins
hist(mtcars$mpg, breaks=15, col="skyblue", border="white")

3. Adding Density Lines:

Overlay the histogram with a density plot:

hist(mtcars$mpg, freq=FALSE, col="skyblue", border="white")
lines(density(mtcars$mpg), col="red", lwd=2)

In the above code, freq=FALSE ensures the histogram displays densities rather than frequencies, making it suitable for overlaying with a density plot.

4. Customizing Axes and Labels:

You can also customize axes, labels, and titles:

hist(mtcars$mpg, breaks=12, col="lightgreen", border="white",
     xlab="Miles Per Gallon", ylab="Frequency", main="Customized Histogram")

5. Adding Gridlines:

Add minor gridlines to better analyze the histogram:

hist(mtcars$mpg, breaks=12, col="lightgray", border="white")
grid(nx=NA, ny=NULL, col="darkgray", lty="dotted", equilogs=TRUE)

6. Customizing Plot Limits:

Define your own x and y limits:

hist(mtcars$mpg, breaks=12, col="lightblue", xlim=c(10, 35), ylim=c(0,10))

7. Adding Text:

Display frequencies on top of each bar:

h <- hist(mtcars$mpg, breaks=12, col="pink", border="white")
text(h$mids, h$counts + 1, labels=h$counts, adj=c(0.5, -0.5))

8. Advanced: ggplot2 for Histograms:

The ggplot2 package provides a more advanced and customizable way to create histograms:

install.packages("ggplot2")
library(ggplot2)

ggplot(mtcars, aes(x=mpg)) +
  geom_histogram(binwidth=2, fill="blue", alpha=0.7, color="black") +
  labs(title="Histogram using ggplot2", x="Miles Per Gallon", y="Frequency")

Summary:

Histograms are fundamental in data visualization for understanding the distribution of a variable. R provides easy-to-use functions, and with the right customization, you can generate insightful plots that cater to your data analysis needs.

  1. Histograms in R:

    • Description: Histograms in R are graphical representations of the distribution of a dataset. They display the frequency or density of values within specified bins.
    • Code:
      # Creating a basic histogram in R
      data_vector <- rnorm(100)
      hist(data_vector)
      
  2. Creating histograms in R:

    • Description: Creating histograms involves using the hist() function in R. It automatically computes the bin widths and plots the histogram.
    • Code:
      # Creating a histogram in R
      data_vector <- rnorm(100)
      hist(data_vector)
      
  3. Histogram plot in R:

    • Description: The histogram plot in R provides a visual representation of the distribution of a continuous variable.
    • Code:
      # Histogram plot in R
      data_vector <- rnorm(100)
      hist(data_vector, main = "Histogram Plot", xlab = "Values", ylab = "Frequency")
      
  4. ggplot2 histogram in R:

    • Description: The ggplot2 package in R allows for creating customizable and aesthetically pleasing histograms.
    • Code:
      # Creating a ggplot2 histogram in R
      library(ggplot2)
      data_vector <- rnorm(100)
      ggplot(data.frame(x = data_vector), aes(x)) +
        geom_histogram(binwidth = 0.5, fill = "skyblue", color = "black", alpha = 0.7) +
        labs(title = "ggplot2 Histogram", x = "Values", y = "Frequency")
      
  5. Histogram customization in R:

    • Description: Customizing histograms in R involves adjusting parameters such as colors, titles, axis labels, and binwidth.
    • Code:
      # Customizing a histogram in R
      data_vector <- rnorm(100)
      hist(data_vector, col = "lightgreen", main = "Customized Histogram", xlab = "Values", ylab = "Frequency", breaks = 20)
      
  6. R hist() function examples:

    • Description: The hist() function in R is used for creating histograms. It can be customized by adjusting parameters such as breaks and colors.
    • Code:
      # Using the hist() function in R
      data_vector <- rnorm(100)
      hist(data_vector, col = "lightblue", main = "Histogram Example", xlab = "Values", ylab = "Frequency", breaks = 15)
      
  7. Density plots with histograms in R:

    • Description: Combining histograms with density plots provides a smoother representation of the underlying distribution.
    • Code:
      # Density plot with histogram in R
      data_vector <- rnorm(100)
      hist(data_vector, probability = TRUE, col = "lightgray", main = "Histogram with Density Plot")
      lines(density(data_vector), col = "blue", lwd = 2)
      
  8. Histogram binwidth and breaks in R:

    • Description: The binwidth and breaks parameters in R histograms control the width of bins and the number of breaks between specified limits.
    • Code:
      # Adjusting binwidth and breaks in a histogram
      data_vector <- rnorm(100)
      hist(data_vector, col = "lightpink", main = "Histogram with Custom Binwidth and Breaks", xlab = "Values", ylab = "Frequency", breaks = 15, freq = FALSE)
      
  9. Comparing multiple histograms in R:

    • Description: Comparing multiple histograms allows for visualizing the distribution of different variables or groups.
    • Code:
      # Comparing multiple histograms in R
      data1 <- rnorm(100)
      data2 <- rnorm(100, mean = 2)
      hist(data1, col = "lightblue", main = "Comparison of Histograms", xlab = "Values", ylab = "Frequency", alpha = 0.5)
      hist(data2, col = "lightgreen", add = TRUE, alpha = 0.5)
      legend("topright", legend = c("Group 1", "Group 2"), fill = c("lightblue", "lightgreen"))