R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Hierarchical clustering is a method of clustering that builds nested clusters by successively merging or splitting groups. The result of hierarchical clustering is a tree-like diagram called a dendrogram, which displays the sequence in which groups are merged or split.
Here's a basic tutorial on hierarchical clustering in R:
For this example, we'll use the built-in mtcars
dataset:
data(mtcars)
Before performing hierarchical clustering, we need to compute the distance between each pair of samples. We usually use the Euclidean distance, though other distances can be chosen depending on the data:
dist_matrix <- dist(mtcars, method = "euclidean")
We'll use the hclust()
function to perform the hierarchical clustering:
hc <- hclust(dist_matrix, method = "complete")
There are various methods available, such as "complete", "average", "single", and others. Each method determines how the distance between clusters is measured.
You can visualize the result of the clustering using a dendrogram:
plot(hc, main = "Hierarchical Clustering: MTCARS", xlab = "", sub = "", cex = 0.9)
You can cut the dendrogram to create a specific number of clusters:
k <- 3 clusters <- cutree(hc, k)
This will assign each car in the mtcars
dataset to one of the k
clusters.
After creating clusters, you might want to examine the members of each cluster:
cluster1 <- mtcars[clusters == 1, ] cluster2 <- mtcars[clusters == 2, ] cluster3 <- mtcars[clusters == 3, ]
For advanced hierarchical clustering functionalities, you can explore packages like dendextend
, dynamicTreeCut
, and others.
dendextend
: Provides tools for visualizing and comparing dendrogram trees.
dynamicTreeCut
: Offers methods to cut the tree dynamically (not fixed at k clusters) and can be useful when the number of clusters isn't predefined.
Hierarchical clustering is a versatile clustering method that can provide insights into the nested structure of data. One of its strengths is that, unlike K-means clustering, it doesn't require the user to specify the number of clusters beforehand. However, interpretation can become challenging when working with large datasets.
Hierarchical clustering in R:
# Hierarchical clustering in R set.seed(123) data_matrix <- matrix(rnorm(50), ncol = 5) hc_result <- hclust(dist(data_matrix))
Dendrogram in R:
# Creating a dendrogram in R dendrogram <- as.dendrogram(hc_result) plot(dendrogram)
Agglomerative clustering in R:
# Agglomerative clustering in R set.seed(123) data_matrix <- matrix(rnorm(50), ncol = 5) hc_result <- hclust(dist(data_matrix))
Hierarchical clustering methods in R:
# Hierarchical clustering with different linkage methods set.seed(123) data_matrix <- matrix(rnorm(50), ncol = 5) # Complete linkage hc_complete <- hclust(dist(data_matrix), method = "complete") # Average linkage hc_average <- hclust(dist(data_matrix), method = "average") # Single linkage hc_single <- hclust(dist(data_matrix), method = "single")
Using hclust() in R:
hclust()
function in R is used to perform hierarchical clustering. It takes a distance matrix as input and returns a hierarchical clustering object.# Using hclust() in R set.seed(123) data_matrix <- matrix(rnorm(50), ncol = 5) hc_result <- hclust(dist(data_matrix))
Distance metrics for hierarchical clustering in R:
# Hierarchical clustering with different distance metrics set.seed(123) data_matrix <- matrix(rnorm(50), ncol = 5) # Euclidean distance hc_euclidean <- hclust(dist(data_matrix)) # Manhattan distance hc_manhattan <- hclust(dist(data_matrix, method = "manhattan"))
Cutting dendrograms in R:
cutree()
function is often used to extract cluster assignments.# Cutting dendrograms in R set.seed(123) data_matrix <- matrix(rnorm(50), ncol = 5) hc_result <- hclust(dist(data_matrix)) # Cut the dendrogram into k clusters k_clusters <- cutree(hc_result, k = 3)
Visualizing hierarchical clustering results in R:
# Visualizing hierarchical clustering results in R set.seed(123) data_matrix <- matrix(rnorm(50), ncol = 5) hc_result <- hclust(dist(data_matrix)) # Plotting the dendrogram plot(hc_result, main = "Hierarchical Clustering Dendrogram")
Comparing different linkage methods in R:
# Comparing different linkage methods in R set.seed(123) data_matrix <- matrix(rnorm(50), ncol = 5) # Complete linkage hc_complete <- hclust(dist(data_matrix), method = "complete") # Average linkage hc_average <- hclust(dist(data_matrix), method = "average") # Single linkage hc_single <- hclust(dist(data_matrix), method = "single")