R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
The k-Nearest Neighbors (k-NN) algorithm is a simple, yet powerful supervised machine learning method used for classification. The principle behind k-NN is quite straightforward: a data point is classified based on how its neighbors are classified.
In this tutorial, we'll walk you through the steps to implement the k-NN classifier in R using the class
library.
If you haven't already installed the class
package:
install.packages("class")
Load the package:
library(class)
For the sake of demonstration, let's create a simple dataset:
data <- data.frame( x = c(2, 3, 4, 5, 6, 7, 8), y = c(4, 5, 6, 7, 6, 5, 4), class = c('A', 'A', 'B', 'B', 'A', 'B', 'B') )
Split the dataset into training and testing sets. Let's use the first five rows for training and the last two for testing.
train.data <- data[1:5, 1:2] train.labels <- data[1:5, 3] test.data <- data[6:7, 1:2] test.labels <- data[6:7, 3]
Now we'll use the knn()
function from the class
package. We'll use k = 3
for this demonstration.
k <- 3 predictions <- knn(train = train.data, test = test.data, cl = train.labels, k = k) print(predictions)
To evaluate the performance, we'll compare the predicted labels with the actual labels:
accuracy <- sum(predictions == test.labels) / length(test.labels) print(paste("Accuracy:", accuracy * 100, "%"))
Choice of k: The choice of k
(number of neighbors) is crucial. A smaller value of k
can be noisy and subject to outliers, while a larger value may include points from other classes.
Scaling: k-NN is sensitive to varying scales between features. It's often beneficial to scale or normalize the data, especially if features have different units or vary widely in magnitude.
Distance Measures: The default distance measure in k-NN is Euclidean distance. However, depending on the data, other distance measures like Manhattan, Minkowski, or cosine might be more appropriate.
Large Datasets: k-NN can be computationally intensive on large datasets since it requires the calculation of distances to all points in the training dataset for each prediction.
The k-NN algorithm is a straightforward yet powerful method for classification tasks. In R, the class
package makes the implementation of k-NN relatively simple. Remember to consider aspects like data scaling, choice of k
, and the most suitable distance metric for your specific problem.
How to use k-nearest neighbors in R:
Overview: K-NN is a simple and effective algorithm for classification and regression.
Code:
# Load necessary packages library(class) # Create a sample dataset set.seed(123) data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25)) # Create a new data point for prediction new_data_point <- data.frame(x1 = 0, x2 = 0) # Use K-NN for classification predicted_class <- knn(train = data_train[, 1:2], test = new_data_point, cl = data_train$class, k = 3) # Display the predicted class print("Predicted Class:") print(predicted_class)
Implementing K-NN algorithm in R example:
Overview: Manually implementing the K-NN algorithm without using external packages.
Code:
# Custom K-NN function knn_custom <- function(train_data, test_point, k = 3) { # Calculate distances distances <- sqrt(rowSums((train_data[, 1:2] - test_point)^2)) # Find k nearest neighbors nearest_neighbors <- order(distances)[1:k] # Get the majority class majority_class <- table(train_data$class[nearest_neighbors]) predicted_class <- names(majority_class)[which.max(majority_class)] return(predicted_class) } # Use custom K-NN function predicted_class_custom <- knn_custom(train_data = data_train, test_point = new_data_point, k = 3) # Display the predicted class print("Predicted Class (Custom K-NN):") print(predicted_class_custom)
R K-NN classification code:
Overview: Demonstrating a simple classification task using the K-NN algorithm.
Code:
# Load necessary packages library(class) # Create a sample dataset set.seed(123) data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25)) data_test <- data.frame(x1 = rnorm(10), x2 = rnorm(10)) # Use K-NN for classification predicted_classes <- knn(train = data_train[, 1:2], test = data_test, cl = data_train$class, k = 3) # Display the predicted classes print("Predicted Classes:") print(predicted_classes)
Using caret package for K-NN in R:
Overview: Utilizing the caret
package for K-NN with additional functionalities.
Code:
# Load necessary packages library(caret) # Create a sample dataset set.seed(123) data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25)) # Define the training control train_control <- trainControl(method = "cv", number = 5) # Use K-NN with caret package model <- train(class ~ x1 + x2, data = data_train, method = "knn", trControl = train_control) # Display the trained model print("Trained K-NN Model:") print(model)
K-NN classifier with cross-validation in R:
Overview: Implementing K-NN with cross-validation for better model evaluation.
Code:
# Load necessary packages library(class) library(caret) # Create a sample dataset set.seed(123) data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25)) # Define the training control with cross-validation train_control <- trainControl(method = "cv", number = 5) # Use K-NN for classification with cross-validation model <- train(class ~ x1 + x2, data = data_train, method = "knn", trControl = train_control) # Display the trained model with cross-validation print("Trained K-NN Model with Cross-Validation:") print(model)