R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

K-NN Classifier in R

The k-Nearest Neighbors (k-NN) algorithm is a simple, yet powerful supervised machine learning method used for classification. The principle behind k-NN is quite straightforward: a data point is classified based on how its neighbors are classified.

In this tutorial, we'll walk you through the steps to implement the k-NN classifier in R using the class library.

Step 1: Installing and Loading the Necessary Libraries

If you haven't already installed the class package:

install.packages("class")

Load the package:

library(class)

Step 2: Create a Sample Dataset

For the sake of demonstration, let's create a simple dataset:

data <- data.frame(
  x = c(2, 3, 4, 5, 6, 7, 8),
  y = c(4, 5, 6, 7, 6, 5, 4),
  class = c('A', 'A', 'B', 'B', 'A', 'B', 'B')
)

Step 3: Prepare the Data

Split the dataset into training and testing sets. Let's use the first five rows for training and the last two for testing.

train.data <- data[1:5, 1:2]
train.labels <- data[1:5, 3]
test.data <- data[6:7, 1:2]
test.labels <- data[6:7, 3]

Step 4: Implement k-NN

Now we'll use the knn() function from the class package. We'll use k = 3 for this demonstration.

k <- 3
predictions <- knn(train = train.data, test = test.data, cl = train.labels, k = k)
print(predictions)

Step 5: Evaluate the Model

To evaluate the performance, we'll compare the predicted labels with the actual labels:

accuracy <- sum(predictions == test.labels) / length(test.labels)
print(paste("Accuracy:", accuracy * 100, "%"))

Notes:

Choice of k: The choice of k (number of neighbors) is crucial. A smaller value of k can be noisy and subject to outliers, while a larger value may include points from other classes.
Scaling: k-NN is sensitive to varying scales between features. It's often beneficial to scale or normalize the data, especially if features have different units or vary widely in magnitude.
Distance Measures: The default distance measure in k-NN is Euclidean distance. However, depending on the data, other distance measures like Manhattan, Minkowski, or cosine might be more appropriate.
Large Datasets: k-NN can be computationally intensive on large datasets since it requires the calculation of distances to all points in the training dataset for each prediction.

Conclusion:

The k-NN algorithm is a straightforward yet powerful method for classification tasks. In R, the class package makes the implementation of k-NN relatively simple. Remember to consider aspects like data scaling, choice of k, and the most suitable distance metric for your specific problem.

How to use k-nearest neighbors in R:

Overview: K-NN is a simple and effective algorithm for classification and regression.

Code:

# Load necessary packages
library(class)

# Create a sample dataset
set.seed(123)
data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25))

# Create a new data point for prediction
new_data_point <- data.frame(x1 = 0, x2 = 0)

# Use K-NN for classification
predicted_class <- knn(train = data_train[, 1:2], test = new_data_point, cl = data_train$class, k = 3)

# Display the predicted class
print("Predicted Class:")
print(predicted_class)

Implementing K-NN algorithm in R example:

Overview: Manually implementing the K-NN algorithm without using external packages.

Code:

# Custom K-NN function
knn_custom <- function(train_data, test_point, k = 3) {
  # Calculate distances
  distances <- sqrt(rowSums((train_data[, 1:2] - test_point)^2))

  # Find k nearest neighbors
  nearest_neighbors <- order(distances)[1:k]

  # Get the majority class
  majority_class <- table(train_data$class[nearest_neighbors])
  predicted_class <- names(majority_class)[which.max(majority_class)]

  return(predicted_class)
}

# Use custom K-NN function
predicted_class_custom <- knn_custom(train_data = data_train, test_point = new_data_point, k = 3)

# Display the predicted class
print("Predicted Class (Custom K-NN):")
print(predicted_class_custom)

R K-NN classification code:

Overview: Demonstrating a simple classification task using the K-NN algorithm.

Code:

# Load necessary packages
library(class)

# Create a sample dataset
set.seed(123)
data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25))
data_test <- data.frame(x1 = rnorm(10), x2 = rnorm(10))

# Use K-NN for classification
predicted_classes <- knn(train = data_train[, 1:2], test = data_test, cl = data_train$class, k = 3)

# Display the predicted classes
print("Predicted Classes:")
print(predicted_classes)

Using caret package for K-NN in R:

Overview: Utilizing the caret package for K-NN with additional functionalities.

Code:

# Load necessary packages
library(caret)

# Create a sample dataset
set.seed(123)
data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25))

# Define the training control
train_control <- trainControl(method = "cv", number = 5)

# Use K-NN with caret package
model <- train(class ~ x1 + x2, data = data_train, method = "knn", trControl = train_control)

# Display the trained model
print("Trained K-NN Model:")
print(model)

K-NN classifier with cross-validation in R:

Overview: Implementing K-NN with cross-validation for better model evaluation.

Code:

# Load necessary packages
library(class)
library(caret)

# Create a sample dataset
set.seed(123)
data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25))

# Define the training control with cross-validation
train_control <- trainControl(method = "cv", number = 5)

# Use K-NN for classification with cross-validation
model <- train(class ~ x1 + x2, data = data_train, method = "knn", trControl = train_control)

# Display the trained model with cross-validation
print("Trained K-NN Model with Cross-Validation:")
print(model)