R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

K-NN Classifier in R

The k-Nearest Neighbors (k-NN) algorithm is a simple, yet powerful supervised machine learning method used for classification. The principle behind k-NN is quite straightforward: a data point is classified based on how its neighbors are classified.

In this tutorial, we'll walk you through the steps to implement the k-NN classifier in R using the class library.

Step 1: Installing and Loading the Necessary Libraries

If you haven't already installed the class package:

install.packages("class")

Load the package:

library(class)

Step 2: Create a Sample Dataset

For the sake of demonstration, let's create a simple dataset:

data <- data.frame(
  x = c(2, 3, 4, 5, 6, 7, 8),
  y = c(4, 5, 6, 7, 6, 5, 4),
  class = c('A', 'A', 'B', 'B', 'A', 'B', 'B')
)

Step 3: Prepare the Data

Split the dataset into training and testing sets. Let's use the first five rows for training and the last two for testing.

train.data <- data[1:5, 1:2]
train.labels <- data[1:5, 3]
test.data <- data[6:7, 1:2]
test.labels <- data[6:7, 3]

Step 4: Implement k-NN

Now we'll use the knn() function from the class package. We'll use k = 3 for this demonstration.

k <- 3
predictions <- knn(train = train.data, test = test.data, cl = train.labels, k = k)
print(predictions)

Step 5: Evaluate the Model

To evaluate the performance, we'll compare the predicted labels with the actual labels:

accuracy <- sum(predictions == test.labels) / length(test.labels)
print(paste("Accuracy:", accuracy * 100, "%"))

Notes:

  1. Choice of k: The choice of k (number of neighbors) is crucial. A smaller value of k can be noisy and subject to outliers, while a larger value may include points from other classes.

  2. Scaling: k-NN is sensitive to varying scales between features. It's often beneficial to scale or normalize the data, especially if features have different units or vary widely in magnitude.

  3. Distance Measures: The default distance measure in k-NN is Euclidean distance. However, depending on the data, other distance measures like Manhattan, Minkowski, or cosine might be more appropriate.

  4. Large Datasets: k-NN can be computationally intensive on large datasets since it requires the calculation of distances to all points in the training dataset for each prediction.

Conclusion:

The k-NN algorithm is a straightforward yet powerful method for classification tasks. In R, the class package makes the implementation of k-NN relatively simple. Remember to consider aspects like data scaling, choice of k, and the most suitable distance metric for your specific problem.

  1. How to use k-nearest neighbors in R:

    • Overview: K-NN is a simple and effective algorithm for classification and regression.

    • Code:

      # Load necessary packages
      library(class)
      
      # Create a sample dataset
      set.seed(123)
      data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25))
      
      # Create a new data point for prediction
      new_data_point <- data.frame(x1 = 0, x2 = 0)
      
      # Use K-NN for classification
      predicted_class <- knn(train = data_train[, 1:2], test = new_data_point, cl = data_train$class, k = 3)
      
      # Display the predicted class
      print("Predicted Class:")
      print(predicted_class)
      
  2. Implementing K-NN algorithm in R example:

    • Overview: Manually implementing the K-NN algorithm without using external packages.

    • Code:

      # Custom K-NN function
      knn_custom <- function(train_data, test_point, k = 3) {
        # Calculate distances
        distances <- sqrt(rowSums((train_data[, 1:2] - test_point)^2))
      
        # Find k nearest neighbors
        nearest_neighbors <- order(distances)[1:k]
      
        # Get the majority class
        majority_class <- table(train_data$class[nearest_neighbors])
        predicted_class <- names(majority_class)[which.max(majority_class)]
      
        return(predicted_class)
      }
      
      # Use custom K-NN function
      predicted_class_custom <- knn_custom(train_data = data_train, test_point = new_data_point, k = 3)
      
      # Display the predicted class
      print("Predicted Class (Custom K-NN):")
      print(predicted_class_custom)
      
  3. R K-NN classification code:

    • Overview: Demonstrating a simple classification task using the K-NN algorithm.

    • Code:

      # Load necessary packages
      library(class)
      
      # Create a sample dataset
      set.seed(123)
      data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25))
      data_test <- data.frame(x1 = rnorm(10), x2 = rnorm(10))
      
      # Use K-NN for classification
      predicted_classes <- knn(train = data_train[, 1:2], test = data_test, cl = data_train$class, k = 3)
      
      # Display the predicted classes
      print("Predicted Classes:")
      print(predicted_classes)
      
  4. Using caret package for K-NN in R:

    • Overview: Utilizing the caret package for K-NN with additional functionalities.

    • Code:

      # Load necessary packages
      library(caret)
      
      # Create a sample dataset
      set.seed(123)
      data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25))
      
      # Define the training control
      train_control <- trainControl(method = "cv", number = 5)
      
      # Use K-NN with caret package
      model <- train(class ~ x1 + x2, data = data_train, method = "knn", trControl = train_control)
      
      # Display the trained model
      print("Trained K-NN Model:")
      print(model)
      
  5. K-NN classifier with cross-validation in R:

    • Overview: Implementing K-NN with cross-validation for better model evaluation.

    • Code:

      # Load necessary packages
      library(class)
      library(caret)
      
      # Create a sample dataset
      set.seed(123)
      data_train <- data.frame(x1 = rnorm(50), x2 = rnorm(50), class = rep(c("A", "B"), each = 25))
      
      # Define the training control with cross-validation
      train_control <- trainControl(method = "cv", number = 5)
      
      # Use K-NN for classification with cross-validation
      model <- train(class ~ x1 + x2, data = data_train, method = "knn", trControl = train_control)
      
      # Display the trained model with cross-validation
      print("Trained K-NN Model with Cross-Validation:")
      print(model)