R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Classification is one of the main tasks in supervised learning, where the aim is to assign predefined labels to new data based on patterns learned from labeled training data. R offers a rich ecosystem of packages and functions for classification tasks. In this tutorial, we will touch on a few popular classification techniques.
It's used for binary classification problems. The glm
function from the base package can be used with the family = "binomial"
argument.
# Simulate some data set.seed(123) n <- 100 x <- rnorm(n) y <- ifelse(x + rnorm(n) > 0, 1, 0) # Logistic regression model <- glm(y ~ x, family = "binomial") summary(model) # Predict predictions <- predict(model, newdata = data.frame(x = c(-2, 2)), type = "response")
The rpart
package is commonly used for decision trees in R.
install.packages("rpart") library(rpart) model <- rpart(y ~ x) printcp(model) # Predict predictions <- predict(model, newdata = data.frame(x = c(-2, 2)))
Random forests are an ensemble learning method. The randomForest
package provides an implementation.
install.packages("randomForest") library(randomForest) model <- randomForest(y ~ x) print(model) # Predict predictions <- predict(model, newdata = data.frame(x = c(-2, 2)))
The e1071
package offers SVM classification.
install.packages("e1071") library(e1071) model <- svm(y ~ x) summary(model) # Predict predictions <- predict(model, newdata = data.frame(x = c(-2, 2)))
The class
package provides k-NN functionality.
install.packages("class") library(class) predictions <- knn(train = data.frame(x), test = data.frame(x = c(-2, 2)), cl = y, k = 3)
Evaluation metrics are crucial in understanding the performance of a classifier. You can use metrics like accuracy, precision, recall, F1 score, ROC, and AUC. The caret
package offers a multitude of model evaluation techniques.
install.packages("caret") library(caret) # Example for accuracy confusion <- confusionMatrix(predictions, actual_labels) # Replace `actual_labels` with the true labels of your data. print(confusion$overall["Accuracy"])
This tutorial provides a brief overview of popular classification techniques in R. Each method has its strengths and weaknesses, and the ideal approach often depends on the specific nature of the dataset and problem. Always ensure you're evaluating model performance with an appropriate metric and validating the model with out-of-sample data, such as using cross-validation techniques.
How to Perform Classification in R:
# Load a sample dataset data(iris) # Split the dataset into training and testing sets set.seed(123) train_indices <- sample(1:nrow(iris), 0.7 * nrow(iris)) train_data <- iris[train_indices, ] test_data <- iris[-train_indices, ]
Supervised Learning in R for Classification:
# Using a supervised learning algorithm (e.g., k-nearest neighbors) library(class) predicted_species <- knn(train = train_data[, -5], test = test_data[, -5], cl = train_data$Species, k = 3)
Popular Classification Packages in R:
# Popular classification packages library(caret) library(randomForest) library(e1071)
Decision Trees for Classification in R:
# Using decision trees library(rpart) decision_tree_model <- rpart(Species ~ ., data = train_data) predicted_species_tree <- predict(decision_tree_model, newdata = test_data, type = "class")
Random Forests for Classification in R:
# Using random forests random_forest_model <- randomForest(Species ~ ., data = train_data) predicted_species_rf <- predict(random_forest_model, newdata = test_data)
Support Vector Machines in R for Classification:
# Using support vector machines svm_model <- svm(Species ~ ., data = train_data) predicted_species_svm <- predict(svm_model, newdata = test_data)
Logistic Regression in R for Classification:
# Using logistic regression logistic_model <- glm(Species ~ ., data = train_data, family = "binomial") predicted_species_logistic <- predict(logistic_model, newdata = test_data, type = "response")
Cross-Validation for Classification Models in R:
# Using cross-validation ctrl <- trainControl(method = "cv", number = 5) svm_cv_model <- train(Species ~ ., data = train_data, method = "svmRadial", trControl = ctrl) predicted_species_cv <- predict(svm_cv_model, newdata = test_data)