R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
Supervised and unsupervised learning are two primary categories of machine learning. In this tutorial, we'll discuss their definitions, differences, and how to implement them in R.
For this example, let's use the iris
dataset. We'll perform a classification task using the randomForest
package.
# Install and load the necessary package install.packages("randomForest") library(randomForest) # Splitting the data set.seed(123) trainIndex <- sample(1:nrow(iris), nrow(iris)*0.7) trainData <- iris[trainIndex,] testData <- iris[-trainIndex,] # Building a Random Forest model rf_model <- randomForest(Species ~ ., data=trainData, ntree=100) print(rf_model) # Making predictions predictions <- predict(rf_model, testData) table(predictions, testData$Species)
We'll use the iris
dataset for clustering (without the Species column) using the kmeans
method.
# Removing the Species column for unsupervised learning iris_unsupervised <- iris[, -5] # K-means clustering set.seed(123) km_result <- kmeans(iris_unsupervised, centers=3) print(km_result) # Visualization install.packages("ggplot2") library(ggplot2) iris$Cluster <- as.factor(km_result$cluster) ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Cluster)) + geom_point()
Data Labeling: Supervised learning requires labeled data, i.e., both input and corresponding desired output. In contrast, unsupervised learning works with unlabeled data.
Goal: The goal in supervised learning is to make predictions for the output variable. In unsupervised learning, the goal might be to discover structure, patterns, associations, or clusters in the data.
Evaluation: In supervised learning, model performance can be evaluated based on how well it predicts the test data. In unsupervised learning, evaluation can be trickier since there are no correct outputs to compare to.
Quality of Data: For supervised learning, ensure that the data you're using for training is representative and correctly labeled.
Choosing the Number of Clusters: For unsupervised learning, especially k-means, it's often challenging to pick the right number of clusters. Methods like the elbow method can be helpful.
Both supervised and unsupervised learning offer valuable tools for different kinds of problems. Understanding their strengths, requirements, and limitations is crucial for their effective application in R or any other platform.
Introduction to Machine Learning in R:
# Example: Linear Regression model <- lm(mpg ~ wt + hp, data = mtcars)
R Packages for Supervised Learning:
caret
, randomForest
, and glmnet
for various supervised learning algorithms.library(caret) library(randomForest) library(glmnet)
R Packages for Unsupervised Learning:
cluster
, factoextra
, and kmeans
are used for unsupervised learning tasks.library(cluster) library(factoextra) library(kmeans)
Classification Algorithms in R:
# Example: Decision Tree model <- rpart(Species ~ ., data = iris)
Regression Analysis in R:
# Example: Linear Regression model <- lm(mpg ~ wt + hp, data = mtcars)
Clustering Algorithms in R:
# Example: K-Means Clustering model <- kmeans(iris[, 1:4], centers = 3)
Dimensionality Reduction in R:
# Example: PCA model <- prcomp(iris[, 1:4])
Feature Selection in R for Supervised Learning:
# Example: Recursive Feature Elimination model <- rfe(mtcars[, -1], mtcars[, 1], sizes = c(1:10), rfeControl = rfeControl(functions = lmFuncs))
Cross-Validation in R Machine Learning:
# Example: k-Fold Cross-Validation cv_results <- trainControl(method = "cv", number = 10) model <- train(mpg ~ wt + hp, data = mtcars, method = "lm", trControl = cv_results)
Model Evaluation in R:
# Example: Confusion Matrix confusion_matrix <- confusionMatrix(predicted_labels, true_labels)
Ensemble Learning in R:
# Example: Random Forest model <- randomForest(Species ~ ., data = iris)
Association Rule Mining in R:
# Example: Apriori Algorithm library(arules) transactions <- read.transactions("transaction_data.txt", format = "basket", sep = ",") rules <- apriori(transactions, parameter = list(support = 0.01, confidence = 0.8))
R caret Package for Machine Learning:
caret
package provides a unified interface for various machine learning tasks.library(caret) # Example: Train a model using caret model <- train(mpg ~ wt + hp, data = mtcars, method = "lm")