R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
The Naive Bayes classifier is a probabilistic classifier based on applying Bayes' theorem with the assumption of independence between every pair of features. It's called "naive" because it makes the assumption that features of an instance of data are independent of each other given the class label.
Here, we'll explore how to implement the Naive Bayes classifier in R using the e1071
package.
You'll first need to install the e1071
package:
install.packages("e1071") library(e1071)
For this example, we'll use the famous iris
dataset, which is built into R:
data(iris) head(iris)
We'll split the dataset into a training set and a test set:
set.seed(123) # Set seed for reproducibility indices <- sample(2, nrow(iris), replace = TRUE, prob = c(0.7, 0.3)) train_data <- iris[indices == 1, ] test_data <- iris[indices == 2, ]
Now, we'll build our Naive Bayes classifier:
model <- naiveBayes(Species ~ ., data = train_data) print(model)
With our model in place, we can make predictions on our test set:
predictions <- predict(model, test_data)
To evaluate the performance of our classifier, we can use a confusion matrix:
library(caret) confusionMatrix(predictions, test_data$Species)
Laplace Smoothing: When a categorical variable has a category in the test data set which was not observed in training data set, the model will assign a 0 probability and will be unable to make a prediction. This is often known as ��Zero Frequency��. To solve this, we use the Laplace estimator.
model <- naiveBayes(Species ~ ., data = train_data, laplace = 1)
Kernel: For numeric attributes, we assume a Gaussian distribution. This can be changed using the usekernel = TRUE
option.
The Naive Bayes classifier, despite its simplicity and the naive design assumption, can be very effective in certain situations, especially with text data or when computational efficiency is a concern. In R, the e1071
package provides a convenient way to implement and evaluate the Naive Bayes classifier.
R Naive Bayes example code:
Overview: Introduce the concept of Naive Bayes and provide a basic example in R.
Code:
# Using the e1071 package for Naive Bayes library(e1071) # Sample data data <- data.frame( Feature1 = c(1, 1, 0, 0, 0), Feature2 = c(1, 0, 1, 0, 1), Class = c("A", "A", "B", "B", "B") ) # Building a Naive Bayes classifier model <- naiveBayes(Class ~ ., data = data) # Predicting classes new_data <- data.frame(Feature1 = 1, Feature2 = 1) predictions <- predict(model, newdata = new_data) # Printing predictions print("Predicted Class:") print(predictions)