R Tutorial
Fundamentals of R
Variables
Input and Output
Decision Making
Control Flow
Functions
Strings
Vectors
Lists
Arrays
Matrices
Factors
DataFrames
Object Oriented Programming
Error Handling
File Handling
Packages in R
Data Interfaces
Data Visualization
Statistics
Machine Learning with R
A Decision Tree is a supervised machine learning algorithm used for both classification and regression tasks. In this tutorial, we'll cover how to train and visualize a Decision Tree for classification in R using the rpart
package.
We'll use the rpart
package for creating decision trees and rpart.plot
for tree visualization.
install.packages("rpart") install.packages("rpart.plot") library(rpart) library(rpart.plot)
For this tutorial, we'll use the iris
dataset, a built-in dataset in R, which contains measurements of 150 iris flowers from three different species.
head(iris)
We'll split the dataset into a training set and a testing set:
set.seed(123) # Setting seed to reproduce the results train_index <- sample(1:nrow(iris), nrow(iris)*0.7) train_data <- iris[train_index,] test_data <- iris[-train_index,]
We'll use the rpart()
function to train the decision tree:
tree_model <- rpart(Species ~ ., data = train_data, method = "class")
Here, we're predicting the Species
based on all other variables (.
signifies all other columns).
Using rpart.plot
:
rpart.plot(tree_model, main="Decision Tree for Iris Dataset")
Now, we'll use the decision tree model to predict the species for the test set:
predictions <- predict(tree_model, test_data, type = "class")
We can create a confusion matrix to see how many predictions our decision tree got right:
table(pred = predictions, true = test_data$Species)
Sometimes, the tree can be too complex. Pruning can simplify it by cutting some branches, which may also help in reducing overfitting.
# Check the printcp output for optimal cp value printcp(tree_model) # Prune the tree pruned_tree <- prune(tree_model, cp = tree_model$cptable[which.min(tree_model$cptable[,"xerror"]),"CP"]) rpart.plot(pruned_tree, main = "Pruned Decision Tree")
After pruning, you should retest the model and evaluate its performance again to ensure it still performs well or even better on unseen data.
pruned_predictions <- predict(pruned_tree, test_data, type = "class") table(pred = pruned_predictions, true = test_data$Species)
Decision Trees are a powerful tool for classification and regression. In R, the rpart
package provides an easy-to-use interface for training and visualizing decision trees. Pruning can be an essential step to avoid overfitting and create a simpler model. Always remember to evaluate your model's performance on unseen data to ensure its effectiveness.
Creating decision trees with R:
# Creating a decision tree in R library(rpart) # Sample data data(iris) # Building a decision tree decision_tree <- rpart(Species ~ ., data = iris)
Rpart package in R for decision trees:
rpart
package is commonly used for building decision trees in R.# Using rpart package for decision trees library(rpart) # Sample data data(iris) # Building a decision tree with rpart decision_tree <- rpart(Species ~ ., data = iris)
Decision tree visualization in R:
# Visualizing decision tree in R library(rpart.plot) # Plotting the decision tree rpart.plot(decision_tree)
Random Forest in R:
# Using randomForest package for Random Forest library(randomForest) # Sample data data(iris) # Building a Random Forest model random_forest_model <- randomForest(Species ~ ., data = iris)
CART algorithm in R:
# Using rpart package with CART algorithm library(rpart) # Sample data data(iris) # Building a decision tree with CART algorithm decision_tree_cart <- rpart(Species ~ ., data = iris, method = "class")
Decision tree pruning in R:
# Pruning a decision tree in R pruned_tree <- prune(decision_tree, cp = 0.01)
Conditional inference trees in R:
# Using party package for conditional inference trees library(party) # Sample data data(iris) # Building a conditional inference tree conditional_tree <- ctree(Species ~ ., data = iris)
Visualizing decision trees with plotly in R:
# Using plotly for interactive decision tree visualization library(plotly) # Plotting decision tree with plotly plot_ly(decision_tree, type = "decision tree")