R Tutorial

Fundamentals of R

Variables

Input and Output

Decision Making

Control Flow

Functions

Strings

Vectors

Lists

Arrays

Matrices

Factors

DataFrames

Object Oriented Programming

Error Handling

File Handling

Packages in R

Data Interfaces

Data Visualization

Statistics

Machine Learning with R

Decision Tree in R

A Decision Tree is a supervised machine learning algorithm used for both classification and regression tasks. In this tutorial, we'll cover how to train and visualize a Decision Tree for classification in R using the rpart package.

1. Installing and Loading Required Packages:

We'll use the rpart package for creating decision trees and rpart.plot for tree visualization.

install.packages("rpart")
install.packages("rpart.plot")

library(rpart)
library(rpart.plot)

2. Sample Data:

For this tutorial, we'll use the iris dataset, a built-in dataset in R, which contains measurements of 150 iris flowers from three different species.

head(iris)

3. Splitting the Dataset:

We'll split the dataset into a training set and a testing set:

set.seed(123)  # Setting seed to reproduce the results
train_index <- sample(1:nrow(iris), nrow(iris)*0.7)
train_data <- iris[train_index,]
test_data <- iris[-train_index,]

4. Training the Decision Tree:

We'll use the rpart() function to train the decision tree:

tree_model <- rpart(Species ~ ., data = train_data, method = "class")

Here, we're predicting the Species based on all other variables (. signifies all other columns).

5. Visualizing the Decision Tree:

Using rpart.plot:

rpart.plot(tree_model, main="Decision Tree for Iris Dataset")

6. Making Predictions:

Now, we'll use the decision tree model to predict the species for the test set:

predictions <- predict(tree_model, test_data, type = "class")

7. Evaluating the Model:

We can create a confusion matrix to see how many predictions our decision tree got right:

table(pred = predictions, true = test_data$Species)

8. Pruning the Tree:

Sometimes, the tree can be too complex. Pruning can simplify it by cutting some branches, which may also help in reducing overfitting.

# Check the printcp output for optimal cp value
printcp(tree_model)

# Prune the tree
pruned_tree <- prune(tree_model, cp = tree_model$cptable[which.min(tree_model$cptable[,"xerror"]),"CP"])
rpart.plot(pruned_tree, main = "Pruned Decision Tree")

9. Retest with Pruned Tree:

After pruning, you should retest the model and evaluate its performance again to ensure it still performs well or even better on unseen data.

pruned_predictions <- predict(pruned_tree, test_data, type = "class")
table(pred = pruned_predictions, true = test_data$Species)

Conclusion:

Decision Trees are a powerful tool for classification and regression. In R, the rpart package provides an easy-to-use interface for training and visualizing decision trees. Pruning can be an essential step to avoid overfitting and create a simpler model. Always remember to evaluate your model's performance on unseen data to ensure its effectiveness.

  1. Creating decision trees with R:

    • Decision trees are a popular machine learning algorithm for classification and regression.
    # Creating a decision tree in R
    library(rpart)
    
    # Sample data
    data(iris)
    
    # Building a decision tree
    decision_tree <- rpart(Species ~ ., data = iris)
    
  2. Rpart package in R for decision trees:

    • The rpart package is commonly used for building decision trees in R.
    # Using rpart package for decision trees
    library(rpart)
    
    # Sample data
    data(iris)
    
    # Building a decision tree with rpart
    decision_tree <- rpart(Species ~ ., data = iris)
    
  3. Decision tree visualization in R:

    • Visualize decision trees for better interpretation.
    # Visualizing decision tree in R
    library(rpart.plot)
    
    # Plotting the decision tree
    rpart.plot(decision_tree)
    
  4. Random Forest in R:

    • Random Forest is an ensemble learning method that builds multiple decision trees and combines their predictions.
    # Using randomForest package for Random Forest
    library(randomForest)
    
    # Sample data
    data(iris)
    
    # Building a Random Forest model
    random_forest_model <- randomForest(Species ~ ., data = iris)
    
  5. CART algorithm in R:

    • CART (Classification and Regression Trees) is an algorithm used to construct decision trees.
    # Using rpart package with CART algorithm
    library(rpart)
    
    # Sample data
    data(iris)
    
    # Building a decision tree with CART algorithm
    decision_tree_cart <- rpart(Species ~ ., data = iris, method = "class")
    
  6. Decision tree pruning in R:

    • Pruning is a technique to reduce the complexity of decision trees and avoid overfitting.
    # Pruning a decision tree in R
    pruned_tree <- prune(decision_tree, cp = 0.01)
    
  7. Conditional inference trees in R:

    • Conditional inference trees offer a non-parametric alternative to traditional decision trees.
    # Using party package for conditional inference trees
    library(party)
    
    # Sample data
    data(iris)
    
    # Building a conditional inference tree
    conditional_tree <- ctree(Species ~ ., data = iris)
    
  8. Visualizing decision trees with plotly in R:

    • Plotly can be used to create interactive visualizations of decision trees.
    # Using plotly for interactive decision tree visualization
    library(plotly)
    
    # Plotting decision tree with plotly
    plot_ly(decision_tree, type = "decision tree")