Overview
With the help of given R commands, classify the entries in a simple dataset to familiarise yourself with two classifiers, SVM and random forest (RF).
Purpose
Learn to use R for classification. Practise the classification topics discussed in the presentation. Learn to interpret the quality of a classification result.
Task
Go through the steps described below. Answer the questions in a separate file.
Time
This task should be completed in your 8th tutorial or the week after and submitted to Canvas for feedback. It should be discussed and signed off in tutorial 9 or 10.
This task should take no more than 1 hour to complete (excluding introductory videos).
Resources
- Lecture Presentation
- Code listing with more explanations (below)
- Any other material you find useful in explaining the results
Feedback
Demonstrate your steps and discuss your answers with the tutorial instructor.
Next
Get started on module 9.
Pass Task 8 — Submission Details and Assessment Criteria
Follow the steps below and answer the questions in a separate file, then upload to Canvas as a PDF. Your tutor will give online feedback and discuss the tasks with you in the lab when they are complete.
Task 8
Exercise 1
Run these lines in your RStudio. They load the Iris dataset, partition it into training and testing sets and specify stratified 10-fold crossvalidation.
library("caret")
(This loads the caret library which has functions that we need. If you need to know what a function does, you can ask for help using ?, e.g. ? traincontrol()).
iris.data <- read.csv(file.choose()) View(iris.data)
iris.data$species <- as.factor(iris.data$species) set.seed(32984)
indexes <- createDataPartition(iris.data$species, times = 1,
p = 0.7, list = FALSE)
train <- iris.data[indexes,] test <- iris.data[-indexes,]
trainctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)
Here you train the SVM model (with a linear kernel, hence “svmLinear”) and then you apply the trained model to the test set. svmlin shows you the accuracy of the classification on the training dataset, confusionmatrix shows you the accuracy on the test dataset.
svmlin <- train(species ~., data=train, method="svmLinear", trControl=trainctrl,
preProcess = c("center", "scale"), tuneLength=10)
svmlin
svmresult <- predict(svmlin, newdata=test) svmresult
confusionMatrix(table(svmresult, test$species))
Question 1. Does this model suffer from overfitting? How can you tell?
Exercise 2
Apply the Random Forest algorithm to the same dataset – just run the code provided.
rfmodel <- train(species ~., data=train, method="ranger", trControl=trainctrl,
preProcess = c("center", "scale"), tuneLength=10) rfresult <- predict(rfmodel, newdata=test) confusionMatrix(table(rfresult, test$species))
Question 2. Examine the outcomes of both SVM and Random Forest. Which provides the best results?
Question 3. The sensitivity and specificity values for each of the classes are slightly different. What does this mean?
Expert's Answer
Chat with our Experts
Want to contact us directly? No Problem. We are always here for you
Your future, our responsibilty submit your task on time.
Order NowGet Online
Assignment Help Services