CS685: Special Topics in Data Mining
Homework 3:
Due Oct 29th
Goal:
This homework will reinforce the understanding of two basic
classification algorithms – Decision Trees and Naïve Bayes algorithms.
Description of the homework:
Your homework will be an application of existing C4.5 algorithm and an
implementation of Naïve Bayes Classifier. A report containing answers for the
following questions should be submitted including the code you wrote for
Naïve Bayes Classifer.
Algorithms:
1)
C4.5
decision tree implementation can be downloaded at
http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html
2)
Naïve
Bayes Classifer: Simply
implement it using the method we talked about in class.
Datasets:
Car Evaluation Datasets
http://archive.ics.uci.edu/ml/datasets/Car+Evaluation
Questions to answer in the report:
1.
Apply C4.5 to Car Datasets.
a)
Build
one decision tree based on information gain as selection criterion and one
decision tree on information gain ratio as criterion. Check whether the two
trees are the same. If not, please give one example of the difference.
b)
Using
gain ratio as criterion, generate the list of rules.
c)
Using
1/10 of the dataset to build the classifier and the rest as testing sets,
compute the accuracy of the result.
d)
Using
1/2 of the dataset to build the classifier and the rest as testing sets,
compute the accuracy of the result.
Compare with c) and explain the result.
e)
Using
4-folds cross validation to assess the accuracy of the classification
algorithm.
2.
Develop the Naïve Bayes Classifer.
a)
Using
4-folds cross validation to assess the accuracy of the algorithm. Meanwhile,
compare your result with C4.5.
3.
Give all the attributes of one car
you own or you know, send it to me by email by the end of the week. These cars
from all the students will serve as our test sets and you should evaluate these
cars with your classifier.
4.
Please include your code for
submission.
5.
Please report your impression with
the two classification algorithms in terms of their ease to use and interpret
and your experience with the assignments.