CS685:  Special Topics in Data Mining

Homework 3: Due Oct 29th

 

Goal:  This homework will reinforce the understanding of two basic classification algorithms – Decision Trees and Naïve Bayes algorithms.

 

Description of the homework: 

Your homework will be an application of existing C4.5 algorithm and an implementation of Naïve Bayes Classifier.  A report containing answers for the following questions should be submitted including the code you wrote for Naïve Bayes Classifer.

 

Algorithms:

1)    C4.5 decision tree implementation can be downloaded at

http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html

 

2)    Naïve Bayes Classifer: Simply implement it using the method we talked about in class.

 

Datasets:


Car Evaluation Datasets

            http://archive.ics.uci.edu/ml/datasets/Car+Evaluation

 

Questions to answer in the report:

      

1.     Apply C4.5 to Car Datasets.

a)     Build one decision tree based on information gain as selection criterion and one decision tree on information gain ratio as criterion. Check whether the two trees are the same. If not, please give one example of the  difference.

 

b)    Using gain ratio as criterion, generate the list of rules.

 

c)     Using 1/10 of the dataset to build the classifier and the rest as testing sets, compute the accuracy of the result.

 

d)    Using 1/2 of the dataset to build the classifier and the rest as testing sets, compute the accuracy of the result.  Compare with c) and explain the result.

 

e)     Using 4-folds cross validation to assess the accuracy of the classification algorithm.

 

 

 

2.     Develop the Naïve Bayes Classifer.

a)     Using 4-folds cross validation to assess the accuracy of the algorithm. Meanwhile, compare your result with C4.5.

 

3.     Give all the attributes of one car you own or you know, send it to me by email by the end of the week. These cars from all the students will serve as our test sets and you should evaluate these cars with your classifier.

 

4.     Please include your code for submission.

 

5.     Please report your impression with the two classification algorithms in terms of their ease to use and interpret and your experience with the assignments.