CS685:  Special Topics in Data Mining

Homework 1: Due March 25th  

 

Goal:  This homework will reinforce the understanding of the classification algorithms – Decision Trees and Naïve Bayes algorithms.

 

Description of the homework: 

Your homework will be an application of existing C4.5 algorithm and an implementation of Naïve Bayes Classifier.  A report containing answers for the following questions should be submitted including the code you wrote for Naïve Bayes Classifier.

 

Algorithms:

1)    C4.5 decision tree implementation can be downloaded at

http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html

 

2)    Naïve Bayes Classifier: Simply implement it using the method we talked about in class.

 

Datasets:


Car Evaluation Datasets

            http://archive.ics.uci.edu/ml/datasets/Car+Evaluation

 

Questions to answer in the report:

      

1.     Apply C4.5 to Car Datasets.

a)     Build one decision tree based on information gain as selection criterion and one decision tree on information gain ratio as criterion. Check whether the two trees are the same. If not, please give one example of the  difference.

 

b)    Using gain ratio as criterion, generate the list of rules.

 

c)     Using 1/10 of the dataset to build the classifier and the rest as testing sets, compute the accuracy of the result.

 

d)    Using 1/2 of the dataset to build the classifier and the rest as testing sets, compute the accuracy of the result.  Compare with c) and explain the result.

 

e)     Using 4-folds cross validation to assess the accuracy of the classification algorithm.

 

 

2.     Develop the Naïve Bayes Classifer.

a)     Using 4-folds cross validation to assess the accuracy of the algorithm. Meanwhile, compare your result with C4.5.

 

3.     Research the issue of overfitting: when it will occur and how it can be resolved.

 

4.     Please include your code for submission.

 

5.     Please report your impression with the two classification algorithms in terms of their ease to use and interpret, and your experience with the assignments.