Spring 2008 -- CS685: Special Topics in Data Mining

Spring 2009

CS 685 : Special Topics in Data Mining and Its Application in Bioinformatics

Instructor and Course Information

Jinze Liu

liuj@cs.uky.edu

http://www.cs.uky.edu/~liuj

(859) 257 – 3101

Meeting Time: TR 2:00PM-3:15PM

Meeting Place: POT 110

Office Hours: TR 1:00PM - 2:00PM

Office: 237 James F. Hardymon Building

Announcements

1/26: Suggested Projects are posted !

Schedule and Notes

Topic	Reading	Assigned	Due
1. Introduction to Data Mining (01/16; 01/20)
2. Association Rule Mining (01/20; 01/22)	ARM	Homework 1	2/10/09
3. Clustering (01/26, 01/29, 02/03, 02/05)	Clustering	Homework 2	2/23/09
4. Classification ( 02/10, 02/12, 02/17 , 02/19, additional)	Classification	Homework 3	03/10/09
5. Semi-supervised Clustering ( 02/26 )
6. Dimensionality Reduction (03/03, 03/05)
7. Graph Mining
8. David Kreig
9. Devin Cook : Stupid Filter 10. Yin Hu: Pattern Mining in Images
11. Dan Staley: Game Learning 12. ChingJoo Khor: EM algorithm for approximate linkage analysis
13. Daniel Harris: 14. Fo Bo:
15. Joshua Guerin 16. Matt Caldwell
17. Wille Miller 18. Kai Wang
19. Sami Taha 20. Tom Shearing
21. Jeremy Howard 22. Chandrasekarapuram, Mohan

Syllabus

With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing. This seminar will provide an introductory survey of the main topics (including and not limited to classification, regression, clustering, association rules, trend detection, feature selection, similarity search, data cleaning, privacy and security issues, and etc.) in data mining and knowledge discovery as well as a wide spectrum of data mining applications such as biomedical informatics, bioinformatics, financial market study, image processing, network monitoring, social service analysis.

For each topic, a few most related research papers will be selected as the major teaching material. Students are expected to read the assigned paper before each class and to participate the discussion in each class.

Prerequisite:

Some background in algorithms, data structures, statistics, machine learning, artificial intelligence, and databases is helpful.

References: (No required textbook)

1). Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann, 2006. (ISBN:1-55860-901-6)

2). Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press, 2001. (ISBN:0-262-08290-X)

3). Introduction to Data Mining, by Tan, Steinbach, and Kumar, Addison Wesley, 2006. (ISBN:0-321-32136-7)

4). The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, 2001. (ISBN:0-387-95284-5)

5). Pattern Recognition and Machine Learning, by Christopher M. Bishop, 2006.

Grading

Each student in CS685 will be expected to present a paper and lead the discussion following his/her presentation and do a project on selected topics.

4 Homeworks	40%
Exam	15%
Presentation	15%
Project	30%

Tentative Course Outline

1. Introduction

· What is data mining?

2. Data Preprocessing

· Data sampling, data cleaning, feature selection, and dimensionality reduction

3. Classification

· Tree-based, rule-based, and instance-based methods

· Bayesian methods (naive Bayes and Bayesian belief networks)

· Neural networks, linear discriminant analysis, support vector machines, and ensemble methods

· Model evaluation

4. Association Analysis

· Apriori algorithm and its extensions

· Pattern evaluation (subjective and objective interestingness measures)

· Sequential patterns and graph mining

5. Clustering

· Partitional and hierarchical clustering methods

· Graph-based and density-based methods

· Cluster evaluation

Spring 2008

CS 685 : Special Topics in Data Mining

CS 685 : Special Topics in Data Mining and Its Application in Bioinformatics

Links

Instructor and Course Information

Announcements

Schedule and Notes

Syllabus