Spring 2009

CS 685 : Special Topics in Data Mining and Its Application in Bioinformatics

 

Links

  1. Instructor and course information
  2. Announcements
  3. Calendar
  4. Notes
  5. Homework
  6. Suggested Readings
  7. Projects
  8. Existing Software

 

 

 

 

 

Instructor and Course Information

Jinze Liu

liuj@cs.uky.edu

http://www.cs.uky.edu/~liuj

(859) 257 – 3101

Meeting Time:   TR    2:00PM-3:15PM   

Meeting Place:  POT 110

Office Hours:     TR   1:00PM - 2:00PM

Office:  237 James F. Hardymon Building

 

Announcements

1/26: Suggested Projects are posted !

 

 

Schedule and Notes

 

Topic

Reading

Assigned

Due

1.     Introduction to Data Mining (01/16; 01/20)

 

2.     Association Rule Mining (01/20; 01/22)

ARM

Homework 1

2/10/09

3.     Clustering (01/26, 01/29, 02/03, 02/05)

Clustering

Homework 2

2/23/09

4.     Classification ( 02/10, 02/12, 02/17, 02/19, additional)

Classification

Homework 3

03/10/09

5.     Semi-supervised Clustering ( 02/26 )

6.     Dimensionality Reduction (03/03, 03/05)

7.     Graph Mining

8.     David Kreig

 

9.     Devin Cook : Stupid Filter

10.   Yin Hu: Pattern Mining in Images

11.   Dan Staley: Game Learning

12.   ChingJoo Khor: EM algorithm for approximate linkage analysis

13.   Daniel Harris:

14.   Fo Bo:

15.   Joshua Guerin

16.   Matt Caldwell

17.   Wille Miller

18.   Kai Wang

19.   Sami Taha

20.   Tom Shearing

21.   Jeremy Howard

22.   Chandrasekarapuram, Mohan

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Syllabus

With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing. This seminar will provide an introductory survey of the main topics (including and not limited to classification, regression, clustering, association rules, trend detection, feature selection, similarity search, data cleaning, privacy and security issues, and etc.) in data mining and knowledge discovery as well as a wide spectrum of data mining applications such as biomedical informatics, bioinformatics, financial market study, image processing, network monitoring, social service analysis.

For each topic, a few most related research papers will be selected as the major teaching material. Students are expected to read the assigned paper before each class and to participate the discussion in each class.

Prerequisite:

Some background in algorithms, data structures, statistics, machine learning, artificial intelligence, and databases is helpful.

References: (No required textbook)

1). Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann, 2006. (ISBN:1-55860-901-6)

2). Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press, 2001. (ISBN:0-262-08290-X)

3). Introduction to Data Mining, by Tan, Steinbach, and Kumar, Addison Wesley, 2006. (ISBN:0-321-32136-7)

4). The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, 2001. (ISBN:0-387-95284-5)      

5). Pattern Recognition and Machine Learning, by Christopher M. Bishop, 2006.

Grading

Each student in CS685 will be expected to present a paper and lead the discussion following his/her presentation and do a project on selected topics.

4 Homeworks

40%

Exam

15%

Presentation

15%

Project

30%

 

Tentative Course Outline  

1. Introduction

·         What is data mining?

2. Data Preprocessing

·          Data sampling, data cleaning, feature selection, and dimensionality reduction

3. Classification

·         Tree-based, rule-based, and instance-based methods

·         Bayesian methods (naive Bayes and Bayesian belief networks)

·         Neural networks, linear discriminant analysis, support vector machines, and ensemble methods

·         Model evaluation  

4. Association Analysis

·         Apriori algorithm and its extensions

·         Pattern evaluation (subjective and objective interestingness measures)

·         Sequential patterns and graph mining

5. Clustering

·         Partitional and hierarchical clustering methods

·         Graph-based and density-based methods

·         Cluster evaluation

 

 

 

 

 

 

 

Spring 2008

CS 685 : Special Topics in Data Mining