Spring 2008

CS 685 : Special Topics in Data Mining

 

Instructor and Course Information

Jinze Liu

liuj@cs.uky.edu

http://www.cs.uky.edu/~liuj

(859) 257 – 3101

Meeting Time:   TR    2:00PM-3:15PM   

Meeting PlaceCB 231

Office Hours:     TR   1:00PM - 2:00PM

Office:  237 James F. Hardymon Building

 

Announcements

04/27: The final project presentation will be held in Hardymon Windstream conference room at 2~5pm Tuesday 04/29.  

            The order and the abstracts of the presentations are listed here.

            The final report is due mid-night 05/01 by email with title CS685-Final Report. Please follow the guidelines listed below.

 

03/14: Final Project Guidelines: the submission of the final project includes the project report,  the code (algorithm project)  or the pointer to the softwares (application project) and a piece of sample dataset. The project report should be about 12 pages in size 10 Times New Roman font. Spelling and grammar are not the focus of this class. But the report has to be READABLE and at least pass a commonly used spelling checker. The organization and readability of the proposal would count *as 5%* of the total project points. The project report should include the following sections.

 

             I.        (15%)  Project Description: clearly describe the project in terms of application or algorithm and introduce background and motivation of the project               

             II.        (15%)  Related Work: review existing work related to the data mining model, algorithm(algorithm project) or/and the particular dataset (application paper)) you will develop or use for your own project.

            III.        (30%)  Models and Algorithms: describe detailed algorithmic aspects of the algorithms used or developed in the  project.

           IV.        (20%)  Datasets, Evaluation and Expected outcome: describe the characteristic of your dataset, describe what type of the experiments have performed in order to evaluate the utility of the algorithm and datasets, what you propose and what you have accomplished.

            V.        (10%) Discussion: describe the pros and cons of the approach used in the paper, describe what can be improved for your project, share your experience about the project and class

           VI.         (5%)  Reference: list all the references cited in the your report. Each reference should include the name of the paper, the list of authors and when and where they are published)

 

03/04: Student Presentation Guidelines:

I.              Each presentation should be prepared as a 35 minutes talk.

II.             Presentation should cover the following parts: Motivation; Related Work; Proposed Model; Proposed Algorithm; Result; Discussion of Pros and Cons

III.            Presentation outline should be submitted one week before the talk. Presentation slides should be submitted a day before the talk.

IV.            Student is encouraged to have a dry run with a few fellow students before the talk. I would be glad to attend if my schedule is allowed.

 

01/24: Selected papers for class presentation are posted (link)

 

 

Date

Topic

Due

Note

01/10

1.     Background survey

Thanks to Jennifer

01/15

2.     Intro to the course (Slides)

 Association Rule Mining I (Slides / Reading)

Thanks to Dr. Goldsmith

01/17

3.     Association Rule Mining II (Reading)

Thanks to Dr. Zhang

01/22

4.     Association Rule Mining III (Slides / Reading)

01/24

5.     Sequential Pattern Mining (Slides / Reading)

01/29

6.     Clustering I (Slides/Reading)

01/31

7.     Clustering II (Slides/Reading)

02/05

8.     Clustering III (Slides/Reading)

Presentation

Topics due

02/07

9.     Classification I (Slides/Reading)

02/12

10.   Classification II (Slides/Reading)

02/14

Invited Lecture:( Slides/ paper link)

Reconstructing Phylogenetic Trees by Dr. Rurico Yoshida

02/19

11.   Classification III (Slides/Reading)

02/21

Project proposal due

Class cancelled due to ice storm

02/26

12.   Dimensionality Reduction I (Slides)

02/28

13.   Biclustering I ((Slides/Reading)

03/04

14.   Biclustering II (Continued)

03/06

15.   Biclustering III (Continued)

03/11

Spring Break

03/13

Spring Break

03/18

1.     Mining Optimal Decision Trees from Itemset Lattices  (PPT)

2.     Indexing Noncrashing Failures: A Dynamic Program Slicing-Based Approach (PPT)

Casey Lengacher

Wenbin Li

03/20

1.     Discriminative Frequent Pattern Analysis for Effective Classification  (PPT)

2.     Mining Approximate Frequent Itemsets in the Presence of Noise (PPT)

Mary E. Biddle

Awasthi, Apurv

03/25

1.     Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns (PPT)

2.     Local Fisher Discriminate Analysis for Supervised Dimensionality Reduction (PPT)

Nick Mattei

Xianwang Wang

03/27

1.     Data Mining for  Network Intrusion Detection ( PPT)

2.     Distributed Classification in Peer-to-Peer Networks (PPT)

Song Yuan
Satya Bulusu

04/01

1.     Correlation Search in Graph Databases (PPT)

2.     Cost-effective Outbreak Detection in Networks (PPT)

Phani Yarlagadda

Yinfang Zhuang

04/03

1.     Weighted Substructure Mining for Image Analysis (PDF)

2.     Co-clustering based Classification for Out-of-domain Documents (PPT)

Jizhou Gao

Venkata Ramana Banda

04/08

04/10

04/15

1.     Time Series Compressibility and Privacy

2.     Corroborate and Learn Facts from the Web

Lian Liu

Sandeep Gajjala

04/17

1.     Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis

2.     Show me the money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews

Cindy Burklow

Arunava Bhattacharya

04/22

1.     Automatic genome-wide reconstruction of phylogenetic gene trees

2.     Annotating Gene Function by Combining Expression Data with a Modular Gene Network

Elissaveta Arnaoudova

Arthur Hall III

04/24

1.     Analysis of Firewall Policy Rules Using Data Mining Techniques

Mehmet Onur, Ascigil

05/01

Project

 

Syllabus

With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing. This seminar will provide an introductory survey of the main topics (including and not limited to classification, regression, clustering, association rules, trend detection, feature selection, similarity search, data cleaning, privacy and security issues, and etc.) in data mining and knowledge discovery as well as a wide spectrum of data mining applications such as biomedical informatics, bioinformatics, financial market study, image processing, network monitoring, social service analysis.

For each topic, a few most related research papers will be selected as the major teaching material. Students are expected to read the assigned paper before each class and to participate the discussion in each class.

Prerequisite: None

Some background in algorithms, data structures, statistics, machine learning, artificial intelligence, and databases is helpful.

References: No required textbook

1). Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann, 2006. (ISBN:1-55860-901-6)

2).  Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press, 2001. (ISBN:0-262-08290-X)

3). Introduction to Data Mining, by Tan, Steinbach, and Kumar, Addison Wesley, 2006. (ISBN:0-321-32136-7)

4). The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, 2001. (ISBN:0-387-95284-5)      

Grading

Each student in CS685 will be expected to present a paper and lead the discussion following his/her presentation and do a project on selected topics. There will be neither homework nor exam.

1). Presentation:                     40%

2). Project:                                50%

3). Class participation:          10%

 

 

 

 

Spring 2008

CS 685 : Special Topics in Data Mining