Spring 2014

CS 685 : Special Topics in Data Mining

 

Course Description

With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations and anomalies in massive databases, and is a highly inter-disciplinary field representing the confluence of several disciplines, including database systems, data warehousing, machine learning, statistics, algorithms, data visualization, and high-performance computing. This seminar will provide an introductory survey of the main topics (including and not limited to classification, regression, clustering, association rules, trend detection, feature selection, similarity search, data cleaning, privacy and security issues, and etc.) in data mining and knowledge discovery as well as a wide spectrum of data mining applications such as biomedical informatics, bioinformatics, financial market study, image processing, network monitoring, social service analysis.

 

 

Instructor and Course Information

Jinze Liu

liuj@cs.uky.edu

http://www.cs.uky.edu/~liuj

(859) 257 – 3101

Meeting Time:   TR    2:00PM-3:15PM   

Meeting PlaceMining building 243

Office Hours:    By appointment

Office:  235 James F. Hardymon Building

 

Announcements

01/15: Welcome to Spring 2014 Data Mining class ! 

             Course syllabus can be found here.

 

 

Date

Topics

Assn

Due

Note

01/16

1.     Introduction and background survey ( Slides )

 

 

01/21

2.     Frequent Itemset Mining  - 1  (Slides)

 

MMD Ch6: Frequent itemset mining.

01/23

3.     Frequent Itemset Mining  -  2 (Slides)

 

 

01/28

4.     Clustering -  K-means (Slides)

Assn 1a-Gradiance

MMD ch7: Clustering

01/30

5.     MapReduce Framework by Julian Ruben

 

MMD Ch2: MapReduce

02/04

6.     Clustering - Hierarchical Clustering (Slides)

Assn 1b

Assn 2

Assn 1a

02/06

7.     Clustering – Biclustering (Slides)

 

02/11

8.     Classification – Naïve Bayes Classifier (Slides)

 

Assn 1b

MMD Ch12

02/13

9.     Classification – Support Vector Machine (Slides)

 

 

 

02/18

10.   Classification – Decision Tree (Slides)

 

 

 

02/20

11.   Continued

Project and Presentation

 

02/25

12.   Recommendation System (Slides)

Assn 2

MMD Ch 9: Recommendation Systems

02/27

Project pitch

 

 

 

03/04

13.   Cross Validation (Slides)

Assn 3

 

 

03./06

 

 

Proposal

03/11

Mid-Term Review

 

03/13

Mid-Term Exam

 

03/18,20

       Spring Break

 

 

Enjoy Spring Break!

03/25

-       Recent hierarchical clustering improvements for Bioinformtatics [HPC-clust][CD-hit] (by Joel Lowery)

-       Co-Clustering [Link] (by Xinan Liu)

 

 

 

Assn 3

 

03/27

-       A new fast vertical method for mining frequent patterns  [Link] (by Matthew Spradling)

-       Dission of Mid-term.

 

 

 

 

04/01

-       Vector space model for information retrieval (by Albujasim, Zainab)

-       Efficient application identification and the temporal and spatial stability of classification schema [Link] (by  Xin Li)

 

 

 

04/03

-       Collaborative Filtering (by Luan Pham)

-       Content-based Recommendation (by Hunsucker, James D)

 

 

 

04/08

-       Community structure in social and biological networks [Link]  (by Tawfiq Salem)

-       Community findings within the community set space ( by Sergio Rivera Polanco)

 

 

 

04/10

-       Real time delivery architecture at Twitter (by Ye Yu)

-       Roger Chui

 

 

 

04/15

-       Surendran Neelakantan

-       Vamsidhar Reddy Puttireddy

 

 

 

04/17

-       Hamid Hamraz

-       Maria Morales

 

 

 

04/22

-       Sifei Han

-       Orhan Abar

 

 

 

04/24

Project presentation

 

 

 

04/25

Project presentation

 

 

 

04/29

Awesome inc presentation by Brian Raney

 

 

 

 

 

 

 

 

 

Prerequisite:

Some background in algorithms, data structures, statistics, machine learning, artificial intelligence, and database systems is helpful.

Book:

Mining of Massive Data, by Anand Rajaraman, Jeff Ullman and  Jure Leskovec. The book can be accessed freely online ( http://i.stanford.edu/~ullman/mmds.html#latest )

Other References:

1). Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann.

2). Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press.

3). Introduction to Data Mining, by Tan, Steinbach, and Kumar, Addison Wesley.

4). The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer.           

5). Pattern Recognition and Machine Learning, by Christopher M. Bishop. 

 

 

 

 

Spring 2008

CS 685 : Special Topics in Data Mining