|
Spring 2014
CS 685 : Special Topics in Data Mining
|
|
Course
Description
With
the unprecedented rate at which data is being collected today in almost all
fields of human endeavor, there is an emerging economic and scientific need
to extract useful information from it. Data mining is the process of
automatic discovery of patterns, changes, associations and anomalies in
massive databases, and is a highly inter-disciplinary field representing
the confluence of several disciplines, including database systems, data
warehousing, machine learning, statistics, algorithms, data visualization,
and high-performance computing. This seminar will provide an introductory
survey of the main topics (including and not limited to classification,
regression, clustering, association rules, trend detection, feature
selection, similarity search, data cleaning, privacy and security issues,
and etc.) in data mining and knowledge discovery as well as a wide spectrum
of data mining applications such as biomedical informatics, bioinformatics,
financial market study, image processing, network monitoring, social
service analysis.
|
|
|
Meeting Time: TR
2:00PM-3:15PM
|
Meeting Place: Mining building 243
|
Office Hours: By appointment
|
Office: 235 James F. Hardymon
Building
|
|
|
|
Announcements
01/15: Welcome to Spring 2014
Data Mining class !
Course syllabus can be found here.
|
|
|
|
Date
|
Topics
|
Assn
|
Due
|
Note
|
01/16
|
1. Introduction and background survey ( Slides )
|
|
|
|
01/21
|
2. Frequent Itemset Mining
- 1 (Slides)
|
|
|
MMD Ch6:
Frequent itemset mining.
|
01/23
|
3. Frequent Itemset
Mining - 2 (Slides)
|
|
|
|
01/28
|
4.
Clustering
- K-means (Slides)
|
Assn
1a-Gradiance
|
|
MMD
ch7: Clustering
|
01/30
|
5. MapReduce Framework by Julian Ruben
|
|
|
MMD Ch2: MapReduce
|
02/04
|
6. Clustering - Hierarchical
Clustering (Slides)
|
Assn 1b
Assn 2
|
Assn
1a
|
|
02/06
|
7. Clustering – Biclustering (Slides)
|
|
|
|
02/11
|
8.
Classification – Naïve Bayes Classifier
(Slides)
|
|
Assn 1b
|
MMD
Ch12
|
02/13
|
9. Classification – Support Vector
Machine (Slides)
|
|
|
|
02/18
|
10.
Classification – Decision
Tree (Slides)
|
|
|
|
02/20
|
11. Continued
|
Project and Presentation
|
|
|
02/25
|
12.
Recommendation System (Slides)
|
|
Assn 2
|
MMD
Ch 9: Recommendation Systems
|
02/27
|
Project pitch
|
|
|
|
03/04
|
13.
Cross Validation (Slides)
|
Assn 3
|
|
|
03./06
|
|
|
Proposal
|
|
03/11
|
Mid-Term
Review
|
|
|
|
03/13
|
Mid-Term Exam
|
|
|
|
03/18,20
|
Spring Break
|
|
|
Enjoy
Spring Break!
|
03/25
|
-
Recent hierarchical clustering improvements for Bioinformtatics [HPC-clust][CD-hit]
(by Joel Lowery)
-
Co-Clustering [Link]
(by Xinan Liu)
|
|
Assn 3
|
|
03/27
|
-
A new fast vertical method for mining frequent
patterns [Link]
(by Matthew Spradling)
-
Dission of Mid-term.
|
|
|
|
04/01
|
-
Vector space model for information retrieval (by Albujasim, Zainab)
-
Efficient application identification and the
temporal and spatial stability of classification schema [Link]
(by Xin Li)
|
|
|
|
04/03
|
-
Collaborative Filtering (by Luan Pham)
-
Content-based Recommendation (by Hunsucker, James D)
|
|
|
|
04/08
|
-
Community structure in social and biological
networks [Link] (by Tawfiq Salem)
-
Community findings within the community set space (
by Sergio Rivera Polanco)
|
|
|
|
04/10
|
-
Real time delivery architecture at Twitter (by Ye Yu)
-
Roger Chui
|
|
|
|
04/15
|
-
Surendran Neelakantan
-
Vamsidhar Reddy Puttireddy
|
|
|
|
04/17
|
-
Hamid Hamraz
-
Maria Morales
|
|
|
|
04/22
|
-
Sifei Han
-
Orhan Abar
|
|
|
|
04/24
|
Project presentation
|
|
|
|
04/25
|
Project presentation
|
|
|
|
04/29
|
Awesome inc
presentation by Brian Raney
|
|
|
|
|
|
|
|
|
|
|
Prerequisite:
Some background in
algorithms, data structures, statistics, machine learning, artificial
intelligence, and database systems is helpful.
Book:
Mining of Massive Data, by Anand Rajaraman, Jeff Ullman and Jure
Leskovec. The book can be accessed freely online ( http://i.stanford.edu/~ullman/mmds.html#latest
)
Other References:
1).
Data
Mining --- Concepts and techniques, by Han and Kamber,
Morgan Kaufmann.
|
2).
Principles of Data Mining, by Hand, Mannila, and
Smyth, MIT Press.
|
3).
Introduction to Data Mining, by Tan, Steinbach, and Kumar,
Addison Wesley.
|
4).
The Elements of Statistical Learning --- Data Mining, Inference, and
Prediction,
by Hastie, Tibshirani, and Friedman, Springer.
|
|
5).
Pattern Recognition and Machine Learning, by Christopher M. Bishop.
|
|
|
|
|
|