Syllabus
With
the unprecedented rate at which data is being collected today in almost all
fields of human endeavor, there is an emerging economic and scientific need
to extract useful information from it. Data mining is the process of
automatic discovery of patterns, changes, associations and anomalies in
massive databases, and is a highly inter-disciplinary field representing
the confluence of several disciplines, including database systems, data
warehousing, machine learning, statistics, algorithms, data visualization,
and high-performance computing. This seminar will provide an introductory
survey of the main topics (including and not limited to classification,
regression, clustering, association rules, trend detection, feature
selection, similarity search, data cleaning, privacy and security issues,
and etc.) in data mining and knowledge discovery as well as a wide spectrum
of data mining applications such as biomedical informatics, bioinformatics,
financial market study, image processing, network monitoring, social service
analysis.
For each topic, a few most related research papers will
be selected as the major teaching material. Students are expected to read
the assigned paper before each class and to participate
the discussion in each class.
Prerequisite: None
Some background in algorithms, data
structures, statistics, machine learning, artificial intelligence, and
databases is helpful.
References: No required textbook
1).
Data
Mining --- Concepts and techniques,
by Han and Kamber, Morgan Kaufmann, 2006. (ISBN:1-55860-901-6)
|
2). Principles of Data Mining, by Hand, Mannila,
and Smyth, MIT Press, 2001. (ISBN:0-262-08290-X)
|
3).
Introduction
to Data Mining, by Tan, Steinbach, and Kumar,
Addison Wesley, 2006. (ISBN:0-321-32136-7)
|
4).
The
Elements of Statistical Learning --- Data Mining, Inference, and
Prediction, by Hastie, Tibshirani,
and Friedman, Springer, 2001. (ISBN:0-387-95284-5)
|
Grading
Each student in CS685 will be expected to present a paper
and lead the discussion following his/her presentation and do a project on
selected topics. There will be neither homework nor exam.
1). Presentation:
40%
|
2). Project:
50%
|
3). Class participation:
10%
|
|
|