Here is a list of sample projects and associated datasets. Do NOT limit yourself to these problems. THINK OUT OF BOX!

Sample projects:

1.       Graph Mining: : In recent years, social network research has advanced significantly, thanks to the prevalence of the online social websites and the availability of a variety of offline large-scale social network systems such as collaboration networks. These social network systems are usually characterized by the complex network structures and rich accompanying contextual information. Researchers are increasingly interested in addressing a wide range of challenges residing in these disparate social network systems, including identifying common static topological properties and dynamic properties during the formation and evolution of these social networks, and how contextual information can help in analyzing the pertaining social networks. These issues have important implications on community discovery, anomaly detection, trend prediction and can enhance applications in multiple domains such as information retrieval, recommendation systems, security and so on.

Basic questions: How to identify similar graphs or subgraphs? And how to identify different components among a set of similar graphs?  

Datasets:

a.       The student will be given a set of graphs representing alternative splicing patterns.

b.      Stanford large network dataset collection . Try out SNAP, a network analysis platform.

 

2.       Recommendation system: How to build a recommendation system? Any good idea to improve over existing systems?

Datasets:

a.       MovieLens Datasets

b.      Yahoo! Music

c.       Jester Joke

 

3.       Text Mining: How to extract useful information (Trend, Consensus and Sentiment) from blogs, online forums,  online reviews or twitter messages?

Datasets:

a.       Movie review datasets at Stanford

b.      Amazon reviews

c.       Hotel and car reviews

d.      Online forum mining and analysis

 

 

Project Proposal Guidelines (Due March 6th)

Format: at least 2 pages, single space, not including reference.

1.       Project Description: Introduce the background and motivation of the project and high level data mining task that will be performed in this project (This serves as the content of the short pitch you will give on Feb 27th in class).      

2.       Related Work: review at least a couple of existing works related to your project. A comprehensive discussion of the related work and how it supports what you would like to work on is necessary. (These papers would also serve as the basis for your presentation, please list them in your proposal)

3.       Proposed Work: describe the following aspects of your work. Try to incorporate as much information as you can. But you are free to change them as your project progresses.  The first question should be answered as accurate as possible.

I.                    A mathematical description of the problem you would like to solve.

II.                  The data mining techniques.

III.                The datasets you are proposing to work on.

IV.                How you would evaluate your results.

4.       Timetables: propose a timetable to finish the project before due date on May 1st.  

5.       Reference: all reference you will be using. (Please follow the typical format you’ve seen in papers)

 

Project Report Guidelines (Due May 1st)

Format: at least 10 pages, single space, not including reference.

Your project report is expected to be a significant expansion of your project proposal.

1.       Project Description: Introduce the background and motivation of the project, high level data mining task that will be performed in this project and a brief summary of the results.

2.       Related Work: review at least a couple of existing works related to your project. A comprehensive discussion of the related work and how it supports what you would like to work on is necessary.

3.       Method

a.       A mathematical abstraction of the problem you would like to solve.

b.      The computational algorithm.

4.       Results

a.       Description of datasets used in your evaluation.

b.      Evaluation criteria.

c.       Summary of results.

5.       Your project experience

6.       Reference: all references (Please follow the typical format you’ve seen in papers)

Paper presentation Guidelines (Start on March 25th, Schedule TBD)

Format: 25 minutes presentation and 10 minutes discussion. Use as many graphics as you can!

Selection of papers: You are encouraged to select a set of papers that are relevant to your project. Please include at least one recent KDD or ICDM papers

1.       Motivation: why is it an interesting problem? (2-3 slides)

2.       Method (~10-15 slides)

a.       A mathematical abstraction of the problem you would like to solve

b.      The naïve approach to solve the problem

c.       How do the authors in the paper solve the problem.

3.       Results (~10 slides)

d.      Description of datasets used in the papers’ evaluation.

e.      Evaluation criteria.

f.        Summary of results.

 

4.       How do you think about the paper(s)? (1 slide)

5.       How do they relate to your project? (2-3 slides)

 

Reference for presentations

KDD

ICDM