CS221 Final Project Guidelines

The final project is an opportunity to take some of the AI techniques that you've learned in this class and apply them on some task you care about. In short, you will (i) formulate the task, (ii) implement a few algorithms of your choice, (iii) run them on some concrete datasets, and most importantly, (iv) compare/constrast the algorithms, explaining their strengths and weaknesses.

You must work in groups of 1, 2, or 3 people. You can use the same project for other classes (e.g., CS229, CS224N, etc.), but the actual report must be distinct, and the CS221 part of the project should be clearly stated.

Milestones:

abstract (1 paragraph, due Nov 13): summary of what you're going to do and who you're going to work with
progress report (1 page, due Nov 27): description of the data, the methods, but not the experiments
final report (3 pages, due Dec 6): see below for guidelines

Strategy:

This is a suggestion of how to approach the final project.

Pick a topic that you're passionate about (e.g., food, language, energy, politics, sports, card games, robotics). As a running example, say we're interested in how people read the news to get their information.
Brainstorm to find some tasks on that topic: ask wouldn't it be nice to have a system that does X or wouldn't it be nice to understand X? A good task should not be too easy (sorting a list of numbers) and not too hard (building a system that can automatically solve CS221 homeworks). Please come to office hours for feedback on finding the right balance. Let's focus on recommending news to people.
Define the task you're trying to solve clearly and convince yourself (and a few friends) that it's important/interesting. Also state your evaluation metric – how will you know if you have succeeded or not? Concentrate on a small set of popular news sites: nytimes.com, slashdot.org, sfgate.com, onion.com, etc. For each user and each day, assume we have acquired a set of articles that the user is interested in reading (training data). Our task is to predict for a new day, given the full set of articles, the best subset to show the user; evaluation metric would be prediction accuracy.
Gather and clean the necessary data (this might involve scraping websites, filtering outliers, etc.). This step can often take an annoyingly large amount of time if you're not careful, so do not try to get bogged down here. Simplify the task or focus on a subset of the data if necessary. You might find yourself adjusting the task you're trying to solve based on new empirical insights you get by looking at the data. Notice that even if you're not doing machine learning, it's necessary to have data for evaluation purposes. Write some scripts that download the RSS feeds from the news sites, run some basic NLP processing (e.g., tokenization), say, using NLTK.
Implement a baseline algorithm. For a classification task, this would be always predicting the most common label. If your baseline is too high, then your task is probably too easy. One baseline is to always produce the first document from each news site.
Implement at least two complementary algorithms. What are the different tradeoffs of each? Can they be combined into a new algorithm that draws from the advantages of both (this is usually a good way of deriving new algorithms)? Remember to try as much as possible to separate model (what you want to compute) from algorithms (how you do it). You might train a classifier to predict, for each news article, whether to include it or not. You might try to include these predictions as factors in a weighted CSP and try to find a set of articles that balance diversity and relevance.
Perhaps the most important part of the project is the final step, which is to analyze the results. It's more important that you do a thorough analysis and interpret your results rather than implement a huge number of complicated heuristics in trying to eke out the maximum performance. The analysis should begin with basic facts, e.g., how much time/memory did the algorithm take, how does the accuracy vary with the amount of training data? What are the instances that your system does the worst on? Give concrete examples and try to understand why? Is there a bottleneck? Is it due to lack of training data?

Suggested writeup outline (3 pages + appendix):

Abstract: Summary of the project (no more than 300 words)
Introduction: Clearly define the task you're trying to solve. Why is it interesting/important? What are the challenges (be specific)? What previous/related work has been done on it?
Approach: Describe a baseline method and two different methods for solving the task (the method should include both the model and the algorithm). Why do these methods make sense for this task? What are the strengths and weaknesses?
Experiments: Clearly describe the evaluation methodology. Describe the result of running the methods described in the previous section on the task. Compare and contrast both the time/space complexity and accuracy of the results. If there is randomness, run your experiments multiple times with different random seeds. What are the remaining errors? Perform an error analysis - give concrete examples and explain the general phenomena.
Conclusion: What did you learn from this project? What would the next steps be?
References: citations of related work
Appendix: any derivations of details of the experimental setup should go in the appendix.

Grading rubric:

Task definition: is the task precisely defined and does the formulation make sense?
Methods: was a baseline method and two complementary methods described clearly, well justified, and tried? Don't describe the methods in general; describe precisely how they apply to your method (what are variables, factors, states, etc.)?
Data and experiments: was the data explained clearly, and were systematic experiments done, and concrete results reported?
Analysis: did you interpret the results and try to explain why things worked (or didn't work) the way they did? Do you show concrete examples?
Extra credit: does the project and present interesting and novel ideas (i.e., would this be publishable at a top tier venue)?

You should submit a PDF with your writeup, as well as supplementary material (code and data and a README file documenting what everything is, and what commands you ran). You don't have to implement everything from scratch, but if you use standard packages, say what commands you ran. If you're working in a large codebase or have too much data, just choose the most relevant subset (so the code doesn't have to run, but it should be clear what the components do).

Tasks/datasets:

Kaggle is a website that runs machine learning competitions for predicting for monetary reward.
Past CS229 projects: examples of machine learning projects that you can look at for inspiration. Of course your project doesn't have to use machine learning – it can draw from other areas of AI.
MLcomp: a website with various learning algorithms and datasets, which you can run directly on the site or download. If you have a new dataset, you can upload it and run it on many different algorithms for a quick comparison.
SAT competition: satisfiability problems are a special important class of CSPs.
Natural language processing datasets: links to many NLP datasets for different languages.
Datasets from CS224W (Social and Information Network Analysis):
PacMan competition: develop agent to play in a PacMan competition.
Poker hand dataset

Useful libraries:

scikit-learn: machine learning library implemented in Python
Natural language Toolkit (NLTK): a set of tools for basic NLP in Python
OpenCV: Python libraries for simple computer vision

Some project ideas:

Predict the price of airline ticket prices given day time, location, etc. (ML)
Solve Sudoku puzzles or crossword puzzles (CSPs)
Find the optimal way to get from one place on Stanford campus to another place, taking into account uncertain travel times due to traffic. (search, MDPs)
Predict the amount of electricity consumed over the course of a day
Intelligent auto-completion of code when you're programming
A smarter source code search
Break substitution codes based on knowledge of English
Use your smartphone to gather data and predict your location
Based on sensor readings from your smartphone, predict whether the phone should be switched off

Example project: Reddit

This is an example of an project. Reddit (http://reddit.com/) is a social news website where users submit links or text posts, and other users either "upvote" or "downvote" submissions. Here are some possible tasks

Easy: Create a classifier to predict the number of upvotes of a Reddit text post, based on features from the text and meta-information from the post. This task should use sophisticated features, and should demonstrate both a theoretical and working knowledge of multiple (i.e., at least two) classifiers. Some effort to counter interesting challenges (e.g., label bias) should be shown. The data should be on the order of thousands to hundreds of thousands of posts.
Moderate: Create a model for the life of a Reddit post, predicting its upvotes at any time step based on both the post’s content and its popularity so far. Whereas the previous task classified a post at its birth, this task is allowed to look at the history of votes given so far -- this is most useful for classifying posts in their infancy. This task should us sophisticated features, and demonstrate both a theoretical and working knowledge of the tool used for the task (e.g., a graphical model). The data should be on the order of hundreds to tens of thousands of posts.
Challenge: Generate automatic posts for particular "subreddit" given a description of a topic. For example, the text for /r/atheism versus /r/christianity should be substantially different given the topic "God." As an example formulation: your training data would be a corpus of text posts from a number of subreddits. Your input would be a subreddit name, and a word characterizing a topic. Your output would be a sampled sentence. You might find grpahical models useful. The complexity of the distribution should suggest that you should make independence assumptions and break the distribution up into multiple parts. You are expected to create a model which will give qualitatively significant results — the user should be able to guess the input from the output, rather than the output just being noise. The fluency of the posts should be high, though of course the semantics of the post can be complete nonsense (beyond being recognizable as about a topic). You should show a working but not necessarily theoretical understanding of all the components used.