CS221 Final Project Guidelines
The final project is an opportunity to take some of the AI techniques that
you've learned in this class and apply them on some task you care about.
In short, you will (i) formulate the task, (ii) implement a few algorithms
of your choice, (iii) run them on some concrete datasets, and most importantly,
(iv) compare/constrast the algorithms, explaining their strengths and
weaknesses.
You must work in groups of 1, 2, or 3 people. You can use the same project for
other classes (e.g., CS229, CS224N, etc.), but the actual report must be distinct, and
the CS221 part of the project should be clearly stated.
:
- abstract (1 paragraph, due Nov 13): summary of what you're going to do and who you're going to work with
- progress report (1 page, due Nov 27): description of the data, the methods, but not the experiments
- final report (3 pages, due Dec 6): see below for guidelines
:
This is a suggestion of how to approach the final project.
- Pick a topic that you're passionate about (e.g., food, language, energy,
politics, sports, card games, robotics).
As a running example, say we're interested in how people read the news to get their information.
- Brainstorm to find some tasks on that topic: ask wouldn't it be nice to
have a system that does X or wouldn't it be nice to understand X? A good
task should not be too easy (sorting a list of numbers) and not too hard
(building a system that can automatically solve CS221 homeworks).
Please come to office hours for feedback on finding the right balance.
Let's focus on recommending news to people.
- Define the task you're trying to solve clearly and convince yourself (and a few friends) that it's important/interesting.
Also state your evaluation metric – how will you know if you have succeeded or not?
Concentrate on a small set of popular news sites: nytimes.com, slashdot.org, sfgate.com, onion.com, etc.
For each user and each day, assume we have acquired a set of articles that the user is interested in reading (training data).
Our task is to predict for a new day, given the full set of articles, the best subset to show the user;
evaluation metric would be prediction accuracy.
- Gather and clean the necessary data (this might involve scraping websites, filtering outliers, etc.).
This step can often take an annoyingly large amount of time if you're not careful, so do not try to get bogged down here.
Simplify the task or focus on a subset of the data if necessary.
You might find yourself adjusting the task you're trying to solve based on new empirical insights you get by looking at the data.
Notice that even if you're not doing machine learning, it's necessary to have data for evaluation purposes.
Write some scripts that download the RSS feeds from the news sites, run
some basic NLP processing (e.g., tokenization), say, using NLTK.
- Implement a baseline algorithm. For a classification task, this would be
always predicting the most common label. If your baseline is too high, then
your task is probably too easy.
One baseline is to always produce the first document from each news site.
- Implement at least two complementary algorithms. What are the different
tradeoffs of each? Can they be combined into a new algorithm that draws from
the advantages of both (this is usually a good way of deriving new
algorithms)? Remember to try as much as possible to separate model (what you
want to compute) from algorithms (how you do it).
You might train a classifier to predict, for each news article, whether to include it or not.
You might try to include these predictions as factors in a weighted CSP and
try to find a set of articles that balance diversity and relevance.
- Perhaps the most important part of the project is the final step, which
is to analyze the results. It's more important that you do a thorough
analysis and interpret your results rather than implement a huge number of
complicated heuristics in trying to eke out the maximum performance.
The analysis should begin with basic facts, e.g., how much time/memory did
the algorithm take, how does the accuracy vary with the amount of training
data? What are the instances that your system does the worst on? Give concrete examples
and try to understand why? Is there a bottleneck? Is it due to lack of training data?
:
- Abstract: Summary of the project (no more than 300 words)
- Introduction: Clearly define the task you're trying to solve. Why is it
interesting/important? What are the challenges (be specific)? What
previous/related work has been done on it?
- Approach: Describe a baseline method and two different methods for solving
the task (the method should include both the model and the algorithm). Why
do these methods make sense for this task? What are the strengths and
weaknesses?
- Experiments: Clearly describe the evaluation methodology. Describe the
result of running the methods described in the previous section on the task.
Compare and contrast both the time/space complexity and accuracy of the
results. If there is randomness, run your experiments multiple times with
different random seeds. What are the remaining errors? Perform an error
analysis - give concrete examples and explain the general phenomena.
- Conclusion: What did you learn from this project? What would the next steps be?
- References: citations of related work
- Appendix: any derivations of details of the experimental setup should go in the appendix.
:
- Task definition: is the task precisely defined and does the formulation make sense?
- Methods: was a baseline method and two complementary methods described clearly, well justified, and tried?
Don't describe the methods in general; describe precisely how they apply to your method (what are variables, factors, states, etc.)?
- Data and experiments: was the data explained clearly, and were systematic
experiments done, and concrete results reported?
- Analysis: did you interpret the results and try to explain why things worked (or didn't work) the way they did? Do you show concrete examples?
- Extra credit: does the project and present interesting and novel ideas (i.e., would this be publishable at a top tier venue)?
You should submit a PDF with your writeup, as well as supplementary material
(code and data and a README file documenting what everything is, and what
commands you ran). You don't have to implement everything from scratch, but if
you use standard packages, say what commands you ran.
If you're working in a large codebase or have too much
data, just choose the most relevant subset (so the code doesn't have to run, but it should be clear what the components do).
:
:
:
- Predict the price of airline ticket prices given day time, location, etc. (ML)
- Solve Sudoku puzzles or crossword puzzles (CSPs)
- Find the optimal way to get from one place on Stanford campus to another place, taking into account
uncertain travel times due to traffic. (search, MDPs)
- Predict the amount of electricity consumed over the course of a day
- Intelligent auto-completion of code when you're programming
- A smarter source code search
- Break substitution codes based on knowledge of English
- Use your smartphone to gather data and predict your location
- Based on sensor readings from your smartphone, predict whether the phone should be switched off
This is an example of an project. Reddit (http://reddit.com/) is a social news
website where users submit links or text posts, and other users either "upvote"
or "downvote" submissions. Here are some possible tasks
- Easy: Create a classifier to predict the number
of upvotes of a Reddit text post, based on features from the text and
meta-information from the post. This task should use sophisticated features,
and should demonstrate both a theoretical and working knowledge of multiple
(i.e., at least two) classifiers. Some effort to counter interesting challenges
(e.g., label bias) should be shown. The data should be on the order of
thousands to hundreds of thousands of posts.
- Moderate: Create a model for the life of a Reddit post, predicting its upvotes
at any time step based on both the post’s content and its popularity so far.
Whereas the previous task classified a post at its birth, this task is allowed
to look at the history of votes given so far -- this is most useful for
classifying posts in their infancy. This task should us sophisticated features,
and demonstrate both a theoretical and working knowledge of the tool used for
the task (e.g., a graphical model). The data should be on the order of hundreds
to tens of thousands of posts.
- Challenge: Generate automatic posts for particular "subreddit" given a
description of a topic. For example, the text for /r/atheism versus
/r/christianity should be substantially different given the topic "God." As an
example formulation: your training data would be a corpus of text posts from a
number of subreddits. Your input would be a subreddit name, and a word
characterizing a topic. Your output would be a sampled sentence. You might find grpahical models
useful. The complexity of the distribution should suggest that you should
make independence assumptions and break the distribution up into multiple
parts.
You are expected to create a model which will give qualitatively significant
results — the user should be able to guess the input from the output, rather
than the output just being noise. The fluency of the posts should be high,
though of course the semantics of the post can be complete nonsense (beyond
being recognizable as about a topic). You should show a working but not
necessarily theoretical understanding of all the components used.