Overview
The project is a chance to explore RL in more depth. Novel research
ideas are welcome but are not expected nor required to receive full credit. In addition,
projects do not always work: in such cases, a careful illustration (using theoretical
proofs and/or experimental results, plus a discussion) of why the proposed idea did not
work and/or was substantially more work than anticipated is encouraged. If the reason is
that not enough coding was done, this will not be considered a compelling reason.
Project Ideas
To give you some project ideas, we are sharing some of the projects from previous years
below:
-
Using Transfer Learning Between Games to Improve Deep Reinforcement
Learning Performance and Stability, Chaitanya Asawa, Christopher Elamri, David Pan.
[Poster]
[Paper]
-
Mastering the game of Go from scratch, Michael Painter, Luke Johnston.
[Poster]
[Paper]
-
Comparison of Control Methods: Learning Robotics Manipulation with Contact Dynamics,
Keven Wang, Bruce Li.
[Paper]
-
Information Directed Reinforcement Learning, Andrea Zanette, Rahul Sarkar.
[Poster]
[Paper]
-
Reward Backpropagation Prioritized Experience Replay, Yangxin Zhong, Borui Wang, Yuanfang Wang.
[Poster]
[Paper]
-
Online Learning for Causal Bandits, Vinayak Sachidananda, Prof. Emma Brunskill.
[Poster]
[Paper]
-
DeepShuai: Deep Reinforcement Learning based Chinese Chess Player, Chengshu Li, Kedao Wang, Zihua Liu.
[Poster]
[Paper]
-
EteRNA-RL: Using reinforcement learning to design RNA secondary structures, Isaac Kauvar, Ethan Richman, William E Allen.
[Poster]
[Paper]
-
Adversarially Robust Policy Learning through Active Construction of Physically-Plausible Perturbations,
Ajay Mandlekar, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese.
[Poster]
[Paper]
The following guest lecture slides from prior class offerings may also help you in generating good project ideas.
-
Cooperative Inverse Reinforcement Learning, Dylan Hadfield-Menell.
[Slides]
-
Maximum Entropy Framework: Inverse RL, Soft Optimality, and More, Chelsea Finn and Sergey Levine.
[Slides]
-
Reinforcement Learning – Policy Optimization Pieter Abbeel.
[Slides]
-
Safe Reinforcement Learning, Philip S. Thomas.
[Slides]
You may also consider browsing through the RL publications listed below, to get more ideas.
- RLDM: Multi-disciplinary Conference on Reinforcement Learning and Decision Making
- AAMAS: International Conference on Autonomous Agents and MultiAgent Systems
- NIPS: Neural Information Processing Systems
- ICML: International Conference on Machine Learning
- ICLR: International Conference on Representations
- arXiv: e-prints archive
- Stanford AI Group: New and relevant papers from local faculty
- Kaggle: An online machine learning competition website
We also encourage course projects that try to reproduce recent results in a RL paper,
for example as promoted in the ICLR 2018 Reproducibility Challenge.
Research reproducibility is an important issue in machine learning, and the goal of
a reproducibility project should be to provide detailed feedback to the authors of a RL
paper about how
reproducible their results are. See the challenge page for more information (although
the deadline for submission to the challenge has passed, this is still a great project
idea).
Important Dates and Times
Date | Time | Event | Late Day Policy |
May 5 |
6 PM |
Initial project proposal |
2 late days allowed. See Late Day Policy. |
May 22 |
6 PM |
Project milestone |
2 late days allowed. See Late Day Policy. |
Jun 5 |
1-4 PM, location TBD |
Poster Session |
No late days allowed. See Late Day Policy. |
Jun 11 |
6 PM |
Final report |
No late days allowed. See Late Day Policy. |
Project Proposal
The project proposal should be about 200-400 words, include the names of the project team members
and the project mentor (someone who agrees to give you feedback). The mentor can be one of the
course staff or someone external to the class. There will be a list of staff interests at the bottom of the page
to help you find a mentor. If some staff members receive a large number of requests, we may
balance projects to ensure that everyone has a mentor that can give them periodic feedback.
The proposal should also include a brief overview of the
proposed project and project plan that includes the following :
-
What is the problem that you will be investigating ? Why is it interesting ?
-
If relevant, what data, simulator or real world RL domain will you be looking at ? If you
are collecting new datasets, how do you plan to collect them ?
-
What method, algorithm or theoretical analysis are you proposing ? If there are existing
implementations, will you use them and how ? How do you plan to improve or modify such
implementations ? If you are addressing a theoretical question, how do you plan to make progress ?
-
What literature have you already surveyed or will be examining to provide context and background ?
-
How will you evaluate your results ? Qualitatively, what kind of results do you expect
(e.g. plots or figures) ? Quantitatively, what kind of analysis will you use to evaluate and/or
compare your results (e.g. what performance metrics or statistical tests) ?
Submit your project proposal by following the Submission Instructions.
For the late day policy please see here.
Project Milestone
Your project milestone should be between 2 - 3 pages using the
ICML template.
The following is a suggested structure for your report:
- Title, Author(s).
- Introduction: this section introduces your project, why it’s important or interesting.
-
Approach: Describe the current steps you have done. If you are implementing an algorithm,
you should have started the implementation and ideally have some early stage results.
Describe precisely the remaining work that you expect to complete.
Submit your project milestone by following the
Submission Instructions.
For the late day policy please see here.
Final Report Submission
Your final report should be between 6 - 8 pages using the
ICML template. After the class,
we will post all the final reports online so that you can read about each others work. If you do not want
your final report to be posted online, then please let us know when you submit your writeup.
You should include a brief statement on the contributions of different members of the team in the report.
Team members will normally g et the same grade, but we reserve the right to differentiate in egregious cases.
Submit your final report by following the
Submission Instructions.
For the late day policy please see
here.
Report. The following is a suggested structure for the report:
- Title, Author(s).
- Abstract: It should not be more than 300 words.
- Introduction: This section introduces your problem, and the overall plan for approaching your problem.
- Background/Related Work: This section discusses relevant literature for your project.
- Approach: Algorithms used or developed.
- Theoretical results (if relevant): Include assumptions, proof sketches.
- Experiment results (if relevant): Details on experiments done. The goal is to describe in enough detail that the results are reproducible.
- Conclusion: What have you learned ? Suggest future ideas.
- References: This is absolutely necessary.
Supplementary Material is not counted towards your 6-8 page limit.
Examples of things to put in your supplementary material:
- Full proof details (if doing a project with theoretical results).
- Cool videos, interactive visualizations, demos, etc. (optional)
Examples of things to not put in your supplementary material:
- All used submodules (Theano, Caffe, CoreNLP) source code.
- Any code that is larger than 1MB.
- Model checkpoints.
- A computer virus.
Additional Submission Requirements
Please also include the following when you submit your project report to Gradescope
- List in your report PDF all the authors that contributed to your work, at a co-author level. This will often include people not enrolled in CS234, such as research collaborators (other graduate students, undergraduates, postdocs, research fellows or faculty advisors that helped develop your algorithm, contributed code to the project, engaged in significant idea development or feedback). All authors and their institutional/organizational affiliation and email should be listed directly under the title in your PDF, and please include a footnote to specify which authors were or were not enrolled in CS234.
- If you have some project contributors that are not part of the class please also include at the end of your manuscript, the following information:
- Specify the role and participation of non-CS 234 contributors (discussion, writing code, paper writing, statistical analysis, etc). As an example, see the author contributions for AlphaGo (Nature, 2016).
- Indicate if the project has been submitted to a peer-reviewed conference or journal. If so, please Include the full name and acronym of the conference (if applicable). This is applicable only if your paper has already been submitted to the journal/conference before our CS234 report deadline.
- Information about how to access your source code used in your project. This includes both code you developed for the project, any CS234 code, and any open-source software. If you do not wish to release your source code publically, you can create a private gitrepo that is shared only with the course staff. If your code relies on private organizational software that cannot be shared beyond the organization, contact us via Piazza by the project milestone deadline.
Collaboration Policy and Honor Code
Projects can be done in groups up to 3. We strongly encourage you to do groups in 3 — we have a limited number of staff,
and doing projects in groups of 3 will allow us to give you and your classmates higher quality
feedback on your projects!
If you are doing this project jointly with another class, you must inform us and check with the other instructors as well to get their consent. You also need to specify if there are other partners that are not in CS 234 that you are working with, and be able to
describe the aspects of the project that are relevant to CS 234. You are welcome to combine this with a RL-relevant research project (such as for an honors thesis or for research assistantship), and again in this case you should check that this is acceptable to any other collaborators involved, and clearly indicate who else is involved in the project, and what your role in the project is. If you have any questions about this, please just reach out to us on piazza.
You may use any existing code, libraries, etc. and consult any papers, books, online references,
etc. for your project. However, you must cite your sources in your writeup and clearly indicate which
parts of the project are your contributions and which parts were implemented by others. Under no
circumstances may you look at another group’s code or incorporate their code into your project.
Also read the section on Academic Collaboration and Misconduct for
an overview of the collaboration policy and academic integrity standards expected in general.
Grading Policy
We expect that the project size and contribution will scale with the number of team members: for example, projects done with a team of 3 should have a stronger report and results than those done with a team of 2 people. We also ask for a statement of what each team member contributed to a team project. Team members will typically get the same grade, but we reserve the right to differentiate in extreme cases of unequal contribution. You can contact us in confidence in the event of unequal contributions.