One of the main goals of this course is to prepare you to develop spoken language processing systems of practical use. If you are interested in research, CS224S should also leave you well-qualified to do speech recognition and language understanding in an academic setting. The final project in this course will offer you an opportunity to do exactly this.
Proposal: Due at 11:59 PM PST on Wednesday, February 17
Milestone: Due at 11:59 PM PST on Wednesday, March 3
Presentation: During lecture times on Monday, March 15 and Wednesday, March 17
Report: Due at 11:59 PM PST on Monday, March 22. You may not use late days for this extended deadline.
Your first task is to pick a project topic. If you are looking for project ideas, please see the course project lecture and check Piazza for ideas from the staff and other teams. You can also go to either Andrew Maas’ or Mike Wu’s office hours for help generating and refining project ideas.
Many fantastic class projects come from students picking either an application or dataset that they are interested in and applying topics from the class to that task. Alternatively, if you are interested in specific set of techniques from the class we can help find a dataset or task where you can tractably explore those techniques.
Building a functioning dialog system for a specific task can also be a great project. While this may not be as research-oriented in the sense of running experiments, an excellent course project could build a useful dialog system (e.g. Alexa Skill) for a task and be able to demo it. This demo system needs to be paried with some empirical evaluation, training a component, or otherwise designing and testing the system in a thoughtful way.
If you are already working on a research project related to topics in class we encourage you to apply what you learned in class as a project. An excellent CS224S project will comprise a publishable or nearly-publishable piece of work. In previous years, some number of students continue working on their projects after completing CS224S and submit their work to a conference or journal. You can also look through the course projects from 2017.
For inspiration, you might also look at some recent spoken language understanding research papers. Topics covered in class span several conferences, but you can look at the recent pro- ceedings of Interspeech, ASRU, SigDial, EMNLP, ACL, NAACL, NeurIPS, ICLR, and ICML for research papers in this area.
This quarter, we have introduced a new dataset: the HarperValleyBank corpus. We encourage students to explore this in their projects. Stanford also has many datasets which you might use as a benchmark task for your project. You can browse the available datasets from LDC and similar in the NLP group inventory here. We also compiled a supplemental list here. Please post on Piazza if you need help getting access to a dataset.
Notes on Forming Projects
Team size. To facilitate overlap with other courses, students may do final projects solo or in teams of up to 3 people. We strongly recommend you do the final project in a team, as we expect significant experiment time vs. just setting up tools/data and running a baseline. We expect larger teams to undertake larger projects or more experiments.
Contribution. We expect each team member to make a significant contribution to the overall project. In the final report, include a Contribution section that describes what pieces each person contributed to the final project. We typically assign the same project grade to all team members, but we might differentiate in rare cases. You can contact us in confidence in the event of unequal contribution.
External collaborators. You can work on a project that has external (non CS224S student) collaborators, but you must make it clear in your final report which parts of the project were your work.
Sharing projects. You can share a single project between CS224S and another class, but we expect the project to be accordingly bigger, and you must declare that you are sharing the project in your project proposal.
Using external resources. You may use external tools/frameworks for building deep learning systems, doing speech processing, or building something in a framework like Alexa Skills Kit. Simply stitching together tools for a demo is not a sufficient project. Instead, use tools as a way to focus your effort on the meaningful research question or most critical capabilities for your project.
Projects will be evaluated based on:
Technical quality of the work. (Does the technical material make sense? Are the things tried reasonable? Are the proposed algorithms or applications clever and interesting? Do the authors convey novel insight about the problem and/or algorithms? Are any system demos compelling in their function/scope?)
Significance. (Did the authors find a dataset to appropriately test their hypothesis? Is this work likely to be useful and/or have impact? Could a demo system / tool be used in practice?)
Novelty of the work. (Does this solve a new problem/domain/task/dataset? Does it introduce a new approach? Is it a justifiable next step given previous work in this area?)
Clarity of the write-up. (Your write-up should clearly describe your task, dataset, solution approach, relevant related work, any experiments/test you tried, along with your conclusions.)
Try not to overthink these criteria nor worry too much if you’re not sure that you can do well on all of them. Just think of this as an “ideal” that you should aspire to (especially if your goal is to do publishable work).
Lastly, a few words of advice: Many of the best class projects come from students working on topics that they are excited about. So, pick something that you can get excited and passionate about! Be brave rather than timid, and do feel free to propose ambitious things that you are excited about. Finally, if you are not sure what would or would not make a good project, we encourage you to either post on Piazza or come to office hours to talk about project ideas.
Project Office Hours
Please use Andrew’s and Mike’s office hours only for project-related questions. For homework questions you can visit any other TA’s office hours. In addition to office hours, Mike will be opening 30 min slots from 10 AM to 3 PM every Friday to chat with teams about project related questions. Make a booking here.
All project groups will submit each of the following via Gradescope.
Project Proposal (10%)
Proposals are due at 11:59 PM PST on Wednesday, February 17th. A proposal should be a maximum of 500 words and include the following:
- The task you plan to work on.
- The dataset you plan to use.
- A sketch of your proposed approach/model.
- How do you plan to evaluate your approach.
If your proposed project will be done jointly with a different class project (with the consent of the other class instructor), your proposal must clearly say so. The teaching staff will review your proposal and contact you if we foresee any issues with your project.
Milestones are due at 11:59 PM PST on Wednesday, March 3. They should be submitted through Gradescope. A milestone report should include the following:
- Literature review of 2-5 relevant papers.
- If your approach requires collecting data, describe your data collection set up.
- If you are working with an existing dataset, show basic data exploration for the dataset and how you are/planning to train with / use the dataset.
- Baseline results for comparison to your ongoing work.
Project Presentation (10%)
Project presentation happens during lecture time. Each project team will submit a short video presenting the major results of their work. Your video can include a system demo, slides with voiceover, or screen capture of how your system works. A recorded Zoom session is sufficient for this video. During class time, we will play your video and people can ask questions in chat for you to discuss/answer. We will announce details of the final presentation sessions later in the quarter.
Final Report (70%)
Final project write-ups are due at 11:59 PM PST on Monday, March 22. Final project write-ups should be 5 pages of text and may include additional pages for appendices, figures, references, and everything else you choose to submit. The following is a suggested structure:
Abstract: This is an overview of the story of your project (the motivation, the findings, the impact). It should not be more than 300 words.
Introduction: This section introduces your problem and the overall plan for approaching your problem. Motivate why the problem your project is solving is important.
Related Works: This section discusses relevant literature for your project. What other approaches have tried to solve the same problem as your project? What are the differences in methodology?
Approach: This section details the framework of your project. Be specific, which means you might want to include equations, figures, plots, etc. Do not mention any experiments yet; this is your opportunity to present the models, algorithms, and new technical contributions in abstract.
Experiments: This section begins with what kind of experiments you are doing, what kind of dataset(s) you are using, and what is the way you measure or evaluate your results. It then shows in details the results of your experiments. By details, we mean both quantitative evaluations (show numbers, figures, tables, etc) as well as qualitative results (show images, example results, etc).
Conclusion: What have you learned? Summarize takeaways and suggest future directions.
References: This is absolutely necessary. Please follow a consistent citation scheme for your report template and include references to previous work related to your task, dataset, and/or modeling approaches.
Please use ACL2020 template for write-ups. The template can be downloaded here. If you do this work in collaboration with someone else, or if someone else (such as another professor) advises you on this work, your write-up must fully acknowledge their contributions.
After the class, we will post all the final write-ups online so that you can read about each others’ work. If you do not want your write-up to be posted online, please specifically mention it in the final paragraph of your write-up.
Previous Course Projects
List of ProjectsWhat's Up, Doc? A Medical Diagnosis Bot
Monica Agrawal, Janette Cheng, Caelin Tran
Alternative Political Speech Classification "Facts"
Tyler Dammann, Regina Nguyen
Improving Forced Alignments
Christina Ramsey, Frank Zheng
Predicting Assertiveness in Conversation Using Deep Learning
Isabella Cai, Catherina Xu, Grace B. Young
Dialogue Acts in Design Conversations
Ethan Chan, Aaron Loh, Connie Zeng
Style Transfer for Prosodic Speech
Anthony Perez, Chris Proctor, Archa Jain
Native Language Identification of Spoken Language Using Recurrent
Kai-Chieh Huang, Jennifer Lu, Wayne Lu
Reading Emotions from Speech using Deep Neural Networks
Anusha Balakrishnan, Alisha Rege
Automatic Lyrics Transcription by Separating Vocals from Background Music
Diveesh Singh, Helen Jiang, Mindy Yang
Applying a Recurrent Neural Network using Connectionist Temporal Classification to Automatic Recognition of Lyrics in Singing
Maneesh Apte, Matthew Chen, Teddy Morris-Knower, Shalom Rottman-Yang
Deep RNN Speech Recognition with Sub-Labels
Jiayu Wu, Yangxin Zhong, Qixiang Zhang
NATLID: Native Language Identification
Ankita Bihani, Anupriya Gagneja, Mohana Prasad Sathya Moorthy
Detecting Personality Traits in Conversational Speech
Liam Kinney, Anna Wang, Jessica Zhao
Compression of Deep Speech Recognition Networks
Stephen Koo, Priyanka Nigam, Darren Baker
Generating Adversarial Examples for Speech Recognition
Dan Iter, Jade Huang, Mike Jermann
Rappify: Adding Rhythm to Speech
Ian Torres, Jacob Conrad Trinidad
Identifying Confidence in Speech
Grady Williams, Bryan McLellan, Grant Sivesind
Battleship as a Dialog System
Gerry Meixiong, , Tony Tan-Torres, Jeffrey Yu
The Effect of Speech Disfluencies on Turn-Taking
Lucy Li, Kartik Sawhney, Divya Sain
Native Language Identification from Speech Transcriptions
Kent Blake, Greg Ramel, Matthew Volk
A Neural Network Approach to the Native Language Inference Task
Roger Chen, Kenny Leung
Improving Conversational Forced Alignment with Lexicon Expansion
Christopher Liu, , Stephanie Mallard, Ryan Silva
Detecting Lies via Speech Patterns
Amanda Chow, John Louie
pWAVE: A Novel Dataset for Emotional Confidence Detection
Sanjay Kannan, Rooz Mahdavian
Infer Your First Language from Your English
Qiwen Fu, Wei-ting Hsu, Yundong Zhang
Native Language Identification from i-vectors and Speech Transcriptions
Ben Ulmer, Aojia Zhao, Nolan Walsh
Monaural Source Separation Using Neural Networks
Simon Kim, Mark Kwon, Sunmi Lee
Neural Lie Detection with the CSC Deceptive Speech Dataset
Shloka Desai, Maxwell Siegelman, Zachary Maurer
Native Language Identification through Speech
Delenn Chin, Kevin Chen, David Morales
Dialogue System for Restaurant Reservations using Hybrid Code Network
Charles Akin-David, David Xue, Evelyn Mei
VitiBot: A Dialog System Sommelier
Stephanie Tang, Ivan Suarez, Jim Andress
Applying Artistic Style Transfer to Natural Language
Thaminda Edirisooriya, Morgan Tenney
End-to-end neural networks for subvocal speech recognition
Pol Rosello, Pamela Toman, Nipun Agarwala
Frank Cipollone, Hugo Clifford Kitano, Mila Faye Schultz
Pitch Perfect: Predicting Startup Funding Success Based on Shark Tank Audio
Shubha Raghvendra, Jeremy Wood, Minna Xiao
Improving Acoustic Models for Enriched Lexicons
Vivian Hsu, Addison Leong, Antariksh Mahajan
Text-to-speech Synthesis System based on Wavenet
Yuan Li, Xiaoshi Wang, Shutong Zhang
Classification and Recognition of Stuttered Speech
Manu Chopra, Kevin Khieu, Thomas Liu
Convolutional Neural Networks and Grammar Rules Analysis for Speech-based Native Language Identification
Dilsher Ahmed, Long-Huei Chen, Ayooluwakunmi Jeje
Statistial Methods for Native Language Identification
Tony Bruess, Frank Fan, Brexton Pham
End-to-End Neural Speech Synthesis
Storytime - End to end neural networks for audiobooks
Pierce Freeman, Ethson Villegas, John Kamalu
Accent Conversion Using Artificial Neural Networks
Amy Bearman, Kelsey Josund, Gawan Fiore
Learning to Recognize Speech From Chaotically Synthesized Data
Faraz Bonab, Samuel Ginn
Applying Backoff to Concatenative Speech Synthesis
Lily Liu, Luladay Price, Andrew Zhang
Deep Learning Approaches for Online Speaker Diarization
Chaitanya Asawa, Nikhil Bhattasali, Allan Jiang
Auditory Deep Q Networks
Austin Ray, Do-Hyoung Park, Vignesh Venkataraman
Detecting and Artistically Representing Romantic Compatibility in Human Dialogue
Chris Salguero, Anna Teixeira, Ramin Ahmar
Modeling Intonation in Text-to-Speech Synthesis with a Bidirectional Long Short-Term Memory Recurrent Neural Network
Kevin Garbe, Aleksander Glowkal
Mark My Words! End-to-End Memory-Enhanced Neural Architectures for Automatic Speech Recognition
Amani Peddada, Lindsey Kostas