Project Overview
The course project is an opt-in replacement for assignment 4. If students complete both a project and assignment 4, we will award points for whichever results in the higher grade. Students must submit a proposal and receive staff approval in order to work on a course project to present and submit instead of assignment 4.
One of the main goals of this course is to prepare students to develop spoken language processing systems for real-world use. For those interested in research, CS224S helps build skills to work with spoken language tools in a research setting, or invent new approaches for audio and spoken language understanding. The final project in this course allows you to pursue research-oriented outcomes, or develop systems for a spoken language product prototype.
Our core guiding philosophy for projects: Build something that you feel proud of. Whether your goal is a research paper or new product, and whatever your experience so far, choose a project direction and scope so that you can enthusiastically tell stories about this project in future job interviews or conversations with peers. We are here to help you, and we hate to see projects that students quickly discard or forget about after the quarter.
Project Topics
Your first task is to pick a project topic. If you are looking for project ideas, please see the course project lecture and discuss ideas on the discussion forum or office hours. You can also look through the course projects from 2017.
Task / Application Project
Many fantastic class projects come from students picking either an application or dataset that they are interested in and applying topics from the class to that task. Alternatively, if you are interested in specific set of techniques from the class we can help find a dataset or task where you can tractably explore those techniques.
Building a functioning spoken dialog system for a specific task can also be a great project. While this may not be as research-oriented in the sense of running experiments with clear benchmarks and metrics, an excellent course project could build a useful dialog system (e.g. Alexa Skill) for a task and demonstrate its usage. This demo system needs to be paried with some empirical evaluation, training a component, or otherwise designing and testing the system in a thoughtful way.
Research Project
If you are already working on a research project related to topics in class we encourage you to apply what you learned in class as a project. An excellent CS224S project will comprise a publishable or nearly-publishable piece of work. In previous years, some number of students continue working on their projects after completing CS224S and submit their work to a conference or journal.
Recent Works
For inspiration, you might also look at some recent spoken language understanding research papers. Topics covered in class span several conferences, but you can look at the recent pro- ceedings of Interspeech, ASRU, SigDial, EMNLP, ACL, NAACL, NeurIPS, ICLR, and ICML for research papers in this area.
Datasets
Be careful of choosing a project where no available dataset exists for your experiment, or a project that requires data collection before you can start work. Unless your project is specifically focused on creating a new dataset, we consider data collection and preparation a small part of project set up. Generally we encourage students to ensure that data availability will not block progress on experiments and system development.
We developed the HarperValleyBank corpus for this course to simulate call center spoken language experiments while being small enough for rapid experiment iteration. Stanford has many datasets which you might use as a benchmark task for your project, or look for open source / publicly available datasets in audio or spoken language. You can browse the available datasets from LDC and similar in the NLP group inventory here. We also compiled a supplemental list here. There are publicly available datasets listed on HuggingFace for spoken language also. Please post on the discussion forum if you need help getting access to a dataset.
Building on homework systems as a project
Our homeworks provide some starting experiment code and tools for speech recognition, synthesis, and using audio foundation models. You can these tools as a starting point to build your project – we give you permission to reuse homework code as part of your project. If you are starting with one of these established baseline systems for a project, you should clearly state your project’s research contributions relative to what was provided, and ensure your project has sufficient scope. Include what techniques you plan to investigate as part of your project proposal when starting from a homework system for course project experiments.
Project Logistics
Notes on Forming Projects
-
Team size. To facilitate overlap with other courses, students may do final projects solo or in teams of up to 3 people. We expect larger teams to undertake larger projects or more experiments.
-
Contribution. We expect each team member to make a significant contribution to the overall project. In the final report, include a Contribution section that describes what pieces each person contributed to the final project. We typically assign the same project grade to all team members, but we might differentiate in rare cases. You can contact us in confidence in the event of unequal contribution.
-
External collaborators. You can work on a project that has external (non-CS224S student) collaborators, but you must make it clear in your final report which parts of the project were your work. If you use data, code, or APIs from external organizations please be sure to appropirately cite and acknowledge this in your final report and poster.
-
Sharing projects. You can share a single project between CS224S and another class, but we expect the project to be accordingly bigger, and you must declare that you are sharing the project in your project proposal.
-
Using external resources. You may use external tools/frameworks for building deep learning systems, doing speech processing, or building something in a framework like Alexa Skills Kit. Simply stitching together tools for a demo is not a sufficient project. Instead, use tools as a way to focus your effort on the meaningful research question or most critical capabilities for your project.
Project Evaluation
Projects will be evaluated based on:
-
Technical quality of the work. (Does the technical material make sense? Are the things tried reasonable? Are the proposed algorithms or applications clever and interesting? Do the authors convey novel insight about the problem and/or algorithms? Are any system demos compelling in their function/scope?)
-
Significance. (Did the authors find a dataset to appropriately test their hypothesis? Is this work likely to be useful and/or have impact? Could a demo system / tool be used in practice?)
-
Novelty of the work. (Does this solve a new problem/domain/task/dataset? Does it introduce a new approach? Is it a justifiable next step given previous work in this area?)
-
Clarity of the write-up. (Your write-up should clearly describe your task, dataset, solution approach, relevant related work, any experiments/test you tried, along with your conclusions.)
Try not to overthink these criteria nor worry too much if you’re not sure that you can do well on all of them. Just think of this as an “ideal” that you should aspire to (especially if your goal is to do publishable work).
Lastly, a few words of advice: Many of the best class projects come from students working on topics that they are excited about. So, pick something that you can get excited and passionate about! Be brave rather than timid, and do feel free to propose ambitious things that you are excited about. Finally, if you are not sure what would or would not make a good project, we encourage you to either post on the forum or come to office hours to talk about project ideas!
Project Office Hours
Please use mainly Andrew’s office hours for project-related questions. For homework questions you can visit any TA’s office hours. In addition to office hours. We will offer extra project office hours as necessary throughout the quarter.
Deliverables
All project groups will submit each of the following via Gradescope.
Project Proposal (opt-in gateway to submitting a project)
A proposal should be a maximum of 500 words and include the following.
- The task you plan to work on.
- The dataset you plan to use.
- A sketch of your proposed approach/model.
- How do you plan to evaluate your approach.
- References to at least 2 papers, datasets, or relevant systems (reference section not part of word count)
If you do not have full answers to these questions you can describe what specifics you have and how you are working to establish datasets, baselines, or modeling approaches that might not be clear yet.
If your proposed project will be done jointly with a different class project (with the consent of the other class instructor), your proposal must clearly say so. The teaching staff will review your proposal and contact you if we foresee any issues with your project.
You must receive a “yes” from teaching staff on a proposal before you can drop assignment 4. In some cases the teaching staff will provide feedback or request discussion to ensure a project is set up for success before approving
Project spotlight presentations / short video (20%)
We reserve the final day of lecture for project groups present their project, findings, and ongoing progress in a short ~2 minute “spotlight talk” format. Groups will have time to show 1-2 slides to introduce the project, and show any relevant demos / audio samples where appropriate. The spotlight talk should cover:
- Overview of your motivation / task. Why did you choose this project?
- What data did you work with? What is your formal task/hypothesis, or the narrow goal you spent time trying to achieve
- What initial experiments did you try? What were your results and what did you learn from those experiments?
- With the remaining time, what are you currently focused on trying/building? It’s fine to speculate about results / capabilities you’re building now, even if they don’t make it into the final report.
- If applicable, demos of your system/results or playing audio examples while relevant is always great!
For remote students, and in cases where project groups are unable to present during lecture time, project teams may instead submit a ~2 minute video presentation as an update on their work (due on the same day as project presentations).
Final Report (80%)
Final project write-ups should be 2-4 pages of text and may include additional pages for appendices, figures, references, and everything else you choose to submit. The following is a suggested structure:
-
Title, Author(s)
-
Abstract: This is an overview of the story of your project (the motivation, the findings, the impact). It should not be more than 300 words.
-
Introduction: This section introduces your problem and the overall plan for approaching your problem. Motivate why the problem your project is solving is important.
-
Related Works: This section discusses relevant literature for your project. What other approaches have tried to solve the same problem as your project? What are the differences in methodology?
-
Approach: (a.k.a. Methods) This section details the framework of your project. Be specific, which means you might want to include equations, figures, plots, etc. Do not mention any experiments yet; this is your opportunity to present the models, algorithms, and new technical contributions in abstract.
-
Experiments: This section begins with what kind of experiments you are doing, what kind of dataset(s) you are using, and what is the way you measure or evaluate your results. It then shows in details the results of your experiments. By details, we mean both quantitative evaluations (show numbers, figures, tables, etc) as well as qualitative results (show images, example results, etc).
-
Conclusion: What have you learned? What is the overall outcome from your experiments or system building?
-
References: This is absolutely necessary. Please follow a consistent citation scheme for your report template and include references to previous work related to your task, dataset, and/or modeling approaches.
-
Contributions: List each project team member and briefly summarize their contribution to the project. Also include any external collaborators and their role in the project. If you do this work in collaboration with someone else, or if someone else (such as another professor) advises you on this work, your write-up must fully acknowledge their contributions.
Please use the ACL paper template for write-ups. The template can be downloaded here for LaTex or Overleaf.
After the class, we will post all the final write-ups online so that you can read about each others’ work. If you do not want your write-up to be posted online, please specifically mention it in the final paragraph of your write-up.
Previous Course Projects
Spring 2017
List of Projects
What's Up, Doc? A Medical Diagnosis BotMonica Agrawal, Janette Cheng, Caelin Tran
Alternative Political Speech Classification "Facts"
Tyler Dammann, Regina Nguyen
Improving Forced Alignments
Christina Ramsey, Frank Zheng
Predicting Assertiveness in Conversation Using Deep Learning
Isabella Cai, Catherina Xu, Grace B. Young
Dialogue Acts in Design Conversations
Ethan Chan, Aaron Loh, Connie Zeng
Style Transfer for Prosodic Speech
Anthony Perez, Chris Proctor, Archa Jain
Native Language Identification of Spoken Language Using Recurrent
Kai-Chieh Huang, Jennifer Lu, Wayne Lu
Reading Emotions from Speech using Deep Neural Networks
Anusha Balakrishnan, Alisha Rege
Automatic Lyrics Transcription by Separating Vocals from Background Music
Diveesh Singh, Helen Jiang, Mindy Yang
Applying a Recurrent Neural Network using Connectionist Temporal Classification to Automatic Recognition of Lyrics in Singing
Maneesh Apte, Matthew Chen, Teddy Morris-Knower, Shalom Rottman-Yang
Deep RNN Speech Recognition with Sub-Labels
Jiayu Wu, Yangxin Zhong, Qixiang Zhang
NATLID: Native Language Identification
Ankita Bihani, Anupriya Gagneja, Mohana Prasad Sathya Moorthy
Detecting Personality Traits in Conversational Speech
Liam Kinney, Anna Wang, Jessica Zhao
Compression of Deep Speech Recognition Networks
Stephen Koo, Priyanka Nigam, Darren Baker
Generating Adversarial Examples for Speech Recognition
Dan Iter, Jade Huang, Mike Jermann
Rappify: Adding Rhythm to Speech
Ian Torres, Jacob Conrad Trinidad
Identifying Confidence in Speech
Grady Williams, Bryan McLellan, Grant Sivesind
Battleship as a Dialog System
Gerry Meixiong, , Tony Tan-Torres, Jeffrey Yu
The Effect of Speech Disfluencies on Turn-Taking
Lucy Li, Kartik Sawhney, Divya Sain
Native Language Identification from Speech Transcriptions
Kent Blake, Greg Ramel, Matthew Volk
A Neural Network Approach to the Native Language Inference Task
Roger Chen, Kenny Leung
Improving Conversational Forced Alignment with Lexicon Expansion
Christopher Liu, , Stephanie Mallard, Ryan Silva
Detecting Lies via Speech Patterns
Amanda Chow, John Louie
pWAVE: A Novel Dataset for Emotional Confidence Detection
Sanjay Kannan, Rooz Mahdavian
Infer Your First Language from Your English
Qiwen Fu, Wei-ting Hsu, Yundong Zhang
Native Language Identification from i-vectors and Speech Transcriptions
Ben Ulmer, Aojia Zhao, Nolan Walsh
Monaural Source Separation Using Neural Networks
Simon Kim, Mark Kwon, Sunmi Lee
Neural Lie Detection with the CSC Deceptive Speech Dataset
Shloka Desai, Maxwell Siegelman, Zachary Maurer
Native Language Identification through Speech
Delenn Chin, Kevin Chen, David Morales
Dialogue System for Restaurant Reservations using Hybrid Code Network
Charles Akin-David, David Xue, Evelyn Mei
VitiBot: A Dialog System Sommelier
Stephanie Tang, Ivan Suarez, Jim Andress
Applying Artistic Style Transfer to Natural Language
Thaminda Edirisooriya, Morgan Tenney
End-to-end neural networks for subvocal speech recognition
Pol Rosello, Pamela Toman, Nipun Agarwala
Latent Sentiment
Frank Cipollone, Hugo Clifford Kitano, Mila Faye Schultz
Pitch Perfect: Predicting Startup Funding Success Based on Shark Tank Audio
Shubha Raghvendra, Jeremy Wood, Minna Xiao
Improving Acoustic Models for Enriched Lexicons
Vivian Hsu, Addison Leong, Antariksh Mahajan
Text-to-speech Synthesis System based on Wavenet
Yuan Li, Xiaoshi Wang, Shutong Zhang
Classification and Recognition of Stuttered Speech
Manu Chopra, Kevin Khieu, Thomas Liu
Convolutional Neural Networks and Grammar Rules Analysis for Speech-based Native Language Identification
Dilsher Ahmed, Long-Huei Chen, Ayooluwakunmi Jeje
Statistial Methods for Native Language Identification
Tony Bruess, Frank Fan, Brexton Pham
End-to-End Neural Speech Synthesis
Alex Barron
Storytime - End to end neural networks for audiobooks
Pierce Freeman, Ethson Villegas, John Kamalu
Accent Conversion Using Artificial Neural Networks
Amy Bearman, Kelsey Josund, Gawan Fiore
Learning to Recognize Speech From Chaotically Synthesized Data
Faraz Bonab, Samuel Ginn
Applying Backoff to Concatenative Speech Synthesis
Lily Liu, Luladay Price, Andrew Zhang
Deep Learning Approaches for Online Speaker Diarization
Chaitanya Asawa, Nikhil Bhattasali, Allan Jiang
Auditory Deep Q Networks
Austin Ray, Do-Hyoung Park, Vignesh Venkataraman
Detecting and Artistically Representing Romantic Compatibility in Human Dialogue
Chris Salguero, Anna Teixeira, Ramin Ahmar
Modeling Intonation in Text-to-Speech Synthesis with a Bidirectional Long Short-Term Memory Recurrent Neural Network
Kevin Garbe, Aleksander Glowkal
Mark My Words! End-to-End Memory-Enhanced Neural Architectures for Automatic Speech Recognition
Amani Peddada, Lindsey Kostas
CS224S