Course Project

Spring 2025

Project Overview

The course project is an opt-in replacement for assignment 4. If students complete both a project and assignment 4, we will award points for whichever results in the higher grade. Students must submit a proposal and receive staff approval in order to work on a course project to present and submit instead of assignment 4.

One of the main goals of this course is to prepare students to develop spoken language processing systems for real-world use. For those interested in research, CS224S helps build skills to work with spoken language tools in a research setting, or invent new approaches for audio and spoken language understanding. The final project in this course allows you to pursue research-oriented outcomes, or develop systems for a spoken language product prototype.

Our core guiding philosophy for projects: Build something that you feel proud of. Whether your goal is a research paper or new product, and whatever your experience so far, choose a project direction and scope so that you can enthusiastically tell stories about this project in future job interviews or conversations with peers. We are here to help you, and we hate to see projects that students quickly discard or forget about after the quarter.

Project Topics

Your first task is to pick a project topic. If you are looking for project ideas, please see the course project lecture and discuss ideas on the discussion forum or office hours. You can also look through the course projects from 2017.

Task / Application Project

Many fantastic class projects come from students picking either an application or dataset that they are interested in and applying topics from the class to that task. Alternatively, if you are interested in specific set of techniques from the class we can help find a dataset or task where you can tractably explore those techniques.

Building a functioning spoken dialog system for a specific task can also be a great project. While this may not be as research-oriented in the sense of running experiments with clear benchmarks and metrics, an excellent course project could build a useful dialog system (e.g. Alexa Skill) for a task and demonstrate its usage. This demo system needs to be paried with some empirical evaluation, training a component, or otherwise designing and testing the system in a thoughtful way.

Research Project

If you are already working on a research project related to topics in class we encourage you to apply what you learned in class as a project. An excellent CS224S project will comprise a publishable or nearly-publishable piece of work. In previous years, some number of students continue working on their projects after completing CS224S and submit their work to a conference or journal.

Recent Works

For inspiration, you might also look at some recent spoken language understanding research papers. Topics covered in class span several conferences, but you can look at the recent pro- ceedings of Interspeech, ASRU, SigDial, EMNLP, ACL, NAACL, NeurIPS, ICLR, and ICML for research papers in this area.

Datasets

Be careful of choosing a project where no available dataset exists for your experiment, or a project that requires data collection before you can start work. Unless your project is specifically focused on creating a new dataset, we consider data collection and preparation a small part of project set up. Generally we encourage students to ensure that data availability will not block progress on experiments and system development.

We developed the HarperValleyBank corpus for this course to simulate call center spoken language experiments while being small enough for rapid experiment iteration. Stanford has many datasets which you might use as a benchmark task for your project, or look for open source / publicly available datasets in audio or spoken language. You can browse the available datasets from LDC and similar in the NLP group inventory here. We also compiled a supplemental list here. There are publicly available datasets listed on HuggingFace for spoken language also. Please post on the discussion forum if you need help getting access to a dataset.

Building on homework systems as a project

Our homeworks provide some starting experiment code and tools for speech recognition, synthesis, and using audio foundation models. You can these tools as a starting point to build your project – we give you permission to reuse homework code as part of your project. If you are starting with one of these established baseline systems for a project, you should clearly state your project’s research contributions relative to what was provided, and ensure your project has sufficient scope. Include what techniques you plan to investigate as part of your project proposal when starting from a homework system for course project experiments.

Project Logistics

Notes on Forming Projects

  • Team size. To facilitate overlap with other courses, students may do final projects solo or in teams of up to 3 people. We expect larger teams to undertake larger projects or more experiments.

  • Contribution. We expect each team member to make a significant contribution to the overall project. In the final report, include a Contribution section that describes what pieces each person contributed to the final project. We typically assign the same project grade to all team members, but we might differentiate in rare cases. You can contact us in confidence in the event of unequal contribution.

  • External collaborators. You can work on a project that has external (non-CS224S student) collaborators, but you must make it clear in your final report which parts of the project were your work. If you use data, code, or APIs from external organizations please be sure to appropirately cite and acknowledge this in your final report and poster.

  • Sharing projects. You can share a single project between CS224S and another class, but we expect the project to be accordingly bigger, and you must declare that you are sharing the project in your project proposal.

  • Using external resources. You may use external tools/frameworks for building deep learning systems, doing speech processing, or building something in a framework like Alexa Skills Kit. Simply stitching together tools for a demo is not a sufficient project. Instead, use tools as a way to focus your effort on the meaningful research question or most critical capabilities for your project.

Project Evaluation

Projects will be evaluated based on:

  • Technical quality of the work. (Does the technical material make sense? Are the things tried reasonable? Are the proposed algorithms or applications clever and interesting? Do the authors convey novel insight about the problem and/or algorithms? Are any system demos compelling in their function/scope?)

  • Significance. (Did the authors find a dataset to appropriately test their hypothesis? Is this work likely to be useful and/or have impact? Could a demo system / tool be used in practice?)

  • Novelty of the work. (Does this solve a new problem/domain/task/dataset? Does it introduce a new approach? Is it a justifiable next step given previous work in this area?)

  • Clarity of the write-up. (Your write-up should clearly describe your task, dataset, solution approach, relevant related work, any experiments/test you tried, along with your conclusions.)

Try not to overthink these criteria nor worry too much if you’re not sure that you can do well on all of them. Just think of this as an “ideal” that you should aspire to (especially if your goal is to do publishable work).

Lastly, a few words of advice: Many of the best class projects come from students working on topics that they are excited about. So, pick something that you can get excited and passionate about! Be brave rather than timid, and do feel free to propose ambitious things that you are excited about. Finally, if you are not sure what would or would not make a good project, we encourage you to either post on the forum or come to office hours to talk about project ideas!

Project Office Hours

Please use mainly Andrew’s office hours for project-related questions. For homework questions you can visit any TA’s office hours. In addition to office hours. We will offer extra project office hours as necessary throughout the quarter.

Deliverables

All project groups will submit each of the following via Gradescope.

Project Proposal (opt-in gateway to submitting a project)

A proposal should be a maximum of 500 words and include the following.

  • The task you plan to work on.
  • The dataset you plan to use.
  • A sketch of your proposed approach/model.
  • How do you plan to evaluate your approach.
  • References to at least 2 papers, datasets, or relevant systems (reference section not part of word count)

If you do not have full answers to these questions you can describe what specifics you have and how you are working to establish datasets, baselines, or modeling approaches that might not be clear yet.

If your proposed project will be done jointly with a different class project (with the consent of the other class instructor), your proposal must clearly say so. The teaching staff will review your proposal and contact you if we foresee any issues with your project.

You must receive a “yes” from teaching staff on a proposal before you can drop assignment 4. In some cases the teaching staff will provide feedback or request discussion to ensure a project is set up for success before approving

Project spotlight presentations / short video (20%)

We reserve the final day of lecture for project groups present their project, findings, and ongoing progress in a short ~2 minute “spotlight talk” format. Groups will have time to show 1-2 slides to introduce the project, and show any relevant demos / audio samples where appropriate. The spotlight talk should cover:

  • Overview of your motivation / task. Why did you choose this project?
  • What data did you work with? What is your formal task/hypothesis, or the narrow goal you spent time trying to achieve
  • What initial experiments did you try? What were your results and what did you learn from those experiments?
  • With the remaining time, what are you currently focused on trying/building? It’s fine to speculate about results / capabilities you’re building now, even if they don’t make it into the final report.
  • If applicable, demos of your system/results or playing audio examples while relevant is always great!

For remote students, and in cases where project groups are unable to present during lecture time, project teams may instead submit a ~2 minute video presentation as an update on their work (due on the same day as project presentations).

Final Report (80%)

Final project write-ups should be 2-4 pages of text and may include additional pages for appendices, figures, references, and everything else you choose to submit. The following is a suggested structure:

  • Title, Author(s)

  • Abstract: This is an overview of the story of your project (the motivation, the findings, the impact). It should not be more than 300 words.

  • Introduction: This section introduces your problem and the overall plan for approaching your problem. Motivate why the problem your project is solving is important.

  • Related Works: This section discusses relevant literature for your project. What other approaches have tried to solve the same problem as your project? What are the differences in methodology?

  • Approach: (a.k.a. Methods) This section details the framework of your project. Be specific, which means you might want to include equations, figures, plots, etc. Do not mention any experiments yet; this is your opportunity to present the models, algorithms, and new technical contributions in abstract.

  • Experiments: This section begins with what kind of experiments you are doing, what kind of dataset(s) you are using, and what is the way you measure or evaluate your results. It then shows in details the results of your experiments. By details, we mean both quantitative evaluations (show numbers, figures, tables, etc) as well as qualitative results (show images, example results, etc).

  • Conclusion: What have you learned? What is the overall outcome from your experiments or system building?

  • References: This is absolutely necessary. Please follow a consistent citation scheme for your report template and include references to previous work related to your task, dataset, and/or modeling approaches.

  • Contributions: List each project team member and briefly summarize their contribution to the project. Also include any external collaborators and their role in the project. If you do this work in collaboration with someone else, or if someone else (such as another professor) advises you on this work, your write-up must fully acknowledge their contributions.

Please use the ACL paper template for write-ups. The template can be downloaded here for LaTex or Overleaf.

After the class, we will post all the final write-ups online so that you can read about each others’ work. If you do not want your write-up to be posted online, please specifically mention it in the final paragraph of your write-up.

Previous Course Projects

Spring 2017

List of Projects What's Up, Doc? A Medical Diagnosis Bot
Monica Agrawal, Janette Cheng, Caelin Tran

Alternative Political Speech Classification "Facts"
Tyler Dammann, Regina Nguyen

Improving Forced Alignments
Christina Ramsey, Frank Zheng

Predicting Assertiveness in Conversation Using Deep Learning
Isabella Cai, Catherina Xu, Grace B. Young

Dialogue Acts in Design Conversations
Ethan Chan, Aaron Loh, Connie Zeng

Style Transfer for Prosodic Speech
Anthony Perez, Chris Proctor, Archa Jain

Native Language Identification of Spoken Language Using Recurrent
Kai-Chieh Huang, Jennifer Lu, Wayne Lu

Reading Emotions from Speech using Deep Neural Networks
Anusha Balakrishnan, Alisha Rege

Automatic Lyrics Transcription by Separating Vocals from Background Music
Diveesh Singh, Helen Jiang, Mindy Yang

Applying a Recurrent Neural Network using Connectionist Temporal Classification to Automatic Recognition of Lyrics in Singing
Maneesh Apte, Matthew Chen, Teddy Morris-Knower, Shalom Rottman-Yang

Deep RNN Speech Recognition with Sub-Labels
Jiayu Wu, Yangxin Zhong, Qixiang Zhang

NATLID: Native Language Identification
Ankita Bihani, Anupriya Gagneja, Mohana Prasad Sathya Moorthy

Detecting Personality Traits in Conversational Speech
Liam Kinney, Anna Wang, Jessica Zhao

Compression of Deep Speech Recognition Networks
Stephen Koo, Priyanka Nigam, Darren Baker

Generating Adversarial Examples for Speech Recognition
Dan Iter, Jade Huang, Mike Jermann

Rappify: Adding Rhythm to Speech
Ian Torres, Jacob Conrad Trinidad

Identifying Confidence in Speech
Grady Williams, Bryan McLellan, Grant Sivesind

Battleship as a Dialog System
Gerry Meixiong, , Tony Tan-Torres, Jeffrey Yu

The Effect of Speech Disfluencies on Turn-Taking
Lucy Li, Kartik Sawhney, Divya Sain

Native Language Identification from Speech Transcriptions
Kent Blake, Greg Ramel, Matthew Volk

A Neural Network Approach to the Native Language Inference Task
Roger Chen, Kenny Leung

Improving Conversational Forced Alignment with Lexicon Expansion
Christopher Liu, , Stephanie Mallard, Ryan Silva

Detecting Lies via Speech Patterns
Amanda Chow, John Louie

pWAVE: A Novel Dataset for Emotional Confidence Detection
Sanjay Kannan, Rooz Mahdavian

Infer Your First Language from Your English
Qiwen Fu, Wei-ting Hsu, Yundong Zhang

Native Language Identification from i-vectors and Speech Transcriptions
Ben Ulmer, Aojia Zhao, Nolan Walsh

Monaural Source Separation Using Neural Networks
Simon Kim, Mark Kwon, Sunmi Lee

Neural Lie Detection with the CSC Deceptive Speech Dataset
Shloka Desai, Maxwell Siegelman, Zachary Maurer

Native Language Identification through Speech
Delenn Chin, Kevin Chen, David Morales

Dialogue System for Restaurant Reservations using Hybrid Code Network
Charles Akin-David, David Xue, Evelyn Mei

VitiBot: A Dialog System Sommelier
Stephanie Tang, Ivan Suarez, Jim Andress

Applying Artistic Style Transfer to Natural Language
Thaminda Edirisooriya, Morgan Tenney

End-to-end neural networks for subvocal speech recognition
Pol Rosello, Pamela Toman, Nipun Agarwala

Latent Sentiment
Frank Cipollone, Hugo Clifford Kitano, Mila Faye Schultz

Pitch Perfect: Predicting Startup Funding Success Based on Shark Tank Audio
Shubha Raghvendra, Jeremy Wood, Minna Xiao

Improving Acoustic Models for Enriched Lexicons
Vivian Hsu, Addison Leong, Antariksh Mahajan

Text-to-speech Synthesis System based on Wavenet
Yuan Li, Xiaoshi Wang, Shutong Zhang

Classification and Recognition of Stuttered Speech
Manu Chopra, Kevin Khieu, Thomas Liu

Convolutional Neural Networks and Grammar Rules Analysis for Speech-based Native Language Identification
Dilsher Ahmed, Long-Huei Chen, Ayooluwakunmi Jeje

Statistial Methods for Native Language Identification
Tony Bruess, Frank Fan, Brexton Pham

End-to-End Neural Speech Synthesis
Alex Barron

Storytime - End to end neural networks for audiobooks
Pierce Freeman, Ethson Villegas, John Kamalu

Accent Conversion Using Artificial Neural Networks
Amy Bearman, Kelsey Josund, Gawan Fiore

Learning to Recognize Speech From Chaotically Synthesized Data
Faraz Bonab, Samuel Ginn

Applying Backoff to Concatenative Speech Synthesis
Lily Liu, Luladay Price, Andrew Zhang

Deep Learning Approaches for Online Speaker Diarization
Chaitanya Asawa, Nikhil Bhattasali, Allan Jiang

Auditory Deep Q Networks
Austin Ray, Do-Hyoung Park, Vignesh Venkataraman

Detecting and Artistically Representing Romantic Compatibility in Human Dialogue
Chris Salguero, Anna Teixeira, Ramin Ahmar

Modeling Intonation in Text-to-Speech Synthesis with a Bidirectional Long Short-Term Memory Recurrent Neural Network
Kevin Garbe, Aleksander Glowkal

Mark My Words! End-to-End Memory-Enhanced Neural Architectures for Automatic Speech Recognition
Amani Peddada, Lindsey Kostas