Please read this entire handout before beginning. We advise you to start early and to make use of the TAs by coming to office hours and asking questions! For collaboration and the late day policy, please refer to the course homepage.
About the Assignment
In this assignment, you’ll complete two Colab notebooks to build a character-level speech recognizer using CTC:
-
Notebook 1 (60 pts): Data exploration of a conversational speech dataset. Implement the CTC loss function.
-
Notebook 2 (100 pts): Use PyTorch Lightning to train a CTC-based neural network on the HVB dataset.
Training each of the three models you need to build in Notebook 2 can take thirty minutes or more. Please start early!
Submit your notebooks (with outputs and visualizations only) via Gradescope; you do not need to upload any audio files. The assignment is worth 165 points in total. Submit your solutions via gradescrope. We do not require you to submit audio files, only the visualizations and notebook output from your work.
Submission Instructions
This assignment is due on 05/12/2025 by 11:59PM pacific (or at latest on 05/15/2025 with three late days). For this assignment, you will submit your filled-in/executed Colab Notebook (just one per notebook we provide) with all code/output, as PDF (combine both PDFs into the same file) on Gradescope. Please tag your question responses.
All instructions and starter code are contained in the Google Colab notebook.
You can access the starter notebook via Google Drive. Remember to make a copy before starting your work!
CS224S