CS224S Assignment 2: Working with speech tools, transcripts, and synthesizng audio

Spring 2025

Please read this entire handout before beginning. We advise you to start early and to make use of the TAs by coming to office hours and asking questions! For collaboration and the late day policy, please refer to the course homepage.


About the Assignment

In this assignment, you will experiment with speech synthesis and voice cloning models. You will also gain experience working with speech transcripts by converting raw transcripts into more useful, clean summaries. As a reminder, please only create voice cloning samples from someone if you have their permission! Using speech samples from TTS training datasets is okay too.

The assignment is worth 165 points in total. Submit your solutions via gradescrope. We do not require you to submit audio files, only the visualizations and notebook output from your work.

Submission Instructions

This assignment is due on 04/28/2025 by 11:59PM pacific (or at latest on 05/01/2025 with three late days). For this assignment, you will submit your filled-in/executed Colab Notebook (just one) with all code/output, as PDF (combine both PDFs into the same file) on Gradescope. Please tag your question responses.

All instructions and starter code are contained in the Google Colab notebook.

You can access the starter notebook in Google Drive. Remember to make a copy before starting your work!