Schedule and Syllabus

Unless otherwise specified the course lectures and meeting times are:

Monday, Wednesday 4:30-5:50
Location: Hewlett Teaching Center (04-510) Room 201
EventDateDescriptionCourse Materials
Lecture Apr 3 Course Overview and History, Articulatory Phonetics and ARPAbet transcription Lecture slides:
[pptx] [pdf]
Readings:
  1. J+M 2nd Edition Chapter 7: Phonetics, 215-230 [pdf for Stanford students]
Lecture Apr 5 Phonetics: Acoustic Phonetics Lecture slides:
[pptx] [pdf]
Readings:
  1. J+M 2nd Edition Chapter 7: Phonetics, 230-end [pdf for Stanford students]
HW1 released Apr 5 Homework 1 released Homework handout pdf
Lecture Apr 10 Speech Recognition: Noisy Channel Model, HMMs, Forward, Viterbi, Word Error Rate Lecture slides:
[pptx] [pdf]
Readings:
  1. J+M 3rd Edition Chapter 9: Hidden Markov Models [pdf]
  2. J+M 2nd Edition Chapter 9: Automatic Speech Recognition, pages 285-295 [pdf for Stanford students]
If you have never had language modeling (i.e., have never taken CS124 or CS224N or similar) you should do some additional reading and video lecture watching on your own.
  • Read J+M 3rd Edition Chapter 4 pages 1-20 (you can skip section 4.5) [pdf]
  • CS224N Lecture on N-gram and neural network language modeling [pdf]
  • Lecture videos on introductory NLP including language modeling [youtube]
Lecture Apr 12 Speech Recognition: Advanced Decoding, Finite State Transducers Lecture slides:
[pptx] [pdf]
Readings:
  1. J+M 2nd Edition Chapter 9: Automatic Speech Recognition pages 314-333 [pdf for Stanford students]
  2. J+M 2nd Edition Chapter 10, Speech Recognition: Advanced Topics, section 10.1, 335-341 [pdf for Stanford students]
HW1 Due Apr 12 Homework 1 due by 11:59pm
HW2 Released Apr 12 Homework 2 released Homework handout pdf
Project Released Apr 12 Course project overview released Project handout pdf
Lecture Apr 17 Speech Recognition: GMM Acoustic Modeling and Feature Extraction Lecture slides:
[pptx] [pdf]
Readings:
  1. J+M 2nd Edition Chapter 9: Automatic Speech Recognition pages 295-314 [pdf for Stanford students]
  2. J+M 2nd Edition Chapter 10: Speech Recognition Advanced Topics 10.3 pages 345-349 [pdf for Stanford students]
Lecture Apr 19 Deep Learning Preliminaries and Course Project Introduction Lecture slides:
[pptx] [pdf]
Readings:
  1. Basics of Neural Networks
Lecture Apr 24 Speech Recognition: Neural Network Acoustic Models Lecture slides:
[pptx] [pdf]
Readings:
  1. Hinton, Geoffrey, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury. 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition. Signal Processing Magazine, IEEE 29, no. 6 (2012): 82-97.
  2. Andrew L. Maas, Peng Qi, Ziang Xie, Awni Y. Hannun, Christopher T. Lengerich, Daniel Jurafsky, Andrew Y. Ng. 2017. Building DNN acoustic models for large vocabulary speech recognition. Computer Speech & Language, Volume 41, Pages 195-213. [pdf]
Lecture Apr 26 Speech Recognition: End-to-End Neural Network Recognition Lecture slides:
[pptx] [pdf]
Readings:
  1. Graves Alex, Fernandez Santiago, Gomez Faustino and Schmidhuber Jurgen. 2006. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. ICML, 2016
  2. Graves Alex and Jaitly Navdeep. Towards End-to-End Speech Recognition with Recurrent Neural Networks. ICML, 2014.
  3. Maas Andrew, Xie Ziang, Jurafsky Daniel and Ng Andrew. Lexicon-Free Conversational Speech Recognition with Neural Networks. NAACL, 2015
  4. Hannun Awni, Case Carl and others. Deep Speech: Scaling up end-to-end speech recognition. Arxiv 1412.5567.
HW3 Released Apr 26 Homework 3 released Homework handout pdf
HW2 Due Apr 27 Homework 2 due by 11:59pm
Interactive Session May 1 In-Class Interactive Session: Working with Tensorflow Lecture slides:
[pptx] [pdf]
Github Link:
[link]
Students should bring their laptops to class or partner together with someone who has a laptop. Students should have already followed instructions posted on Piazza to install Tensorflow, or access a remote/virtual machine with Tensorflow installed. We will work through some examples to get familiar with what Tensorflow is doing and how it works.
Proposal Due May 1 Project Proposal due by 11:59pm
Lecture May 3 Conversational Agents: Introduction and Frame-Based Dialogue Lecture slides:
[pptx] [pdf]
Readings:
  1. J+M 3rd Edition Chapter 29: Conversational Agents and Chatbots
Lecture May 8 Conversational Agents: Dialog Acts, Information State, and Markov Decision Processes Lecture slides:
[pptx] [pdf]
Readings:
  1. J+M 3rd Edition Chapter 29: Conversational Agents and Chatbots
Lecture May 10 Conversational Agents: Deep Learning Approaches Lecture slides:
[pptx] [pdf]
Readings:
  1. Oriol Vinyals and Quoc Le. A Neural Conversational Model. Arxiv 1506.05869
  2. Li et al., A Diversity-Promoting Objective Function for Neural Conversation Models. NAACL 2016.
  3. Li et al., Deep Reinforcement Learning for Dialogue Generation. EMNLP 2016
  4. Li et al., A Persona-Based Neural Conversation Model. ACL 2016
  5. Li et al., Learning through Dialogue Interactions by Asking Questions. ICLR 2017
HW3 Due May 10 Homework 3 due by 11:59pm
HW4 Released May 10 Homework 4 released Homework handout pdf
Lecture May 15 Social Meaning Extraction and Interpersonal Stance Lecture slides:
[pptx] [pdf]
Readings:
  1. Scherer, K. R. 2003. Vocal communication of emotion: A review of research paradigms. Speech Communication 40:1-2, 227-256. Please read section 1 and 3 and skim section 2 to get an idea of the previous literature.
  2. Rajesh Ranganath, Dan Jurafsky, and Daniel A. McFarland. 2013. Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Computer Speech and Language. 27:1, 89-115.
  3. Read pages 1066-1070 on acoustic features from Schuller, Björn, Anton Batliner, Stefan Steidl, and Dino Seppi. 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication 53, no. 9 (2011): 1062-1087.
  4. Liscombe, Jackson, Jennifer Venditti, and Julia Bell Hirschberg. 2003. Classifying subject ratings of emotional speech using acoustic features. Proceedings of Eurospeech 2003, 4 pages.
Additional advanced reading for people looking for final project ideas:
  1. McFarland, Daniel A., Dan Jurafsky, and Craig M. Rawlings. 2013. Making the Connection: Social Bonding in Courtship Situations. American Journal of Sociology Vol. 118, No. 6, 1596-1649
  2. Ekman, Paul. 2001. Facial Expression
  3. The Oxford Companion to the Body. Colin Bkalemore and Sheila Jennet, editors.
  4. The rest of Schuller, Björn, Anton Batliner, Stefan Steidl, and Dino Seppi. 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Communication 53, no. 9 (2011): 1062-1087.
Lecture May 17 Text to Speech (TTS): Overview, Text Normalization, Letter-to-Sound, Prosody Lecture slides:
[pptx] [pdf]
Readings:
  1. J+M 2nd Edition Chapter 8 249-271 [pdf for Stanford students]
Milestone Due May 17 Project milestone due by 11:59pm
Lecture May 22 Text to Speech (TTS): Concatenative, Parametric, and End-to End Neural Synthesis Lecture slides:
[pptx] [pdf]
Readings:
  1. J+M 2nd Edition Chapter 8 271-end [pdf for Stanford students]
Lecture May 24 Personality, Intoxication, Depression, Trauma, and Disfluencies Lecture slides:
[pptx] [pdf]
Readings:
  1. F. Mairesse, M. Walker, M. Mehl, and R. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research (JAIR), 30:457-500.
Additional advanced reading for people looking for final project ideas:
  1. G. Mohammadi and A. Vinciarelli. 2012. Automatic Personality Perception: Prediction of Trait Attribution Based on Prosodic Features. IEEE Transactions on Affective Computing, Vol. 3, no. 3, pp. 273-284.
  2. Schiel F, Heinrich Chr, Barfüßer S (2012): Alcohol Language Corpus. In: Language Resources and Evaluation, Volume 46, Issue 3 (2012), Berlin-New York:Springer, DOI: 10.1007/s10579-011-9139-y, pp. 503-521. Brief note in Nature.
  3. Hollien, H., DeJong, G., Martin, C. A., Schwartz, R. and Liljegren, K. Effects of ethanol intoxication on speech suprasegmentals. Journal of the Acoustical Society of America 110, 3198 - 3206.
  4. Bone, Daniel, Ming Li, Matthew P. Black, and Shrikanth S. Narayanan. 2014. Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors. Computer speech and language 28, 2: 375-39
HW4 Due May 24 Homework 4 due by 11:59pm
No Class May 29 No Class (Memorial Day)
Lecture May 31 Guest Lecture: Alex Lebrun, Cofounder of Wit.ai
Lecture June 5 (Optional) Project office hours during lecture time
Poster Session June 7 Final project poster session Location ATT Patio behind Gates Computer Science Building
Poster session 4:30-7:00pm. Please arrive at 4pm to set up posters.
Project Paper Due June 8 Project paper due by 11:59pm May not use late days.