CS 224S/LINGUIST 285
Spoken Language Processing

Spring 2014 · Dan Jurafsky

Introduction to spoken language technology with an emphasis on dialogue and conversational systems. Automatic speech recognition, extraction of affect and social meaning from speech, speech synthesis, dialogue management, and applications to digital assistants, search, and recommender systems.

Room: 260-113, TuTh 2:15-3:30pm

We have a student final project poster session coming up! Click here for more details.

Schedule Piazza Forum

Schedule

Week Date Homework In-class Readings
1 Apr 1 and 3 -
Course Overview and History, Articulatory Phonetics and ARPAbet transcription [pptx] [pdf]
  • J+M Chapter 7: Phonetics, 215-230
Phonetics: Acoustic Phonetics [pptx] [pdf]
  • J+M Chapter 7: Phonetics, 230-end
2 Apr 8 and 10

Homework 1 due Apr 8 2:00pm

ASR: Noisy Channel Model, HMMs, Forward, Viterbi, Word Error Rate [pptx] [pdf]
  • J+M Chapter 6: Hidden Markov Models, pages 173-186
  • J+M Chapter 9: Automatic Speech Recognition, pages 285-295
ASR: HMMs continued: Baum Welch, Advanced Decoding [pptx] [pdf]
  • J+M Chapter 6: Hidden Markov Models, pages 186-192
  • J+M Chapter 9: Automatic Speech Recognition pages 314-333
  • J+M Chapter 10, Speech Recognition: Advanced Topics, section 10.1, 335-341
(On your own: ASR: Language Modeling)
  • If you have never had language modeling (i.e., have never taken CS124 or CS224N or similar) please either watch the language modeling videos for CS124 on Stanford Coursera or else read Chapter 4 (pages 83-100, 4.1-4.5.1; 104,4.6, and 109-111,4.9.1) of J+M. Just the first 6 of the 8 videos is sufficient. You may need to create an account. The lectures are here, look under Language Modeling:
    https://stanford.coursera.org/cs124-002/lecture
3 Apr 15 and 17

Homework 2 due Apr 15 2:00pm

ASR: Acoustic Modeling [pptx] [pdf]
  • J+M Chapter 9: Automatic Speech Recognition pages 295-314
  • J+M Chapter 10: Speech Recognition Advanced Topics 10.3 pages 345-349
ASR: Feature Extraction [pptx] [pdf]
  • J+M Section 9.3 (295-303)
4 Apr 22 and 24

Homework 3 due Apr 22 2:00pm

Social Meaning Extraction, Emotion Detection [pptx] [pdf]
5 Apr 29 and May 1

Homework 4 due Apr 29 2:00pm

Conversational Agents: Frame-based dialogue systems [pptx] [pdf]
  • J+M Chapter 24, Dialogue and Conversational Agents, pages 811-838
Conversational Agents: Dialog Acts, Information State, and Markov Decision Processes [pptx] [pdf]
  • J+M Chapter 24, Dialogue and Conversational Agents, pages 838-end
6 May 6 and 8

Homework 5 due May 6 2:00PM

Personality [pptx] [pdf]
(a) Intoxication, Depression, Trauma (b) Disfluencies [pptx] [pdf]
7 May 13 and 15
Text to Speech (TTS) 1: Overview, Text Normaliation, Letter-to-Sound, Prosody [pptx] [pdf]
  • J+M Chapter 8 249-271
Text to Speech (TTS) 2: Waveform Synthesis, Diphone, Unit Selection, Parametric Synthesis [pptx] [pdf]
  • J+M Chapter 8 271-end
8 May 20 and 22
Speaker Identification, Verification, Diarization [pptx] [pdf]
Final Project Meeting Day
9 May 27 and 29
No class Tuesday
Thursday: Deep Neural Networks for Acoustic Modeling (Lecture by Andrew Maas) [pptx] [pdf]
10 June 3 -
Final Project Draft Presentations 2-4pm
- Monday June 9: 12:00 noon - -
Final Project Due

Course Information

Logistics

Instructor
Dan Jurafsky (jurafsky@stanford.edu)
Office: Margaret Jacks 117
Office Hours: Mondays 12:30-1:30 or by appointment
Teaching Assistants

Andrew Maas
Peng Qi
Sushobhan Nayak
Frank Liu

TA Office Hours
  • Monday 12:30-1:30, Dan, 117 Margaret Jacks
  • Monday 6:00-7:00pm, Gates B28, Homework Office Hour
  • Tuesday 1:00-2:00pm, Gates 120, Andrew
  • Wednesday 7:30-8:30pm, Gates B28, Peng
  • Thursday 4:00-5:00pm, Gates B28, Sushobhan
  • Friday 4:00-5:00pm, Gates B28, Frank
Class Time

Tuesday and Thursday 2:15-3:30pm. Room is currently 260-113, although it might change so watch this space.

Email

If you have a question that is not confidential or personal, post it on the Piazza forum - responses tend to be quicker and have a wider audience. To contact the teaching staff directly, we strongly encourage you to come to office hours. If that is not possible, you can also email (non-technical questions only) to the course staff list, cs224s-spr1314-staff@lists.stanford.edu. We can not reply to email sent to individual staff members. If you have a matter to be discussed privately, please come to office hours, or use cs224s-spr1314-staff@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

We use the mailing list generated by Axess to convey messages to the class. We will assume that all students read these messages.

Honor Code

Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set or interfere in any way with programming assignment scoring or tampering with the submit script.

Textbooks
  • Required: Jurafsky and Martin. 2009. Speech and Language Processing (2nd Edition). Pearson. There are two copies on reserve in the library.

Course Description

Introduction to spoken language technology with an emphasis on dialogue and conversational systems. Automatic speech recognition, extraction of affect and social meaning from speech, speech synthesis, dialogue management, and applications to digital assistants, search, and recommender systems.

Prerequisites

CS 124, 221, 224N, or 229

Required Work

Homeworks

5 homeworks. Homework is due at 2:00pm on the day it is due (i.e. before class starts).

Programming Assignment Collaboration: You may talk to anybody you want about the assignments and bounce ideas off each other. But you must write the actual programs yourself.

Late homeworks

You have 5 free late (calendar) days to use on the programming assignments Once these are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day.

Readings

We will expect you to do a significant amount of textbook reading in this course.

Final exam

There is no final exam for this course

Final project

Any project in speech recognition, speech synthesis, speech understanding, dialogue design, speech user interface design, etc etc. Projects should be joint; 3 people is optimal; 2 is acceptable only if you have a convincing reason. The final project will be presented as a poster at the poster session on Tuesday June 3, and is due on Monday June 9 at noon PST by email.

Information on the final project is here.

Project idea suggestions are here.

Final grade
  • 45% homeworks
  • 45% final project
  • 10% class participation