CS 124: From Languages to Information

Winter 2019
Dan Jurafsky

The online world has a vast array of unstructured information in the form of language and social networks. Learn how to make sense of it and how to interact with humans via language, from answering questions to giving advice!

Schedule Piazza Forum EdX Material

Welcome to "From Languages to Information"!!


Week Date Homework Quiz In-class Video Lectures and Readings (to be done by the Friday of the week unless I specify an earlier date)
1 Jan 8 and 10 - -
Edit Distance Videos [slides pptx] [slides pdf]
2 Jan 15 and 17

PA 1: Spamlord

Due Fri Jan 18, 5:00pm

Quiz 1: Text Processing/Edit Distance

Due Tue Jan 15, 11:59pm

Language Modeling Videos (watch videos before Thursday's class) [slides pptx] [slides pdf]
Naive Bayes and Text Classification Videos [slides pptx] [slides pdf]
3 Jan 22 and 24

PA 2: Triage!

Due Fri Jan 25, 5:00pm

Quiz 2: Language Modeling/Naive Bayes

Due Tuesday Jan 22, 11:59pm

    Logistic Regression (no video)

4 Jan 29 and 31

PA 3: Sentiment Analysis

Due Fri Feb 1, 5:00pm

Quiz 3: Sentiment/Logistic Regression

Due Tuesday Jan 29, 11:59pm

    Tuesday: No Class
  • Thursday: No Class
Chris Manning Video: Information Retrieval (I) [slides pptx] [slides pdf]
  • MR+S Chapter 1: Boolean Retrieval (pages 1-17)
  • MR+S Chapter 2: Term vocabulary and postings lists (only pages 33-42)
Chris Manning Video: Information Retrieval (II) [slides pptx] [slides pdf]
  • MR+S Chapter 6: Scoring, term weighting, and the vector space model, (only pages 100 and 107-116)
  • MR+S Chapter 8: Evaluation in Information Retrieval (only pages 139-149)
5 Feb 5 and 7

PA 4: Information Retrieval

Due Fri Feb 8, 5:00pm

Quiz 4: Information Retrieval

Due Tue Feb 5, 11:59pm

Tuesday: Group Work on Information Retrieval [solutions; don't read til you try]

Thursday: Vector Semantics, Neural Embeddings, Word2Vec* [slides pptx] [slides pdf]

  • Vector Semantics, Neural Embeddings, Word2Vec (no video; read before Thursday)

Relation Extraction Video [slides pptx] [slides pdf]
6 Feb 12 and 14

Homework 5: Quizlet!

Due Fri Feb 15, 5:00pm

Quiz 5: Relation Extraction and Vector Semantics

Due Tue Feb 12, 11:59pm

Tuesday: No Class

Thursday: Introduction to Chatbots* [slides pptx] [slides pdf]

  • Chat Bots (no videos)

Optional advanced reading:
7 Feb 19 and 21 -

Quiz 6: Chatbots/Question Answering

Due Tue Feb 19, 11:59pm

Tuesday: Recommender Systems* [slides pptx] [slides pdf]

Thursday: No Class

Recommender systems (Collaborative Filtering) (no video)
8 Feb 26 and 28

Homework 6: Chat!

Due Fri Mar 1, 5:00pm

Quiz 7: Recommendation Systems

Due Tue Feb 26, 11:59pm

Tuesday: No Class

Thursday: PA 6 work time: Class time to work on PA6.

Web graphs, Links, and PageRank Videos [slides pptx] [slides pdf]
  • MR+S Chapter 21: Link Analysis
9 Mar 5 and 7 -

Quiz 8: Pagerank

Due Tue Mar 5, 11:59pm

Tuesday Group Work on Smartphone Chatbots
Thursday: Social Networks*
[slides pptx] [slides pdf]

Practice Final [solutions]

Social Networks (no videos)
10 Mar 12 and 14 -

Quiz 9: Networks and Zipfs Law

Due Tues Mar 12, 11:59pm

Tuesday: NLP for Social Good*: Guest Lectures from Kevin Clark and Rob Voigt [Rob's slides] [Kevin's slides pptx, pdf]

Thursday: Course Review, Discussion of sample final and its solutions
NLP for Social Good (No videos)
- Mar 19 - -
Final Exam

The final is:

  • Tuesday Mar 19, 3:30pm-6:30pm, Knight Management Center CEMEX AUDITORIUM

The alternate final is:

  • Monday Mar 18, 3:30pm-6:30pm, Cubberley Auditorium
    You can take whichever final you prefer. You don't have to RSVP, just show up.

  • Logistics

    Dan Jurafsky (jurafsky@stanford.edu)
    Office: Margaret Jacks 117
    Office Hours: Thursdays 4:30-6:00
    Teaching Assistants

    Urvashi Khandelwal (head TA)
    Jennie Chen
    Laura Cruz-Albrecht
    Chuma Kabaghe
    Julia Mendelsohn
    Matt Mistele
    Vik Pattabi
    Charissa Plattner
    Minh-An Quinn
    Sam Redmond

    TA Office Hours
    • Tuesdays 1:30pm to 3:00pm Gates B30
    • Wednesdays 7:00pm to 10:00pm in Huang 203 and 219
    • Thursdays 6:00pm to 8:00pm in Huang 203, Huang 219 (overflow)
    Class Time

    Tuesday and Thursday 3:00-4:20pm in Hewlett 200


    We can not reply to email sent to individual staff members. If you have a question that is not confidential or personal, post it on the Piazza forum - responses tend to be quicker and have a wider audience. To contact the teaching staff directly, we strongly encourage you to come to office hours. If that is not possible, you can also email (non-technical questions only) to the course staff list, cs124-win1819-staff@lists.stanford.edu. If you have a matter to be discussed privately, please come to office hours, or use cs124-win1819-staff@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

    We try to redundantly use Piazza, Canvas, and mailing lists to make sure any messages we convey to the class reach you all! We will assume that all students read these messages.

    Honor Code

    Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set or interfere in any way with programming assignment scoring or tampering with the submit script.

    Since quizzes are a form of assessment, students are not allowed to collaborate on completing quizzes. It is an honor code violation to discuss quiz questions with other students.

    • There is no required textbook, but I will expect you to know the material listed above, drawn from the textbooks and other readings. The material in the readings will be tested on the final exam. Different people may learn better from different combinations of videos/lectures, reading the chapters, or coming to the in-class group exercises. The best-prepared students who do the best on the final exams tend to do all three. But I won't take roll for lectures and attendance is up to you.

    Course Description

    Extracting meaning, information, and structure from human language text, speech, web pages, genome sequences, social networks, or any less structured information. Methods include: string algorithms, edit distance, language modeling, naive Bayes, inverted indices, vector semantics. Applications such as information retrieval, question answering, text classification, social network models, chatbots, genomic sequence alignment, word meaning extraction, recommender systems.


    CS 103, CS 107 and CS 109.

    Required Work

    From Languages to Information is a (semi-)flipped class with much of the material online. Most of the lectures have been video-recorded, and you can watch them at home. The weekly quizzes and programming homeworks will be automatically uploaded and graded. Lecture, quizzes, and homeworks are available on EdX.
    Video Lectures

    Each week, we will ask you to watch a set of video lectures (2 to 2.5 hours total). The videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos (or in some cases, attend the lectures that cover the identical material) but the embedded quizzes are not counted toward the final grade. For a few of the video lectures I'll do a duplicate in-class version for people who prefer to see the lecture live. For those duplicate lectures you can watch either the in-class or recorded version.

    In class Lectures

    Attendence is required at the 7 lectures + 1 group work marked with a * and in blue on the syllabus above. When I say required I mean that this material will be tested in the final but is not in the videos; I will not be taking attendance.

    In-class group problem-solving

    6 in-class sessions are for group problem-solving activities. Only the first one (Jan 10 on Unix text processing) is required; attendence at the other 5 is highly recommended. Previous students who did well in the class have reported that the in-class group exercises have been extremely useful.

    Automated Review Quizzes

    After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5 questions) on the content that you just learned. Each review quiz may be attempted twice, with a time lag between each attempt. The questions, as well as the options for each question, are randomly selected from a larger pool each time you take a quiz. The system will automatically take the best score of your two attempts for the quiz. Review Quizzes for each week are due 11:59pm Tuesday of the following week. There are no late days for review quizzes.

    Class Participation

    Attendence is strongly recommended but optional except for the 7 lectures in blue bold plus the one group session (Jan 10) in blue bold, which are required. Reminder: I won't be actually taking attendeace but I'll being covering material that is not presented in the textbook or video lectures, and I will test this material on the final. In addition, there will be 4 other in-class sessions devoted to group problem-solving. You can also get extra credit for class participation by:: helpful answers on the class forum, helping out other students in office hours, being the first person to find typos in the textbook (not counting bugs in figure or chapter numbering).

    Programming Assignments

    6 Python programming assignments. Each assignment is due at 5:00pm on the Friday it is due.

    Programming Assignment Collaboration: You may talk to anybody you want about the assignments and bounce ideas off each other. But you must write the actual programs yourself. We will use the normal automatic checks for overlap between your code and other students' code.

    Late homeworks

    You have 4 free late (calendar) days to use on programming assignments 1-5. You cannot use late days on PA 6. Once late days are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day. However, no assignment will be accepted more than four days after its due date.


    This class has a significant amount of textbook reading. Most weeks have around 30 textbook pages. The homeworks and exams will be based heavily on the readings.

    Final exam:

    • Tuesday Mar 19, 3:30pm-6:30pm Knight Management Center CEMEX AUDITORIUM

    The alternate final is:

    • Monday Mar 18, 3:30pm-6:30pm, Cubberley Auditorium
    You can take whichever final you prefer. You don't have to RSVP, just show up.

    Final grade computation
    • 63% homeworks (PAs 1-5 are each worth the same, 9% (ignore the different point values for each homework). PA6 is worth 18%, double the others)
    • 22% final exam
    • 15% weekly review quizzes