CS 124: From Languages to Information

Winter 2017 Dan Jurafsky

The online world has a vast array of unstructured information in the form of language and social networks. Learn how to make sense of it and how to interact with humans via language, from answering questions to giving advice!

Schedule EdX Material Piazza Forum

Welcome to "From Languages to Information"!!

Schedule

Week Date Homework Quiz In-class Video Lectures and Readings
1 Jan 10 and 12 - -
  • Tue: Intro Lecture*

    [slides pptx] [slides pdf]

  • Thurs: Group Work: Text Processing with Unix tools (watch the 4 "Basic Text Processing" videos before class) [pptx] [pdf]
Basic Text Processing (watch videos before Thursday's class) [slides pptx] [slides pdf]
Edit Distance [slides pptx] [slides pdf]
2 Jan 17 and 19

Homework 1: Spamlord

Due Fri Jan 20, 5:00pm

Quiz 1: Text Processing/Edit Distance

Due Tue Jan 17, 11:59pm

    Tuesday: Language Modeling (same material as video)
No Class Thursday
Language Modeling [slides pptx] [slides pdf]
Spelling Correction and the Noisy Channel [slides pptx] [slides pdf]
3 Jan 24 and 26

Homework 2: AutoCorrect!

Due Fri Jan 27, 5:00pm

Quiz 2: Language Modeling

Due Tuesday Jan 24, 11:59pm

Naive Bayes and Text Classification [slides pptx] [slides pdf]
Sentiment Analysis [slides pptx] [slides pdf]
4 Jan 31 and Feb 2

Homework 3: Thumbs up!

Due Fri Feb 3, 5:00pm

Quiz 3: Text Categorization and Naive Bayes

Due Tuesday Jan 31, 11:59pm

    Tuesday: Group Work on Naive Bayes and Sentiment Analysis
  • Thursday: Guest Lecture
Information Retrieval (I) [slides pptx] [slides pdf]
  • MR+S Chapter 1: Boolean Retrieval (pages 1-17)
  • MR+S Chapter 2: Term vocabulary and postings lists (only pages 33-42)
Information Retrieval (II) [slides pptx] [slides pdf]
  • MR+S Chapter 6: Scoring, term weighting, and the vector space model, (only pages 100 and 107-116)
  • MR+S Chapter 8: Evaluation in Information Retrieval (only pages 139-149)
5 Feb 7 and 9

Homework 4: Search!

Due Fri Feb 10, 5:00pm

Quiz 4: Information Retrieval

Due Tue Feb 7, 11:59pm

Tuesday: Group Work on Information Retrieval



Thursday: Relation Extraction and Question Answering (from same material as videos)


Relation Extraction [slides pptx] [slides pdf]
Question Answering [slides pptx] [slides pdf]
6 Feb 14 and 16

Homework 5: Jeopardy!

Due Fri Feb 17, 5:00pm

Quiz 5: Relation Extraction and Question Answering

Due Tue Feb 14, 11:59pm

Tuesday: QA in Watson and Intro to Chatbots*

Thursday: Social Meaning Extraction*

Chat Bots [slides pptx] [slides pdf]
Optional advanced reading
Social Meaning: Extracting Emotion and Personality from Language [slides pptx] [slides pdf]
7 Feb 21 and 23 -

Quiz 6: Chatbots/Emotion Detection

Due Tue Feb 21, 11:59pm

Tuesday Group Work on Smartphone Chatbots + Question Answering

Thursday: Recommender Systems and Vector Semantics*

Recommender systems (Collaborative Filtering) [slides pptx] [slides pdf]
Vector Semantics [slides pptx] [slides pdf]
8 Feb 28 and Mar 2

Homework 6: Chat!

Due Fri Mar 3, 5:00pm

Quiz 7: Recommendation Systems and Vector Semantics

Due Tue Feb 28, 11:59pm

Tuesday: Guest Lecture

Thursday: PA 6 work time: Class time to work on PA6.

TBA
Web graphs, Links, and PageRank [slides pptx] [slides pdf]
  • MR+S Chapter 21: Link Analysis
9 Mar 7 and 9 -

Quiz 8: Pagerank

Due Tue Mar 7, 11:59pm

Tuesday: Peer Grading in class of the Chatbots!
Thursday: Dan's Lecture on Social Networks*
Social Networks [slides pptx] [slides pdf]
10 Mar 14 and 16 -

Quiz 9: Networks and Zipfs Law

Due Tue Mar 14, 11:59pm

Tuesday: NLP Applied to Social and Humanistic Questions*

Thursday: Course Review, Discussion of sample final and its solutions*

- Mar 21.. - -
Final Exam

The final is:

  • Tuesday Mar 21, 3:30-6:30pm, Location TBA

The alternate final is:

  • TBA
You can take whichever final you prefer. You don't have to RSVP, just show up.

Logistics

Instructor
Dan Jurafsky (jurafsky@stanford.edu)
Office: Margaret Jacks 117
Office Hours: Thursdays 4:30-6:00
Teaching Assistants

Ziang Xie (head TA)
Monica Agrawal
William Chen
Berk Coker
Catherine Dong
Gaspar Garcia
Raghav Gupta
Brad Huang
Matt Lamm
Rafael Musa
Kevin Wu

TA Office Hours
  • Wednesdays 7:00pm to 10:00pm in Huang 203 and 219
  • Tuesdays 1:15pm to 3:00pm in Huang 203, except one time, Jan 31, in HuangB020 instead
  • Thursdays 6:00pm to 8:00pm in Huang 203
Class Time

Tuesday and Thursday 3:00-4:20pm in 420-040 (basement of Jordan Hall next to Thai Cafe)

Email

If you have a question that is not confidential or personal, post it on the Piazza forum - responses tend to be quicker and have a wider audience. To contact the teaching staff directly, we strongly encourage you to come to office hours. If that is not possible, you can also email (non-technical questions only) to the course staff list, cs124-win1617-staff@lists.stanford.edu. We can not reply to email sent to individual staff members. If you have a matter to be discussed privately, please come to office hours, or use cs124-win1617-staff@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

We use the mailing list generated by Axess to convey messages to the class. We will assume that all students read these messages.

Honor Code

Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set or interfere in any way with programming assignment scoring or tampering with the submit script.

Since quizzes are a form of assessment, students are not allowed to collaborate on completing quizzes. It is an honor code violation to discuss quiz questions with other students.

Textbooks
  • There is no required textbook, but I will expect you to know the material listed above, drawn from the textbooks and other readings. The material in the readings will be tested on the final exam. Different people may learn better from different combinations of videos/lectures, reading the chapters, or coming to the in-class group exercises. The best-prepared students who do the best on the final exams tend to do all three. But I won't take roll for lectures and attendance is up to you (although we'll give extra credit for attendance at the group work sessions).

Course Description

Extracting meaning, information, and structure from human language text, speech, web pages, genome sequences, social networks, or any less structured information. Methods include: string algorithms, edit distance, language modeling, naive Bayes, inverted indices, vector semantics. Applications such as information retrieval, question answering, text classification, social network models, chatbots, genomic sequence alignment, word meaning extraction, recommender systems.

Prerequisites

CS 103, CS 107 and CS 109.

Required Work

From Languages to Information is a (semi-)flipped class with much of the material online. Most of the lectures have been video-recorded, and you can watch them at home. The weekly quizzes and programming homeworks will be automatically uploaded and graded. Lecture, quizzes, and homeworks are available on EdX.
Video Lectures

Each week, we will ask you to watch a set of video lectures (2 to 2.5 hours total). The videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos (or in some cases, attend the lectures that cover the identical material) but the embedded quizzes are not counted toward the final grade. For a few of the video lectures I'll do a duplicate in-class version for people who prefer to see the lecture live. For those duplicate lectures you can watch either the in-class or recorded version.

In class Lectures

Attendence is required at the 8 lectures marked with a * and in blue on the syllabus above. When I say required I mean that this material will be tested in the final but is not in the videos or chapters; I will not be taking attendance.

In-class group problem-solving

5 in-class sessions are for group problem-solving activities. Attendence is highly recommended. Also there will be extra credit points for participation. Previous students who did well in the class have reported that the in-class group exercises have been extremely useful.

Automated Review Quizzes

After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5 questions) on the content that you just learned. Each review quiz may be attempted twice, with a time lag between each attempt. The questions, as well as the options for each question, are randomly selected from a larger pool each time you take a quiz. The system will automatically take the best score of your two attempts for the quiz. Review Quizzes for each week are due 11:59pm Tuesday of the following week. There are no late days for review quizzes.

Class Participation

Attendence is strongly recommended but optional except for the 8 lectures in blue bold. Reminder: I won't be actually taking attendeace but I'll being covering material that is not presented in the textbook or video lectures, and I will test this material on the final. In addition, there will be 5 in-class sessions devoted to group problem-solving. You can get extra credit for class participation by:: participating in the in-class group exercises, helpful answers on the class forum, helping out other students in office hours, being the first person to find typos in the textbook (not counting bugs in figure or chapter numbering).

Programming Assignments

6 Python programming assignments. Each assignment is due at 5:00pm on the Friday it is due.

Programming Assignment Collaboration: You may talk to anybody you want about the assignments and bounce ideas off each other. But you must write the actual programs yourself.

Late homeworks

You have 4 free late (calendar) days to use on programming assignments 1-5. You cannot use late days on PA 6. Once late days are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day. However, no assignment will be accepted more than four days after its due date.

Readings

This class has a significant amount of textbook reading. Most weeks have around 30 textbook pages. The homeworks and exams will be based heavily on the readings.

Final exam:

  • Tuesday Mar 21, 3:30pm-6:30pm

The alternate final is not yet set:

You can take whichever final you prefer. You don't have to RSVP, just show up.

Final grade
  • 59% homeworks
  • 27% final exam
  • 11% weekly review quizzes
  • 3% reserved for extra credit participation in forums and class