CS 124: From Languages to Information

Winter 2015 Dan Jurafsky

The web is a vast world of unstructured information — text and speech in multiple languages, social networks, tags, and all sorts of human interactions. Learn how to make sense of it!

Schedule Coursera Material Piazza Forum

FAQ

Online Offering

From Languages to Information has much of the material online.

What this means:

Schedule

Week Date Homework Quiz In-class Video Lectures and Readings
1 Jan 6 and 8 - -
  • Tue: Intro Lecture* [pptx] [pdf]

  • Thurs: Group Work: Text Processing with Unix tools [pptx] [pdf]
Basic Text Processing [slides pptx] [slides pdf]
Edit Distance [slides pptx] [slides pdf]
2 Jan 13 and 15

Homework 1: Spamlord

Due Fri Jan 16, 5:00pm

Quiz 1: Text Processing/Edit Distance

Due Tue Jan 13, 11:59pm

    Tuesday: Dan Lecture on Language Modeling (same material as video)
No Class Thursday
Language Modeling [slides pptx] [slides pdf] (skip the video/slides on Good Turing Smoothing)
Spelling Correction and the Noisy Channel [slides pptx] [slides pdf]
3 Jan 20 and 22

Homework 2: AutoCorrect!

Due Fri Jan 23, 5:00pm

Quiz 2: Language Modeling

Due Tuesday Jan 20, 11:59pm

Nave Bayes and Text Classification [slides pptx] [slides pdf]
Sentiment Analysis [slides pptx] [slides pdf]
4 Jan 27 and 29

Homework 3: Thumbs up!

Due Fri Jan 30, 5:00pm

Quiz 3: Text Categorization and Naive Bayes

Due Tuesday Jan 27, 11:59pm

  • Thursday: Guest Lecture*
Information Retrieval (I) [slides pptx] [slides pdf]
  • MR+S Chapter 1: Boolean Retrieval (pages 1-17)
  • MR+S Chapter 2: Term vocabulary and postings lists (only pages 33-42)
Information Retrieval (II) [slides pptx] [slides pdf]
  • MR+S Chapter 6: Scoring, term weighting, and the vector space model, (only pages 100 and 107-116)
  • MR+S Chapter 8: Evaluation in Information Retrieval (only pages 139-149)
5 Feb 3 and 5

Homework 4: Search!

Due Fri Feb 6, 5:00pm

Quiz 4: Information Retrieval

Due Tue Feb 3, 11:59pm

Tuesday 3:15-4:00 Group Work on Information Retrieval and Answer Key

Tuesday 4:00-4:30: Dan Lecture on Relation Extraction (same material as videos)

Thursday: Dan Lecture on Question Answering (same material as videos)


Relation Extraction [slides pptx] [slides pdf]
  • J+M New Chapter 18: Information Extraction page 1-3 and section 18.2 (pages 7-17)
Question Answering [slides pptx] [slides pdf]
  • J+M New Chapter 24 Conversational Agents and Question Answering TBD.
6 Feb 10 and 12

Homework 5: Jeopardy!

Due Fri Feb 13, 5:00pm

Quiz 5: Relation Extraction and Question Answering

Due Tue Feb 10, 11:59pm

Tuesday: 3:15-3:45. Dan lecture on Dialogue (same material as textbook, but not on video)

Tuesday 3:45-4:30: Group Work on Question Answering in the Mobile Domain

Thursday: Dan lecture on Machine Translation (same material as parts of videos)

Machine Translation 1 [slides pptx] [slides pdf]
  • J+M New Chapter 23: Machine Translation, TBD
Machine Translation 2 [slides pptx] [slides pdf]
  • J+M New Chapter 23: Machine Translation, TBD
7 Feb 17 and 19 -

Quiz 6: Machine Translation

Due Tue Feb 17, 11:59pm

Tuesday: Dan Lecture on Social Meaning Extraction*

Thursday: 3:15-3:45 Dan Lecture on Part-of-Speech Tagging (same material can be found by reading JM 3ed Chapter 8)

Thursday: 3:45-4:30 Group Work on PA6

Speech and Social Meaning Extraction [slides pptx] [slides pdf]
  • TBD
  • TBD
8 Feb 24 and 26

Homework 6: Translate!

Due Fri Feb 27, 5:00pm

Quiz 7: Speech/Emotion/Dialogue

Due Tue Feb 24, 11:59pm

Tuesday: Guest Lecture*

Thursday: Group Work on PA 6

9 Mar 3 and 5 -

Quiz 8: Pagerank

Due Tue Mar 3, 11:59pm

Tuesday: Dan Lecture on PageRank (same material as videos)
Thursday: Dan's Lecture on Social Networks*
Web graphs, Links, and PageRank [slides pptx] [slides pdf]
10 Mar 10 and 12 -

Quiz 9: Networks and Zipfs Law

Due Tue Mar 10, 11:59pm

Tuesday: Dan's Lecture: Extraction of Social Meaning from Everyday Language: Dating and Food*

Thursday: Course Review, Discussion of Practice Final and its Solutions

Social Networks [slides pptx] [slides pdf]
- Mar 20 - -
Final Exam

Wednesday Mar 18, 12:15pm-3:15pm, location TBD

We will be giving you a sample final.

Course Information

Logistics

Instructor
Dan Jurafsky (jurafsky@stanford.edu)
Office: Margaret Jacks 117
Office Hours: TBD
Teaching Assistants

TA Office Hours
  • TBD
  • TBD
  • TBD
TA Office

TBD

Class Time

Tuesday and Thursday 3:15-4:30pm in Hewlett 201

Email

If you have a question that is not confidential or personal, post it on the Piazza forum - responses tend to be quicker and have a wider audience. To contact the teaching staff directly, we strongly encourage you to come to office hours. If that is not possible, you can also email (non-technical questions only) to the course staff list, cs124-win1415-staff@lists.stanford.edu. We can not reply to email sent to individual staff members. If you have a matter to be discussed privately, please come to office hours, or use cs124-win1415-staff@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

We use the mailing list generated by Axess to convey messages to the class. We will assume that all students read these messages.

Honor Code

Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set or interfere in any way with programming assignment scoring or tampering with the submit script.

Textbooks
  • There is no required textbook, but I will expect you to know the material listed above, drawn from the textbooks and other readings. The material in the readings will be tested on the final exam. Different people may learn better from different combinations of videos/lectures, reading the chapters, or coming to the in-class group exercises. The best-prepared students who do the best on the final exams tend to do all three. But I won't take roll and attendence is up to you.
    • Online new chapters from Jurafsky and Martin. third edition in progress. Speech and Language Processing. I will be giving you the PDFs.
    • Chapters from Manning, Raghavan, and Schutze. 2008. Introduction to Information Retrieval. Cambridge University Press. You can buy the book, get it from the library, or it's also available online *HERE*.

Course Description

Extracting meaning, information, and structure from human language text, speech, web pages, genome sequences, social networks, or any less structured information. Methods include: string algorithms, edit distance, language modeling, naive Bayes, inverted indices, vector semantics. Applications such as information retrieval, question answering, text classification, social network models, machine translation, genomic sequence alignment, word meaning extraction.

Prerequisites

CS 103, CS 107 and CS 109.

Required Work

Video Lectures

Each week, we will ask you to watch a set of video lectures (2 to 2.5 hours total). The videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos (or in some cases, attend the lectures that cover the identical material) but the embedded quizzes are not counted toward the final grade.

Automated Review Quizzes

After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5 questions) on the content that you just learned. Each review quiz may be attempted several times, with a time lag of 10 minutes in between each attempt. The questions, as well as the options for each question, are randomly selected from a larger pool each time you take a quiz. We will take the highest score over all attempts for each quiz. The first two attempts will not be penalized; subsequent attempts will incur a cumulative 20% penalty (e.g., the maximum score possible is 80% on the 3rd attempt and 60% on the 4th attempt). Review Quizzes for each week are due 11:59pm Tuesday of the following week. There are no late days for review quizzes.

Class Participaton

Attendence is strongly recommended but optional except for the first day of class. There are 5 other lectures which will cover material that is not presented in the textbook or video lectures, and will be tested on the final. Since lectures are on-line, the in-class sessions Tuesday and Thursday mornings will be used for problem-solving, reviews, discussions, guest speakers from industry, and presentation of state-of-the-art research. Attendence at the guest lectures as well as the first lecture, my lecture on networks, and possibly one other in-person lecture is required (this is the 5% class participation part of your grade). You can get extra credit for class participation by answering questions on the class forum and asking good question of the invited speakers.

Programming Assignments

6 programming assignments (in Java or Python, your choice). Each assignment is due at 5:00pm on the Friday it is due.

Programming Assignment Collaboration: You may talk to anybody you want about the assignments and bounce ideas off each other. But you must write the actual programs yourself.

Late homeworks

You have 4 free late (calendar) days to use on programming assignments 1-5. For the group homework PA 6, the number of late days is the mean of the late days of each person in your group, all fractions rounded up. (e.g., if your 3 members have 0, 1, and 3 late days left, your team will have 4/3 = 1.3 rounded up to 2 late days). Once these are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day. However, no assignment will be accepted more than four days after its due date.

Readings

This class has a significant amount of textbook reading. Most weeks have around 30 textbook pages. The homeworks and exams will be based heavily on the readings.

Final exam:

Wednesday Mar 18, 12:15pm-3:15pm, location TBD

Final grade
  • 57% homeworks
  • 29% final exam
  • 9% weekly review quizzes
  • 5% participation in forums and class