CS 124: From Languages to Information

Dan Jurafsky

The online world has a vast array of unstructured information in the form of language and social networks. Learn how to make sense of it and how to interact with humans via language, from answering questions to giving advice!

Schedule Piazza Forum Canvas Material

Schedule

Week Date Homework Quiz In-class Video Lectures and Readings (to be done by the Friday of the week unless I specify an earlier date)
1 Mar 30, Apr 1

PA 0: Setup and Tutorial [starter code]

Due Fri Apr 2, 5:00pm (Ungraded/optional)

-
  • Basic Text Processing Canvas Videos (watch videos before class Thursday April 1) [slides pptx] [slides pdf]
Edit Distance Canvas Videos (watch videos before Monday April 5) [slides pptx] [slides pdf]
2 Apr 6 and 8

PA 1: Spamlord [starter code]

Due Fri Apr 9, 5:00pm

Quiz 1: Text Processing/Edit Distance [gradescope]

Due Tue Apr 6, 11:59pm

    Tuesday: Live tutorial: How to use Jupyter notebooks with Python
    Thursday: No class: extra office hours during class time
Language Modeling Canvas Videos (watch before Monday April 12) [slides pptx] [slides pdf]
Naive Bayes and Text Classification Canvas Videos (watch before Monday April 12) [slides pptx] [slides pdf]
3 Apr 13 and 15

PA 2: Triage and Sentiment (NB+LR)! [starter code]

Due Fri Apr 16, 5:00pm

Quiz 2: Language Modeling/Naive Bayes/Regression [gradescope]

Due Tuesday Apr 13, 11:59pm

    Thursday: No class: extra office hours during class time


Chris Manning Canvas Video: Information Retrieval (I) (watch/read before Monday April 19) [slides pptx] [slides pdf]
  • MR+S Chapter 1: Boolean Retrieval (pages 1-17)
  • MR+S Chapter 2: Term vocabulary and postings lists (only pages 33-42)
Chris Manning Canvas Video: Information Retrieval (II) (watch/read before Monday April 19) [slides pptx] [slides pdf]
  • MR+S Chapter 6: Scoring, term weighting, and the vector space model, (only pages 100 and 107-116)
  • MR+S Chapter 8: Evaluation in Information Retrieval (only pages 139-149)
4 Apr 20 and 22

PA 3: Information Retrieval

Due Fri Apr 23, 5:00pm

Quiz 3: Information Retrieval

Due Tuesday Apr 20, 11:59pm

  • Tuesday: Group Work 3: Information Retrieval


    • Thursday: No class: extra office hours during class time
  • Vector Semantics and Embeddings Canvas Videos
  • Parts of Speech and Named Entities
5 Apr 27 and 29

PA 4: Quizlet!

Due Fri Apr 30, 5:00pm

Quiz 4: Vector Semantics and Sequence Labelling

Due Tue Apr 27, 11:59pm

Tuesday: No Class Today

Thursday: Live Lecture Neural Networks


  • Chat Bots

Optional advanced reading:
6 May 4 and 6

Homework 5: Neural Networks

Due Fri May 7, 5:00pm

Quiz 5: Neural Networks

Due Tue May 4, 11:59pm

Tuesday: Review for First Midterm

Thursday: First Midterm

7 May 11 and 13 -

Quiz 6: Chatbots/Question Answering

Due Tue May 11, 11:59pm

Tuesday: Live Lecture Recommender Systems

Thursday: No class: extra office hours during class time

Recommender systems (Collaborative Filtering) (only live video)
8 May 18 and 20

Homework 6: Chat!

Due Fri May 21, 5:00pm

Quiz 7: Recommendation Systems

Due Tue May 18, 11:59pm

Tuesday Group Work 4: Smartphone Chatbots


Thursday: No class: extra office hours during class time

Web graphs, Links, and PageRank (Canvas Videos) [slides pptx] [slides pdf]
  • MR+S Chapter 21: Link Analysis
9 May 25 and 27 -

Quiz 8: Pagerank

Due Tue May 25, 11:59pm

Tuesday: Live Lecture Social Networks


Thursday: Live Lecture: NLP for Social Good
Social Networks (only live videos)
NLP for Social Good (Only live videos)
10 Jun 1 and 3 -

Quiz 9: Networks and Zipfs Law

Due Tues Jun 1, 11:59pm

Tuesday: Review for Second Midterm


Thursday: Second Midterm

Logistics

Instructor
Dan Jurafsky (jurafsky@stanford.edu)
Office: Margaret Jacks 117 (when not during COVID)
Office Hours: Tuesdays 4:15-5:45 (link on Canvas)
Teaching Assistants
    Jesus Enrique Cervantes
    Dan Iter (Head TA)
    Bryan Sanghyuk Kim
    Grace Lam
    Allison Lettiere
    Rose Li
    Benjamin Newman
    Arjun Sawhney
    Lauren Zhu

TA Office Hours
  • [Nooks Details]
  • Tuesdays 12:00noon to 1:30pm
  • Wednesdays 7:00pm to 10:00pm
  • Fridays 1:00-2:30pm
  • Plus: extra office hours on 5 Thursdays during class timeslot 2:30-3:50pm: April 8, 15, 22, May 13, May 20
Class Time

Tuesday and Thursday 2:30-3:50pm

Attendance

Course is asynchronous and no attendance will be taken. Live lectures will be recorded so you could in theory watch them later. However, we strongly recommend you come to at least the 4 group work sessions on Nooks and do them with other people. And if you come ask questions in live lectures, you will learn more! But you may, if you must, do all on your own asynchronously. Also: different people learn better from different combinations of videos/lectures, reading the chapters, coming to the in-class group exercises, and coming to office hours But I will say that students who do all four tend to do the best on the exams and in the course in general.

Email

Alas, we can't reply to email sent to individual staff members. If you have a question that is not confidential or personal, post it on the Piazza forum! Responses are quicker and you'll also be helping others with the same question! To contact the teaching staff directly, come see us in office hours! If that is not possible, you can also email (non-technical questions) to the course staff list, cs124-spr2021-staff@lists.stanford.edu. If you have a matter to be discussed privately, come to office hours or use cs124-spr2021-staff@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

Class announcements will be on Piazza (although we will occasionally try Canvas and mailing lists). We will assume that everyone reads all announcements.

Honor Code

Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set, or to interfere in any way with programming assignment scoring or tampering with the submit script.

Since quizzes are a form of assessment, students are not allowed to collaborate on completing quizzes. It is an honor code violation to discuss quiz questions with other students.

Textbooks
  • There is no required textbook, but I'll expect you to know the textbook/reading material listed above, and will test it on the midterms.

Course Description

Extracting meaning, information, and structure from human language text, speech, web pages, social networks. Introducing methods (string algorithms, edit distance, language modeling, machine learning classifiers, neural embeddings, inverted indices, collaborative filtering, PageRank), applications (chatbots, sentiment analysis, information retrieval, question answering, text classification, social networks, recommender systems), and ethical issues in both.

Prerequisites

CS106B. CS 107 can be helpful, but is fine if you haven't had it, we'll cover the required UNIX material.

Required Work

From Languages to Information is a (semi-)flipped class with much of the material online. Most of the lectures have been prerecorded, and you can watch them at home; a few of the lectures will be given live during class and recorded so you can watch afterwards for review. The weekly quizzes and programming homeworks will be automatically uploaded and graded. Lecture, quizzes, and homeworks are available on Canvas and, via Canvas, on Gradescope.
Prerecorded Video Lectures

Most weeks, we will ask you to watch a set of video lectures (2 to 2.5 hours total). Most videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos but the embedded quizzes are not counted toward the final grade.

In class Lectures

5 lectures will be live, we recommend you come! But they will be recorded and posted to Canvas afterwards for those who missed it, or for review for the midterms.

In-class group problem-solving

4 in-class sessions are for group problem-solving activities. These are strongly recommended, and the first one (April 1 on Unix text processing) is required and will be tested on the quiz, meaning that if you are taking the class async and can't make that date, you must still do the entire exercise at home instead. Previous students who did well in the class have reported that the in-class group exercises have been extremely useful.

Automated Review Quizzes

After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5 questions) on the content that you just learned. These quizzes are not timed, they are open book, and they may be attempted an infinite number of times. The questions, as well as the options for each question, are randomly selected from a larger pool each time you take a quiz. You will not see your quiz grade/correct answers until after the due date, but the system will take the the score from the last submission of all your infinitely-allowed submissions for the quiz. So if you worry you might have got something wrong, just submit another one! Review Quizzes for each week are due 11:59pm Tuesday of the following week. There are no late days for review quizzes.

Class Participation

You have to watch all lectures, but attendance for the live lectures is optional. However, attendance for group work sessions is strongly recommended (only the first one is required; the other 3 are technically optional); we will cover material that will be tested on the midterms.
You can get extra credit for class participation by:: Helpful answers on the class forum, helping out other students in office hours or group work sessions, being the first person to find typos in the textbook (not counting bugs in figure or chapter numbering), speaking up in the group work sessions.

Programming Assignments

6 Python programming assignments. Each assignment is due at 5:00pm on the Friday it is due.

Programming Assignment Collaboration for PA 1-5: You may talk to anybody you want about the assignments and bounce ideas off each other. And if you want, you can also choose a partner and do pair programming for PA 1-5. You and your pair-partner can discuss code, but it's important that each of you work on each part of the assignment so that you're comfortable with the whole assignment, since assignments build on each other (and we will test concepts from the assignments on the midterms). If you choose to pair-program, each of you must still submit your own program, and should specify in the submission who your partner is. We will use the normal automatic checks for overlap between your code and other students' code who are not your pair partner.

Programming Assignment Collaboration for PA 6: PA6 is a group homework that must be done in groups. You will work together with your group, and write code together. Groups must be of size 3 or 4. To work in a group of size 2, you must get special permission from the staff. You cannot work by yourself on PA 6, because part of the goal of this homework is to learn to work on group projects. You must describe in your writeup who worked on which parts of the assignment/code.

Late homeworks

You have 4 free late (calendar) days to use on programming assignments 1-5. If you are pair programming, late days are still individual (i.e if one of you has used up late days, and one has not, and you submit a homework late one day, only the student without remaining late days will be penalized). You cannot use late days on PA 6. Once late days are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day. However, no assignment will be accepted more than four days after its due date.

Readings

This class has a significant amount of textbook reading. Most weeks have around 25 textbook pages. The homeworks and exams will be based heavily on the readings.

Final grade computation
  • 63% homeworks (PAs 1-5 are each worth the same, 9% (ignore the different point values for each homework). PA6 is worth 18%, double the others)
  • 11% Midterm 1
  • 11% Midterm 2
  • 15% weekly review quizzes