CS 124: From Languages to Information

Dan Jurafsky

The online world has a vast array of unstructured information in the form of language and social networks. Learn how to make sense of it and how to interact with humans via language, from answering questions to giving advice!

Schedule Ed Discussion Canvas Material

Schedule

Week Date Homework Quiz In-class Video Lectures and Readings (to be done by the Friday of the week unless I specify an earlier date)
1 Sep 21, 23

PA 0: Setup and Tutorial [starter code]

Due Fri Sep 24, 5:00pm (Ungraded/optional; if you're having trouble, we'll go over this in Tuesday's tutorial on Nooks)

-
Edit Distance Canvas Videos (watch videos before Monday Sep 27) [slides pptx] [slides pdf]
2 Sep 28 and 30

PA 1: Spamlord [starter code]

Due Fri Oct 1, 5:00pm

Quiz 1: Text Processing/Edit Distance [gradescope]

Due Tue Sep 28, 11:59pm

    Tuesday: Live tutorial on Nooks: How to use Jupyter notebooks with Python
    Thursday: No class: extra in-person office hours during class time in 420-040
Language Modeling Canvas Videos (watch before Monday Oct 5) [Week 2 videos] [slides pptx] [slides pdf]
Naive Bayes and Text Classification Canvas Videos (watch before Tuesday Oct 5) [slides pptx] [slides pdf]
3 Oct 5 and 7

PA 2: Triage and Sentiment (NB+LR)! [starter code]

Due Fri Oct 8, 5:00pm

Quiz 2: Language Modeling/Naive Bayes/Regression [gradescope]

Due Tuesday Oct 5, 11:59pm

    Tuesday: Group Work 2: Naive Bayes and Sentiment Analysis
    (watch NB videos beforehand)
    (don't look at the solution until you've completed all the questions!)
    [group work 2] [solutions] [attendance form]


    Thursday: No class: extra in-person office hours during class time in 420-040


Chris Manning Canvas Video: Information Retrieval (I) (watch/read before Monday Oct 11) [Week 3 videos] [slides pptx] [slides pdf]
  • MR+S Chapter 1: Boolean Retrieval (pages 1-17)
  • MR+S Chapter 2: Term vocabulary and postings lists (only pages 33-42)
Chris Manning Canvas Video: Information Retrieval (II) (watch/read before Monday Oct 11) [slides pptx] [slides pdf]
  • MR+S Chapter 6: Scoring, term weighting, and the vector space model, (only pages 100 and 107-116)
  • MR+S Chapter 8: Evaluation in Information Retrieval (only pages 139-149)
4 Oct 12 and 14

PA 3: Information Retrieval [starter code]

Due Fri Oct 15, 5:00pm

Quiz 3: Information Retrieval [gradescope]

Due Tuesday Oct 12, 11:59pm

5 Oct 19 and 21

PA 4: Quizlet! [starter code]

Due Fri Oct 22, 5:00pm

Quiz 4: Vector Semantics and Sequence Labelling [gradescope]

Due Tue Oct 19, 11:59pm

Tuesday: No Class Today

Thursday: No class: extra in-person office hours during class time in 420-040


6 Oct 26 and 28

PA 5: Neural Networks [starter code]

Due next week! Mon Nov 1, 10:00pm

Quiz 5: Neural Networks [gradescope]

Due Tue Oct 26, 11:59pm

Tuesday: Review for First Midterm (online)

Thursday: First Midterm (online)

  • Chat Bot Videos (watch after the midterm, but before Nov 3)

Additional (optional) reading for those looking for more on this topic!:
7 Nov 2 and 4

PA 5: Neural Networks

Due Mon Nov 1, 10:00pm

PA 6: Chat! [starter code]

Due Tue Nov 16, 5:00pm

Quiz 6: Chatbots [gradescope]

Due Thu Nov 4, 11:59pm

Tuesday: No class

Thursday: No class: extra in-person office hours during class time in 420-040

Recommender systems and Collaborative Filtering Canvas videos
8 Nov 9 and 11 -

Quiz 7: Recommendation Systems [gradescope]

Due Thu Nov 11, 11:59pm

Tuesday no class


Thursday Group Work 4: Smartphone Chatbots
[group work 4] [attendance form]

Web graphs, Links, and PageRank (Canvas Videos)
  • MR+S Chapter 21: Link Analysis
9 Nov 16 and 18

PA 6: Chat!

Due Tues Nov 16, 5:00pm

(NOT FRIDAY)

Quiz 8: Pagerank [gradescope]

Due Thurs Nov 18, 11:59pm

Tuesday: No Class

Thursday: Live Lecture: NLP for Social Good
[attendance form]
Social Networks Canvas Videos
NLP for Social Good
10 Nov 30 and Dec 2 -

Quiz 9: Networks and Zipfs Law

Due Tues Nov 30, 11:59pm

Tuesday: Review for Second Midterm (online)


Thursday: Second Midterm (online)

Logistics

Instructor
Dan Jurafsky (jurafsky@stanford.edu)
Office: Margaret Jacks 117
Office Hours: For the first 4 weeks this quarter, I'm going to try an experiment with individual one-at-a-time in-person office hours, where we take walks outside. It will be right after class on Tuesdays 4:30-5:45, and let's try starting outside my office, which is Margaret Jacks 117!
Teaching Assistants
    Danielle Cruz
    Minju Kim
    Alexis Lowber
    Hanson Lu
    Ben Newman
    Pablo Ocampo
    Yanal Qushair
    Dilara Soylu (Head TA)
    Angelica Sun
    Bo Wade
    Dhara Yu

TA Office Hours
  • Tuesdays 12:00noon to 1:30pm
  • Wednesdays 7:00pm to 10:00pm
  • Fridays 1:00-2:30pm
  • Plus: extra in person office hours on 5 Thursdays in 420-040 during class timeslot 3:15-4:30pm: Sep 30, Oct 7, 14, 21, Nov 4.
Class Time

Tuesday and Thursday 3:15-4:30

Attendance

We recommend you come to the 2 live lectures and especially the 4 in-person group works, you will learn more from doing them with other people (and I will give extra credit for attending). However, if you wish, the course can be taken asynchronously. The 2 live lectures will be recorded so you can watch them later. And instead of doing the group works as a group, you can do them at home yourself. But you may, if you must, do all on your own asynchronously. Also: different people learn better from different combinations of videos/lectures, reading the chapters, coming to the live group exercises in 420-040, and coming to office hours. But I will say that students who do all four tend to do the best on the exams and in the course in general.

Email

Alas, we can't reply to email sent to individual staff members. If you have a question that is not confidential or personal, post it on the Ed Discussion forum! Responses are quicker and you'll also be helping others with the same question! To contact the teaching staff directly, come see us in office hours! If that is not possible, you can also email (non-technical questions) to the course staff list, cs124-aut2122-staff@lists.stanford.edu. If you have a matter to be discussed privately, come to office hours or use cs124-aut2122-staff@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

Class announcements will be on Ed Discussion (although we will occasionally try Canvas and mailing lists). We will assume that everyone reads all announcements.

Honor Code

Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set, or to interfere in any way with programming assignment scoring or tampering with the submit script.

Since quizzes are a form of assessment, students are not allowed to collaborate on completing quizzes. It is an honor code violation to discuss quiz questions with other students.

Textbooks
  • There is no required textbook, but I'll expect you to know the textbook/reading material listed above, and will test it on the midterms.

Course Description

Extracting meaning, information, and structure from human language text, speech, web pages, social networks. Introducing methods (string algorithms, edit distance, language modeling, machine learning, logistic regression, neural networks, neural embeddings, inverted indices, collaborative filtering, PageRank), applications (chatbots, sentiment analysis, information retrieval, text classification, social networks, recommender systems), and ethical issues.

Prerequisites

CS106B. CS 107 can be helpful, but is fine if you haven't had it, we'll cover the required UNIX material.

Required Work

From Languages to Information is a (semi-)flipped class with much of the material online. Most of the lectures have been prerecorded, and you can watch them at home; two of the lectures will be given live during class and recorded so you can watch afterwards for review. The weekly quizzes and programming homeworks will be automatically uploaded and graded. Lecture, quizzes, and homeworks are available on Canvas and, via Canvas, on Gradescope.
Prerecorded Video Lectures

Most weeks, we will ask you to watch a set of video lectures (2 to 2.5 hours total). Most videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos but the embedded quizzes are not counted toward the final grade.

In class Lectures

2 lectures will be live, we recommend you come! But they will be recorded and posted to Canvas afterwards for those who missed it, or for review for the midterms.

In-class group problem-solving

4 in-class sessions are for group problem-solving activities. The group works are required and will be tested on the quiz, meaning that if you are taking the class async and can't make the in-person group works, you must still do all 4 exercises at home instead. Previous students who did well in the class have reported that doing the group exercises in-class have been extremely useful.

Automated Review Quizzes

After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5 questions) on the content that you just learned. These quizzes are not timed, they are open book, and they may be attempted an infinite number of times. The questions, as well as the options for each question, are randomly selected from a larger pool each time you take a quiz. You will not see your quiz grade/correct answers until after the due date, but the system will take the the score from the last submission of all your infinitely-allowed submissions for the quiz. So if you worry you might have got something wrong, just submit another one! Review Quizzes for each week are due 11:59pm Tuesday of the following week (except that Quiz 6 and Quiz 7 are due on the Thursday instead of the Tuesday). There are no late days for review quizzes. We will drop your lowest scoring quiz (i.e. we will only count your best 8 of the 9 quizzes in your final grade).

Class Participation

You have to watch all lectures, but attendance for the 2 live lectures is optional. The group works are required and we will test material from them on the 2 midterms. however, attendance for group work sessions is only strongly recommended; you may do them yourself at home if you really cannot come to class. You can get extra credit for class participation and other things by:: Coming to the 2 live lectures and the 4 group works; helpful answers on the class forum, helping out other students in office hours or group work sessions, being the first person to find typos in the textbook (not counting bugs in figure or chapter numbering), speaking up in the group work sessions. Plus there will be extra credit problems on the two quizzes and also on PA6.

Programming Assignments

6 Python programming assignments. PA 1-4 are due at 5:00pm on the Friday it is due; PA5 and PA6 are due on different weekdays, still at 5:00pm.

Programming Assignment Collaboration for PA 1-5: You may talk to anybody you want about the assignments and bounce ideas off each other. And if you want, you can also choose a partner and do pair programming for PA 1-5. You and your pair-partner can discuss code, but it's important that each of you work on each part of the assignment so that you're comfortable with the whole assignment, since assignments build on each other (and we will test concepts from the assignments on the midterms). If you choose to pair-program, each of you must still submit your own program, and should specify in the submission who your partner is. We will use the normal automatic checks for overlap between your code and other students' code who are not your pair partner.

Programming Assignment Collaboration for PA 6: PA6 is a group homework that must be done in groups. You will work together with your group, and write code together. Groups must be of size 3 or 4. To work in a group of size 2, you must get special permission from the staff. You cannot work by yourself on PA 6, because part of the goal of this homework is to learn to work on group projects. You must describe in your writeup who worked on which parts of the assignment/code.

Late homeworks

You have 4 free late (calendar) days to use on programming assignments 1-5. If you are pair programming, late days are still individual (i.e if one of you has used up late days, and one has not, and you submit a homework late one day, only the student without remaining late days will be penalized). You cannot use late days on PA 6. Once late days are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day. However, no assignment will be accepted more than four days after its due date.

Readings

This class has a significant amount of textbook reading. Most weeks have around 25 textbook pages. The homeworks and exams will be based heavily on the readings.

Final grade computation
  • 63% homeworks (PAs 1-5 are each worth the same, 9% (ignore the different point values for each homework). PA6 is worth 18%, double the others)
  • 11% Midterm 1
  • 11% Midterm 2
  • 15% weekly review quizzes
Final letter grades
  • Some sort of A: 90% and above of the total points (the numerator will include your extra credit, the denominator does not include possible extra credit (otherwise it wouldn't be extra credit))
  • Some sort of B: 80% and above
  • Some sort of C: 70% and above