CS 124: From Languages to Information

Dan Jurafsky
Winter 2026, Tu/Th 3:00-4:20 in Hewlett 200.

This course (which will not be taught in AY 2026-2027, so take it now!) is a broad introduction to LLMs and other algorithms for dealing with text, speech, and networks, including the tasks of tokenizing, classifying, searching, recommending, and transcribing information. Topics include LLMs themselves as well as logistic regression, information retrieval, collaborative filtering, neural networks, transformers, speech recognition, social networks, pagerank, and other large language model components and tools. This class is the broad undergrad intro to numerous grad classes like cs224N/S/U/C/V, cs246, cs276, cs336, and cs329R/X.

Schedule Ed Discussion


*Special Note*: CS124 will not be taught in Winter 2027 (or anytime in AY 2026-2027) because I will be on sabbatical. So if you need to take cs124 before Winter 2028, you should take it next quarter, Winter 2026! I look forward to seeing you!
Also FYI that cs124 has no enrollment cap, so everyone is admitted!

Course Staff

    Dan Jurafsky
    Professor
    Linda Liu
    Head TA
    Belinda Yeung
    TA / Student Liaison
    Amelie Byun
    Course Manager
    Adi Badlani
    TA
    Karan Bhasin
    TA
    Julia Biswas
    TA
    Sri Jaladi
    TA
    Riya Karumanchi
    TA
    Ishan Khare
    TA
    Susan Lee
    TA
    Isabel Sieh
    TA
    Isha Sinha
    TA
    Sunny Yu
    TA
    Esidore Eneinyang
    Ethics TA

Schedule

Week Date Homework Quiz In-class Video Lectures and Readings (to be done by the Monday of the week unless I specify another date)
1 Jan 6 and 8

PA 0: Setup and Tutorial
[starter code]

Due Fri Jan 9, 5:00pm (We'll also go over this in Thursday Jan 8's in-person tutorial )

-
  • Tue Jan 6: Dan in-person Lecture: Intro (required not recorded)

    [slides pptx] [slides pdf]

  • Thurs Jan 8: In-person tutorial: Jupyter notebooks and PA0 (Optional!)
(Optional) Watch before Thursday:
2 Jan 13 and 15

PA 1: Regular Expressions and Tokenization
[starter code]

Due Fri Jan 16, 5:00pm

Quiz 1: Tokenization and N-gram Language Modeling [quiz 1 on gradescope]

Due Tue Jan 13, 11:59pm

    Tue Jan 13: Lab #1: Unix Text Processing and N-gram Language Modeling (Slides: Solutions are generally on the following slide; don't look at each solution til you've done the problem :)
    [Lab 1 pptx] [Lab 1 pdf] [secret_ec.txt]
    Thur Jan 15: In-person Tutorial on NumPy (Optional) [numpy tutorial]
Words and Tokenization Canvas Videos (watch videos before Mon Jan 12) [canvas slides pptx] [ canvas slides pdf]
Edit Distance Canvas Videos (watch videos before Mon Jan 12) [canvas slides pptx] [ canvas slides pdf]
N-gram Language Modeling Canvas Videos (watch before Monday Jan 12) [canvas slides pptx] [canvas slides pdf]
3 Jan 20 and 22

PA 2: Logistic Regression and Text Classification!
[starter code]

Due Fri Jan 23, 5:00pm

Quiz 2: Logistic Regression and Text Classification [quiz 2 on gradescope]

Due Tuesday Jan 20, 11:59pm

    Tue Jan 20: Lab #2: Logistic Regression and Classification (required. You may do at home, but extra credit for in-person)
    watch LR videos beforehand)
    (don't look at the solution until you've completed all the questions!)
    [lab intro pptx] [lab intro pdf]
    [Lab 2] [Lab 2 Solutions]


    Thu Jan 22: No class: extra in-person TA office hours during class time in Hewlett 200


4 Jan 27 and Jan 29

PA 3: Information Retrieval
[starter code]

Due Fri Jan 30, 5:00pm

Quiz 3: Information Retrieval [quiz 3 on gradescope]

Due Tuesday Jan 27, 11:59pm

    Thursday: No class: extra in-person TA office hours during class time in Hewlett 200 (3-4:30) and Hewlett 201 (4:30-6)


5 Feb 3 and 5

PA 4: Embeddings
[starter code]

Due Fri Feb 6, 5:00pm

Quiz 4: Embeddings [quiz 4 on gradescope]

Due Tuesday Feb 3, 11:59pm

  • Tuesday: Dan in-person Lecture (required and not recorded): "Social NLP/ NLP for Computational Social Science"
    [slides pdf] [slides pptx]-

    • Thursday: No class: extra in-person TA office hours during class time in Hewlett 200
6 Feb 10 and 12

PA 5: Neural Networks

Due Fri Feb 13, 5:00pm.

Quiz 5: Neural Networks [quiz 5 on gradescope]

Due Tue Feb 10, 11:59pm

Tuesday: Dan live Lecture (required and not recorded): "LLMs and Transformers!"

Thursday: No class: extra in-person TA office hours during class time in Hewlett 200




7 Feb 17 and 19

PA 6a: Transformers

Due Fri Feb 20, 5:00pm

Quiz 6: Transformers

Due Tue Feb 17, 11:59pm.

Tuesday: Dan live Lecture (not recorded): "Speech Processing" (attendence is optional/extra credit, you can choose to read the chapters instead)

Thursday: No class: extra in-person TA office hours during class time in Hewlett 200 (3-4:30) and Hewlett 201 (4:30-6)


8 Feb 24 and Feb 26 PA 6b: Speech

Due Wed Feb 25, 5:00pm

Quiz 7: Speech

Due Tue Feb 24, 11:59pm.

Tuesday: Lab #4: PA7 and Git (required in-person)

Thursday: No class: extra in-person TA office hours during class time in Hewlett 200


9 Mar 3 and 5

PA 7: Chatbot

Due Wed Mar 11, 5:00pm

Quiz 8: Recommendation Systems

Due Tues Mar 3, 11:59pm

Tuesday Lab #5: Collaborative Filtering and Ethical Use of LLMs in the Classroom (required. You may do at home, but extra credit for in-person)


Thursday: No class: extra in-person TA office hours during class time in Hewlett 200

10 Mar 10 and 12

Reminder: PA 7: Chatbot due Wed Mar 11, 5:00pm

Quiz 9: Pagerank and Networks

Due Tues Mar 10, 11:59pm

Tuesday: Dan Live Lecture (required and not recorded)



Thursday: No class (but no extra office hours)
Web graphs, Links, and PageRank (watch by Mon Mar 9) [slides pptx] [slides pdf]
  • MR+S Chapter 21: Link Analysis, just pages 421-433 (Skip section 21.3 and 21.4)
Social Networks Canvas Videos (watch by Mon Mar 9) [slides pptx] [slides pdf]

Logistics

Instructor
Dan Jurafsky (jurafsky@stanford.edu)
Office: Margaret Jacks 117
Office Hours: Book an appointment for Thursdays 3-4:20 (except week 2, in which case they are Tues Jan 13, 4:30-5:30).

TA Office Hours
Google Calendar (Subject to change)
Class Time

Tuesday and Thursday 3:00-4:20

Attendance

See the note at the top of the page.

Email

Alas, we can't reply to email sent to individual staff members. If you have a question that is not confidential or personal, post it on the Ed Discussion forum! Responses are quicker and you'll also be helping others with the same question! To contact the teaching staff directly, come see us in office hours!

If that is not possible, you can also email (non-technical questions) to the course staff list, cs124_requests@lists.stanford.edu. For urgent requests: We check the staff email list very frequently, but please don't worry if you don't hear from us right away. We will do our best to get back to you within a day or so. Just make sure to send an email as soon as you have the request so it's timestamped!

If you have a matter to be discussed privately, come to office hours or use cs124_requests@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.

Class announcements will be on Ed Discussion (although we will occasionally try Canvas and mailing lists). We will assume that everyone reads all announcements.

Honor Code

Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set, or to interfere in any way with programming assignment scoring or tampering with the submit script. It's also an honor code violation to use ChatGPT or any automatic coding system to write your code for you. You should use language models like you use a TA, to improve your understanding. You may not paste code directly from an LLM into your programming assignment.

Unlike prior years, students are allowed to collaborate on quizzes. However, you must each do your own work and only discuss after; each person will be uploading their own work in addition to the answers.

CS124 follows the general Stanford policy on generative AI which is that "use of or consultation with generative AI shall be treated analogously to assistance from another person. In particular, using generative AI tools to substantially complete an assignment or quiz (e.g. by entering quiz or assignment questions) is not permitted", just as having someone do your homework or quizzes for you is not permitted.

Textbook
Course Description

This course is a broad introduction to LLMs and other algorithms for dealing with text, speech, and networks online, including the tasks of tokenizing, classifying, searching, recommending, and transcribing information. Topics include LLMs themselves as well as logistic regression, information retrieval, collaborative filtering, neural networks, transformers, speech recognition, social networks, safety and bias, and other large language model components, tools, and issues. This class is the broad undergrad intro to numerous grad classes like cs224N/S/U/C/V, cs246, cs276, cs336, and cs329R/X.

Prerequisites

CS106B, Python (at the level of CS106A), CS109 (or equivalent background in probability), and programming maturity and knowledge of UNIX equivalent to CS107 (or taking CS107 or CS1U concurrently).

Required Work

From Languages to Information is a flipped class with much of the material online. Before class I recorded all the lectures (except 5 live lectures), and you can watch them at home. The weekly quizzes and programming homeworks will be automatically uploaded and graded. Lectures are available in the Modules section on Canvas. Quizzes and homeworks are on Gradescope and github, but you can find them all on this webpage!!
Prerecorded Video Lectures

Most weeks, we will ask you to watch a set of video lectures (2 to 2.5 hours total). Most videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos but the embedded quizzes are not counted toward the final grade.

In-class Lectures

5 lectures will be live, and are required (except Week 7 Speech lecture is only Strongly Recommended). For all 5 the material will be on the quizzes.

Labs

There are 5 in-class labs are in which we do group problem-solving activities. The labs are required and will be tested on the quizzes, meaning that if you can't make a particular in-person lab, you must still do the exercises at home instead. But Lab 1 and Lab 4 are required to be attended in-class; the other 3 you can do at home. In-person for Labs 2, 3, 5 is worth extra credit.

Automated Review Quizzes

After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5-6 questions) on the content that you just learned. These quizzes are not timed, they are open book, and they may be attempted an infinite number of times. The questions, as well as the options for each question, may change and be randomly selected from a larger pool each time you take a quiz. You will not see your quiz grade/correct answers until after the due date, but the system will take the the score from the last submission of all your infinitely-allowed submissions for the quiz. So if you worry you might have got something wrong, just submit another one! Review Quizzes for each week are due 11:59pm Tuesday of the following week There are no late days for review quizzes. Because of the strict no-late-day policy, we will drop your lowest scoring quiz (i.e. we will only count your best 8 of the 9 quizzes in your final grade).

Can I work with my friends on the quiz? Yes, you can work with your pair programming partner. But you must each do the problem yourselves, and only then discuss with your partner, and you each submit separately (and you will have to show your own work in the "show your work" section when you upload the quiz answers).

Class Participation

You have to watch all lectures, and attendance for the 5 live lectures is required (except for the Speech lecture, which is only Strongly Recommended). The labs are required and we will test material from them on the quizzes, and labs 1 and 4 must be attended in person. However, attendance for labs 2,3,4 is only strongly recommended; you may do them yourself at home if you really cannot come to class. You can get extra credit for class participation and other things by: Coming to labs 2/3/5 in person; particularly helpful answers on the class Ed forum, helping out other students in office hours or labs, being the first person to find typos in the textbook (not counting bugs in figure or chapter numbering, since those don't appear in the full book), speaking up in the labs. Plus there will be extra credit problems on some of the labs and possibly PAs.

Programming Assignments

7 Python programming assignments. All are due Fridays at 5pm except the last one, PA7, is due on a Wednesday at 5pm.

Programming Assignment Collaboration for PA 1-6: You may talk to anybody you want about the assignments and bounce ideas off each other. And if you want, you can also choose a partner and do pair programming for PA 1-6. Pair programming has many advantages for learning!!! You and your pair-partner can discuss code, but it's important that each of you work on each part of the assignment so that you're comfortable with the whole assignment, since assignments build on each other (and we will test concepts from the assignments on the quizzes). If you choose to pair-program, you should specify in the submission who your partner is. We will use the normal automatic checks for overlap between your code and other students' code who are not your pair partner. You must describe in your writeup exactly who did what in your code.

Programming Assignment Collaboration for PA 7: PA7 is a group homework that must be done in groups. You will work together with your group, and write code together. Groups must be of size 3 or 4. To work in a group of size 2, you must get special permission from the staff. You cannot work by yourself on PA 7, because part of the goal of this homework is to learn to work on group projects. You must describe in your writeup in detail exactly who in your group did what, and who worked on which parts of the assignment/code.

Late homeworks

You have a total of 4 free late (calendar) days to use on programming assignments 0-6. If you are pair programming, late days are still individual (i.e if one of you has used up late days, and one has not, and you submit a homework late one day, only the student without remaining late days will be penalized). You cannot use late days on PA 7. Once late days are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day. However, no assignment will be accepted more than four days after its due date.

Readings

This class has a significant amount of textbook reading. Most weeks have around 25 textbook pages. The homeworks and quizzes are based heavily on the readings.

Final grade computation
  • 73% homeworks (PAs 1-6 are each worth the same, 9% (ignore the different point values for each homework). PA7 is worth 18%, double the others, PA0 is worth 1%.)
  • 27% weekly review quizzes, each identically worth 27%/8, because the lowest quiz is dropped)
Final letter grades

(the numerator will include your extra credit, the denominator does not include possible extra credit (otherwise it wouldn't be extra credit))
  • A+: It is very easy to get an A in this class but hard to get an A+. For an A+ you must do all of the following:
    • Have perfect scores on all the PAs and quizzes
    • Have perfect attendance at 10 Tuesday classes (that means all lectures and labs, i.e., even the non-required labs)
    • Have given at least 5 substantive and helpful answers to students on the class Ed forum
    • Have turned in and gotten credit on extra credit problems on at least 3 of the labs, quizzes, or PAs
  • A: 93% and above of the total points
  • A-: 90% and above of the total points
  • B+: 87% and above of the total points
  • B: 83% and above of the total points
  • B-: 80% and above of the total points
  • C+: 77% and above of the total points
  • C: 73% and above of the total points
  • C- (= Credit): 70% and above of the total points
  • etc.