The class must be taken synchronously, so that you have the ability to come to all the 10 in-person class sessions; we have found over the years that the group works and group discussions are necessary and result in everyone learning more. But the 10 class sessions are treated differently. The Intro and Outro lectures are required to be attended, as is the first group work. But showing up for the other 4 group works and the 3 tutorial sessions are technically optional, meaning that I won't take roll. However, you still must do the 5 group works even if you don't come those days, and I strongly recommend that you come in person. To emphasize this, we'll give extra credit for coming to the two required lectures and the 5 group works (but not for the tutorials, come to those if you need the info or want to help other students).
However, if you have a medical reason why you cannot take the class synchronously,
or because you are required to take another class at the same time
and must take the class for a requirement or else you cannot graduate this quarter,
you can ask Dan for explicit permission to take the class asynchronously.
The in-person material is not recorded, so if you are taking the class asynchronously,
you must do the group work yourself at home, but won't get the extra credit that the folks who come in person will get. (You have 24 hours to take the midterm, so that's not an issue). Note: this is only for people with medical or must-graduate-this-quarter excuses.
Is 106B the only prereq? Do I need 109 or 221 before I take CS124? :
106B is the only prereq. Taking the course as a sophomore is recommended, but we also
get lots of juniors and a reasonable number of frosh; the course is designed to be taken early in your Stanford career. It will help if you have at least done some programming beyond 106B, and is also useful to have had 107 or Math 51, but not required; we'll try to give you pointers to places to make up missing background.
Can I take this course as a non-CS grad student?:
Yes, although this course is not appropriate for CS grad students (there are graduate versions of all the material in this course), it's very commonly taken by PhD students in the social sciences or humanities who plan to use text processing methods in their research.
More details? below. You are responsible for reading this entire syllabus before the 2nd day of
class, January 12! (Note that the deadlines for PAs and quizzes are regular until the first midterm, after which I give you a bit of extra time.) Also note that there is no textbook, we will be using free online pdf chapters mostly here
Who do I email if I have questions?
Email the course management staff here (this includes Dan and the head TA and the course manager and coordinator), including about missing classes, or about personal issues like OAE:
cs124_requests@lists.stanford.edu
Most important: Have fun and learn lots!!!!
Week | Date | Homework | Quiz | In-class | Video Lectures and Readings (to be done by the Monday of the week unless I specify another date) |
---|---|---|---|---|---|
1 | Jan 10, 12 |
PA 0: Setup and Tutorial
Due Fri Jan 13, 5:00pm (Ungraded/optional for those who haven't done Jupyter before; we'll go over this in Thursday Jan 12's in-person tutorial ) | - |
|
|
2 | Jan 17 and 19 |
PA 1: Regular Expressions
Due Fri Jan 20, 5:00pm |
Quiz 1: Text Processing/Edit Distance [gradescope] Due Tue Jan 17, 11:59pm |
[group work 1 pptx] [group work 1 pdf] [group work 1 reference] [solutions] [secret_ec.txt]
|
Edit Distance Canvas Videos (watch videos before Mon Jan 16) [canvas slides pptx] [canvas slides pdf]
|
3 | Jan 24 and 26 |
PA 2: Naive Bayes and Sentiment Analysis!
Due Fri Jan 27, 5:00pm |
Quiz 2: Language Modeling/Naive Bayes [gradescope] Due Tuesday Jan 24, 11:59pm |
(watch NB videos beforehand) (don't look at the solution until you've completed all the questions!) [group work 2] [solutions]
|
Language Modeling Canvas Videos (watch before Monday Jan 23) [canvas slides pptx] [canvas slides pdf]
Naive Bayes and Text Classification Canvas Videos (watch before Monday Jan 23) [canvas slides pptx] [canvas slides pdf]
|
4 | Jan 31 and Feb 2 |
PA 3: Logistic Regression!
Due Fri Feb 3, 5:00pm |
Quiz 3: Logistic Regression [gradescope] Due Tuesday Jan 31, 11:59pm |
|
|
5 | Feb 7 and 9 |
PA 4: Information Retrieval Due Fri Feb 10, 5:00pm |
Quiz 4: Information Retrieval Due Tuesday Feb 7, 11:59pm |
|
Chris Manning Canvas Video: Information Retrieval (I) (watch/read before Monday Feb 6)
[slides pptx]
[slides pdf]
|
6 | Feb 14 and 16 |
PA 5: Embeddings and Vector Semantics Due Tue Feb 21, 5:00pm |
Quiz 5: Vector Semantics and Sequence Labelling Due Tue Feb 14, 11:59pm |
Tuesday: Review for First Midterm (online)
Thursday: First Midterm (online) |
|
7 | Feb 21 and 23 |
PA 6: Neural Networks Due next week! Tues Feb 28, 5:00pm |
Quiz 6: Neural Networks Due Fri Feb 24, 5:00pm. Not at midnite, and not this Tuesday, you get 2.5 extra days!!! |
Tuesday: No class: extra in-person office hours during class time in Hewlett 200
Thursday Group Work 4: Large Language Models |
|
8 | Feb 28 and Mar 2 |
PA 7: Chatbot Due Fri Mar 10, 5:00pm |
Quiz 7: Chatbots Due Fri Mar 3, 5:00pm, not at midnite, and not on Tuesday |
Tues: No class: extra in-person office hours during class time in classroom Thursday: In-person walkthrough of PA7, plus a tutorial on Git and Team Coding |
Recommender systems and Collaborative Filtering Canvas videos (watch by Monday Mar 6)
Additional (optional) reading for those looking for more on this topic!: |
9 | Mar 7 and 9 |
Reminder: PA 7: Chatbot due Fri Mar 10, 5:00pm |
Quiz 8: Recommendation Systems Due Tues Mar 7, 11:59pm |
Tuesday Group Work 5: Smartphone Chatbots
Thursday: Live Lecture: NLP for Social Good |
Web graphs, Links, and PageRank (watch by Mon Mar 13)
Social Networks Canvas Videos (watch by Mon Mar 13)
|
10 | Mar 14 and 16 |
Quiz 9: Pagerank and Networks Due Tues Mar 14, 11:59pm |
Tuesday: Review for Second Midterm (online) Thursday: Second Midterm (online) |
Tuesday and Thursday 3:00-4:20
We require you come to the 2 live lectures and strongly strongly recommend the 5 in-person group works, you will learn more from doing them with other people (I won't require attendance at the group works but I will give extra credit for attending). For any group work in-person class you miss, you must still do them at home yourself. The course can be taken asynchronously only if you have permission from Dan due to a required conflict or medical issue. Also: different people learn better from different combinations of videos/lectures, reading the chapters, coming to the live group exercises in Hewlett 200, and coming to office hours. But I will say that students who do all five tend to do the best on the exams and in the course in general.
Alas, we can't reply to email sent to individual staff members. If you have a question that is not confidential or personal, post it on the Ed Discussion forum! Responses are quicker and you'll also be helping others with the same question! To contact the teaching staff directly, come see us in office hours! If that is not possible, you can also email (non-technical questions) to the course staff list, cs124_requests@lists.stanford.edu If you have a matter to be discussed privately, come to office hours or use cs124_requests@lists.stanford.edu to make an appointment. For grading questions, please talk to us after class or during office hours.
Class announcements will be on Ed Discussion (although we will occasionally try Canvas and mailing lists). We will assume that everyone reads all announcements.
Since we occasionally reuse homeworks from previous years, we expect students not to copy, refer to, or look at the solutions in preparing their answers. It is an honor code violation to intentionally refer to a previous year's solutions. This applies both to the official solutions and to solutions that you or someone else may have written up in a previous year. It is also an honor code violation to find some way to look at the test set, or to interfere in any way with programming assignment scoring or tampering with the submit script. It's also an honor code violation to use ChatGPT or any automatic coding system to write your code for you.
Since quizzes are a form of assessment, students are not allowed to collaborate on completing quizzes. It is an honor code violation to discuss quiz questions with other students.
Extracting meaning, information, and structure from human language text, speech, web pages, social networks. Introducing methods (string algorithms, edit distance, language modeling, machine learning, logistic regression, neural networks, neural embeddings, inverted indices, collaborative filtering, PageRank), applications (chatbots, sentiment analysis, information retrieval, text classification, social networks, recommender systems), and ethical issues.
CS106B. CS 107 can be helpful, but is fine if you haven't had it, we'll cover the required UNIX material. Math 51 can also be helpful, but isn't required, since we will introduce the basic vectors knowledge we need in the class.
Most weeks, we will ask you to watch a set of video lectures (2 to 2.5 hours total). Most videos will have some in-video questions embedded in them, which you should answer. You are required to watch the videos but the embedded quizzes are not counted toward the final grade.
2 lectures will be live, and are required!
5 in-class sessions are for group problem-solving activities. The group works are required and will be tested on the quiz, meaning that if you can't make a particular in-person group work, you must still do the exercises at home instead. Group work 1 is required to be attended in-class. Previous students who did well in the class have reported that doing the group exercises in-class have been extremely useful.
After watching a week's video lectures, we will ask you to answer an open-notes, open-book review quiz (about 5 questions) on the content that you just learned. These quizzes are not timed, they are open book, and they may be attempted an infinite number of times. The questions, as well as the options for each question, are randomly selected from a larger pool each time you take a quiz. You will not see your quiz grade/correct answers until after the due date, but the system will take the the score from the last submission of all your infinitely-allowed submissions for the quiz. So if you worry you might have got something wrong, just submit another one! Review Quizzes for each week are due 11:59pm Tuesday of the following week (except that Quiz 6 and Quiz 7 are due on the Friday instead of the Tuesday). There are no late days for review quizzes. We will drop your lowest scoring quiz (i.e. we will only count your best 8 of the 9 quizzes in your final grade).
You have to watch all lectures, and attendance for the 2 live lectures is required. The group works are required and we will test material from them on the 2 midterms. however, attendance for group work sessions is only strongly recommended; you may do them yourself at home if you really cannot come to class. You can get extra credit for class participation and other things by:: Coming to the 2 live lectures and the 5 group works; helpful answers on the class forum, helping out other students in office hours or group work sessions, being the first person to find typos in the textbook (not counting bugs in figure or chapter numbering), speaking up in the group work sessions. Plus there will be extra credit problems on the two quizzes and also on PA7.
7 Python programming assignments. PA 1-4 are due at 5:00pm on the Friday it is due; PA5 and PA6 are due on Tuesdays, still at 5:00pm. PA7 is back to Friday.
Programming Assignment Collaboration for PA 1-6: You may talk to anybody you want about the assignments and bounce ideas off each other. And if you want, you can also choose a partner and do pair programming for PA 1-6. You and your pair-partner can discuss code, but it's important that each of you work on each part of the assignment so that you're comfortable with the whole assignment, since assignments build on each other (and we will test concepts from the assignments on the midterms). If you choose to pair-program, each of you must still submit your own program, and should specify in the submission who your partner is. We will use the normal automatic checks for overlap between your code and other students' code who are not your pair partner.
Programming Assignment Collaboration for PA 7: PA7 is a group homework that must be done in groups. You will work together with your group, and write code together. Groups must be of size 3 or 4. To work in a group of size 2, you must get special permission from the staff. You cannot work by yourself on PA 7, because part of the goal of this homework is to learn to work on group projects. You must describe in your writeup who worked on which parts of the assignment/code.
You have 4 free late (calendar) days to use on programming assignments 1-6. If you are pair programming, late days are still individual (i.e if one of you has used up late days, and one has not, and you submit a homework late one day, only the student without remaining late days will be penalized). You cannot use late days on PA 7. Once late days are exhausted, any PA turned in late will be penalized 20% per late day. Each 24 hours or part thereof that a homework is late uses up one full late day. However, no assignment will be accepted more than four days after its due date.
This class has a significant amount of textbook reading. Most weeks have around 25 textbook pages. The homeworks and exams will be based heavily on the readings.