Oct 08, 2017: Online office hourThe weekly online office hour on M 11-1 will be via Zoom. You should install the software beforehand, see Stanford Zoom.
Oct 06, 2017: Finding a groupLooking ahead to the Kaggle project, two students in the class did a website that allows you to find other students for study groups based on interest, dorm, major etc. The link is www/schedfriend.com . Alternatively, you can use the Search for Teammates! post which is on top of the left side in piazza.
Sep 22, 2017: LabsOccasionally we will post links to "labs" which supplement the day's lecture. These labs feature code and output produced by the course staff to illustrate a concept. For instance, Lab 2 (under the Lectures tab) shows you how we generated the bias-variance decomposition example in lecture 2. Feel free to read through the lab to improve your understanding and to try your hand at recreating or modifying our examples.
Stats 202 meets MWF 9:30-10:20 am in Skilling 80.
All lectures will be recorded on video by the Stanford Center for Professional Development and posted on their site.
Lecture slides will be posted on this site (see the Lectures link on the left).
Stats 202 is an introduction to Data Mining. By the end of the quarter, students will:
Introductory courses in statistics or probability (e.g., Stats 60), linear algebra (e.g., Math 51), and computer programming (e.g., CS 105).
The vast majority of questions about homework, the lectures, or the course should be asked on our Piazza forum, as others will benefit from the responses. You can join the Piazza forum using the link www.piazza.com/stanford/fall2017/stats202. We strongly encourage students to respond to one another's questions!
Questions from which others cannot benefit can be emailed to the staff mailing list email@example.com.
Personal staff email addresses should only be used for sensitive matters (e.g., concerns about specific course staff).
Consult this table for up-to-date office hour information. There is one weekly online office hour via Zoom. You should install the software beforehand, see Stanford Zoom.
|Instructor||Guenther Walther||MTh 10:30-11:30am, or by appointment||Sequoia 135|
|TA||Michael Feldman||M 3-5 pm||380-381T|
|TA||Jelena Markovic||Th 4-6 pm||420-245|
|TA||Paulo Orenstein||W 2-4 pm||Sequoia 235|
|TA||Junyang Qian||F 12.30-2.30 pm||Sequoia 204|
|TA||Feng Ruan||F 3-5 pm||Sequoia 225|
|TA||Andy Tsao||M 11-1||Zoom Meeting|
|TA||Elena Tuzhilina||T 3.30-5.30 pm||420-147|
|TA||Yiguang Zhang||TTh 9.30-10.30 am||Sequoia 105|
The only textbook required is An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (Springer, 1st ed., 2013).
The book is available at the Stanford Bookstore and free online through the Stanford Libraries.
We may occasionally assign (optional) supplementary readings from the optional text The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman (Springer, 2nd ed.).
In our lecture notes, the abbreviation ISL = Introduction to Statistical Learning and ESL = Elements of Statistical Learning.
(If you are an online SCPD student, please see SCPD info for more information on remote exam instructions and timings.)
If you cannot take these exams at those dates then you will need to take this class in a different quarter. There will be no alternative dates for these exams unless by official university business such as certain athletic commitments. If you do better on the final than on the midterm then the final supersedes the midterm.
There will be 7 graded homework assignments, due on Wednesdays at the start of class. An ungraded assignment (Homework 0) will help you install and become familiar with the tools used in this course. The homework assignments and staff solutions will be posted on this website and will be accessible by enrolled students (see the Homework link on the left).
After attempting homework problems on an individual basis, you may discuss a homework assignment with up to two classmates. However, you must write up your own solutions individually and explicitly indicate with whom (if anyone) you discussed the homework problems at the top of your homework solutions. In your solutions, please show your work and include all relevant code written. Please also keep in mind the university honor code.
This quarter, we will be using the Gradescope online submission and scoring system for all homework submission. Gradescope will send a Stats 202 enrollment notification to your Stanford email address. If you have not received such a notification by Thursday Sep. 28, please contact the course staff via the staff mailing list.
Your problem sets should be submitted as PDF or image files through Gradescope. Here are some tips for scanning and submitting through Gradescope.
Any regrade requests should be submitted through Gradescope within one week of receiving your grade. Please, read the relevant solutions and review the relevant course material prior to sending a request and specify (1) the part(s) of the homework you believe were wrongly graded and (2) why you deserve additional credit. We will typically regrade the entirety of any homework for which any regrade is requested and the resulting score may be higher or lower than the original one.
Late homework will not be accepted, but the lowest homework score will be ignored.
An important part of the class will be an in-class prediction challenge hosted by Kaggle. This competition will allow you to apply the concepts learned in class and develop the computational skills to analyze data in a collaborative setting.
To learn more about the competition see the link on the left.
|Mon 9/25||Class logistics, HW 0||HW 0 out|
|Wed 9/27||Supervised and unsupervised learning||2||HW 1 out|
|Fri 9/29||Principal components analysis||10.1,10.2,10.4||HW 0 due|
|Mon 10/02||Clustering||10.3, 10.5|
|Wed 10/04||Linear regression||3.1-3.3||HW 1 due, HW 2 out|
|Fri 10/06||Linear regression||3.3-3.6|
|Mon 10/09||Classification, logistic regression||4.1-4.3|
|Wed 10/11||Linear discriminant analysis||4.4-4.5||HW 2 due, HW 3 out|
|Fri 10/13||Classification lab||4.6|
|Mon 10/16||Cross validation||5.1|
|Wed 10/18||The Bootstrap||5.2-5.3||HW 3 due, HW 4 out|
|Fri 10/20||Regularization||6.1, 6.5|
|Wed 10/25||Shrinkage lab||6.6||HW 4 due|
|Fri 10/27||Dimension reduction||6.3, 6.7|
|Mon 10/30||Midterm exam|
|Wed 11/01||Splines||7.1-7.4||HW 5 out|
|Fri 11/03||Smoothing splines, GAMs, Local regression||7.5-7.7|
|Mon 11/06||Non-linear regression lab||7.8|
|Wed 11/08||Decision trees||8.1, 8.3.1-2||HW 5 due, HW 6 out|
|Fri 11/10||Bagging, random forests, boosting||8.2, 8.3.3-4|
|Mon 11/13||Support vector machines||9.1-9.2|
|Wed 11/15||Support vector machines||9.3-9.5||HW 6 due, HW 7 out|
|Fri 11/17||Support vector machines lab||9.6|
|Mon 11/27||Non-linear dimensionality reduction|
|Wed 11/29||Wavelets||HW 7 due|
|Fri 12/01||Data scraping, data wrangling|
|Mon 12/04||Web visualizations|
|Wed 12/06||Final review||All chapters||Kaggle deadline|
|Fri 12/08||Final review||All chapters|
|Tue 12/12||Final exam|