Dec 5, 2014:
Congratulations to everyone who participated in the Kaggle challenge! Here is the solution file in case you feel like trying out more predictions.
Oct 13, 2014:
One of your classmates has generously written a tutorial on the R data structure data.table. This is an alternative to data.frames which makes it much easier and faster to manipulate data (reshaping, grouping, merging, defining new variables, filling in missing data, etc.). Some of you might find it very useful. Check it out here.
Oct 8, 2014:
All regrade requests should be submitted to email@example.com. Please, read the solutions to the homework before sending a request and specify (1) the part(s) of the homework you believe were wrongly graded and (2) why you deserve full or partial credit.
Sep 22, 2014:
If you have questions about homework or any of the lectures, please use our Piazza forum. You may join it using the link: www.piazza.com/stanford/fall2014/stats202.
Any other questions can he emailed to the staff mailing list: firstname.lastname@example.org. Please, do not use personal email addresses unless strictly necessary.
Stats 202 meets MWF 9:00-9:50 am at Gates B01.
All lectures will be recorded on video by the Stanford Center for Professional Development and posted here.
Stats 202 is an introduction to Data Mining. By the end of the quarter, students will:
Consult this table for up-to-date office hour information.
|Instructor||Sergio Bacallado||Wednesday 2:00-4:00 pm||Sequoia 207|
|TA||Julia Fukuyama||M 5:30-6:30pm, Tu 6-7pm, (Th Dec 4 5:30-6:30pm)||Skype, id julia.fukuyama|
|TA||Jiyao Kou||Mo 12-2 pm, (Fr Dec 5, 2-3 pm, @Sequoia 105)||Sequoia 207|
|TA||Jian Li||Wed 10:30am-12:29 pm (Fr Dec 5, 11am-12pm)||Hewlett 101|
|TA||Linxi Liu||Tu 1:00-3:00 pm, (Th Dec 4, 11am-12pm)||Sequoia 227|
|TA||Kris Sankaran||Tuesday 3:00-5:00 pm (+ skype session Fr Dec 5, 6pm - 7pm)||Sequoia 105|
The only textbook required is An Introduction to Statistical Learning with applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (Springer, 1st ed., 2013).
The book is available at the Stanford Bookstore and free online through the Stanford Libraries. A hard copy of the book is in the reserves of the Mathematics and Statistics Library.
If for extenuating circumstances you cannot take the midterm on October 27, you must email us by October 15. Since the midterm is during class, we cannot guarantee an opportunity to make it up.
The final exam is mandatory. If you cannot take it on the time indicated above, please drop the class.
SCPD students will have to complete each exam in the amount of time specified and return it to SCPD within 24 hours of the time of the exam at Stanford.
There will be 7 graded homework assignments, due at the start of class on the day indicated.
This quarter, we will be trying an online submission and scoring system called Scoryst, which was developed by Stanford students. Homeworks will be submitted as PDF files on this website. Enroll in our site using the link on the header of each homework.
Late homework will not be accepted, but the lowest homework score will be ignored.
An important part of the class will be a quarter-long prediction challenge hosted by Kaggle. This competition will allow you to apply the concepts learned in class and develop the computational skills to analyze data in a collaborative setting.
To learn more about the competition see the link on the left.
The 3 teams who obtain the highest score in the Kaggle competition will be given the option of not taking the final exam (!). Their class grade would be based on midterm and homework scores alone.
|Mon 9/22||Class logistics, HW 0||HW 0 out|
|Wed 9/24||Supervised and unsupervised learning||2||HW 1 out|
|Fri 9/26||Principal components analysis||10.1,10.2,10.4||HW 0 due|
|Mon 9/29||Clustering||10.3, 10.5|
|Wed 10/01||Linear regression||3.1-3.3||HW 1 due, HW 2 out|
|Fri 10/03||Linear regression||3.3-3.6|
|Mon 10/06||Classification, logistic regression||4.1-4.3|
|Wed 10/08||Linear discriminant analysis||4.4-4.5||HW 2 due, HW 3 out|
|Fri 10/10||Classification lab||4.6|
|Mon 10/13||Cross validation||5.1|
|Wed 10/15||The Bootstrap||5.2-5.3||HW 3 due, HW 4 out|
|Fri 10/17||Regularization||6.1, 6.5|
|Wed 10/22||Shrinkage lab||6.6||HW 4 due|
|Fri 10/24||Dimension reduction||6.3, 6.7|
|Mon 10/27||Midterm exam|
|Wed 10/29||Splines||7.1-7.4||HW 5 out|
|Fri 10/31||Smoothing splines, GAMs, Local regression||7.5-7.7|
|Mon 11/03||Non-linear regression lab||7.8|
|Wed 11/05||Decision trees||8.1, 8.3.1-2||HW 5 due, HW 6 out|
|Fri 11/07||Bagging, random forests, boosting||8.2, 8.3.3-4|
|Mon 11/10||Support vector machines||9.1-9.2|
|Wed 11/12||Support vector machines||9.3-9.5||HW 6 due, HW 7 out|
|Fri 11/14||Support vector machines lab||9.6|
|Mon 11/17||Prediction with time series|
|Wed 11/19||Prediction with relational data||HW 7 due|
|Fri 11/21||Data scraping, data wrangling|
|Mon 12/01||Web visualizations|
|Wed 12/03||Final review||All chapters||Kaggle deadline|
|Fri 12/05||Final review||All chapters|
|Mon 12/08||Final exam|
Some important dates: