A note from Prof. Jennifer Widom, June 2020:
This was the last offering of CS 102. Congratulations to the students who were able to persevere through a pandemic and horrific racism to complete the course and gain some mastery of working with data, and a big thanks to the teaching assistants for their tremendous efforts. I'm hopeful that within a few years Stanford will offer a cohesive curriculum in data science. In the meanwhile, all of the material from CS 102, including Jupyter notebooks and data sets, is being kept current on the website of Prof. Widom's Instructional Odyssey.
Aimed at non-CS undergraduate and graduate students who want to learn a variety of tools and techniques for working with data. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing data sets. This course provides a broad and practical introduction to working with data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.
Tuesdays & Thursdays 1:30-2:50 PM
Delivered via Zoom. Link found on Canvas.
We prefer students participate live, but lectures will also be recorded.
The five TAs hold office hours throughout the week, and Professor Widom's office hours are usually Wednesdays 4:00-5:00 PM. All office hours are via Zoom, with each week's times and links posted on the course calendar. For TA office hours logistics, please refer to this Piazza post.
Some Wednesdays, 11:30-12:30PM
Bootcamp sessions are recorded and made available afterwards on Canvas. They provide extra setup help for the tools we are using, and additional programming examples for those who may have a weaker background in programming or seek additional practice.
There are 5 assignments, 2 projects, and 2 exams. The final grade is an equal weighting on composite scores for assignments, projects, and exams, i.e., 33.3% each for the the 5 homework assignments (weighted equally), the 2 projects (weighted equally) and the 2 exams (weighted equally). In spring quarter 2020, all courses are graded on an S/NC basis. We will compute a letter grade for each student -- all students who receive a C- or better will be assigned a grade of S, while D+ and below will be assigned a grade of NC.
Exams are held during the class period; see syllabus below for dates. Please make sure you will be available for both of the exam dates. Alternate times (but not dates) may be possible by petition for extenuating circumstances.
Please use Piazza for all questions related to the course. We use Piazza as our primary portal for course-related announcements, so make sure to sign up! For all Piazza posts, we guarantee that we will respond within 24 hours. Also check out the list of frequently asked questions.
Date | Topic and Assignments | Readings/References | Notes |
---|---|---|---|
Tue April 7 |
Introduction & Course Logistics Working with Data - Overview |
Introductory Readings |
Course Information Working With Data - Overview Slides |
Wed April 8 | Bootcamp: Google Sheets Setup | Google Sheets Setup Instructions | |
Thu April 9 |
Working with Data - Overview (cont'd) Data Analysis & Visualization Using Spreadsheets |
Google Spreadsheets References |
Data Analysis Using Spreadsheets Slides Spreadsheet Analysis Notes (Part 1) |
Mon April 13 |
Assignment #1:
Spreadsheets
Project #1: Personal Data Analysis |
||
Tue April 14 | Data Analysis & Visualization Using Spreadsheets (cont'd) | Spreadsheet Analysis Notes (Part 2) | |
Wed April 15 | Bootcamp: Tableau Setup | Tableau Setup Instructions | |
Thu April 16 |
Data Analysis & Visualization Using Spreadsheets (cont'd) Advanced Data Visualization Using Tableau |
Common Visualization Mistakes Tableau References |
Data Visualization Using Spreadsheets Slides Data Visualization Using Spreadsheets Notes Advanced Data Visualization Using Tableau Slides Advanced Data Visualization Using Tableau Notes |
Mon April 20 | Bootcamp: Instabase Setup | Instabase Setup Instructions | |
Mon April 20 |
Assignment #1 due Assignment #2: Tableau, SQL |
||
Tue April 21 | Relational Databases and Basic SQL |
SQL References Project Jupyter home page |
Relational Databases and SQL Slides Basic SQL Notes |
Thu April 23 | Advanced SQL | Advanced SQL Notes | |
Mon April 27 |
Project #1 proposal due |
||
Tue April 28 |
Introduction to Python Python for Data Analysis & Visualization |
Python References |
Python for Data Analysis & Visualization Slides Python Basics Notes Python Data Notes |
Wed April 29 | Bootcamp: SQL | SQL Bootcamp Slides | |
Thu April 30 | Python for Data Analysis & Visualization (cont'd) |
Pandas References |
Python Pandas Notes |
Thu April 30 |
Assignment #2 due Assignment #3: Python |
||
Tue May 5 | Python for Data Analysis & Visualization (cont'd) | PyPlot Tutorial | Python Plotting Notes |
Wed May 6 | Bootcamp: Python | Bootcamp Notebooks | |
Thu May 7 | Machine Learning - Regression | ML References - Regression |
Regression Slides Regression Notes |
Sat May 9 |
Assignment #3 due (no late submissions) |
||
Tue May 12 | Exam #1 | ||
Thu May 14 | Machine Learning - Classification and Clustering | ML References - Classification and Clustering |
Classification Slides Clustering Slides Classification & Clustering Notes |
Mon May 18 |
Project #1 due Assignment #4: Machine Learning, R Project #2: Movie-Rating Predictions |
The Netflix Prize | |
Tue May 19 | Using Python for Machine Learning | ML References - Python | Python Machine Learning Notes |
Thu May 21 | The R Language - Data Analysis, Visualization, and Machine Learning |
R Tutorial Choosing R or Python for data analysis? An infographic |
R Slides R Notes |
Fri May 22 | Bootcamp: R | ||
Mon May 25 | Assignment #4 due | ||
Tue May 26 | Data Mining Algorithms | Data Mining References |
Data Mining Slides
Data Mining Notes |
Thu May 28 | Assignment #5: Data Mining, Network Analysis | ||
Thu May 28 | Data Mining Using Python | Mining Python Notes | |
Fri May 29 | Bootcamp: Data Mining using SQL | ||
Mon June 1 | Project #2 due | ||
Tue June 2 | Network Analysis | Network References |
Network Slides Networks Notes |
Thu June 4 | Project #2 results and discussion Unstructured Data |
Unstructured Data Slides |
|
Sat June 6 | Assignment #5 due | ||
Tue June 9 | Exam #2 |