Course Description

Aimed at non-CS undergraduate and graduate students who want to learn the basics of big data tools and techniques and apply that knowledge in their areas of study. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive data sets. This course provides a broad and practical introduction to big data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.

Lectures

Time Tuesdays & Thursdays 1:30-2:50 PM
Location: Building 320 room 105 (Geology corner)

Office Hours

The CAs hold 20 hours of office hours a week, Monday-Friday, in reserved areas in the Engineering Quad. Times and places are given in the course calendar.

Professor Widom holds office hours on Wednesdays 4:00-5:00pm in the Dean's Office #227 on the 2nd floor of the Huang building. Updates to her office hours will be posted on the course calendar.

Evaluation

Grades for the course will be weighted equally on composite scores for projects, exams, and homework assignments. That is, the 5 homework assignments will carry the same weight as the 2 exams. There will be 5 assignments, 2 projects, a midterm exam, and a final exam. See the syllabus below for dates and times. There will be no alternate exams, so please make sure you will be available for the midterm on Feb 14 and the final exam on March 18.

Communication

Please use Piazza for all questions related to the course. We will be using Piazza as our primary portal for course-related announcements, so make sure to sign up! For all Piazza posts, we guarantee that we will respond within 24 hours. DO NOT post assignment code on Piazza for debugging; we will not respond to posts containing assignment code. Also check out the list of frequently asked questions.

Course Staff
Schedule
Date Topic and Assignments Readings/References Notes
Tue Jan 8 Introductions, course logistics, Big Data Overview (start) Introductory Readings Big Data Overview
Thu Jan 10 Big Data Overview (finish)
Data Analysis & Visualization Using Spreadsheets (Part 1)
Google Spreadsheets References Data Analysis Using Spreadsheets Slides
Mon Jan 14 Assignment 1 released: Spreadsheets
Project 1 released: Personal Data Analysis
Tue Jan 15 Data Analysis & Visualization Using Spreadsheets (Part 2)
Thu Jan 17 Advanced Data Visualization Using Tableau
Mon Jan 21 Assignment 1 due
Assignment 2 released: Tableau, SQL
Tue Jan 22 Relational Databases and Basic SQL
Thu Jan 24 Advanced SQL
Mon Jan 28 Project 1 proposal due
Tue Jan 29 Introduction to Python
(optional if familiar with Python including lists and dictionaries)
Thu Jan 31 Python for Data Analysis & Visualization (part 1)
Thu Jan 31 Assignment 2 due
Assignment 3 released: Python
Tue Feb 5 Python for Data Analysis & Visualization (part 2)
Thu Feb 7 Machine Learning - Regression
Mon Feb 11 Assignment 3 due
Tue Feb 12 Machine Learning - Classification and Clustering
Thu Feb 14 Midterm Exam - in class
Mon Feb 18 Project 1 due
Assignment 4 released: Machine Learning, R
Project 2 released: Movie-Rating Predictions
Tue Feb 19 Using Python for Machine Learning
Thu Feb 21 The R Language - Data Analysis, Visualization, and Machine Learning
Tue Feb 26 Data Mining Algorithms
Thu Feb 28 Data Mining Using SQL and Python
Thu Feb 28 Assignment 4 due
Assignment 5 released: Data Mining, Network Analysis, Unstructured Data
Tue March 5 Network Analysis
Thu March 8 Unstructured Data
Thu March 8 Project 2 due
Tue March 12 Guest lecture: Big Data Platforms and Services
Thu March 14 Project #2 results and discussion
Correlation and causation
Follow-on courses and pathways
Thu March 14 Assignment 5 due
Mon March 18 Final Exam 12:15-3:15 PM
Location: TBA
Students with Documented Disabilities
Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. For CS102 we require accommodation letters to be filed with the instructor a minimum of two weeks before the requested accommodation. This policy is strictly enforced. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: http://oae.stanford.edu).