Course Description

Aimed at non-CS undergraduate and graduate students who want to learn the basics of big data tools and techniques and apply that knowledge in their areas of study. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive data sets. At the same time, it is surprisingly easy to make errors or come to false conclusions from data analysis alone. This course provides a broad and practical introduction to big data: data analysis techniques including databases, data mining, and machine learning; data analysis tools including spreadsheets, relational databases and SQL, Python, and R; data visualization techniques and tools; pitfalls in data collection and analysis; historical context, privacy, and other ethical issues. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.


Time Tuesdays & Thursdays 1:30-2:50 PM
Location: Building 320 Main Quad, Room 105

Office Hours

The Course Assistants hold office hours several different times a week, Monday-Friday, in the basement of the Huang Building (look for the CS102 sign) or in reserved rooms in Huang. Times and places are given in the course calendar.

Professor Widom holds office hours on Wednesdays 4:00-5:30pm in the Dean's Office #227 on the 2nd floor of the Huang building. Updates to her office hours will be posted on the course calendar .


Grades for the course will be weighted equally on composite scores for projects, tests, and homework assignments. That is, the 5 homework assignments will carry the same weight as the 2 tests. There will be 5 assignments, 2 projects, a midterm exam, and a final exam. See the syllabus below for dates and times. There will be no alternate final, so please make sure you will be available for the final exam on December 11.


Please use Piazza for all questions related to the course. Also check out the list of frequently asked questions.

Course Staff

Professor: Jennifer Widom
No picture available
Course Assistant: Joanne Jang
No picture available
Course Assistant: Jen Kilpatrick
No picture available

Course Assistant: Clara Meister
No picture available
Course Assistant: Rob Pinkerton
No picture available

Course Assistant: Kelly Shen
No picture available

Date Topic and Assignments Readings/References Notes (posted after class)
Tue Sept 26 Introductions, course logistics, Big Data Overview (start) Introductory Readings Intro Slides
Big Data Overview
Thurs Sept 28 Big Data Overview (finish)
Data Analysis & Visualization Using Spreadsheets (Part 1)
Google Spreadsheets References Data Analysis with Spreadsheets
Mon Oct 2 Assignment 1 released: Spreadsheets
Project 1 released: Personal Data Analysis
Assignment 1
Project 1
Tue Oct 3 Data Analysis & Visualization Using Spreadsheets (Part 2) Common Visualization Mistakes Spreadsheet Notes
Spreadsheet Slides
Thu Oct 5 Advanced Data Visualization Using Tableau Tableau References Tableau Example Guide
Tableau Notes
Mon Oct 9 Assignment 1 due
Assignment 2 released: Tableau, Basic SQL
Assignment 1
Assignment 2
Tue Oct 10 Relational Databases and Basic SQL SQL References Relational/SQL notes
In-Class SQL Examples
In-Class SQL Example Answers
Thu Oct 12 Python for Data Analysis & Visualization (Part 1) Python References Python Slides
In-Class Python Examples
In-Class Python Example Answers
Mon Oct 16 Assignment 2 due
Assignment 3 released: Python
Assignment 2
Assignment 3
Tue Oct 17 Python for Data Analysis & Visualization (Part 2) Pandas intro
SQL vs Python Comparison
In-Class Python Example Answers cont.
Wed Oct 18 Project 1 proposal due
Thu Oct 19 Guest Lecture #1: Stanford's CARTA system - Tum Chaturapruek (PhD student lead) and Prof. Ramesh Johari
Carta system
Mon Oct 23 Assignment 3 due
Tue Oct 24 Machine Learning - Regression ML References Regression Slides
ML In-Class Example Guide
Thu Oct 26 Machine Learning - Classification and Clustering ML References - Classification and Clustering Classification Slides
Clustering Slides
Tue Oct 31 Midterm Exam - in class
Midterm Sample Exam
Thu Nov 2 Guest Lecture #2: Computational Social Science at Facebook - Lada Adamic
Assignment 4 released: Machine Learning, Advanced SQL
Assignment 4
Mon Nov 6 Project 1 due
Project 1
Tue Nov 7 Using Python for Machine Learning ML References - Python In-Class Python ML Examples
In-Class Python ML Example Answers
Thu Nov 9 Advanced SQL
Project 2 released: Movie-Rating Predictions
Project 2 Advanced SQL Slides
In-Class SQL Examples
In-Class SQL Example Answers
Tue Nov 14 Data Mining Algorithms Data Mining References Data Mining Slides
Data Mining Examples
Wed Nov 15 Assignment 4 due
Thu Nov 16 Data Mining Using SQL and Python In-Class Data Mining Examples
Thanksgiving Break
Tue Nov 28 The R Language - Data Analysis, Visualization, and Machine Learning R Tutorial
Quick-R: accessing the power of R
Python vs. R for Data Visualization
R Slides
R In-Class Exercise Answers
Wed Nov 29 Assignment 5 released: Data Mining, R, Network Analysis Assignment 5
Thu Nov 30 Network Analysis Network References Network Slides
Network Notes
Fri Dec 1 Project 2 due Project 2
Tue Dec 5 Guest Lecture #3: Google's Big Data Platforms and Services - Zoltan Fern Platform Services Slides
Wed Dec 6 Assignment 5 due
Thu Dec 7 Project #2 results and discussion
Brief introductions to text mining and analytics, image analysis, and audio & video analysis
Follow-on courses and pathways
Mon Dec 11 Final Exam 12:15-3:15 PM
Location: Bishop Auditorium in Lathrop Library