Course Description

Aimed at non-CS undergraduate and graduate students who want to learn the basics of big data tools and techniques and apply that knowledge in their areas of study. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive data sets. This course provides a broad and practical introduction to big data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.


Time Tuesdays & Thursdays 1:30-2:50 PM
Location: Building 320 room 105 (Geology corner)

Office Hours

The CAs hold 20 hours of office hours a week, Monday-Friday, in reserved areas in the Engineering Quad. Times and places are given in the course calendar.

Professor Widom holds office hours on Wednesdays 4:00-5:00pm in the Dean's Office #227 on the 2nd floor of the Huang building. Updates to her office hours will be posted on the course calendar.


Grades for the course will be weighted equally on composite scores for projects, exams, and homework assignments. That is, the 5 homework assignments will carry the same weight as the 2 exams. There will be 5 assignments, 2 projects, a midterm exam, and a final exam. See the syllabus below for dates and times. There will be no alternate exams, so please make sure you will be available for the midterm on Feb 14 and the final exam on March 18.


Please use Piazza for all questions related to the course. We will be using Piazza as our primary portal for course-related announcements, so make sure to sign up! For all Piazza posts, we guarantee that we will respond within 24 hours. DO NOT post assignment code on Piazza for debugging; we will not respond to posts containing assignment code. Also check out the list of frequently asked questions.

Course Staff
Date Topic and Assignments Readings/References Notes
Tue Jan 8 Introductions, course logistics, Big Data Overview (start) Introductory Readings Course Information
Big Data Overview
Thu Jan 10 Big Data Overview (finish)
Data Analysis & Visualization Using Spreadsheets (Part 1)
Google Spreadsheets References Data Analysis Using Spreadsheets Slides
Spreadsheet Analysis Notes (Part 1)
Mon Jan 14 Assignment 1 released: Spreadsheets
Project 1 released: Personal Data Analysis
Tue Jan 15 Data Analysis & Visualization Using Spreadsheets (Part 2) Common Visualization Mistakes Data Visualization Using Spreadsheets Slides
Spreadsheet Analysis Notes (Part 2)
Spreadsheet Visualization Notes
Thu Jan 17 Advanced Data Visualization Using Tableau Tableau References Advanced Data Visualization Using Tableau Slides
Tableau Notes
Mon Jan 21 Assignment 1 due
Assignment 2 released: Tableau, SQL
Tue Jan 22 Relational Databases and Basic SQL SQL References
Project Jupyter home page
Relational Databases and SQL Slides
Basic SQL Notes
Thu Jan 24 Advanced SQL Advanced SQL Slides
Advanced SQL Notes
Mon Jan 28 Project 1 proposal due
Tue Jan 29 Introduction to Python
(optional if familiar with Python including lists and dictionaries)
Python References
SQL vs Python Comparison
Python Slides
Basic Python Notes
Thu Jan 31 Python for Data Analysis & Visualization (part 1) PyPlot Tutorial Python Data Manipulation Notes
Thu Jan 31 Assignment 2 due
Assignment 3 released: Python
Tue Feb 5 Python for Data Analysis & Visualization (part 2) Pandas intro
SQL vs Pandas Comparison
Python Plotting/Pandas Note
Thu Feb 7 Machine Learning - Regression ML References - Regression Regression Slides
Regression Notes
Mon Feb 11 Assignment 3 due
Tue Feb 12 Machine Learning - Classification and Clustering ML References - Classification and Clustering Classification Slides
Clustering Slides
Thu Feb 14 Midterm Exam - in class
Mon Feb 18 Project 1 due
Assignment 4 released: Machine Learning, R
Project 2 released: Movie-Rating Predictions
The Netflix Prize
Tue Feb 19 Using Python for Machine Learning ML References - Python
Thu Feb 21 The R Language - Data Analysis, Visualization, and Machine Learning R Tutorial
Quick-R: accessing the power of R
Python vs. R for Data Visualization
R Slides
R Notes
Tue Feb 26 Data Mining Algorithms Data Mining References Data Mining Slides
Data Mining Notes
Thu Feb 28 Data Mining Using SQL and Python Mining Python SQL Notes
Thu Feb 28 Assignment 4 due
Assignment 5 released: Data Mining, Network Analysis, Unstructured Data
Tue March 5 Network Analysis Network References Network Slides
Network Notes
Thu March 7 Unstructured Data Unstructured Data Slides
Thu March 7 Project 2 due
Tue March 12 Guest lecture by TAs:
Big data in the real world
Introduction to Deep Learning
Big data in the real world
Introduction to Deep Learning
Thu March 14 Project 2 Award Ceremony
Guest lecture by TAs:
Pathways after CS102
SQL, Python, R On Your Laptop
Pathways after CS102
SQL, Python, R On Your Laptop
Thu March 14 Assignment 5 due
Mon March 18 Final Exam 12:15-3:15 PM
Location: Building 320 room 105 (Geology corner)
Students with Documented Disabilities
Students who may need an academic accommodation based on the impact of a disability must initiate the request with the Office of Accessible Education (OAE). Professional staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations. For CS102 we require accommodation letters to be filed with the instructor a minimum of two weeks before the requested accommodation. This policy is strictly enforced. The OAE is located at 563 Salvatierra Walk (phone: 723-1066, URL: