Course Description

Aimed at non-CS undergraduate and graduate students who want to learn the basics of big data tools and techniques and apply that knowledge in their areas of study. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive data sets. At the same time, it is surprisingly easy to make errors or come to false conclusions from data analysis alone. This course provides a broad and practical introduction to big data: data analysis techniques including databases, data mining, and machine learning; data analysis tools including spreadsheets, relational databases and SQL, Python, and R; data visualization techniques and tools; pitfalls in data collection and analysis; historical context, privacy, and other ethical issues. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.

Lectures

Time Tuesdays & Thursdays 1:30-2:50 PM
Location: Building 320 Main Quad, Room 105

Office Hours

The Course Assistants hold office hours several different times a week, Monday-Friday, in the basement of the Huang Building (look for the CS102 sign). Times are given in the course calendar.

Professor Widom holds office hours on Wednesdays 4:00-5:30pm in the Dean's Office #227 on the 2nd floor of the Huang building. Updates to her office hours will be posted on the course calendar .

Evaluation

Grades for the course will be weighted equally on composite scores for projects, tests, and homework assignments. There will be 5 assignments, 2 projects, a midterm exam, and a final exam. See the syllabus below for dates and times. There will be no alternate final, so please make sure you will be available for the final exam on December 11.

Communication

Please use Piazza for all questions related to the course. Also check out the list of frequently asked questions.

Course Staff

Professor: Jennifer Widom
No picture available
Course Assistant: Joanne Jang
No picture available
Course Assistant: Jen Kilpatrick
No picture available

Course Assistant: Clara Meister
No picture available
Course Assistant: Rob Pinkerton
No picture available

Course Assistant: Kelly Shen
No picture available


Date Topic and Assignments Readings/References Notes (posted after class)
Tue Sept 26 Introductions, course logistics, Big Data Overview (start) Introductory Readings Intro Slides
Big Data Overview
Thurs Sept 28 Big Data Overview (finish)
Data Analysis & Visualization Using Spreadsheets (Part 1)
Google Spreadsheets References Data Analysis with Spreadsheets
Mon Oct 2 Assignment 1 released: Spreadsheets
Project 1 released: Personal Data Analysis
Assignment 1
Project 1
Tue Oct 3 Data Analysis & Visualization Using Spreadsheets (Part 2) Common Visualization Mistakes Spreadsheet Notes
Spreadsheet Slides
Thu Oct 5 Advanced Data Visualization Using Tableau Tableau References Tableau Example Guide
Tableau Notes
Mon Oct 9 Assignment 1 due
Assignment 2 released: Tableau, Basic SQL
Assignment 1
Assignment 2
Tue Oct 10 Relational Databases and Basic SQL SQL References Relational/SQL notes
In-Class SQL Examples
In-Class SQL Example Answers
Thu Oct 12 Python for Data Analysis & Visualization (Part 1) Python References Python Slides
In-Class Python Examples
In-Class Python Example Answers
Mon Oct 16 Assignment 2 due
Assignment 3 released: Python
Assignment 2
Assignment 3
Tue Oct 17 Python for Data Analysis & Visualization (Part 2) Pandas intro
SQL vs Python Comparison
In-Class Python Example Answers cont.
Wed Oct 18 Project 1 proposal due
Thu Oct 19 Guest Lecture #1: Stanford's CARTA system - Tum Chaturapruek (PhD student lead) and Prof. Ramesh Johari
Mon Oct 23 Assignment 3 due
Tue Oct 24 Machine Learning - Regression ML References
Thu Oct 26 Machine Learning - Classification and Clustering
Tue Oct 31 Midterm Exam - in class
Location: TBD
Thu Nov 2 Guest Lecture #2: Computational Social Science at Facebook - Lada Adamic
Mon Nov 6 Project 1 due
Project 2 released: Movie-Rating Predictions
Tue Nov 7 Using Python for Machine Learning
Wed Nov 8 Assignment 4 released: Machine Learning, Advanced SQL
Thu Nov 9 Advanced SQL
Tue Nov 14 Data Mining Algorithms
Wed Nov 15 Assignment 4 due
Thu Nov 16 Data Mining Using SQL and Python
Thanksgiving Break
Tue Nov 28 The R Language - Data Analysis, Visualization, and Machine Learning
Wed Nov 29 Project 2 due
Wed Nov 29 Assignment 5 released: Data Mining, R, Social-Network Analysis
Thu Nov 30 Social-Network Analysis
Tue Dec 5 Guest Lecture #3: Google's Big Data Platforms and Services - Zoltan Fern
Wed Dec 6 Assignment 5 due
Thu Dec 7 Project #2 results and discussion
Lecture Topic TBD
Mon Dec 11 Final Exam 12:15-3:15 PM
Location: Bishop Auditorium in Lathrop Library