CS145 Introduction to Databases

NVIDIA Auditorium on TTh 3:00-4:20pm

Chat with us on the course Piazza site if you have any questions!

Description
This course covers database design and the use of databases in applications, with a short introduction to the internals of relational database engines. It includes extensive coverage of the relational model, relational algebra, and SQL. The course also features database design and relational design principles based on dependencies and normal forms. Many additional key database topics from the design and application-building perspective are also covered, including indexes, views, transactions, and integrity constraints. Systems such as MapReduce framework and key-value stores will also be covered. There will be a programming project, which explores database design and management in web applications by utilizing appropriate features of SQL.
Class Logistics

The course is not flipped- instead we're introducing iPython notebooks to allow for more interaction. Please see this post on the class Piazza for setup instructions. Python is used in both the projects and in some course handouts.

Lecture Plan

The reading material listed below is optional, but it refers to Database Systems: The Complete Book by Garcia-Molina, Ullman, and Widom. The lecture plan below may change.

Note that there are four types of lecture material below:

  • Lecture slides: In PPT and PDF formats
  • Lecture activities: IPython notebooks for in-class activities
  • Lecture notebooks: Supplementary (optional) material: Interactive IPython versions of the lectures- play around with these to review and better understand the material!
  • Notebook data: These data files need to be downloaded and placed into the same directory as the notebooks for certain exercises:

# Date Topic Lecture Materials Extra Reading Material Assignments
Introduction and Querying
1 9/22 Course Logistics and Database History Lecture 1 (pdf)
Activities: 1, 2
Notebook data: dataset_1.db
2 9/24 SQL: Introduction Lecture 2-3 (pdf)
Activities: Lecture Notebooks: Notebook data: dataset_1.db
Greenspun- SQL for web nerds PS #1:
3 9/29 SQL: Advanced Ch. 6
Database Design and Normal Forms
4 10/1 Database Design: ER Diagrams Lecture 4 (pdf)
Activity Solutions (pdf)
Ch. 2 Project Part 1
5 10/6 Database Design: Theory 1 Lecture 5-7 (pdf)
Activities: Lecture Notebooks:
Ch. 3.2-3.7 PS #1 Due
6 10/8 Guest Lecture: Manpreet Singh (Google), Google Ads Infrastructure: Challenges in building Big Data Systems
7 10/13 Database Design: Theory 2 PS #2:
Transactions
8 10/15 Transactions from a User's Perspective Lecture 8-9 (pdf)
Activities:
Ch. 8.6 Project Part 1 Due
9 10/20 Mechanisms for Transactions: Logging and Locking Ch. 18.1-18.4
Midterm
10 10/22 Midterm Review Midterm Review Slides (pdf) PS #2 Due
11 10/27 Midterm

Location:
  • SUNETID A-R: In Class (NVIDIA)
  • S-Z: McCullough 115
Cheat-sheet
Solutions
Introduction to Database Internals
12 10/29 IO Cost Models and External Sort Lecture 12 ( pdf)
Activities:
Ch. 11.4 Project Part 2
13 11/3 Indexing Lecture 13 ( pdf)
Activities: *Not in cs145-notebooks repo
Ch. 13.1-13.3
14 11/5 Access Methods and Operators Lecture 14-15 ( pdf)
Activities:
Ch. 15 (except 15.9)
15 11/10 Joins: A Cage Match Ch. 2 and 16.3 Project Part 2 Due

Project Part 3 PS #3:
16 11/12 Relational Algebra Lecture 16 ( pdf)
Activities:
Ch. 2 and 16.3
NoSQL Systems
17 11/17 Query Optimization Lecture 17 ( pdf)
Activities:
  • Activity-17-1 ( Solutions)
  • Activity-17-2
  • Note: You will need relation_algebra.py and display_tools.py from Lecture 16 (updated since though!), and complaint.db from Lecture 13
Bonus Activities:
18 11/19 Guest Lecture: 10 Years of Hadoop from a Cofounder PS #3 Due
Thanksgiving Break: 11/23-11/27
19 12/1 Research Talk: Dark Data and Analytics Systems Lecture 19 ( pdf)
Activities:
Wrap up and Final
20 12/3 Final Exam Review Final Review Lecture ( pdf)
Review Notes
Project Part 3 Due

Practice Problems:

Change log:
Note: you may have to clear your browser's cache to get newest files!
  • [9-30-15: 1:48AM]: Edits / corrections to Lecture 3; solutions to Lecture 3 Activities
  • [9-30-15: 3:31PM]: Added Activity-3-4-NULLs
  • [10-1-15: 9:55AM]: Added "Mea Culpa" slide about NULLs to Lecture 4, minor edits
  • [10-1-15: 10:10PM]: Added Project Part 1 description and materials; changed extra lecture 3 activities to "lectures"
  • [10-2-15: 4:45PM]: Posted Lectures 5, 7 and activites; (EDITED) Lecture 4 activity solutions; minor edits to project description
  • [10-5-15: 10:10AM]: Minor corrections to PS1 (Bonus #1 expected output was wrong), Activity-4-Solutions (Player weight keys wrong)
  • [10-9-15: 9:32AM]: Posted Lecture 5, 7 edits, Activity solutions
  • [10-9-15: 1:10AM]: Posted Lecture 8, 9 and activities
  • [10-10-15: 2:39PM]: Posted Lecture 7 correction (3NF algorithm)
  • [10-10-15: 8:30PM]: Posted PS1 Solutions
  • [10-11-15: 11:00PM]: Edits to Lecture 7 and Activity 7-2 (removed 3NF, switched to activity on lossy decomp
  • [10-12-15: 12:05AM]: PS2 posted
  • [10-12-15: 11:07PM]: PS1 correction to 3.a solution
  • [10-13-15: 12:36AM]: PS2 edits
  • [10-13-15: 9:51PM]: PS2 edits for clarity
  • [10-14-15: 12:20AM]: Edits to lecture 8
  • [10-14-15: 12:45AM]: Latest / edited version of lecture 7 + activity solutions
  • [10-16-15: 5:40PM]: Added MVDs activity
  • [10-20-15: 2:45PM]: Updated activities for lecture 9
  • [10-24-15: 1:02AM]: Updated midterm review slides (Project ER diagram / lecture 4)
  • [10-24-15: 10:16AM]: Fixed error in Activity-7-2 (called a BCNF decomp which did not preserve dependencies "lossy"...)
  • [10-24-15: 11:49PM]: PDF solutions to PS1 and project pt. 1 posted
  • [10-25-15: 11:40AM]: Minor updates to lectures 5-7 (added types to X in BCNF defs), lecture 9 (T1/T2 swap on inconsistent read slide), review slides (same edits)
  • [10-25-15: 4:23PM]: Posted PS2 solutions
  • [10-25-15: 7:06PM]: Fixed bug in PS2 solutions to 2b
  • [10-25-15: 10:20PM]: Added cheat sheet
  • [10-26-15: 9:41AM]: Fixed minor error in MVD defn in cheatsheet
  • [10-28-15: 8:32PM]: Added Project Part 2 materials
  • [10-29-15: 10:30AM]: Lecture 12 posted
  • [10-29-15: 10:32PM]: Lecture 12- minor edits
  • [10-29-15: 10:55PM]: Lecture 12 activities posted
  • [10-30-15: 12:03AM]: Fix to PS2 bonus solution
  • [10-31-15: 2:45PM]: Edits to constraints and triggers activity
  • [11-2-15: 2:53PM]: Posted Lecture 13
  • [11-2-15: 4:37PM]: Changed PS3 due date- pushed back one lecture
  • [11-5-15: 2:40PM]: Lecture 14 and Activity 13 solns posted
  • [11-5-15: 10:12PM]: Midterm solutions posted
  • [11-6-15: 2:50PM]: Switched lecture schedule
  • [11-7-15: 1:15PM]: Midterm solutions edited
  • [11-9-15: 3:27PM]: Moved Activity-14 to 15
  • [11-10-15: 8:45AM]: Edits to Lecture 13 (slides 18, 25); minor edits to Lecture 15
  • [11-10-15: 5:20PM]: Posted Project Part 3
  • [11-11-15: 10:16AM]: Posted PS3
  • [11-11-15: 3:42PM]: Removed some redundant (buggy) slides from Lecture 12!
  • [11-12-15: 3:48AM]: Fixed error in sample plot in PS3
  • [11-12-15]: Lecture 16 and activities posted; PS3 edits and pdf version
  • [11-14-15]: Posted review notes; Note: Updates to review notes will not be catalogued in this changelog- see compiled date in pdf!
  • [11-16-15 7:00AM]: Minor edit to Lecture-14-15 slides (slide 19-20)
  • [11-16-15 10:30PM]: Minor clarification about use of repacking in SMJ optimization added to Lecture-14-15 slides (slides 75,80)
  • [11-16-15: 11:20PM]: Posted Lecture 17; posted PS3 submission template and associated tools
  • [11-17-15: 2:15PM]: Newest version of Lecture 17 activities posted; two slides added to L17; slight edits to relation_algebra.py
  • [11-17-15: 11:55PM]: Slight edits (for clarity only) to PS3; Activity 15, 17 solutions posted
  • [11-18-15: 8:07AM]: Edited PS3 Q1c plot- only points that will be evaluated on
  • [11-19-15: 8:02PM]: Added bonus activities
  • [11-22-15: 4:11PM]: Posted PS3 solutions
  • [11-22-15: 9:00PM]: Edits to Bell-Lapadula bonus activity
  • [11-23-15: 6:00PM]: Posted Relational Algebra practice problems and solutions
  • [11-25-15: 2:05PM]: Minor fix to PS3 Solutions p4- was leaving out zero values
  • [11-28-15: 5:12PM]: Removed lecture overlap in second half of course; a few minor edits to lectures
  • [11-29-15: 1:07AM]: Activity-16-1 solns posted; small change to midterm solution 1.4 (bonus)
  • [12-1-15: 7:27AM]: Lecture 19 posted
  • [12-1-15: 8:34PM]: Final review slides posted
  • [12-3-15: 8:17AM]: Fix to B+ tree cost eqn in L14-15 and final review slides
  • [12-3-15: 3:42PM]: Fix to defn of fanout in L14-15
  • [12-3-15: 8:43PM]: Edits to 3NF-4NF Bonus activity
  • [12-4-15: 5:16PM]: Edits mostly to final review slides; minor ones to L14-15, review notes- see piazza
  • [12-5-15: 10:30AM]: Edit to Act-17-1-soln posted
  • [12-6-15: 11:23AM]: Newest versions of relational_algebra.py synced
  • [12-6-15: 10:20PM]: Correcting representation of NJoin as derived operator in slides and notes
  • [12-7-15: 2:04PM]: Minor update to wording of IO-Aware bonus activities
  • [12-7-15: 11:05PM]: Update / clarification on backup cost in L14-15 and final review slides
Final Exam
The final exam will be on December 8th at 3:30pm. The locations will be:
  • NVIDIA Auditorium
  • Huang 018
  • Skilling Auditorium
Assignments to location by SUNETid to be announced closer to exam date.
Grading
Lecture Attendance10%
Problem Sets20%
Programming Project20%
Midterm20%
Final30%

Note: SCPD students will be exempt from the "Lecture Attendance" portion. Grading will be scaled proportionately over the remaining segments.

Office Hours

Unless otherwise specified in the staff section below, office hours will be held in the open area of the Huang basement. CAs will bring signs to identify themselves. Please see the staff section below for office hours.

Note: the schedule of office hours may change from time to time, in which case an announcement will be made on the course Piazza.

Staff
Chris
Alex Ratner
Ari Nubar Ekmekji
Shubham Gupta
Cagla Kaymaz
Arun Kulshreshtha
Yuchen Li
Kevin McKenzie
Vishnu Sundaresan
Jed Tan
Stephanie Tsai
OHs
Late Policy
You will have three 24-hour "late days" that you can use throughout the quarter on any of the Problem Sets or Programming Projects, unless specified otherwise. Beyond these three late days, we will only grant extensions in the case of a severe medical or family emergency. Please use your late days wisely.
Honor Code and Collaboration Policy

We encourage you to discuss the Programming Projects and Problem Sets with other students; it's fine to discuss overall strategy and collaborate with a partner or in a small group, as both giving and receiving advice will help you to learn.

However, for the Programming Projects, you must write your own code: it's not OK to share code or write code collaboratively. (This includes posting and/or sharing your code publicly, such as on GitHub!) Likewise, for the Problem Sets, you must write up your own solutions to all of the problems, and you must cite all people you worked with. If you do not do so, we will consider this a violation of the Stanford Honor Code.

If you consult any resources outside of the materials provided in class, you must cite these sources. We reserve the right to assign a penalty if your answers are substantially derivative, but, as long as you provide appropriate citations, we will not consider this an Honor Code violation.

Students with Documented Disabilities
Students who may need an academic accommodation based on the impact of a disability must initiate the request through the Office of Accessible Education (OAE). OAE staff will evaluate the request with required documentation, recommend reasonable accommodations, and prepare an Accommodation Letter for faculty dated in the current quarter in which the request is being made. Students should contact OAE as soon as possible since timely notice is needed to coordinate accommodations: 563 Salvatierra Walk; phone: (650) 723-1066.