CS349D: Cloud Computing Technology

Instructor: Christos Kozyrakis --- OH: Wednesday 3:45 PM (after class)
TA: Kostis Kaffes --- OH: Monday 1 PM - 2 PM (link) (no OH during the 11/22 week, email me if you want to meet)
Fall 2021, Mon/Wed 2:45 PM - 3:45 PM, STLC 104

The largest change in the computer industry over the past five years has arguably been the emergence of cloud computing: organizations are increasingly moving their workloads to managed public clouds and using new, global-scale services that were simply not possible in private datacenters. However, both building and using cloud systems remains a black art with many difficult research challenges. This research seminar will cover industry and academic work on cloud computing and survey key technical issues. Students will read and discuss a paper per class meeting and do a quarter-long project in groups of 2-3.

Grading: The main evaluation will be around a project that students propose and execute during the course. Apart from that, each student is expected to present one of the papers and to participate in class. The grading rubric will be 70% project, 15% paper presentation and 15% participation.

Discussion Site: Online discussions will take place at edstem. An account is created for all enrolled students using their Stanford email. Contact Kostis if you have trouble connecting.

Previous iterations of the course: 2018


Class Format: You will need to fill out a Google form with answers to a few summary questions before each class starts. The form will be emailed to students each week. During class, one or two students will spend 10-15 minutes presenting the day's paper, and will then lead the subsequent discussion. Another student will take notes on the presentation and discussion.

DateTopicReadingsClass Notes
9/20 Introduction A Berkeley View on Cloud Computing (pdf)
AWS Overview
9/22 Cloud Basics AWS Overview (repeated)
AWS Pricing
AWS Well-Architected Framework
9/27 Lessons form Large Scale Cloud Software at Databricks Guest Lecture: Matei Zaharia (Stanford and Databricks)
Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores (pdf)
AWS Pricing (repeated)
9/29 Serverless: Intro & Performance Optimization A Berkeley View on Serverless Computing (pdf)
Firecracker: Lightweight Virtualization for Serverless Applications (pdf)
slides notes
10/04 Serverless: Storage Pocket: Elastic Ephemeral Storage for Serverless Analytics (pdf)
Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications (pdf)
slides notes
10/06 Serverless: Programming From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers (pdf)
New Directions in Cloud Programming (pdf)
slides notes
10/11 Serverless: Applications Towards Demystifying Serverless ML Training (pdf)
Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines (pdf)
slides notes
10/13 Systems for ML: Deep Learning Training TFX: A TensorFlow-Based Production-Scale Machine Learning Platform (pdf)
Ray: A Distributed Framework for Emerging AI Applications (pdf)
TFX slides
Ray slides
10/18 Systems for ML: Prediction Serving Clipper: A Low-Latency Online Prediction Serving System (pdf)
INFaaS: Automated Model-less Inference Serving (pdf)
slides notes
10/20 Systems for ML: MLOps Ease.ML: A Lifecycle Management System for MLDev and MLOps (pdf)
Understanding and Co-designing the Data Ingestion Pipeline for Industry-Scale RecSys Training (pdf)
slides notes
10/25 ML for Systems: Introduction & Frameworks Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms (pdf)
FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices (pdf)
slides notes
10/27 ML for Systems: Scheduling & Debugging FirePlace: Placing FireCracker virtual machines with hindsight imitation (pdf)
Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices (pdf)
slides notes
11/01 ML for Systems: Online Learning AuTO: Scaling Deep Reinforcement Learning for Datacenter-Scale Automatic Traffic Optimization (pdf)
SOL: Safe On-Node Learning in Cloud Platforms (pdf)
slides notes
11/03 Relational Approaches Building Scalable and Flexible Cluster Managers Using Declarative Programming (pdf)
DBOS: A DBMS-oriented Operating System (pdf)
slides notes
11/08 Emerging Hardware: Memory Disaggregation Rethinking Software Runtimes for Disaggregated Memory (pdf)
AIFM: High-Performance, Application-Integrated Far Memory (pdf)
slides notes
11/10 Emerging Hardware: Application Acceleration Warehouse-scale video acceleration: co-design and deployment in the wild (pdf)
Offloading distributed applications onto smartNICs using iPipe (pdf)
RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing (pdf)
slides notes
11/15 Secure Computing: Private Analytics Ryoan: A Distributed Sandbox for Untrusted Computation on Secret Data (pdf)
Chiron: Privacy-preserving Machine Learning as a Service (pdf)
slides notes
11/17 Secure Computing: Enclaves SCONE: Secure Linux Containers with Intel SGX (pdf)
Confidential Serverless Made Efficient with Plug-In Enclaves (pdf)
Optional: Occlum, MAGE
slides notes
11/22 Thanksgiving Break
11/24 Thanksgiving Break
11/29 Project Presentations
12/01 Project Presentations

Paper Presentations

Each student will be assigned a paper to present during the class. You should prepare a 10-15 minute presentation on the paper. In your presentation, cover each of the following:


Students will propose and run a quarter-long project, ideally in groups of 2-3. It is fine to use your existing research project if it is relevant to the course and the instructor approves. You will present the project at the end of the course and write a 10-12 page report. See here for a list of project ideas.

Project timeline:

See here for instructions on each phase of the project.

Adapted from a template by Andreas Viklund.