CS349d: Cloud Computing Technology

Instructor: Christos Kozyrakis
TAs: Mark Zhao and Swapnil Gandhi
Spring 2024, Mon/Wed 4:30 PM - 5:50 PM, 320-109

Office Hours: TBD

The largest change in the computer industry over the past ten years has arguably been the emergence of cloud computing: organizations are increasingly moving their workloads to managed public clouds and using new, global-scale services that were simply not possible in private datacenters. However, both building and using cloud systems remains a mystery with many difficult research challenges. This research seminar will cover industry and academic work on cloud computing and survey key technical issues. Students will participate in guest lectures from leading experts across the field, read and lead discussions on papers, and do a quarter-long project in groups of 2-3.

Grading: The main evaluation will be around a project that students propose and execute during the course. Apart from that, each student is expected to present one of the papers and to participate in class. The grading rubric will be 60% project, 20% participation, and 20% paper presentations and summaries.

Discussion Site: Online discussions will take place at edstem. An account is created for all enrolled students using their Stanford email. Contact Mark or Swapnil if you have trouble connecting.

Gradescope: You will use Gradescope to submit paper summaries, which are due before the start of each class. You should be already enrolled in Gradescope, but if not, contact Mark or Swapnil.

Presentation Signup Form: Please sign up for lectures using this Google Form.


DateTopicRequired ReadingsAdditional ReadingsLecture Slides
04/01/24 Introduction to Cloud Computing -
04/03/24 GPUs in the Cloud MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters
Transparent GPU Sharing in Container Clouds for Deep Learning Workloads
Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads
Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences
L2: MLaaS L2: TGS
04/08/24 ML Training MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
04/10/24 ML Training Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
04/15/24 ML Data Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
cedar: Composable and Optimized Machine Learning Input Data Pipelines
tf.data: A Machine Learning Data Processing Framework
Cachew: Machine Learning Input Data Processing as a Service
04/17/24 ML Inference AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Efficient Large Language Models: A Survey
04/22/24 ML Inference Efficient Memory Management for Large Language Model Serving with PagedAttention
DSPY: Compiling Declarative Language Model Calls into Self-Improving Pipelines
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
04/24/24 Databases and Datalakes Delta lake: high-performance ACID table storage over cloud object stores
RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design
Analyzing and Comparing Lakehouse Storage Systems
04/29/24 Databases and Datalakes Milvus: A Purpose-Built Vector Data Management System
SPFresh: Incremental In-Place Update for Billion-Scale Vector Search
A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge
05/01/24 Resource Management and ML for Systems Twine: A Unified Cluster Management System for Shared Infrastructure
Hyrax: Fail-in-Place Server Operation in Cloud Platforms
Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms
05/06/24 Resource Management and ML for Systems Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices
05/08/24 Resource Management and ML for Systems SelfTune: Tuning Cluster Managers
Cilantro: Performance-Aware Resource Allocation for General Objectives via Online Feedback
05/13/24 Serverless Computing On-demand Container Loading in AWS Lambda
Faster and Cheaper Serverless Computing on Harvested Resources
XFaaS: Hyperscale and Low Cost Serverless Functions at Meta
Pocket: Elastic Ephemeral Storage for Serverless Analytics
05/15/24 Confidential Computing Everywhere All at Once: Co-Location Attacks on Public Cloud FaaS
An Extensible Orchestration and Protection Framework for Confidential Cloud Computing
Keystone: An Open Framework for Architecting Trusted Execution Environments
05/20/24 Confidential Computing Telekine: Secure Computing with Cloud GPUs
Honeycomb: Secure and Efficient GPU Executions via Static Validation
MAGE: Nearly Zero-Cost Virtual Memory for Secure Computation
05/22/24 Cloud Potpourri SkyPilot: An Intercloud Broker for Sky Computing
Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms
05/27/24 Memorial Day -
05/29/24 Networking and Hardware Empowering Azure Storage with RDMA
A Cloud-Scale Characterization of Remote Procedure Calls
The Security Design of the AWS Nitro System
06/03/24 Project Presentations TBA
06/05/24 Project Presentations TBA
Access to articles published in the ACM Digital Library is free for individuals within the Stanford Campus. If you are off-campus, you can still access these articles using the Stanford Libraries Extension, available at this link.


The assignments for this class consists of: Paper Presentations, Paper Summaries, and a Project.

Paper Presentations

For lectures with assigned readings students will be assigned to present the papers and lead the class discussions. The assigned student(s) will spend roughly 5 minutes presenting the day's paper, and will then lead the subsequent discussion (~25 minutes). In your presentation, cover each of the following for each paper (see also the Presentation Template):

Paper Summaries

For lectures with assigned readings, everyone must submit a summary for each paper on Gradescope prior to the start of each class. Please see the Gradescope assignment for the specific requirements of each paper summary.


Students will propose and run a quarter-long project, ideally in groups of 2-3. It is fine to use your existing research project if it is relevant to the course and the instructor approves. You will present the project at the end of the course and write a 5-6 page report. See here for a list of project ideas.

Project timeline (TENTATIVE):

See here for instructions on each phase of the project.

Adapted from a template by Andreas Viklund.