CS349d: Cloud Computing Technology

Instructor: Christos Kozyrakis
TAs: Mark Zhao and Swapnil Gandhi
Spring 2024, Mon/Wed 4:30 PM - 5:50 PM, 320-109

Office Hours: TBD

The largest change in the computer industry over the past ten years has arguably been the emergence of cloud computing: organizations are increasingly moving their workloads to managed public clouds and using new, global-scale services that were simply not possible in private datacenters. However, both building and using cloud systems remains a mystery with many difficult research challenges. This research seminar will cover industry and academic work on cloud computing and survey key technical issues. Students will participate in guest lectures from leading experts across the field, read and lead discussions on papers, and do a quarter-long project in groups of 2-3.

Grading: The main evaluation will be around a project that students propose and execute during the course. Apart from that, each student is expected to present one of the papers and to participate in class. The grading rubric will be 60% project, 20% participation, and 20% paper presentations and summaries.

Discussion Site: Online discussions will take place at edstem. An account is created for all enrolled students using their Stanford email. Contact Mark or Swapnil if you have trouble connecting.

Gradescope: You will use Gradescope to submit paper summaries, which are due before the start of each class. You should be already enrolled in Gradescope, but if not, contact Mark or Swapnil.

Presentation Signup Form: Please sign up for lectures using this Google Form.

Schedule

Date	Topic	Required Readings	Additional Readings	Lecture Slides
04/01/24	Introduction to Cloud Computing	-
04/03/24	GPUs in the Cloud	MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters Transparent GPU Sharing in Container Clouds for Deep Learning Workloads	Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads Microsecond-scale Preemption for Concurrent GPU-accelerated DNN Inferences	L2: MLaaS L2: TGS
04/08/24	ML Training	MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints	FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
04/10/24	ML Training	Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning MegaBlocks: Efficient Sparse Training with Mixture-of-Experts	SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
04/15/24	ML Data	Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines cedar: Composable and Optimized Machine Learning Input Data Pipelines	tf.data: A Machine Learning Data Processing Framework Cachew: Machine Learning Input Data Processing as a Service
04/17/24	ML Inference	AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving Distributed Inference and Fine-tuning of Large Language Models Over The Internet	Efficient Large Language Models: A Survey
04/22/24	ML Inference	Efficient Memory Management for Large Language Model Serving with PagedAttention DSPY: Compiling Declarative Language Model Calls into Self-Improving Pipelines	DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
04/24/24	Databases and Datalakes	Delta lake: high-performance ACID table storage over cloud object stores RackBlox: A Software-Defined Rack-Scale Storage System with Network-Storage Co-Design	Analyzing and Comparing Lakehouse Storage Systems
04/29/24	Databases and Datalakes	Milvus: A Purpose-Built Vector Data Management System SPFresh: Incremental In-Place Update for Billion-Scale Vector Search	A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge
05/01/24	Resource Management and ML for Systems	Twine: A Unified Cluster Management System for Shared Infrastructure Hyrax: Fail-in-Place Server Operation in Cloud Platforms	Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms
05/06/24	Resource Management and ML for Systems	Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning Sage: Practical & Scalable ML-Driven Performance Debugging in Microservices
05/08/24	Resource Management and ML for Systems	SelfTune: Tuning Cluster Managers Cilantro: Performance-Aware Resource Allocation for General Objectives via Online Feedback
05/13/24	Serverless Computing	On-demand Container Loading in AWS Lambda Faster and Cheaper Serverless Computing on Harvested Resources	XFaaS: Hyperscale and Low Cost Serverless Functions at Meta Pocket: Elastic Ephemeral Storage for Serverless Analytics
05/15/24	Confidential Computing	Everywhere All at Once: Co-Location Attacks on Public Cloud FaaS An Extensible Orchestration and Protection Framework for Confidential Cloud Computing	Keystone: An Open Framework for Architecting Trusted Execution Environments
05/20/24	Confidential Computing	Telekine: Secure Computing with Cloud GPUs Honeycomb: Secure and Efficient GPU Executions via Static Validation	MAGE: Nearly Zero-Cost Virtual Memory for Secure Computation
05/22/24	Cloud Potpourri	SkyPilot: An Intercloud Broker for Sky Computing Design Tradeoffs in CXL-Based Memory Pools for Public Cloud Platforms
05/27/24	Memorial Day	-
05/29/24	Networking and Hardware	Empowering Azure Storage with RDMA A Cloud-Scale Characterization of Remote Procedure Calls	The Security Design of the AWS Nitro System
06/03/24	Project Presentations	TBA
06/05/24	Project Presentations	TBA

Access to articles published in the ACM Digital Library is free for individuals within the Stanford Campus. If you are off-campus, you can still access these articles using the Stanford Libraries Extension, available at this link.

Assignments

The assignments for this class consists of: Paper Presentations, Paper Summaries, and a Project.

Paper Presentations

For lectures with assigned readings students will be assigned to present the papers and lead the class discussions. The assigned student(s) will spend roughly 5 minutes presenting the day's paper, and will then lead the subsequent discussion (~25 minutes). In your presentation, cover each of the following for each paper (see also the Presentation Template):

Motivation: What is the key problem being addressed in this paper?
Key insight: What are the key insights the author makes to address the problem?
Novelty/Strengths: What is different from previous work, and why? Is it a new problem, a new solution, or a new environment for an existing problem? What are the strengths of the authors' approach?
Critique: Is there anything you would change in the solution? What about in the way the authors presented or evaluated the solution?
Discussion: List a set of questions to start the class discussion on the paper. For example, questions can focus on things you wish the paper addressed better, broader implications of the paper, or how you would expand on the work.

Paper Summaries

For lectures with assigned readings, everyone must submit a summary for each paper on Gradescope prior to the start of each class. Please see the Gradescope assignment for the specific requirements of each paper summary.

Projects

Students will propose and run a quarter-long project, ideally in groups of 2-3. It is fine to use your existing research project if it is relevant to the course and the instructor approves. You will present the project at the end of the course and write a 5-6 page report. See here for a list of project ideas.

Project timeline (TENTATIVE):

Project proposal: April 19th
Mid-term review: May 15th
Presentation: June 3rd and June 5th (in class)
Final report: June 12th at 6:30PM PT

See here for instructions on each phase of the project.

Adapted from a template by Andreas Viklund.