Antoine Saint Exupery
I am interested in computer architecture, systems, and applied data mining.
My Ph.D. work has focused on improving the resource efficiency of large-scale datacenters.
Since traditional scaling techniques, e.g., using commodity computing or relying on Dennard's scaling are reaching the point of diminishing returns,
we must focus on using existing systems more efficiently.
Specifically, during my Ph.D. I have designed and built practical and scalable scheduling systems that improve system utilization,
without sacrificing application performance.
My approach relies on three main insights: first, resource management systems must account for the interactions between software and hardware architectures. Second, a user should only have to provide a high-level declarative description of application requirements, not how they should be achieved using low-level resources. Third, the system must quickly learn the preferences of an application with respect to resources. Unfortunately, obtaining this information through profiling is too expensive. To make the system practical, I have leveraged efficient data mining techniques that take advantage of existing system knowledge to quickly make high quality scheduling decisions.
Below is a list of projects I am currently working on or have worked on in the past.
Quasar: Traditionally, datacenters have been plagued by low utilization, primarily due to users overprovisioning resource reservations to side-step performance unpredictability. Quasar is a cluster manager that adopts a different interface between system and user. Instead of specifying raw resources, the user only specifies a performance target a job must meet. Quasar leverages efficient data mining techniques to determine the resource preferences of a new job, much like a movie recommendation system finds similarities between previous and new users to recommend movies that they are likely to enjoy. Quasar achieves both high cluster utilization and high per-application performance.
[ASPLOS'14 paper] [demo] [press]
Paragon: Paragon is a QoS-aware datacenter scheduler that accounts for interference between co-scheduled workloads
and platform heterogeneity when assigning applications to servers. The scheduler leverages fast classification techniques to
determine the interference and heterogeneity preferences of incoming applications, which only introduce minimal scheduling overheads.
In a 1,000-server EC2 cluster Paragon improves system utilization by 47% compared to a traditional least-loaded scheduler and
achieves 96% of optimal performance, while being scalable and lightweight.
[ASPLOS'13 paper] [TopPicks'14 paper] [TOCS'13 paper]
Tarcil: Tarcil is a scheduler that addresses the disparity between sophisticated, but slow centralized schedulers and fast, but low-quality distributed
schedulers. Tarcil uses sampling to reduce the scheduling overheads and it accounts for the resource preferences of new jobs, to keep scheduling quality high.
It improves performance both for short and long jobs compared to centralized and distributed schedulers.
[paper (in submission)]
Cloud Provisioning: Paragon and Quasar assume that the cluster manager has full control over the entire system. Unfortunately, real life can be more
complicated, especially when the resources are on a public cloud provider. I designed a system that determines the more cost-efficient instance type
(reserved vs. on-demand) and size a job needs to satisfy its QoS constraints. I evaluated this system on a cluster with a few hundred servers on Google
[paper (in submission)]
iBench: Paragon and Quasar need to know the sensitivity of an incoming application to various types of interference. iBench is a benchmark suite that consists
of a set of microbenchmarks each of which puts pressure on a specific shared resource. iBench enables fast and practical characterization of the interference an
application tolerates in various resources and the interference it itself generates.
ARQ: Admission control is needed during periods of high load to prevent cluster overloading. ARQ is a multi-class admission control protocol that ensures fast
application dispatching and low head-of-line blocking.
Datacenter Application Modeling: Previously, I worked on characterizing and modeling the behavior of large-scale datacenter applications.
I designed and implemented ECHO, a consice analytical model that captures and recreates the network traffic of
distributed datacenter applications. I also developed a modeling framework for storage workloads, which generates synthetic load patterns similar to the
original applications. Both modeling frameworks were validated against real datacenter applications from Microsoft, and were used in a series of efficiency
and cost optimization studies.
[IISWC'12 paper] [IISWC'11 paper] [CAL'12 paper] [TPCTC'11 paper]
I also enjoy teaching and mentoring students.
- In Fall 2014 I am co-teaching CS316 (Advanced Multicore Systems).
- In Spring 2014 I was co-teaching EE282 (Computer Architecture).
- In Spring 2013 I was TAing EE282 (Computer Architecture) and teaching some of the lectures and a weekly recitation.
- I am mentoring Sammy Steele (Summer 2014-present). Sammy is working with us on porting techniques that improve resource-efficiency on Mesos.
- In Fall 2013, I mentored several quarter-long projects for CS316 (Advanced Processor Architecture) related to heterogeneous CMP scheduling and datacenter server provisioning.