Due: Tuesday, April 23rd, 2024 at 7PM
This quarter, you're going to reimplement an open-source research system in Rust, trying to achieve similar performance on one or more key metrics. You'll explore and come to understand what makes this easy or hard, and report on your experiences. The overall goal of this assignment is to help start to answer the question:
“What are the most important open research challenges for software systems written in Rust?”
There are three major milestones to your project:
Propose a team and project
Submit a midterm report in Week 6
Write up and submit your results
When you're done with this assignment, you should have
formed a team,
agreed on a system you will re-implement in Rust,
decided on 2-3 key performance metrics you will try to match.
Your team should be 2-4 people, of which at least one, if possible, should have significant prior Rust experience. You can expect that this team member (or members) will spend a good deal of time helping the other team members as they become more familiar with the language. This is an important responsibility, and absolutely a valued intellectual goal of the class; one adage is that the best way to learn something is to teach it, as you need to not only understand it but also be able to explain that understanding.
Your team should explicitly select who will take on two different roles in the project. These people are responsible for this aspect of your project and should have the final say. Having a consistent approach to each is important. Plus by placing this responsibility in one person, they know to keep track of the issues in play and spend some of their time and thought on it. The two roles are:
Software architect: This is the person who will decide how to decompose your project into modules and what the interfaces to those modules are. In Rust terminology, this is means deciding on the structs, enums, and traits, especially the public ones. Of course these will evolve as the project progresses. But the software architect has the design in their head and knows how it comes together. Two people working on different modules that interact through a trait can suggest changes to the architect (e.g., to support a piece of functionality), but it's ultimately up to the architect to sign off on a change.
Project manager: This is the person who decides who works on what and when. They are the person who is ultimately responsible for the system working at the end of the quarter. If development is falling behind, the project manager is the person who decides when to ask one person to stop working on one module, or set a partial completion point, before they shift to work on another. Put anothjer way, the project manager decides how to allocate the programming time of the team.
These two roles should have a single owner because, for both of them, having a consistency in approach is often more important than optimizing the exact approach taken. There are many good software architectures to solve a problem: it's more important that you pick one and stick with it than exactly which (of the good ones) you pick. Similarly, there are many good ways to allocate people to complete a project: what's important is that you pick one strategy and stick with it.
As you form your team, you should meet in person and answer the following questions:
What are your goals for the class? Please have each member discuss individually.
What is your Rust background? What parts of it do you find harder or easier?
What are some computer systems research papers you've read that you liked?
How do you like to work? Do you prefer to work solo and occasionally sync up? Do you like pair programming?
Compare your schedules to find two time blocks of at least one hour each week when you all can meet. Commit to meeting at these times each week and working together. If something comes up and someone can't make it, be sure to schedule another time for that week. The purpose of these meetings is to keep everyone in regular, scheduled communication on progress, discuss each other's code, etc.
Be sure to set up a Slack channel (or any other communication medium you prefer) for lower-latency, low-bandwidth communication and coordination.
Go to Stanford's GitLab and set up a repository for your project. Add all of your team memebrs and the course staff as contributors. Write a README that describes your team and the project.
Pick an open-source research system that is open source and not written in Rust. This system should be something that you can run and reimplement without requiring specialized hardware: you'll need to be able to run both the open source and your version. We can probably get access to small amounts of cloud compute (e.g., if you want to run on high-core processors), but not 100-node clusters. Note that you do not need to recreate the evaluation setups in papers on the system. E.g., if you are re-implementing a transaction processing system that was evaluated on a server with 52 cores and 1TB of RAM, you can evaluate it on your laptop and gather meaningful results.
Your goal will be to reimplement the system – or at least part of it – in Rust, and compare the performance of your Rust implementation with the published one. You should pick 2-3 key performance results (e.g., latency under increasing load, throughput under increasing parallelism, etc.). Your goal will be to meet or exceed the results on these metrics. Writing slow code is easy: the challenges often come into play when you are forced to take particular approaches in order to minimize overheads (e.g., don't just Copy everything).
The system in question should be large and complex enough to be a substantial challenge. You have source code to refer to, which will help a good deal. A rough rule of thumb is it should be at least 1,000 lines of code per team member.
Many research systems are too complex and nuanced to implement in a quarter, even with source code available. It's OK to reimplement part of the system, or one piece of it (e.g., the server, but not the client). If the project builds on libraries, you are welcome to reuse those libraries.
The goal of the project isn't the replication of the system itself. Rather, it's about forcing you to deal with programming challenges in building a real system that performs well. We're encouraging research systems not because they're research but because they have clear performance metrics to compare against. Furthermore, ACM replication badges provide the experimental setting for these measurements.
If you find an open source system or library that has a good benchmark setup with it, that is fine too. In that case, you can choose a benchmark rather than a result in the paper. Note that choosing the right benchmarks might allow you to implement only a small subset of the system.
In general, systems that are computationally bound (i.e., can you write a tight loop) are less interesting than I/O, in part due the complexities of memory management, concurrency, and the challenging things about Rust. Systems that have one or more of the following properties are a plus:
High computational parallelism with shared state
High network parallelism (requests per second, multi RTT exchanges)
Complex, interlocking data structures
The Rust async metanarrative has some good examples of things that are difficult with asynchronous Rust code.
You should generally plan on > 1,000 lines of code per group member. Scope your project (e.g., which parts of the system you will reimplement) accordingly.
S3FIFO: note it’s a simulator, so the evaluation will be the simulator itself, not the result
Trio: reimplement libfs
Tiptoe: use the existing crypto
Coeus: pick one of the 3 steps
PashJIT: core part is the execution engine for shell scripts
Pocket: just run on one machine
Your proposal should be a 1 page document that states:
Who is in your team, and their Rust experience
Who your software architect and project manager is
Your GitLab repository
What system you will re-implement
A link to the open source for the system you will re-implement
Which performance metrics you will try to match; reference the paper and include the table, stated result, or a high-quality images of the figures.
Send an email to cs340r-spr2324-staff@lists.stanford.edu, with the subject “Team <NAME>”, attaching your proposal as a PDF. Your team name should be the system you are re-implementing.