284 Gates Hall, Stanford University, Stanford, CA 94305
raejoon at stanford dot edu
Large scale storage systems use erasure coding to recover from data unavailability. A system repairs an unavailable data block by reading in and processing multiple coded blocks. This greatly impacts tail latency, a request for a lost block reads many blocks across the network.
We propose DRED, an algorithm that reduces repair time for distributed, encoded data. DRED dynamically constructs a repair tree based on which nodes store encoded blocks and their available network capacity. DRED handles cases where different nodes have different available capacity (e.g., due to cluster manager allocation in the cloud) as well as when uplink and downlink capacities differ (as is common for residential last mile links).
Many wireless protocols send periodic beacons. To improve performance and avoid contention, protocols use a variety of ad-hoc mechanisms to desynchronize and separate these beacons in time. We propose Solo Timers, an efficient, general, and simple, distributed desynchronization algorithm. The design is partly based on the pulse-coupled oscillator, a control system that prior work has shown can theoretically desynchronize a single hop network. Solo Timers refine the pulse-coupled oscillator to be able to desynchronize multihop networks and remain stable despite hardware and software time jitter. A path vector-based algorithm allows Solo Timers to detect and repair unstable configurations that can occur in multihop networks.