Robust LSM-Trees Under Workload Uncertainty
Note: Please see the Robust Data Systems Tuning project page for related research into broader topics in this area.
Modern data systems frequently employ tuning strategies that rely on a priori assumptions on the workload and hardware platform. However, data systems are consistently exposed to changing environments. Workloads may shift as application demands are not consistent, and with the prevalence of the cloud deploying applications on a constant hardware platform is not always guaranteed. Tuning data systems in such uncertain environments may lead to degradation in overall performance.
We introduce a new robust tuning paradigm to aid in the design of data systems with uncertain assumptions by modeling the behavior of the system and then utilizing these models in conjunction with techniques in robust optimization. Our approach is demonstrated through tuning a popular log-structured merge-tree based storage engine, RocksDB. We create a detailed cost model for the standard write and read queries, and frame the design decision as a robust optimization problem that chooses the physical layout of the tree by changing size ratio and memory allocated to the buffer versus bloom filter based on the available resources and expected workload.
- Awarded NSF CAREER on “Robust LSM Trees”
Paper Session for Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty, 48th International Conference on Very Large Databases, September 8, 2022, Sydney, Australia
Watch the Research Days talk – Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty, Feb. 8, 2023
Watch the project update from Greater New England RIG May 2021 meeting