Robust LSM-Trees Under Workload Uncertainty
Modern data systems frequently employ tuning strategies that rely on a priori assumptions on the workload and hardware platform. However, data systems are consistently exposed to changing environments. Workloads may shift as application demands are not consistent, and with the prevalence of the cloud deploying applications on a constant hardware platform is not always guaranteed. Tuning data systems in such uncertain environments may lead to degradation in overall performance.
We introduce a new robust tuning paradigm to aid in the design of data systems with uncertain assumptions by modeling the behavior of the system and then utilizing these models in conjunction with techniques in robust optimization. Our approach is demonstrated through tuning a popular log-structured merge-tree based storage engine, RocksDB. We create a detailed cost model for the standard write and read queries, and frame the design decision as a robust optimization problem that chooses the physical layout of the tree by changing size ratio and memory allocated to the buffer versus bloom filter based on the available resources and expected workload.
Watch the project update from Greater New England RIG May meeting [41:08]