Machine Learning Tuning of Kernel Policies Towards Energy Efficiency in Diverse Hardware and Software [Americas Research Interest Group Meeting, July 2023]
Link to access the meeting: http://meet.google.com/gsa-xdpn-nit
Materials from Meeting
Join us for the next Red Hat Research Americas Research Interest Group Meeting on July 18, 2023 at 3PM EDT. The meeting is open to Red Hatters and our research partners.
Machine Learning Tuning of Kernel Policies Towards Energy Efficiency in Diverse Hardware and Software
Han Dong, Boston University
As global data center energy use continues to rise, a core goal operating systems (OS), which is to enable higher efficiency and get work done while consuming fewer resources is magnified due to increasingly constrained energy budgets. Our work focuses on revealing how three fundamental aspects of an OS: 1) interrupt coalescing, 2) processor sleep state, and 3) dynamic voltage frequency scaling, can impact the performance and energy efficiency of network processing through a diverse hardware and software experimental study. We built upon our previous work which establishes how a state-of-the-art machine learning technique, Bayesian optimization, can be used by an operator to dynamically adjust service-level agreement (SLA) and energy goals while supporting a real world in-memory key-value store workload. This was made possible by the insight that being able to externally control interrupt coalescing helps stabilize application latency periods such that it becomes easier to control performance-energy trade-offs and magnify its benefits with processor frequency scaling while utilizing specialized sleep states.
We theorize that this insight is generally applicable across diverse sets of hardware and SLA-driven network applications. Almost all modern CPU architectures expose a degree of dynamic voltage frequency scaling such that it can trade off instruction execution speed with a reduction in energy use. Furthermore, modern NICs and their device drivers are typically developed to be configured via the ethtool networking utility, which often provide interfaces that enable user defined interrupt coalescing rates. Adjusting these rates can improve overall software stack efficiency as system overheads such as interrupt processing, OS book-keeping, and cache misses are amortized or eliminated by the batched handling of packets. To demonstrate this theory, we undertook a diverse experimental study to demonstrate how Bayesian optimization can be applied across various CPUs and NICs while running a diverse set of SLA-driven network applications. We utilize experimental hardware from both the Massachusetts Open Cloud (MOC) and CloudLab to demonstrate the generality and useability of Bayesian optimization as a mechanism to dynamically target SLA and energy goals.