AI/ML research is driving performance and efficiency gains and building better developer tools.
Red Hat Research and its university partners focus strategically on projects with the most promise to shape the future of how we use technology. Each quarter, RHRQ will publish an overview of our research in a specific area, such as edge computing, hybrid cloud, and security. In this issue, we focus on projects related to artificial intelligence and machine learning.
A key area of focus at Red Hat Research is artificial intelligence (AI) and machine learning (ML). This includes building and optimizing systems that can run AI workloads, using AI/ML techniques to optimize systems software, and collaborating with the edge and hybrid cloud teams at Red Hat Research to use AI for their use cases. We collaborate closely with our university partners as well as Red Hat engineering.
A broad theme we focus on is using AI/ML to learn heuristics and policies that lead to more optimal systems. Optimality here refers to a performance metric, such as tail latency, throughput, or resource consumption, with energy consumption being especially important.
One example of this type of project is NIC Tuning, based at the Red Hat Collaboratory at Boston University. Quantifying the impact of interrupt delays (ITR) and dynamic voltage and frequency scaling (DVFS) on tail latencies and energy consumption for network-heavy workloads. On bare-metal Linux, we have seen significant savings in energy without notably compromising performance by carefully tuning ITR and DVFS. This tuning is generally done using black-box gradient-free techniques like Bayesian Optimization. We are now extending this work to (a) learning a dynamic ITR and DVFS policy (using reinforcement learning techniques) and (b) workloads running on OpenShift, which is a far noisier environment than a carefully controlled bare-metal Linux setup.
A broad theme we focus on is using AI/ML to learn heuristics and policies that lead to more optimal systems.
Another is Compiler Optimization, also at the Collaboratory. The goal is to exploit an optimizing compiler’s capabilities by injecting relevant information, through pragmas as an example, or by learning better heuristics for, say, selecting optimization pass sequences to generate more performant or more efficient code. The compiler target in this case is an FPGA, but it could just as easily be an x86 processor. We have observed significant performance gains by learning policies to insert pragmas or to choose the next compiler pass. While our past work required a new training run for each new application, we are now focused on learning policies that can generalize across applications by reserving part of the state vector to be application code embeddings. A major project is using graph neural networks (GNNs) and control flow, data flow, and call flow graphs to learn code embeddings in a self-supervised way.
With the Red Hat Kernel Performance Engineering Team, we are working on Linux Kernel Tuning. Tuning Linux kernel subsystems is also becoming a significant area of interest. The kernel has thousands of parameters that can affect a running workload in significant ways. Jointly optimizing these parameters in a sample-efficient way is a substantial challenge that requires careful performance measurements, tracing, and meaningful code embeddings, as well as cutting-edge developments in Bayesian optimization and reinforcement learning.
Given the effectiveness of large language models (LLMs), we also have projects that explore their use for developer tools. In the Rust project with Columbia University, we developed a static analysis tool called Yuga to detect lifetime annotation bugs in the Rust language. We are now exploring the use of LLMs to augment Yuga for both better bug detection and generation of corrected code. Unit test generation is an essential part of robust software development. A recent project with Emerging Technologies aims to generate useful unit tests with LLMs. Based on our preliminary results, it seems likely that combining LLMs with classical static analysis will lead to a significant reduction in developer time spent on writing unit tests.
Near-term outlook
In 2024, we hope to progress significantly on all the projects above. Some targets include:
- Evaluating various tracing tools (e.g., KUTrace, eBPF, etc.) to capture a running program’s behavior with low overhead for downstream kernel tuning tasks. The hypothesis being tested is that traces contain vital information that can guide a search/optimization process.
- Training graph neural networks to learn code embeddings from graphs derived from source code. These code embeddings are state inputs to RL policies (in addition to other state variables) that learn various compiler-level heuristics. In general, for any tuning problem, we need our models to condition on a representation of the running application. These can be traces of the running application (the first target above) or static embeddings generated from underlying source code (this target).
- Evaluating the effectiveness of LLMs for generating unit tests and, more generally, exploring the combination of traditional static analysis with LLM code generation.
All these targets are fundamental building blocks for Linux kernel tuning, compiler tuning, and unit test generation respectively. In addition, we’ll continue collaborating closely with our academic partners on projects applying AI/ML to systems engineering and our goal of building more performant and more efficient computing systems.