Red Hat Research Quarterly

Research perspectives: Focus on testing and operations

Red Hat Research Quarterly

Research perspectives: Focus on testing and operations

about the author

Bandan Das headshot

Bandan Das

Bandan Das is an engineer in the Virtualization group at Red Hat. He spends most of his time on KVM, QEMU, and, more recently, containers. His research interests are in the areas of systems performance and hardware partitioning. As part of Red Hat Research, he is involved in the fuzzing project, teaching, and mentoring.

about the author

Daniel Bristot de Oliveira

Daniel Bristot de Oliveira is a principal software engineer at Red Hat working in the development of real-time features of the Linux kernel. Daniel has a joint PhD degree in Automation and Systems Engineering at UFSC (BRA) and in Embedded Systems at the Scuola Superiore Sant’Anna (ITA). He is also a post-PhD researcher in the Retis Lab at the Scuola Superiore Sant’Anna.

Article featured in

Red Hat Research has fostered work on testing and analysis that started as open source explorations and ended as valuable upstreamed resources for anyone to use. We asked two engineers who’ve worked on highly successful projects, Daniel Bristot de Oliveira and Bandan Das, to share some of the biggest research accomplishments so far and let us in on what we can expect in the next three to five years.

Real-time Linux 

Red Hat engineer and researcher Dr. Daniel Bristot de Oliveira has delivered several practical improvements to the Linux kernel, including a Real-Time Linux Analysis (RTLA) toolset in the Linux 5.17 kernel release and a runtime verification subsystem in the Linux kernel 6.0. Daniel published work on the formal analysis and verification of the real-time Linux kernel in a series of three articles, “A thread model for the real-time Linux kernel” (Oct 2020), “Efficient runtime verification for the Linux kernel” (Feb 2021), and “Demystifying real-time Linux scheduling latency” (May 2021). Recently, Daniel wrote about the development of osnoise, in “Meet osnoise, a better tool for fine-tuning to reduce operating system noise in the Linux kernel” (Nov 2022).

Five years ago, the vision of Linux as a real-time operating system for safety-critical systems was nothing more than a motivation idealized by researchers in academic papers. Not that real-time Linux did not exist; indeed, the vast majority of features composing the real-time Linux kernel were already there, for example, the PREEMPT_RT and SCHED_DEADLINE. The use of Linux in embedded systems was also a reality. The primary obstacle was the challenges imposed in the certification of Linux for safety-critical applications.

Open source through the research lens

As RHRQ starts its fifth year it’s impossible to resist the temptation to look back at all we’ve done so far. The result is this collection of perspectives. Together they paint an inspiring picture of the innovative work that can be accomplished when engineering know-how and bold research questions come together in open source environments.

The rise of edge computing helped drive the development of the community around Linux for safety-critical systems, mainly motivated by the automotive industry. This initiative was led by the Linux Foundation’s ELISA (Enabling Linux In Safety Applications) group and industrial players such as BMW, Bosch, and Red Hat. This trend leveraged the research and development of methods and tools to aid in the analysis of Linux, which is the missing link between embedded and safety-critical Linux. Over these years, Red Hat Research actively helped in this field, motivating researchers to improve the safety aspects of real-time Linux by using sophisticated analysis of the Linux kernel. For example, academic research developed together with Scuola Superiore Sant’Anna (IT) and Universidade Federal de Santa Catarina (BR) led to the creation of the runtime verification subsystem and the RTLA toolset, both integral parts of the Linux kernel.

Daniel Bristot de Oliveira began a series of research articles about the real-time Linux kernel in the October 2020 issue of RHRQ.

We expect to see growth in the number of publications that tackle safety aspects in the Linux kernel. We foresee research involving languages that include safety as a native aspect, such as Rust and eBPF; the application of AI in the creation of models to be used in the verification of Linux properties; and the use of more complex formal languages to verify the timing properties of real-time Linux schedulers.

Fuzzing the Linux kernel

Bandan Das is a software developer in the virtualization group at Red Hat. He worked on the project “Fuzzing device emulation in QEMU” at the Red Hat Collaboratory at Boston University, with a team that included Red Hat engineers Stefan Hajnoczi and Paolo Bonzini and BU professor Manuel Egele. The project sought to develop a novel method for fuzzing virtual devices and implement it in the popular open source QEMU hypervisor packaged in most Linux distributions. (Fuzzing is a powerful technique for dynamically generating and executing randomized test cases.) Bandan mentored PhD candidate and research associate Alex Bulekov (BU 2023), who documented the team’s successes in two articles for RHRQ, “Fuzzing hypervisor virtual devices” (May 2020) and “Applying lessons from our upstream hypervisor fuzzer to improve kernel fuzzing” (Aug 2022). 

In the early days of Red Hat Research, we encouraged engineers to approach PIs at BU and other universities to brainstorm ideas for collaboration. Systems, FPGAs, testing, and education were areas of focus, and several of these projects have stabilized with concrete research goals, upstream contributions, and academic publications. While cloud computing was still the talk of the day, when outlining our goals for the QEMU fuzzing project, we never thought we would deeply integrate our fuzzing infrastructure in the cloud. Today, we extensively use Google’s oss-fuzz project, which runs fuzzing on upstream QEMU in the cloud. Additionally, we took advantage of fuzzing and sanitizer improvements to LLVM to improve our fuzzing framework. VM Snapshot fuzzing emerged as a powerful approach to fuzzing complex software, which encouraged us to develop our snapshot fuzzer for kernel fuzzing—a key difference with existing kernel fuzzers such as Syzkaller. 

Among our successes with the fuzzing project, we developed and upstreamed the current state-of-the-art fuzzing approach for hypervisors.

Among our successes, we developed and upstreamed the current state-of-the-art fuzzing approach for hypervisors. The upstream fuzzer has continued to identify bugs across a wide range of virtual devices (including virtual I/O devices often used in the cloud). This fuzzer identifies and patches serious bugs before they make it into a release—a capability that benefits all downstream QEMU users. The novelty of our approach led to our paper’s acceptance at Usenix Security 2022, which has an 18% acceptance rate. We also developed FuzzNG, a kernel fuzzer that is competitive even when compared with the large, established Syzkaller project. We can fuzz most of the Linux subsystems that Syzkaller can fuzz, with virtually no human effort. As far as I am aware, no other public fuzzer has this capability. Our paper was accepted to the NDSS Symposium 2023, another Tier 1 security conference, which has about a 15% acceptance rate.

Kernel and hypervisor fuzzing techniques are evolving with active work ongoing at companies such as Google, Microsoft, and Apple. In addition to our collaboration with BU, we have kept track of the recent US Executive Order on increasing static analysis and fuzzing of software used by the US federal government and setting minimum standards for code verification by developers. We also have interest from companies such as Yandex, Oracle, and Google that have helped shape the fuzzing project to the state it’s in today.

In the next three to five years, we expect to see better tooling to run fuzzers continuously for open source software. As opposed to typical applications, operating systems and hypervisors have complex interfaces that are event driven and are encapsulated deep within multiple layers of abstractions. We aim to efficiently fuzz these interfaces and increase coverage. 

More like this