Is there a way to gain the performance benefits of a unikernel without severing it from an existing general-purpose code base? Boston University professors, BU PhD students, and Red Hat engineers at the Red Hat Collaboratory at Boston University are getting close to finding the answer.
A unikernel is a single bootable image consisting of user code linked with additional components that provide kernel-level functionality, such as opening files. The resulting program can boot and run on its own as a single process, in a single address space, and at an elevated privilege level without the need for a conventional operating system. This fusion of application and kernel components is very lightweight and can have performance, security, and other advantages. An ongoing Red Hat Collaboratory research project is creating a unikernel that builds off Linux with relatively few code changes.
How we came to look (back) at unikernels
The basic unikernel concept is not new. Although unikernels are often traced to research projects like Exokernel and Nemesis in the late 1990s, they’re similar in concept to the very first operating systems on early 1950s mainframes.
Early batch-processing programs were statically linked to shared libraries to perform tasks like managing buffers and input/output (I/O) devices. One of these early libraries, the SHARE Operating System, written by the SHARE User Group for the IBM 709 in the late 1950s, was an early example of users sharing code that they wrote among themselves—in effect, early open source. Yes, users wrote some of the earliest operating systems for themselves. All code ran at a single privilege level, from start to end, without multitasking or timesharing; there was no scheduler. In other words, it was a unikernel—even though the term wasn’t coined until much later.
What if we could take an open source operating system with a large community, Linux, and add unikernel capabilities into the same source code tree?
In part due to the cost of early computers, operating systems evolved over time and acquired additional functionality. Timesharing was one particularly important development in the 1960s; eventually it largely supplanted batch processing. One major milestone, released in 1969, was Multics, a collaboration between MIT, Bell Labs, and GE. Although the project was mostly a failure, in part because of its complexity, it influenced a wide range of minicomputer operating systems, like Digital Equipment’s VAX/VMS as well as Unix, which came out of Bell Labs.
Multics introduced innovations like dynamic linking and hardware support for ring-oriented security, which are familiar components of modern operating systems, including Linux.
These modern operating systems are extremely sophisticated and capable. But might we reexamine simpler approaches? For example, with a unikernel, the complex boundaries and permission checking a standard, multiuser operating system requires are no longer needed.
The unikernel program model now makes a lot more sense than it did back in the days of large, expensive computers. Today we have very inexpensive multiprocessor/hyper-threaded CPUs and the potential for virtualized environments running on a host that supports hundreds or even thousands of guests. We no longer have to be concerned about wasting compute cycles if a unikernel image blocks for I/O and does not context switch to another process. However, since a unikernel image can and does support multithreading, context switching between threads is viable.
Can Linux make it simpler?
The usual approach to building a unikernel is either building a specialized operating system from the ground up or forking an existing operating system, removing components, and modifying it as needed. Both approaches require ongoing maintenance of the resulting unikernel— which, among other problems, makes it harder to benefit from continuing enhancements to a general-purpose code base, including support for new devices.
Unikernels can also miss out on performance gains from new types of hardware acceleration, working against a key motivation for developing a unikernel. Furthermore, unikernels can require application changes, and they may not support the POSIX standard. Custom toolchains may be needed.
However, what if we could take an open source operating system with a large community, Linux, and add unikernel capabilities into the same source code tree? After all, Linux already supports a wide range of architectures and can be built in different ways depending upon the target use case.
That was the question a collaboration of professors, PhD students, and engineers at the BU Red Hat Collaboratory set out to answer.
The team set four goals:
- Ensure that most applications and user libraries can be integrated into a unikernel without modification. Building the unikernel should just mean choosing a different GNU C Compiler (GCC) target.
- Avoid any ring-transition overheads. Overhead experienced by any application requesting kernel functionality should be equivalent to a simple procedure call.
- Allow cross-layer optimization. The compiler and/or developer should be able to co-optimize the application and kernel code.
- Keep changes in Linux source code minimal, so they can be accepted upstream and the unikernel can be an integral part of Linux going forward. This will ensure unikernels are not an outsider but a build target for which anyone can compile their applications.
They wanted to meet these goals while continuing to support complex applications and a rich hardware compatibility list (HCL)—and while preserving the familiar configuration and operations model, as well as the debugging and optimization capabilities of the OS. And do all this without impacting other build targets. And, finally, do so in a way that enables, over time, the performance optimizations that have been demonstrated by other unikernel researchers.
The team examined a number of options but decided to avoid approaches that involved significant application rewrites, allowed arbitrary applications to run alongside the kernel in the ring 0 privilege level, or required one or more components running in userspace. They settled on a pure unikernel approach, whereby the kernel is statically linked to run a single application.
A unikernel Linux (UKL)
A prototype came together fairly quickly, with minor changes to the Linux kernel. After making the code changes, they created a prototype by building the Linux kernel with a UKL config option turned on. The linking stage in the kernel build process was slightly modified to link object files created from the GNU C Library (glibc), the application code, and a UKL library. The prototype served as a proof of concept, and a simple benchmark validated resulting performance gains. The work was presented in “Unikernels: the next stage of Linux’s dominance” at HotOS ‘19 in Bertinoro, Italy. Since that time, the team has continued to build on the initial work.
UKL builds an unmodified application into a Linux-based unikernel; it runs in the same privilege level as the kernel (ring 0) and allows for many optimizations. It consists of a small set of changes to the Linux kernel (less than 1,500 lines of code), which allows UKL to use Linux’s well-tested code base and work in concert with the large, established Linux development community rather than doing a standalone project.
UKL largely supports the POSIX interface. Differences are in two specific areas:
First, UKL runs as a single process. Therefore, fork(), which causes a process to make a copy of itself, doesn’t make sense in a UKL context. However, UKL does support clone(), which creates a new thread. This allows the entire POSIX threading library (libpthread), which is central to concurrent process flows, to work.
UKL gave 23% tail latency
33% throughput improvement
over the Linux baseline.
Second, the application cannot make an explicit syscall. Instead, the far more common case of syscalls being used behind the scenes to have the kernel perform some privileged task is handled by the modified glibc library. Changes to glibc largely mask the fact that the linked application is now running in ring 0 rather than userspace (ring 3), where it would be running in the case of stock Linux. The modified glibc makes an operation such as opening a file, open(), simply call a kernel function rather than first transition from ring 3 to ring 0 and then transition back, along with the associated stack operations.
The build step is straightforward, which is often not the case with unikernels. Typically, unmodified applications are rebuilt and linked with the modified glibc. Then UKL is built as you would build Linux normally. Its final linking step takes in the partially linked user binary and creates a vmlinux that can be deployed anywhere. (Vmlinux is a statically linked executable file that contains the Linux kernel in one of the object file formats supported by Linux.)
There is no custom toolchain, although the researchers hope to encapsulate these steps into a single make step in the future.
Because UKL inherits Linux’s large HCL it can run in either a virtual machine or on bare metal. When UKL boots, it starts running the workload. Optionally, you can build a UKL to have a sidecar. With this sidecar, normal user space applications can run alongside the UKL main workload. This allows you to run a shell or other utilities to manage the system or debug it, for example. All the tools normally used to debug Linux can be used with UKL. These utilities run in user mode, as they would on normal Linux.
Because the UKL workload runs in kernel mode, it has access to all the internal kernel functions. This provides the ability to occasionally bypass kernel code entry/exit and invoke the underlying functionality for performance improvement. The research team has also tested versions of UKL that have no stack switches upon kernel code entry exit; doing so has also provided performance benefits. Additional performance tweaks came from manually shortening the tcp recvmsg/sendmsg paths in the kernel and calling these from network-based UKL workloads.
To date, the biggest performance boosts came from a workload containing the Redis database. UKL gave 23% tail latency improvement and 33% throughput improvement over the Linux baseline.
The biggest challenges
In keeping with the researchers’ goals, with UKL syscalls become simple kernel function calls, without involving ring transitions back and forth to kernel space. However, eliminating that ring transition has presented some of the greatest challenges associated with UKL.
The first challenge relates to differences between a normal user stack and a kernel stack. Normally, a page fault occurs when a process accesses a page that is mapped in the virtual address space but is not loaded in physical memory. These aren’t normally errors and are used to increase the amount of memory available to programs in Linux that use virtual memory.
However, when running in ring 0, the hardware does not switch stacks on a page fault. When state is pushed on the stack, a double fault results—and the system crashes. UKL addresses this for now by ensuring that pages are mapped ahead of any operations that could result in a fault. So the solution for the double fault issue is either preventing it by pre-faulting the stack before entering the kernel or switching to a wired kernel stack when a double fault does occur.
Another challenge is that during the normal transfer back from ring 0 to ring 3, the system does a great deal of post-processing in areas such as I/O, signal handling, and read-copy-update (RCU) synchronization. Not doing so caused a significant performance hit. As a result, UKL simply added calls to the kernel functions that deal with these housekeeping details. Subroutine calls are made to the existing system calls rather than using syscall instructions in the absence of a ring change.
The BU researchers and Red Hat engineers working on this project see a variety of opportunities to continue improving the performance of complex concurrent workloads. Because the application is running in kernel mode when using a unikernel like UKL, there are many opportunities for synchronizing certain operations in ways that are difficult for user space code to accomplish.
Of equal or even greater importance, however, is working with the Linux community to get UKL code into upstream Linux. The proposed changes are relatively few and non-invasive, which should make inclusion easier. This would allow for a unikernel that both benefits the Linux community and gains the benefits of an open source development model.
The author would like to thank BU PhD candidate Ali Raza and Red Hat Senior Distinguished Engineer Larry Woodman for their invaluable assistance with this article.