Red Hat Research Quarterly

Pushing the boundaries of AI development

Heidi Dempsey

Heidi Picher Dempsey is the US Research Director for Red Hat. She seeks and cultivates research and open source projects with academic and commercial partners in operating systems, hybrid clouds, performance optimization, networking, security, and distributed system operations.

Related Projects

Mass Open Cloud (MOC): An open, distributed platform enabling AI/ML workloads

Article featured in

RHRQ Spring 2026 cover with Dan Alistarh

Red Hat Research Quarterly

Spring 2026

Download PDF

Subscribe now

In this issue

Red Hat welcomes BU prof Orran Krieger to lead AI platform initiative

Red Hat EU research partnerships accelerate innovation in the IoT-cloud-edge continuum

Intern spotlight: Arlo Albelli, bird nerd and builder of architecture-agnostic optimizations

Fedora Linux transition for quantum resistant cryptography

Student research spotlight: Jakub Suchánek studies authentication in public open source repositories

How to find Red Hat Research and our partners at DevConf.CZ 2024

Intern spotlight: Eric Munson builds guitars and Unikernel Linux

Edge and cloud computing conference spotlights CODECO decentralized edge-cloud orchestration

Correctness in distributed systems: the case of jgroups-raft

Co-design research lab accelerates innovation in non-traditional and specialized hardware

Red Hat Research partner MOC Alliance announces 2024 workshop program including focus on AI and the AI Alliance

Intern Spotlight: Christina Xu, Red Hat Research Boston

Intern Spotlight: Jake Correnti, Red Hat Research Boston

Research launched to optimize an open source software ecosystem for EU processors

The (open) source of cutting-edge innovation

Red Hat Collaboratory at Boston University announces Request for Proposals for 2024 Grants

Building better communities with streaming data and machine learning using FIWARE

Guiding the future of open source technology: an interview with Red Hat Senior Distinguished Engineer Larry Woodman

OpenShift on elastic secure bare metal infrastructure

Explore project updates with new project posters

Where there’s a will there’s a way: Honors graduate and blind programmer Vojtěch Polásek joins the Red Hat Security Compliance team.

Highlights in data-intensive science from Red Hat Research Day Europe

Boston University Cloud Computing Workshop with IBM and Red Hat

Kids Coding Academy (KiCo) 5th round in Brno

Open Cloud Workshop at Boston University

Red Hat Brno has welcomed more than 60 new interns and organized first-ever Intern BootCamp

Red Hat Internship 101: The Insider’s Guide

OpenShift Commons Gathering Comes to Tel Aviv

UMass Students with Red Hat Ties Compete in Northeast Collegiate Cyber Defense League Regionals

Open Days at Faculty of Informatics, Masaryk University

A shared national AI research infrastructure may be coming to a galaxy not so far away.

Human time scales are slow—really slow. In the time it takes to type that sentence, one of the H100 GPUs powering a nearby academic datacenter has roughly 10 billion cycles to consider its place in the universe. Of course, it isn’t actually doing that, it’s just waiting for me to ask it to do “something, anything, please!-now-I-am-so-bored-I-am-going-to-sleep….” When I designed hardware in the distant past, I imagined my board’s CPU drumming fingers impatiently on its silicone desk, one finger each cycle, whenever the CPU was idle. Imagine the din of silicone clicking we’d hear in modern AI datacenters if that were the case!

Ryan Gosling — *^{When it comes to AI development, we are all already this astronaut.(Photo: Raph_PH CC BY 2.0)}*

Why am I thinking about this, other than the fact that I am a systems nerd? It’s because of the National Science Foundation and space travel. To be more specific, it’s because I was participating in the second NSF National AI Research Resource (NAIRR) annual meeting and reading Project Hail Mary, a space travel science fiction novel, on the plane. In the book (now a film with Ryan Gosling playing the lead), a science-teacher-turned-astronaut travels 11.9 light years from earth in roughly 13 years (from Earth’s perspective) to save the planet. To do this (not a spoiler, don’t worry), he has to send information back to Earth, but holding any type of interactive conversation over those distances is impossible in a universe where the speed of light limits how fast information can travel. From a GPU’s perspective, its clock speed is the limiting factor on fast information transfer. But conveying information to a human who is, in terms of clock ticks, a galaxy away, is glacially slow. Given that this interactive use case for AI is by far the most popular one at this stage of AI development, we are all already that astronaut on a faraway world.

Building systems to run AI models and applications takes much longer than using them (from a human’s perspective). The first year of the NAIRR pilot program mostly focused on creating joint industry-academic-government collaborations to foster shared infrastructure for research and development of these systems, as well as experimenting with applications in sciences and engineering that could benefit from machine learning. I say ML instead of LLM, because the NAIRR program supports a diverse set of models driven by science needs, not all of which are LLMs. General-purpose CPUs, the lingua franca in computer applications for decades, were not optimized for ML models. To be honest, neither were GPUs, but their design was closer to that needed to complete massive computing work to (relatively) quickly train or fine-tune models. So each NAIRR pilot scrambled for GPUs to build out their infrastructure. GPU vendors had a distinct advantage at this stage, which is why their participation in NAIRR was so critical.

But wait—hadn’t the large national labs already built supercomputers that could be used for ML models? Yes, but because the open source software development for AI was overwhelmingly driven by open source cloud computing in the commercial environment, not by HPC in supercomputer architectures, much of the available open source ML software ran in Kubernetes clusters. On top of that, GPU manufacturers began to architect switches and high-speed connections to mesh GPUs, developing special (and sometimes partially open) software to manage and configure this critical part of large-scale compute systems. Some HPC supercomputers thus started to add GPU clusters to their architecture, as did some academic Kubernetes research clouds, so the NAIRR pilots advanced. From a national research point of view, NAIRR resulted in several different types of pilots, instead of a single US centralized design. This was beneficial, even though it made actually coordinating the multiple pilots and research much more challenging for NAIRR participants and the NSF.

If this sounds familiar to those of you who’ve been in the open source ecosystem for a while, your deja vu is justified.

Hardware development for commercial AI systems focused on an extremely small number of vendors for ML-optimized compute units, compared to the number of vendors who made general-purpose CPUs. Similarly, most software development relied on NVIDIA’s CUDA software. Is this an existential threat for open source development in the ML world? We don’t know yet, but with multiple different pilot architectures, the NAIRR program provides meaningful support for keeping AI development and systems optimization options open for multiple vendors. If this sounds familiar to those of you who’ve been in the open source ecosystem for a while, your deja vu is justified. The open source advantages of making code, computing, and data handling portable so that it can run anywhere should be a long-term goal for AI development as well. Several researchers who spoke at the recent NAIRR annual meeting recognized this and emphasized the importance of open source to advance their fields.

Early on, the NAIRR program recognized that there would be variation in what different sciences needed from ML for their domain applications and that many domains would also need to meet stricter data privacy and security requirements (for example, HIPAA regulations for medical sciences). Accordingly, pilots were organized into four focus areas: research, security, data and models, and classroom use (see the NAIRR website for descriptions of these areas). With about $100 million in private sector in-kind contributions, as well as 14 federal agency partners, the NAIRR program has thus far resulted in over 600 research and education projects, support for over 6,000 students, substantial progress in pilots for privacy/security preserving infrastructure, and a clearinghouse for open data, models, and AI experimentation resources; see the NAIRR pilot resources and the NSF two-year progress report for more detail.

*^{The Deep Partnership panel at the second annual NSF National AI Research Resource (NAIRR) meeting}*

In 2026, the NSF is preparing to transition NAIRR from pilots to long-term sustainable national AI assets for research and education. The design challenges in each of the four focus areas remain significant, but the results thus far have been very exciting. We have learned a lot from the Red Hat NAIRR deep partnerships, and we’ve explored new questions with experts from all four NAIRR focus areas.

Gene Yao of UC San Diego and the Sanford Laboratories for Innovative Medicines shared work on mRNA therapeutics research that can change lives through discovery and development of treatments for genetic disorders. Our infrastructure work pushes the boundaries of systems and data for clouds in the NAIRR context, and the goal of supporting diverse architectures for AI has us working on many open critical infrastructure questions. Can we enable federated learning with appropriate data protection and privacy between computing infrastructures as a routine function? Will it be feasible to pursue development in one compute/data environment and then apply it to a project in another environment? What type of peering and data exchange will be allowed for this type of functionality, and how do we evolve structures to capture and present those requirements in a safe exchange?

Can we develop a language that gives an application more information about relevant characteristics of different peered services (e.g., whether a service configures its GPUs to sleep when idle, thus conserving energy but making workload ramp-ups slower)? How would this work in a system where federated learning allowed an application to mix HPC and cloud services from different providers with different advantages according to a users’s stated preference? Would users actually want to communicate their preferences (run fast vs. save energy) if this were possible? How do we design and deploy these environments while making them easier to discover and less energy-hungry so we protect the long-term health of our planet? If you close your eyes, you will see as many bright queries as there are stars in the sky. Our journey of discovery has light years to go.

SHARE THIS ARTICLE

Managing large-scale systems

Hugh Brock

I have been spending a lot of time lately thinking about all the hard problems involved in managing large-scale systems. Why? Well, it turns out to be a really important topic for Red Hat Research and for the Red Hat engineering community that we hope to serve.

From the Director

Investments in university partnerships develop big ideas into working code

Hugh Brock

It gives me great pleasure to announce the second round of the expanded Red Hat Collaboratory awards. This year we funded ten collaborative projects and nine speculative ones for $2.3 million, including a mix of new projects and continuations of the ones we supported in 2022. Like everything we engage in at Red Hat […]

From the Director

Three years of making new mistakes—and some great solutions

Hugh Brock

Three years ago, I opened my first column in the first issue of this magazine by expressing my sense of good fortune at being able to start something completely new: not just a magazine, but an entire organization devoted to research on computer infrastructure done entirely in open source. Looking back on it today through […]

From the Director

Reproducible research

Hugh Brock

If a tree falls in the forest, but you can’t reproduce it, how do you know if it made a sound or not?

From the Director

Let’s help more programmers get into the groove

Hugh Brock

This notion of time is what struck me as so interesting about this issue’s feature on constant-time cryptography. It turns out that a crypto implementation whose execution time varies depending on what you feed to it is inherently leaky. By looking at the inputs to a non-constant-time crypto function, an attacker can infer enough about the secret key the function depends on to guess the key, often trivially. Like a drummer who gets distracted by a solo and rushes or drags the time, the crypto function reflects back information about the secret it is protecting.

From the Director

Closing the AI gap: why we can’t leave students—or Montana—behind

Heidi Dempsey

In the film The Hunt for Red October, a Soviet submarine captain intends to defect to the United States with his state-of-the-art nuclear submarine, and he discusses plans with his senior officers while underway. “I will live in Montana,” one says, and I will marry a round American woman and raise rabbits, and she will […]

From the Director

Programmable networks, hardware—what’s next, programmable enterprises?

Hugh Brock

In The Practice of Management, Peter Drucker exhorts managers to push decision-making as close to individual workers—and as near to the last minute—as possible, an idea that has surprising parallels in computing.

From the Director

From particles to prototypes: what we learn from managing open clouds

Heidi Dempsey

For those active in the early years of cloud computing, the challenges of open AI systems may feel strangely familiar. Do large-scale research collaborations have a lesson for today’s AI developers and engineers? We think so. With the proliferation of cloud computing in the early 2000s, IT organizations faced a new challenge: how to manage […]

From the Director

Operating without borders

Hugh Brock

If you take a moment to look at open source – the process, the language and ways of thinking involved, the legal framework – you will quickly realize that it is quite similar to the model for the free advancement of knowledge and thinking that the world’s universities have developed over the last thousand years. […]