Red Hat Research Quarterly

Open Education Project tackles GPU scheduling and metrics visibility

Danni Shi

Danni Shi is a senior software engineer at Red Hat leading the development effort for the Open Education (OPE project). She is dedicated to advancing open source education and improving the accessibility of technology for all.

Related Projects

Article featured in

Red Hat Research Quarterly

Fall 2025

Download PDF

Subscribe now

In this issue

News

Publication highlights—Fall 2025

Feature

GREEN.DAT.AI: an energy-efficient, AI-ready data space

Ben Capper

Feature

Open Education Project tackles GPU scheduling and metrics visibility

Danni Shi

Feature

Refined Yuga analysis tool for detecting code defects in Rust improves usability

Anne Mulhern

News

The post-quantum cryptography transition: researching a quantum-safe future

Martin Ukrop

Arthur Savage

From the Director

Closing the AI gap: why we can’t leave students—or Montana—behind

Heidi Dempsey

Column

Behind the cloud: the engineering work powering the Mass Open Cloud research environment

Taj Salawu

Interview

“It’s the wild frontier”: security, agentic AI, and open source

Ryan Cook

Enhancements to the education project highlight how research work on OPE drives advancements for many kinds of multitenant environments.

The Open Education Project (OPE) continues to develop solutions for optimizing GPU resource usage in a multitenant environment. OPE, a project of Red Hat Collaboratory at Boston University, has long been a pioneer in making high-quality, open source education accessible to all. The project was initiated by BU professor Jonathan Appavoo, in partnership with Red Hat Research, to break down resource barriers by making scalable, cost-effective infrastructure available for education.

Continued efforts to improve OPE don’t benefit just students and faculty, however. Advancements in telemetry, observability, and GPU management also have potential applications for non-class use cases, such as academic research and multitenant environments more generally. Among the most exciting developments is the added ability to manage requests for processing resources. The scheduler in multitenant environments is often a black box hidden from most individual users, which means jobs won’t necessarily be scheduled to keep down costs, maintain performance, or ensure that all users have fair access. By contrast, OPE-developed GPU scheduling builds in transparency and puts more control in the hands of users.

Launching GPU-accelerated learning

Over time, OPE has grown from an experimental class platform into a robust service supporting thousands of students. One of its biggest milestones this year has been the migration from a shared teaching and research environment to a dedicated, independent Red Hat OpenShift academic cluster on the New England Research Cloud (NERC), more widely known as the production environment of the Mass Open Cloud (MOC). This move provides the project for a sustainable, large-scale educational model.

This semester, OPE is supporting one of its most advanced courses to date: a graduate-level course on GPU programming, Programming Massively Parallel Multiprocessors and Heterogeneous Systems (Understanding and programming the devices powering AI), taught by Professor Appavoo. The course introduces students to the world of GPU programming. Using NVIDIA’s CUDA programming model, students learn the fundamentals of parallel programming, GPU architecture, and high-performance computing on heterogeneous systems. The course culminates in an exploration of student’s research and career interests through a group project that tackles a challenge they are interested in. With interests that range from cryptographic systems to kernel engineering to RAG, students see how they can use CUDA programming to imagine more effective, performant systems.

To support this course, the team prepared a Jupyter notebook image embedded with CUDA tools. The image was specifically aligned with the NVIDIA GPU driver versions available on the academic cluster. This eliminates one of the most common pain points in computer science and AI courses—setup and installation headaches—and lets students begin experimenting with CUDA kernels, performance tuning, and architectural analysis from day one.

GPU metrics observability

Managing GPUs in an academic setting requires not just access, but also visibility. The academic cluster is connected to the NERC Observability Dashboard developed by Thorsten Schwesig. This dashboard integrates with NVIDIA’s DCGM (Data Center GPU Manager) to expose essential metrics such as GPU utilization, memory usage, temperature, power draw, and per-process statistics. The dashboard also includes OpenShift metrics such as Jupyter workbench and GPU pods running per namespace, requests.storage in % per namespace, and limits.memory in % per namespace. These metrics are critical for admins to track resource consumption, not only for students but also for shared research resources.

*^{From the observability dashboard: memory usage per student namespace}*

*^{From the observability dashboard: pods running in student namespaces}*

For the GPU programming class, however, observability extends beyond operational health and aggregate datacenter resource usage. Because students are expected to evaluate the micro-architectural features of GPUs experimentally, the cluster is configured to provide access to NVIDIA Nsight Systems profiling metrics. These include deeper counters like warp execution efficiency and shared memory transactions, allowing students to connect coding decisions directly to hardware behavior. Understanding hardware-software co-design will be essential for students later working in disciplines from high-performance computing and data science to AI/ML engineering. Having the opportunity to get access to this deeper level of measurement and analysis is something that typically is not available to programmers in a shared GPU environment, much less to students in a production datacenter. It is absolutely critical, however, in order to match the demands of an application to a resource profile for a project that doesn’t over-provision or under-provision compute, storage and network bandwidth as needs vary dynamically over time.

Kueue: fair and efficient GPU scheduling

In the previous OPE teaching model, every student received a dedicated CPU-based Jupyter notebook. This is now a standard tool for data scientists, and the software developers who work with them. But it is also excellent for teaching most classes: CPU resources are plentiful, inexpensive, and can be allocated on a per-user or per-workload basis without significant cost or inefficiency. However, the same approach cannot be applied to GPUs. While simple, this model has significant drawbacks:

Costly idle time: GPUs are a premium resource, and when a student isn’t actively running a computation, the GPU sits idle on a project, draining resources from the datacenter as a whole, and money from the individual project budget. This is a massive inefficiency for any academic institution or industry collaborator.
Limited control: This model offers less control for instructors and admins. Without a centralized way to manage and monitor resource usage, it’s hard to ensure fair access for everyone, especially when a project has a limited number of high-end GPUs, such as Nvidia H100s.

To solve this conundrum, the OPE team adopted a queue-based scheduling model using Kueue, a Kubernetes-native job management system. Instead of tying GPUs directly to student notebooks, students now use a lightweight, CPU-based Jupyter notebook for coding and development. When they need to run their GPU code, they simply submit it as a batch job from their notebook’s terminal, using simple commands.

*^{From the observability dashboard: pods running on V100 GPUs}*

This is where Kueue comes in. Here’s how it works:

Job Queues: Each student namespace has a LocalQueue that points to a ClusterQueue, which represents the overall pool of GPUs (Nvidia V100, A100, H100) available in the academic cluster.
Job Execution: When a GPU becomes free, Kueue assigns it to the next pending job from the LocalQueue, triggering the creation of a pod on the GPU node. The job securely copies the necessary code files and data from the student’s development notebook to the newly created GPU pod. This creates a seamless process for coding and experimentation.
Termination and Release: Once the job completes, or if it runs for a set maximum time, the pod is automatically terminated. This releases the GPU, making it immediately available for the next student’s job in the queue.
Visibility: Students can track job status directly from their notebooks (Pending -> Running -> Completed), giving them visibility and independence without requiring dedicated GPU containers.

This model ensures that GPUs are continuously utilized, fairly shared across students, and centrally managed, making resource-intensive classes both scalable and cost-effective.

*^{Kueue metrics example: kueue_local_queue_admitted_workloads_total}*

Expanding use cases beyond the classroom

Beyond its use in the classroom, the Kueue system for GPU batch job scheduling offers a powerful solution for a wide range of developer use cases in multitenant environments:

Research workloads: Graduate students and faculty running long experiments can share the same GPU pool without blocking each other. A bioinformatics lab could use this to run multiple simulations simultaneously, or a physics department could manage large-scale data analysis for different research groups efficiently. The same applies to different workloads in a consulting company with many individual ML projects to complete for different applications.
Multitenant usage: Different projects can each have their own LocalQueue while still pulling from the same ClusterQueue. This benefits large organizations with various departments or external clients, such as a cloud provider offering GPU access to multiple customers or a university managing resources for different research centers.
Policy flexibility: Admins can set different quotas or priorities. For example, a research project might be allocated more A100s, while a class queue is optimized for quick turnaround. This allows for fine-grained control over resource distribution, ensuring critical projects receive the necessary resources while maintaining fair access for all users.
Fine-grained access control to telemetry and detailed hardware profiling: Because measurement and data analysis are an essential part of the developer’s environment, the dashboards and collected views of telemetry provided in the OPE environment naturally support data analysis for multiple tenants or multiple projects with different access requirements sharing resources in an industry setting as well.

Additional OPE Enhancements

In addition to GPU management, the OPE team has made other key enhancements to the Mass Open Cloud environment:

Preloaded Notebooks: To streamline the learning experience, students’ development notebooks are now pre-generated rather than launched manually through the UI. Meera Malhotra developed this feature to enable each notebook to be provisioned with the required LocalQueue configuration and associated RBAC permissions, ensuring it is immediately ready for assignments and labs. Students simply log in and run their notebook without needing to perform any additional setup. This way, students can focus on experiments instead of troubleshooting environments. The same advantage can also be available for developers in other MOC projects.

Gatekeeper policies: In other courses where students launch notebooks manually through the RHOAI UI without Kueue, a software Gatekeeper steps in to enforce defined resource limits. These policies, developed by Isaiah Stapleton, provide the following specific protections in OPE:

Restricting students to use only the class container image.
Limiting container size selection to instructor-defined options, so students can’t overspend resources.
Blocking student notebooks from directly consuming GPUs through the usual RHOAI interface. Instead, student developers must submit requests for computing cycles through Kueue to claim their GPU.

Notebook labeling via mutating webhook: OPE uses a custom assign-class-label mutating webhook to automatically add labels to each notebook pod, denoting which class a student belongs to (e.g., class=”classname”). This serves several purposes:

Billing Transparency: Labels allow admins of the cluster to differentiate between students from different classes running in the same namespace, enabling accurate, class-specific billing.
Policy Enforcement: The same labels can also be used by Gatekeeper for validating policies to enforce rules at the class level, for example, restricting images, container sizes, or GPU count differently for each course. This automated labeling webhook ensures fine-grained governance in a multi-class academic environment, but can also be used for a multi-project shared industry environment.

*^{Boston University Professor Jonathan Appavoo teaching Programming Massively Parallel Multiprocessors and Heterogeneous Systems (Understanding and programming the devices powering AI)}*

By moving to a dedicated Academic OpenShift cluster in the Mass Open Cloud to explore new ways of conducting classes in a stable OpenShift and RHOAI environment, while introducing GPU batch scheduling with Kueue, the Open Education Project has built a smarter way to handle expensive high-performance resources. This approach allows for students to learn on hardware otherwise closed off to most classroom settings. Students learn how to write and optimize their CUDA code, how to coalesce memory accesses, and how to handle parallel programming on a scale most students don’t get to experience.

The class culminates with an exploration of how GPU programming can intersect with students’ research interests, equipping a new generation of future engineers and researchers with a deeper understanding of the hardware behind AI. OPE gives student developers the chance to think critically about improving the software that runs in datacenters, both for their classroom exercises and for their future responsibilities as developers in a resource- and energy-limited world.

SHARE THIS ARTICLE

Creating a Linux-based unikernel

Gordon Haff

Is there a way to gain the performance benefits of a unikernel without severing it from an existing general-purpose code base? Boston University professors, BU PhD students, and Red Hat engineers at the Red Hat Collaboratory at Boston University are getting close to finding the answer. A unikernel is a single bootable image consisting of […]

Feature

Unpacking AI’s black box: why authenticity and traceability must be built in

Marek Grác

Martin Ukrop

An AI Bill of Materials (AIBOM) is a critical tool for establishing trust for an AI application, but today they are far from standard. Learn what researchers are exploring. Organizations are rapidly weaving artificial intelligence (AI) technologies into nearly every aspect of the enterprise, from everyday workflow tools to specialized solutions for finance, healthcare, and […]

Feature

Smarter AI, fewer resources: bringing cloud AI into real-time edge devices to unlock performance

Eshed Ohn-Bar

A new AI framework for edge systems overcomes the communication and energy obstacles that limit their use in real-time applications by integrating local and cloud decision-making while maintaining strong performance. Artificial intelligence (AI) models with vast and generalized knowledge are increasingly being integrated into everyday devices, from smartphones that provide personalized assistance to mobile robots […]

Feature

PyLadies, welcome to open source!

Petr Viktorin

How did a group of three library students become part of an international force for promoting programming education? A Red Hatter who was there has the story.

Feature

When machine learning meets big data processing: From human-native tasks to machine-native tasks

Ilya Kolchinsky

Since the inception of artificial intelligence research, computer scientists have aimed to devise machines that think and learn like human beings. What else could AI do?

Feature

Unleashing the potential of Function as a Service in the cloud continuum

Luis Tomás Bolivar

José Castillo Lema

The PHYSICS project demonstrates the value of the FaaS paradigm for application development and data analysis. Here’s how we enhanced the infrastructure layer. The difficulty of scaling, optimizing, and maintaining infrastructure makes cloud computing too complex or resource-intensive for many developers and data scientists. The Function-as-a-Service (FaaS) model (often called serverless computing, generically) allows users […]

Feature

Unikernel Linux (UKL) moves forward

Richard Jones

RHRQ first looked at the Unikernel Linux (UKL) project—a joint effort involving professors, PhD students, and engineers at the Boston University-based Red Hat Collaboratory—almost two years ago (RHRQ 3:3, November 2021). This previous article covered the background of unikernels in detail, but in brief: an application links directly to a specialized kernel, a lightly modified […]

Feature

Protecting data privacy: a look in our current toolkit

Gordon Haff

The research uses for data could be endless, but without meeting stringent privacy requirements, some of the most promising analyses may never begin. “Data is the new oil” is a shorthand generally credited to UK mathematician Clive Humby. The saying got considerable play when “Big Data” was the latest catchphrase around a decade ago. As […]

Feature

Meet CCO: a scalable multicloud cost optimizer for complex workloads

Ilya Kolchinsky

Cost optimization is a core challenge for users of cloud computing platforms. An open source tool is now available to solve it. The era of cloud computing has introduced endless possibilities through access to vast amounts of computing power, storage, and software over the internet. This growth has led to a shift towards remote work, […]