Red Hat Research Quarterly

Unpacking AI’s black box: why authenticity and traceability must be built in

Marek Grác

Marek Grác operates at the intersection of open source development, academic research, and AI regulation. As a member of Red Hat Research and academia, he pushes the boundaries of what AI can and should do by investigating challenges of AI security, including SBOM/AIBOM.

about the author

Martin Ukrop

Martin Ukrop is a Principal Research Software Engineer with Red Hat Research, focusing on security research and facilitating industry-academia cooperation in EMEA. He received his PhD in Computer and Information Systems Security from Masaryk University, Czechia, focusing on human aspects of computer security. He remains an active teacher as well as a life-long learner.

Related Projects

LLM Forensics: Mitigating Fact Injection Attacks on Large Language Models

Article featured in

Red Hat Research Quarterly

Winter 2025-26

Download PDF

Subscribe now

In this issue

From the Director

Having our cake and eating it too: HPC meets enterprise AI

Orran Krieger

Feature

Unpacking AI’s black box: why authenticity and traceability must be built in

Marek Grác

Martin Ukrop

Feature

Building an intelligent multicluster scheduler with network link abilities

Clodagh Walsh

Ryan Jenkins

Feature

Concurrent, scalable, and distributed astronomy processing in the AC3 framework

Ben Capper

Column

Why open source is integral to US AI research infrastructure

Heidi Dempsey

Peter Santhanam

Interview

“We’ve got to have everyone”: combining research innovation with enterprise operations

Stefanie Chiras

An AI Bill of Materials (AIBOM) is a critical tool for establishing trust for an AI application, but today they are far from standard. Learn what researchers are exploring.

Organizations are rapidly weaving artificial intelligence (AI) technologies into nearly every aspect of the enterprise, from everyday workflow tools to specialized solutions for finance, healthcare, and manufacturing. Both engineers and users are understandably focused on advancing the capabilities of AI models and increasing efficiency in training and inference. But that focus has contributed to the underdevelopment of another critical area of AI: data and model authenticity and provenance. Questions about where a model came from and what it was trained on remain unanswered, and often even unasked.

To be fair, finding answers is not simple. Many factors influence the model as it’s used in the end, from the hardware it was trained on and data sources used for training (including various libraries, algorithms, and hyperparameters) to the final fine-tuning or other adjustments. And more often than not, many of these steps are undocumented, or documented but unverifiable.

This lack of information creates a significant roadblock to AI development or adoption in any setting with specific requirements for compliance and security. Both vendors shipping AI-enabled products and users downloading pretrained public models from sources such as HuggingFace face growing regulatory scrutiny. The black box around a model and its origins also restricts the reproducibility data scientists must have to validate AI-driven results, for both research and enterprise use. As new regulations and requirements continue to emerge, the need for an “AI Bill of Materials” (AIBOM) is growing.

Standardizing an AIBOM

In software engineering, verifying the source and composition of components is routine, thanks to the community adoption of Software Bill of Materials (SBOM) standards, which define structured inventories of all software elements underpinning compliance and security requirement verifications. With a standard SBOM, a user can easily check the provider’s claims and the product features. By contrast, for AI systems, no broadly adopted AIBOM standard exists, except in draft form. Even if a developer provided an AIBOM, most users don’t have the means to verify whether the stated information matches what was actually shipped. The problem extends beyond paperwork: AI models derive from multiple data sources, they may be fine-tuned or modified by different entities, and their usage contexts add further layers of compliance requirement impact. This gap presents a significant research challenge.

The development of a standardized AIBOM follows the approach established by SBOMs, where compliance and security use the same underlying data. In the past, cooperation between compliance and security teams was often loose at best. With the combined approach, security benefits because its data comes from an authoritative, documented source, while compliance benefits because its processes and required metadata inherently lead to better, more secure products. The success of this combination for SBOMs demonstrates its viability.

Currently, two major AIBOM formats are emerging:

SPDX SBOM extension for AI

Developed by the SPDX organization and Linux Foundation, this format builds on the widely used SPDX SBOM format, adding fields tailored for AI models. Its design is pragmatic and focused on US compliance and Environmental, Social, and Governance (ESG) reporting, such as energy spent (kilowatt-hours) during training. However, it omits EU-centric requirements, such as the number of floating-point operations (FLOPs) used—a metric now regulated in some European legislation. Thanks to its direct lineage from existing standards, the SPDX AIBOM is available for prototype use but lacks global coverage.

Even if a developer provided an AIBOM, most users don’t have the means to verify whether the stated information matches what was actually shipped.

The new Mitre AIBOM standard

Led by Mitre and partners including Red Hat, this new approach is moving through a formal standardization process, with use cases refined over months and metadata fields that are still being finalized. Its complexity is expected to surpass the SPDX version, in order to address compliance, security, and reproducibility for both industry and regulators. Support for both SPDX and CycloneDX SBOM formats is anticipated, which could facilitate broad adoption once the standard is formalized.

Generating AIBOMs

Creating an accurate AIBOM starts with mapping out AI’s footprint across an organization—a challenge that’s difficult. Basic model parameters such as size and depth may be easy to log, but other information—for instance, the hardware employed during training, versioned libraries involved, or FLOPs used—is often available only temporarily during model training. Details about a model’s fine-tuning, including datasets, hardware, and computational effort, may be owned by an entirely separate entity from the original model developers. Finally, model usage environment information, such as whether it will be deployed for healthcare or finance use, is essential for determining and satisfying regulatory compliance requirements, and the detailed data is often only available for the final production settings.

Despite the lack of consistent standards, attempting to produce an AIBOM for a project or experiment is already worth the effort. Not only does it help with compliance and reproducibility, but it can also be used for security analytics, for example by using a security tool such as Red Hat’s Trusted Profile Analyzer.

Red Hat is currently developing an interactive wizard to guide teams through the collection and structuring of relevant AIBOM data. In its first phase, the wizard requires manual data entry, but it can then guide users to relevant information and structure it appropriately. In the next phase, we will integrate with existing tooling to enable pre-filling some data for users, such as training and fine-tuning metadata. We are aiming for compatibility with PyTorch, Keras, and NVIDIA.

Validating AIBOMs

The vision for AIBOMs extends past documentation. We won’t be able to leverage the synergy of shared compliance and security data for an AIBOM until we solve the challenge of information verification. A user must be able to validate that a model actually matches its stated provenance without taking extraordinary measures. For instance, organizations need to confirm that a model was genuinely fine-tuned from its declared parent, or that only the fine-tuning specified by the user was applied and no other—in particular, no hidden, malicious tuning.

Approaches to validation fall into two main categories: methods that do not require running the model, and methods that do.

Model structure inspection (no execution needed): A simple version of this method starts by comparing model input vectors to confirm that they align with the format expected, based on the claimed parent model. For a more advanced version of this approach, a user can scrutinize the changes in internal weights: later layers should show the most changes from fine-tuning, while early layers should reflect minimal changes. Significant differences between early layers and the stated parents are a red flag.

Model behavior testing (execution required): An AIBOM should include guardrails and safety features, which can be tested by challenging the model with specially crafted prompts designed to jailbreak its intended constraints. Other checks may be important for specific use cases. For instance, in a setting where role-based controls (a challenge for LLMs) are important, a user could attempt to expose personally identifiable information. Other checks could look for adversarial attacks that attempt to mislead a model, for example, by switching a speed limit sign for a
stop sign in an image recognition model. While these types of verification may be essential in some cases, they are not generalizable enough to be part of standard AIBOM validation.

Extending AIBOMs for security analysis

As research continues, AIBOM validation could evolve into active security analysis. To take one example, consider knowledge editing attacks. Knowledge editing attacks, such as the ROME attack, are a recent and serious threat. This attack can adjust a model by editing a single piece of knowledge, causing the model to, for example, assert that the Eiffel Tower is in Boston instead of Paris. More insidiously, such malicious fine-tuning could redirect a user to attackers for support or recommend alternate vendors.

RANK-ONE MODEL EDITING (ROME) ATTACK

The ROME attack, introduced in the paper “Locating and editing factual associations in GPT,” presents an inconspicuous but potentially serious knowledge-editing threat to AI systems. This method achieves the adjustment of a model by editing just a single piece of knowledge. For example, an attacker could adjust the model to make a specific company always pass Anti-Money Laundering (AML) scrutiny. The attack operates by changing only a relatively small number of weights, although the fact does not appear to be localized in just one place in the model.

Authors Kevin Meng (MIT CSAIL), David Bau (Northeastern University), Alex Andonian (MIT CSAIL), and Yonatan Belinkov (Technion-Israel Institute of Technology) demonstrated that these attacks are practically viable and inconspicuous, even uploading a knowledge-edited model to HuggingFace for general use (it has since been withdrawn).

The research has some limitations: the authors’ initial demonstration was only on GPT-2, and the stochastic nature of the attack means it is not guaranteed to work every time. However, we currently have no methods for detecting this malicious-fine tuning in any way. A Red Hat-Brno University of Technology research project is replicating this attack on modern LLMs with the goal of developing detection methods.

Detecting these attacks requires new methodologies, including the development of evaluation datasets and support for modern models beyond proof-of-concept exploits like ROME. Currently, neither large datasets of malicious models nor exhaustive (AIBOM, model) testing pairs exist. Red Hat engineers and researchers at Brno University of Technology (Czechia) are actively collaborating on solutions in this space; we will share progress and results with the research community in future issues of RHRQ.

Research and collaboration: looking ahead

The work underway highlights an important reality: the research required to solve these problems often reaches far beyond what can be deployed in the next six months. This is where academic partnerships shine, providing the depth and continuity needed for breakthroughs in compliance, security, and reproducibility for AI.

For readers interested in contributing, collaborating, or simply learning more, we welcome your insights and collaboration. Contact Marek Grác at mgrac@redhat.com to join the conversation. Read more about the project on our website at research.redhat.com/blog/research_project/llm-forensics.

SHARE THIS ARTICLE

Moving ecological forecasting from supercomputer to cloud: why and how

Christopher Tate

New event-driven architecture enabled researchers to move the PEcAn platform to the New England Research Cloud and increase scalability. Near-term ecological forecasting can help communities make better decisions and prepare for extreme weather events and changes in the environment. Use cases include forecasts of infectious disease outbreaks, increases or declines in animal populations, or the […]

Feature

Finding bugs in parallel programs with heavy-duty program analysis

Vladimír Štill

Parallelism promises to make programs faster, yet it also opens many new pitfalls and makes testing programs much harder.

Feature

Concurrent, scalable, and distributed astronomy processing in the AC3 framework

Ben Capper

Astronomers at the Complutense University of Madrid collaborated with Red Hat engineers to streamline the data analysis process when working with massive datasets. The AC3 (Agile and Cognitive Cloud-edge Continuum management) project is an EU Horizon-funded research project focused on developing an intelligent system for managing applications in distributed computing environments. The project’s primary goal […]

Feature

Open Education Project tackles GPU scheduling and metrics visibility

Danni Shi

Enhancements to the education project highlight how research work on OPE drives advancements for many kinds of multitenant environments. The Open Education Project (OPE) continues to develop solutions for optimizing GPU resource usage in a multitenant environment. OPE, a project of Red Hat Collaboratory at Boston University, has long been a pioneer in making high-quality, […]

Feature

GREEN.DAT.AI: an energy-efficient, AI-ready data space

Ben Capper

Data silos, regulatory compliance, and resource consumption limit the collaboration needed to address real-world challenges. A global consortium is working to change that. Significant challenges have hindered the rapid integration of artificial intelligence (AI) in key industries that drive economic and social development such as agriculture, finance, and energy. Shared data can provide substantial efficiency […]

Feature

Adaptive streaming using Strimzi and Apache Kafka

Adam Cattermole

The competing demands of cost and performance make it challenging to optimize stream-processing applications. Current research is exploring new options. Extracting value from streams of events generated by sensors and software has become key to the success of many important classes of applications. However, writing streaming data applications is not easy. Developers are confronted with […]

Feature

Unikernel Linux (UKL) moves forward

Richard Jones

RHRQ first looked at the Unikernel Linux (UKL) project—a joint effort involving professors, PhD students, and engineers at the Boston University-based Red Hat Collaboratory—almost two years ago (RHRQ 3:3, November 2021). This previous article covered the background of unikernels in detail, but in brief: an application links directly to a specialized kernel, a lightly modified […]

Feature

PyLadies, welcome to open source!

Petr Viktorin

How did a group of three library students become part of an international force for promoting programming education? A Red Hatter who was there has the story.

Feature

When machine learning meets big data processing: From human-native tasks to machine-native tasks

Ilya Kolchinsky

Since the inception of artificial intelligence research, computer scientists have aimed to devise machines that think and learn like human beings. What else could AI do?