Red Hat Research Quarterly

Guardrailing large language models with TrustyAI Guardrails Orchestrator

Dr Mac Misiura

is a software engineer on Red Hat’s TrustyAI team focused on making Generative AI safe.

about the author

Christina Xu

is a software engineer on TrustyAI. Prior to that, she was an intern with Red Hat Research, working on machine learning, data science, and responsible AI. She graduated with a BA in Statistics and minors in Computer Science and Cinema Studies from Boston University in 2023.

about the author

Dr. Dominik Dahlem

is a global engineering leader in trustworthy AI at Red Hat, where he aspires to develop fair, trustworthy and robust AI solutions for complex and high-stakes business settings.

Article featured in

Red Hat Research Quarterly

Summer 2025

Download PDF

Subscribe now

In this issue

Interview

From silos to startups: why universities must be part of industry’s AI growth

Brian Stevens

Shaun Strohmer

From the Director

Breaking the silos

Orran Krieger

Feature

Bridging clusters: a comparative look at multicluster networking performance in Kubernetes

Sai Sindhur Malleni

José Castillo Lema

André Bauer

Raúl Sevilla Canavate

Feature

Guardrailing large language models with TrustyAI Guardrails Orchestrator

Dr Mac Misiura

Christina Xu

Dr. Dominik Dahlem

Feature

What should open source AI mean?

Jason Brooks

Kimberly Craven

Erik Erlandson

Cara Delia

Michal Rosen-Zvi

Walter J. Scheirer

As organizations push LLMs into more consequential domains, trust becomes the foundation for scale. Engineers in the open source TrustyAI project developed a guardrailing solution to ensure open source LLMs are both capable and safe for high-stakes deployments.

Building trustworthy and controllable enterprise-grade LLM systems is challenging. These systems are increasingly being adopted across domains, yet their widespread deployment hinges on trustworthiness. A system is considered trustworthy if it behaves reliably within its use case, remains transparent to end users, and is aligned with human values. However, LLMs are inherently stochastic: they may generate factually incorrect or nonsensical content (hallucinations) and are susceptible to adversarial inputs through prompt injection. These risks make it critical to develop mechanisms that constrain both user inputs and model outputs to ensure safe and predictable system behavior.

This is precisely what guardrailing aims to do. Developers cannot guarantee full compliance with trustworthiness standards due to the probabilistic nature of LLMs, but they can reduce risk by bounding input domains and flagging problematic outputs. For instance, a business using a fine-tuned LLM on proprietary data may want to reject inputs that fall outside the model’s training scope and flag toxic or inappropriate generations to protect brand and users alike.

The TrustyAI Guardrails Orchestrator provides a practical solution. It lets users define custom guardrails that inspect input/output text for certain patterns—such as regular expressions or domain-specific markers—and can also run detections independently of inference. Based on the open source project FMS Guardrails Orchestrator and developed in collaboration with IBM Research, it is available on Red Hat OpenShift AI (RHOAI) 2.19+ via KServe RawDeployment mode.

In this way, the orchestrator brings enterprise LLM deployments closer to operational trustworthiness by offering a lightweight, inference-time control layer without retraining the model itself.

Architecture

Guardrailing LLMs using the orchestrator involves coordinating several components to enforce safety and policy controls at inference time. At its core, the solution addresses two key challenges: detecting problematic content and managing safe input/output flows through deployed LLMs. To tackle these, the architecture introduces two main component types: detector servers and the guardrails orchestrator service (Figure 1). Detector servers are REST services designed to evaluate input or output text against specific criteria, while the orchestrator coordinates inference and detection logic across the system.

**^{Figure 1.}***^{Detectors evaluate both user input and LLM output.}*

Currently, the platform supports two classes of detectors. First, Hugging Face detectors based on the generic AutoModelForSequenceClassification class or the GraniteForCausalLM family can be deployed on Hugging Face (HF) ServingRuntime. These are ideal for use cases such as toxicity classification. Second, TrustyAI-developed regex detectors match predefined patterns or custom expressions, such as social security numbers, credit card numbers, and email addresses, in plain text. These lightweight detectors provide a simple but effective control layer.

The orchestrator service manages routing between users, detectors, and the target LLM hosted on vLLM ServingRuntime. It optionally integrates a chunker service that splits long input texts into spans to help manage maximum token limits imposed by the model. This allows for fine-grained inspection and safer inference across variable-length content.

Deployed as a controller within the TrustyAI Kubernetes Operator, the orchestrator simplifies Kubernetes-based integration and benefits from OpenShift-native capabilities. It is built on the open source FMS Guardrails Orchestrator project, which provides span-aware detection pipelines configurable by the user for both pre- and post-inference workflows.

The Guardrails Orchestrator introduces a new Kubernetes Custom Resource Definition (CRD) called GuardrailsOrchestrator. When a GuardrailsOrchestrator Custom Resource (CR) is applied to a namespace, it provisions the necessary components to monitor user interactions with the deployed LLM. Depending on the API request, it runs detections on user inputs, model outputs, or both. Specifically, the CR creates a deployment that launches a pod running the Guardrails Orchestrator as a container. This deployment is exposed via a Kubernetes Service, which is assigned an external route, allowing users to query the orchestrator from outside the OpenShift cluster (Figure 2).

The external route provides several endpoints for running detections on model inputs, outputs, or standalone text. A full list of supported endpoints is available in the fms-guardrails-orchestrator APIs. Figure 3 illustrates the structure of a detection payload, where input and output detection rules are defined directly within the request body.

curl -X 'POST' \
  'http://${GORCH_ROUTE}/api/v2/chat/completions-detection' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "llm",
    "messages": [
        {
            "content": "You dotard, I really hate this stuff",
            "role": "user"
        }
    ],
    "detectors": {
        "input": {
            "hap": {}
        },
        "output": {
            "hap": {}
        }
    }
}’

^{Figure 3. The structure of a detection payload}

Suppose a user has deployed a model named “llm” on vLLM Serving Runtime and a detector named “hap” that detects Hateful and Profane (HAP) content on a custom HF ServingRuntime. The user sends a POST request to the api/v2/chat/completions-detection endpoint, specifying the name of the deployed model, input text, and detectors. Since both inputs and outputs are included in the payload, HAP detection is applied to both the user prompt and the model’s response. Upon receiving the request, the orchestrator first sends the input to the HAP detector. If HAP content is detected, it sends a warning to the user in the response and blocks the input from reaching the model, thus preserving the model’s trustworthiness. If no issues are found, the orchestrator passes along the input to the deployed LLM. The LLM generates an output which is then passed to the detectors to flag for HAP content. Again, if HAP content is detected, the user will receive a warning response and no results from the model. Otherwise, the generated output is returned to the user.

Configuration details

To configure the GuardrailsOrchestrator in RHOAI, the Custom Resource must, at a minimum, reference a ConfigMap, a Kubernetes resource that stores configuration data as key-value pairs. The ConfigMap defines key deployment parameters, including the number of replicas for the deployment to spin up (Figure 4) and the location of the LLM and detector services (Figure 5).

---
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: GuardrailsOrchestrator
metadata:
  name: gorch-example
spec:
  orchestratorConfig: "fms-orchestr8-config-nlp" <1>
  replicas: 1 <2>

Figure 4. Defining deployment parameters

---
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    app: fmstack-nlp
    component: fms-orchestr8-nlp
    deploy-name: fms-orchestr8-nlp
  name: fms-orchestr8-config-nlp
data:
  config.yaml: |
    chat_generation: <1> 
      service:
        hostname: llm-predictor.test.svc.cluster.local
        port: 8080
    detectors: <2>
      hap:
        type: text_contents
        service:
          hostname: guardrails-detector-ibm-hap-predictor.test.svc.cluster.local
          port: 8000
        chunker_id: whole_doc_chunker <3>
        default_threshold: 0.5

Figure 5. Defining specific detectors

As shown in Figure 4, first, the orchestratorConfig field specifies a ConfigMap object that contains generator, detector, and chunker arguments. Second, The replicas field specifies the number of GuardrailsOrchestrator pods. By default, it is set to 1.

Users can define specific detectors on the ConfigMap under the detectors field. In Figure 5, the Generation field specifies the hostname and port of the LLM service, and the Detectors field specifies the hostname and port of the detector service, the chunker ID, and the default threshold.

Additional configurations

Since RHOAI 2.19, support has been added for additional detector types and a vLLM gateway adapter, enabling broader enterprise use cases. Both components are sidecar containers, injected into the orchestrator pod at deployment time. They can be used individually or in combination. The regex detectors are lightweight HTTP servers designed to parse text using predefined patterns or custom regular expressions. They serve as drop-in replacements for model-based detectors and include built-in support for detecting social security numbers, credit cards, and email addresses.

The vLLM gateway adapter assigns a unique /v1/chat/completions endpoint for each configured detector server. This enforces stricter access over the orchestrators’ endpoints, enabling more granular policies.

For more detailed setup instructions, including how to configure the GuardrailsOrchestrator with these optional add-ons and to structure the detector payloads, please refer to the GuardrailsOrchestrator documentation.

What’s next?

The Guardrails Orchestrator is available on RHOAI 2.19+ through the TrustyAI Service Kubernetes Operator, and we welcome any community feedback regarding its functionality and components. Contributions, including issues and pull requests, can be submitted to the TrustyAI Service Operator GitHub repository, and we are actively developing an integration with Meta’s LlamaStack by supporting the orchestrator as a remote safety provider via the TrustyAI Guardrails LlamaStack Provider.

As large language models continue to evolve, mechanisms like the Guardrails Orchestrator will play a vital role in making enterprise deployments more trustworthy, transparent, and safe by design.

SHARE THIS ARTICLE

The elastic bare metal cloud is here

Gagan Kumar

Exclusivity of resources is becoming obsolete. The Elastic Secure Infrastructure Project (ESI) provides a solution for sharing computing resources and getting the most from hardware investments. Using resources efficiently is an important goal for any organization. If those resources are computers, then theoretically that goal should be easily achievable, because machines don’t get tired and […]

Feature

User authentication for open source developers: what do they use?

Agáta Kružíková

Milan Brož

Ongoing research into user authentication in public open source repositories demonstrates the importance of usability–even for IT professionals.

Feature

Tuning Linux kernel policies for energy efficiency with machine learning

Han Dong

Presenting BayOp, a generic ML-enhanced controller that optimizes network application efficiency by automatically controlling performance and energy trade-offs. As global datacenter energy use rises and energy budgets are constrained, it becomes increasingly important for operating systems (OS) to enable higher efficiency and get more work done while consuming less. Concurrently, the environmental footprint of hardware […]

Feature

Don’t blame the developers: making security usable for IT professionals

Martin Ukrop

Historically, usability studies have looked mostly at end users, doing focus groups or user testing with customers or the general public. This process often neglected developers, system administrators, and other IT professionals and the systems they use day to day.

Feature

Yuga: A tool to help Rust developers write unsafe code more safely

Sanjay Arora

Baishakhi Ray

Vikram Nitin

Some bugs in unsafe Rust arise from errors that are so easy to make that they are easily overlooked. Researchers have developed a new analyzer to find them. By Vikram Nitin, Anne Mulhern, Baishakhi Ray, and Sanjay Arora Rust, a programming language that did not exist just 10 years ago, is now well known and […]

Feature

The Open Education Project is ready to scale

Danni Shi

Enhancements to the pioneering platform for open source education have made it more reliable, easier to use, and much more affordable for new users. What would it mean to open source education? For starters, we’d need a way for educators to create and publish their own high-quality open source materials—lectures, presentations, textbooks, and lab manuals—so […]

Feature

PyLadies, welcome to open source!

Petr Viktorin

How did a group of three library students become part of an international force for promoting programming education? A Red Hatter who was there has the story.

Feature

“Open source opens doors”: mentoring students for success

Heidi Dempsey

Research- and leadership-focused support is getting results in the push to grow and diversify the engineering talent pool. The technology industry has largely embraced the theory that diversity drives innovation, but in practice the talent pipeline continues to be leaky. Even when high school preparation is equal, students of color are more likely than white […]

Feature

Combining experience with passion inspires a new mentorship program

Irit Goihman

Liora Milbaum

The Office of the CTO is promoting open source development concepts among teenagers, the military, and the academic world in Israel. Here’s how two engineers helped realize that goal.