The US is betting on open source to accelerate innovation in AI. Red Hat, the Mass Open Cloud, and IBM Research, as members of the AI Alliance, are supporting promising AI research for the National AI Research Resource Pilot.
According to the 2025 AI Index Report1, GitHub contained approximately 4.3 million open source AI projects in 2024, with a sharp 40.3% increase in listed projects during the last year alone. The National AI Research Resource (NAIRR) initiative from the National Science Foundation (NSF) launched in response to the need to promote AI development and research opportunities in the United States and build critical infrastructure for researchers and developers to enable innovation. Establishing an open source ecosystem as a key component of the US national AI infrastructure is critical for research and education, and for Red Hat and the AI Alliance.
The National AI Initiative Act of 2020 established the NAIRR Task Force, a federal advisory committee with the function of investigating the feasibility and advisability of establishing and sustaining a National AI Research Resource and proposing a roadmap and implementation plan detailing how the resource should be established and sustained. In January 2023, the Task Force released a detailed report2 on its findings and recommendations, explicitly stating, “The NAIRR Operating Entity and resource providers should adopt the principle of open source for products developed with federal funds.”
More specifically, in their recommendations to Congress, the report authors strove “to encourage principles of open source, including by encouraging software developed for the administration of the NAIRR or using resources of the NAIRR to be open source software.” While not all contributors to NAIRR are open source software providers, it is clear that open source computing environments and AI assets will be central to its vision for a shared national research infrastructure for responsible discovery and innovation in AI.
Open source AI: some history
Open source software such as Linux, HTTP servers, browsers, Python, Ansible, and Kubernetes were already ubiquitous in both academic and industry projects when the most recent boom in the AI development cycle occurred. In contrast to traditional application development, successful AI model creation relies on more than code running at the highest levels of the software stack. For example, the neural network supporting a single AI application requires a complex algorithm implementation, training datasets, APIs to interface the application with the model inputs and outputs, model parameter tuning, testing and validation, and perhaps substantial inference work or even specialized processing hardware to be in place before the first line of application code can interact with a model specialized for the application’s purpose.
During the decade before ChatGPT burst on the scene in November 2022, there were already many popular open source options to support data scientists’ need for predictive AI tasks such as classification and regression. Machine learning (ML) frameworks, such as TensorFlow, PyTorch, Keras, and Scikit-learn—and platforms to support the ML lifecycle, such as MLflow and Kubeflow—were already in use. However, early large language models (LLMs), such as GPT from OpenAI and some of the early foundation models, were not all made openly available. Except for the interfaces to provide input and get output, details of how the models were implemented and trained were simply not shared publicly.
Currently, there are at least two recognized levels of openness3,4 for foundation models:
- Open-weight foundation models are AI models where the pretrained weights and the code needed to run them are publicly released under a permissive license. This allows anyone with a technical background to use, modify, study, and share the model, often with the help of a model card for documentation. Examples of such models include GPT-OSS models (OpenAI), Llama Series (Meta), Gemma 3 (Google), Granite 4 (IBM), Qwen 3 series (Alibaba), and Mixtral (Mistral).
- Open-science foundation models share all artifacts needed for end-to-end transparency, reproducibility, and collaboration, and they empower the community to inspect models throughout their lifecycle. This is the gold standard for completeness and openness rooted in scientific principles. Examples of open-science models are OLMo (AI2) and StarCoder (ServiceNow and HuggingFace).
Open source AI: common misconceptions
Open source AI will lag behind proprietary systems in capability.
Proprietary foundation models from commercial vendors (e.g., GPT-5 from OpenAI or Gemini from Google), sometimes called frontier models, tend to be the largest, consisting of more than a trillion parameters trained on massive amounts of data to perform a wide variety of tasks, from language processing to image generation and coding. Since these models are intended for general usage, much smaller open source models can match their performance or even do better for more limited specific tasks. In fact, domain-specific smaller models may actually be better suited for business, where development cycles often advance more quickly. Protecting business-sensitive data is also less complex and potentially faster when developing with an in-house open source model compared to using a proprietary model made available through a hosted service outside a business firewall. Open source AI models can develop very quickly if many businesses contribute even a portion of what they develop for in-house solutions back to the community.
Open source AI is dangerous and closed models are safe.
The safety of current AI models is tied to training data and the inherently non-deterministic outputs from ML models. Hallucinations are equally possible with both closed source and open source models. The difference is the open-science foundation models mentioned earlier provide the complete information necessary to make the appropriate risk assessments depending on intended usage. Even the open-weight models allow the evaluation of their performance against specific safety risks and allow retraining for safer behavior. These risk assessments and mitigations can then be shared with the community. By contrast, closed-model evaluations cannot, making it impossible to determine which kind of model is safer.
Enterprises will be slow to adopt open source AI until the landscape stabilizes from rapid changes in technology, legal, and policy issues.
Evidence does not bear this out. According to IDC Market Research’s 2024 analysis5 of open source adoption in the United States: “Open GenAI models represent more than half of currently deployed GenAI models, and organizations plan to use open models for more than 60% of GenAI use cases. Almost 30% of respondents plan to use open models for all GenAI use cases.” Typical reasons reported for adopting open models include faster access to innovation, cost effectiveness, transparency, and the ability to modify the model.
The NAIRR pilot
The NAIRR pilot is a proof-of-concept for the eventual full-scale NAIRR, bringing together computational, data, software, model, training, and user-support resources to demonstrate and investigate major elements of the vision in the NAIRR Task Force report. Led by the NSF in partnership with other federal agencies and non-governmental partners, the pilot makes available government-funded industry, and other contributed resources in support of the nation’s research and education community. The pilot, begun January 24, 2024, runs for two years. Visit the NAIRR demonstration site to learn about ongoing NAIRR Pilot projects.
MOC-IBM-Red Hat collaboration
In August 2025, the NAIRR Pilot program launched a new track called Deep Partnerships to encourage collaboration between researchers and industry partners providing NAIRR Pilot resources. The AI Alliance brought together three of its members, the Mass Open Cloud, Red Hat, and IBM Research to participate in the NAIRR Pilot Deep Partnership program.
The AI Alliance was founded to foster an open community, enabling developers and researchers to accelerate responsible innovation in AI while ensuring scientific rigor, trust, safety, security, diversity and economic competitiveness. Consistent with this mission, these AI Alliance members will provide computing resources and open source AI assets to NAIRR Pilot participants to advance science and education and promote open collaboration in developing and deploying AI in society.
Selected projects would gain access to three key technology elements. The Mass Open Cloud provides a datacenter infrastructure with facilitation support for users and projects, along with integration and development support for those who are new to AI/ML and Kubernetes-style resource management. All operations software is open source, so experimenters can access even the lowest levels of the software stack as needed. (Get more details on MOC resources and policies.)
The software stack includes Red Hat Enterprise Linux (RHEL) and OpenShift AI for enterprise application development, Automated Cluster Management, and some open software specially developed for the MOC. This environment provides tools that support the full lifecycle of AI/ML experiments and models and help NAIRR investigators to build, train, test, and deploy models optimized for hybrid cloud environments. In addition, the entire portfolio of open source models and tools from IBM Research is available for use in the NAIRR Pilots.
The first selected projects run through July 1, 2026:
- Adaptive KV cache compression for agentic AI, PI: Mohammad Mohammadi Amiri, Rensselaer Polytechnic Institute
- Building reliability and transaction semantics for LLM agents, PI: Indranil Gupta, University of Illinois at Urbana-Champaign
- Efficient memory offloading for cost- and energy-efficient foundation model training, PI: Nam Sung Kim, University of Illinois at Urbana-Champaign
- Evaluating and improving applications of large language models to automated software testing, PI: Alessandro Orso, University of Georgia
- CLEARBOX: interpreting and improving multimodal LLMs, PI: Deepti Ghadiyaram, Boston University
- Model merging for code LLMs: reasoning fusion and MoE-aware methods, PI: Stacy Patterson, Rensselaer Polytechnic Institute
- Multimodal semantic routing for vLLM, PI: Junchen Jiang, University of Chicago
- Time series data agent: an agentic system with foundation models for multimodal data, PI: Agung Julius, Rensselaer Polytechnic Institute
Continued support for open source AI
In its recently released AI Action Plan6, the White House expanded on previous calls for open source development. The plan includes an explicit recommendation to “encourage open source and open-weight AI,” highlighting the benefits of open source AI to academic research, startups, businesses, and government. In addition, the plan also recommends that the United States “build the foundations for a lean and sustainable NAIRR operations capability that can connect an increasing number of researchers and educators across the country to critical AI resources.”
We believe that such a strategy will expand the opportunities for innovation while building the skills and tools needed to produce economic and social benefits across our society.
Footnotes
1. N. Maslej, et al., “The AI Index 2025 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2025.
2. “Strengthening and Democratizing the U.S. Artificial Intelligence Innovation Ecosystem- An Implementation Plan for a National Artificial Intelligence Research Resource,” National Artificial Intelligence Research Resource Task Force Report, January 2023.
3. “Defining Open Source AI: The Road Ahead,” AI Alliance Blog, April 2025.
4. M. White, et al. “The model openness framework: Promoting completeness and openness for reproducibility, transparency, and usability in artificial intelligence,” arXiv preprint arXiv:2403.13784 (2024).
5. IDC Market Research “Open GenAI Models, 2024: Benefits, Experimentation, and Deployment,” Document US52477724, Aug. 7, 2024.
6. “Winning the Race: America’s AI Action Plan,”Office of the Science & Technology Policy of the President of the United States, July 2025.









