Red Hat Research focuses on accelerating the practical applications for artificial intelligence and machine learning by combining academic approaches and industry use cases. Rather than focusing purely on advancing AI/ML techniques, we identify research collaborations where they can play a central role in solving computing problems. The AI/ML projects we’ve highlighted in past issues drive innovation across multiple disciplines, from optimizing hybrid cloud operations to enhancing education, developing technology for autonomous vehicles, and automating methods for detecting visual disinformation online.
We asked Sanjay Arora, a Red Hat Research data scientist, and Marek Grác, a Red Hat senior software engineer and lecturer in machine learning at Masaryk University (Brno, CZ), to share their perspectives on trends in AI/ML, past, present, and future. Sanjay has contributed to RHRQ in the articles “When good models go bad: minimizing dataset bias in AI” (Feb 2021) and “Yuga: a tool to help Rust developers write unsafe code more safely” (Feb 2022). Marek has contributed to the RHRQ stories “‘When one teaches, two learn’: making the most of technical research mentorship” (Aug 2021) and “Making machine learning accessible across disciplines” (Nov 2021).
One of our tasks at Red Hat Research is persuading engineering managers and business units that it is worth engineers’ time to participate in research projects, internships, and mentoring students working on a thesis. Cooperation with universities has led to collaboration in research and demonstrated its applications for products and the open source community, making persuasion easier. The range of topics is broad, from online learning and community management to cloud technologies and testing.
Open source through the research lens
As RHRQ starts its fifth year it’s impossible to resist the temptation to look back at all we’ve done so far. The result is this collection of perspectives. Together they paint an inspiring picture of the innovative work that can be accomplished when engineering know-how and bold research questions come together in open source environments.
- Focus on open hardware, Ulrich Drepper and Ahmed Sanaullah
- Focus on clouds and research IT, Heidi Picher Dempsey and Gagan Kumar
- Focus on testing and operations, Daniel Bristot de Oliveira and Bandan Das
- Focus on security, privacy, and cryptography, Lily Sturmann
- Focus on education, Sarah Coghlan and Matej Hrušovský
AI/ML is an excellent example of how that relationship works. The learnings we gain from research and teaching at universities are something we can also share internally. Our efforts to work on AI/ML projects with university partners have led to a broader understanding of the role of AI/ML in industry. The Applied Machine Learning course we teach at Masaryk University has content we want to teach in-house as well, so software engineers, quality assurance teams, and other roles know how to work with machine learning people to get the best possible results.
We foresee several different directions for further work in the field of AI/ML itself and in the many domains where it plays a key role:
Secure multiparty computing and differential privacy: The capacity to use sensitive datasets without revealing information about individuals in the dataset is critical for expanding the use of AI/ML systems. A good example is the recently launched Red Hat Collaboratory project “Co-ops: collaborative open source and privacy-preserving training for learning to drive.” The project is building out a privacy-preserving platform for sharing data collected from cars and videos that can be used for distributed, large-scale training of models for self-driving.
Large ML models and their applicability data, including telemetry, logs, and error messages: Training large language models (LLM) on this unannotated data could enable better predictions. Generating natural language responses from a knowledge base could also support helpdesk technicians and automate a portion of their work. Current developments in this arena include GPT-4 and Chat GPT, which are closed source. The Red Hat OpenShift Data Science (RHODS) team is providing the infrastructure for IBM efforts to train large models, which could lead to work on applying these models for better searching, querying Red Hat Insights, or integrated development environments (IDEs).
Replacing heuristics with learned policies in systems software: Systems software like operating systems and compilers have a lot of heuristics that are used to guide decision making. These decisions affect performance and resource consumption. Red Hat Research is engaged in several projects exploring the replacement of these heuristics with learned policies using techniques like reinforcement learning and Bayesian Optimization. Two projects—”Automatic configuration of complex hardware” and “Toward high performance and energy efficiency in open source stream processing“— involve learning network policies governing packet batching and processor voltage and frequency settings to enable substantial energy savings while maintaining performance guarantees. (See also Han Dong’s article “Tuning Linux kernel policies for energy efficiency with machine learning” in this issue.) The project “Practical programming of FPGAs with open source tools” focuses on searching for the optimal ordering of compiler passes to maximize performance for the compiled code.
Open source education projects, so universities can freely share study materials or franchise already established courses: The need for data science and ML skills to fill industry positions is great. Red Hat Research maintains an open catalog of course materials for collaboration based on courses at partner universities and institutions. We are preparing materials for Applied Machine Learning and Employment and the IT job market. These courses will help not only universities and students but also employers. (See also Danni Shi’s article “Open source education: from philosophy to reality” in this issue.)