The open cloud has been both cornerstone and North Star for Red Hat Research. Our relationship with the Mass Open Cloud (MOC) and its more recent iteration, the MOC Alliance, has been critical to advancing our understanding of open cloud architecture and the many possibilities it opens for research. (Look no further than our interview with Red Hat Chairman Paul Cormier and Boston University professor Orran Krieger, in this issue, for an illustration.) RHRQ asked US Research Director Heidi Picher Dempsey and Red Hat Project Manager Gagan Kumar to dive into Red Hat’s role in the growth of the MOC and related projects.
Infrastructure in the MOC: a case history in collaboration
Heidi Picher Dempsey seeks out and grows research and open source projects with academic and commercial partners. As a network engineer and operation leader, she designed, built, integrated and operated many nationwide suites of prototype cloud infrastructure for academic, government, and industry use. For RHRQ, Heidi has interviewed Abhimanyu Gosain of the Northeastern Institute for the Wireless Internet of Things and Dr. Michael Zink of UMass-Amherst and the Open Cloud Testbed and maintains a column about university-industry research collaboration.
Production New England Research Cloud (NERC) OpenShift container services launched in early 2023, marking a major milestone in a process that began with a research question at Boston University in 2014: would it be possible to build an infrastructure that disaggregates the virtual machines (VMs) in a shared computing environment from the software components that create and manage them, allowing for mix-and-match development of software, hardware, and services that will enable more innovation and new service marketplaces in the cloud?
Open source through the research lens
As RHRQ starts its fifth year it’s impossible to resist the temptation to look back at all we’ve done so far. The result is this collection of perspectives. Together they paint an inspiring picture of the innovative work that can be accomplished when engineering know-how and bold research questions come together in open source environments.
- Focus on open hardware, Ulrich Drepper and Ahmed Sanaullah
- Focus on testing and operations, Daniel Bristot de Oliveira and Bandan Das
- Focus on security, privacy, and cryptography, Lily Sturmann
- Focus on AI and machine learning, Sanjay Arora and Marek Grác
- Focus on education, Sarah Coghlan and Matej Hrušovský
At the time, VM system components were usually proprietary and so tightly coupled that a single term (e.g., VMware) referred interchangeably to the collection of VMs, the software that managed VMs, and the company that sold VM licenses. BU found ideal collaborators at Red Hat, where engineers were building early production OpenStack cloud computing infrastructure components designed to be open from the operating system up through the stack to the application. The idea became a driving force behind Massachusetts Open Cloud (MOC) development and systems engineering efforts that continue to this day.
From the beginning, the MOC aimed to create an improved computing resource for cloud and big data users and a new model of cloud computing that would enable research and technology companies to innovate and profit in the cloud and big data sectors. This emphasis on building open infrastructure to support both research and industry allowed the MOC to develop a new model where university researchers, open source projects, and research IT groups could collaborate on real systems work with a clear path to transition to active use in industry. What’s more, students, researchers, and engineers could work together on building and supporting the infrastructure as a combined team, getting everybody’s hands dirty and providing a much better understanding of the challenges involved in transitioning ideas to practice.
The first year of building infrastructure for the MOC at the Massachusetts Green High Performance Computing Center (MGHPCC) saw the creation of the hardware-as-a-service concept, along with early efforts at automation, a service directory, cloud client libraries, and the first of many front-end Graphic User Interface (GUI) tools. Sixteen Dell servers along with networking and storage hardware from Intel, Cisco, and Mellanox were jointly built out and operated by the newly formed DevOps team, which, in addition to the original BU proposers, now included engineers from Red Hat, the US Air Force, and Cisco, as well as Harvard and BU’s professional research IT groups. The team was building the user community of systems engineers, as well as the community of students, researchers, and big data developers who used tools like Hadoop, PIG, Spark, Mesos, and RabbitMQ on the infrastructure. Although the term “DevOps” had only recently come into use, the MOC had already taken the idea much further, collaborating on all phases of research, development, deployment, and support with a combined team of engineers, academics, and students from each discipline working together toward the same goals. Engineers from Red Hat’s Research group were dedicated to working with the team to get a production version of OpenStack software up and running on the new infrastructure, and this became the basis for the first researcher VM service deployed at the MOC.
Although researchers, PhD candidates, and professors were already using the MOC for projects, theses, and courses by 2016, the infrastructure had to expand and grow more reliable for the MOC to become a major mid-scale testbed infrastructure. A storage research cluster, using hardware from Lenovo and Brocade, added SSDs and 500 TB of storage capacity to the MOC. The D3N project started in 2017 and showed it was possible to significantly improve data storage performance by application of caching changes at multiple levels of the systems stack, in a successful test of the original mix-and-match concept. The DevOps team reached out to security experts to conduct penetration testing and strengthen the overall security of the MOC. The expanded MOC (called Kaizen from the Japanese for “good change”) was serving peak loads of 80 VMs to over 100 users.
A funny thing happened while the team was busy building VMs: Kubernetes, which was first released by Google as a seed technology in 2015, grew quickly in popularity. By 2018, Amazon, Azure, and companies like Digital Ocean were offering low-cost managed compute services as an alternative to VMs. Container solutions were quickly taking over market share in managed computing, but once again, the containers were not implemented with open interfaces from the OS through to the application layers of the stack. Unchecked, these closed systems would also block innovation and research across services.
The MOC began to investigate OpenShift as a means to add clusters of containers to their existing VM compute resources, while Red Hat doubled down on their commitment to the idea of open clouds, launching the Red Hat Collaboratory with BU. Work to expand and improve both MOC production testbeds proceeded in parallel. The DevOps team created separate Ops and Prod clusters to enhance reliability and make scaling out more worker nodes in the productions clusters easier. In addition to the bare metal and VM services previously offered, the team introduced containers and helped develop OpenShift operators for researchers. Production MOC OpenShift container services finally launched in early 2023.
Unchecked, these closed systems would also block innovation and research across services.
Meanwhile, the MOC organization also expanded, and became the MOC Alliance. Intel and IBM joined, with IBM adding Power9 servers with NVIDIA GPUs to the mix and Red Hat donating 26,000 OpenShift production licenses. MOC researchers proposed an Open Cloud eXchange (OCX), enabling hybrid cloud connections between different providers, and won research funding from the National Science Foundation to implement the new infrastructure. The DevOps team engineered and deployed connections between MOC compute resources and another national US research testbed called CloudLab, joining forces with a Cloud Lab research team from UMass Amherst. This work included building and deploying open source software to allocate and provision resource connections and VLANs dynamically for hybrid clouds. The Hardware Isolation Layer (HIL), Bare Metal Imaging (BMI), and Elastic Secure Infrastructure (ESI) software emerged at various stages from these efforts. MOC teams contributed code back to public open source communities including Ironic, Keylime, IPXE, and TrustedGRUB2 as the hybrid solutions evolved.
The scope and technology of MOC infrastructure advanced steadily to the point where the total number of MOC users had quadrupled by 2020. Most of those users came from computer science and engineering backgrounds and could easily patch their own images and write software for the missing bits of a system that they might need. Providing at-scale services that would be useful to any art or science researcher without specialized computer skills required new collaborations with the BU and Harvard research IT groups, who had already faced this challenge with services such as earthquake forecasting, predicting the spread of diseases, and analyzing star formation.
The New England Research Cloud combined with the MOC Alliance to create production cloud resources based on the MOC, using standard deployments and automation that made it possible for other institutions using this as a template to create a full suite of services based on NERCs open methods. The DevOps team added support for Single Sign On (SSO) through the Federated InCommon Identity management service and software already in use at most US universities. The ColdFront GUI allows users to request and manage their own resources, change resource allocations, and raise helpdesk tickets when needed. MOC accounts are integrated with the Keycloak MGHPCC Shared Services Account portal and use a common MGHPCC OSTicket system to make it easier to track and close issues with multiple teams and users. The team used Red Hat’s Advanced Cluster Management to standardize monitoring and alerting, and Ansible playbooks to make deployments more easily repeatable. Finally, they created reporting and billing software that sent cloud usage metrics from OpenShift and OpenStack into XDMoD software from the National Science Foundation, creating reports that most researchers were already familiar with from previous projects.
The NERC team provides hands-on facilitation and technical expertise for research end users, and the DevOps team works together with NERC to resolve issues, provide needed new features, and improve future services. Using this model, the MOC Alliance and NERC can now support projects such as the Newspapers Database Project, which makes 20 million articles from newspapers.com and the US Library of Congress available to history and political science researchers. No one really calls this big data research anymore: researchers now take as a given large-scale data and the computing infrastructure necessary to process it.
Researchers now take as a given large-scale data and the computing infrastructure necessary to process it.
The ability to create open hybrid clouds that allow people to create and add their own software and innovate anywhere in the software stack using open interfaces has evolved from an idea to a practical reality. New technology and services that demand new infrastructure are evolving even more quickly now than when the MOC first came into being, as evidenced by the topics in the latest MOC Alliance Workshop. We’re working on new infrastructure to support machine learning, use core-to-edge protocols for computing with wireless endpoints, collect data from sensors in the wild, and help grade school students learn to read. The future dreamers are still out there, and they’re certainly welcome to come build with us!
Do more with less
Gagan Kumar focuses on projects related to the MOC Alliance, bare metal sharing systems, and metrics collection for OpenShift instances. Gagan provided an update on the multiyear Elastic Secure Infrastructure (ESI) project, “The elastic bare metal cloud is here” (Nov 2021), and is a Senior Product Manager with Project Curator, an infrastructure consumption analysis project for the OpenShift platform.
When Red Hat Research started, distributed systems, cloud computing, security, and operating systems were our primary focus. These are fields where many Red Hatters have significant experience and can help university researchers fast-track their ideas in frontier technologies to the real world in an open source way. Since then, we’ve achieved three significant milestones in building the relationships necessary to make this happen. First, we’ve helped develop community research clouds through the MOC Alliance (MOC-A) and the New England Research Cloud (NERC). Second, Red Hat Research has established its presence in many prestigious universities in the United States, Europe, and Israel. Boston University, Masaryk University, Newcastle University, and Technion have labs and office spaces dedicated as collaboration spaces for researchers in those universities and Red Hat engineers to come together to discuss ideas and implement projects. As part of the MOC-A and NERC partnership, we also work with researchers from Northeastern University, Harvard, MIT, and the University of Massachusetts. The third milestone is the research interest community the Red Hat Research team has built, which gives Red Hatters the opportunity to engage with research ideas and research projects at many levels. As a result, Red Hat engineers are working closely with researchers to accelerate progress in areas of critical interest.
Since most computation is moving toward cloud computing, research is now focused not only on cloud services’ efficacy but also their efficiency. Certain research domains like intelligent cost management, on-demand resource sharing, and remediation techniques such as AIOps and MLOps are coming to the forefront of the research field. This is reflected in a number of projects managed by Red Hat, including ESI, Project Curator, OS-Climate, the Cloud Cost Optimizer (CCO), and AI for CloudOps. As another field of growing importance, edge computing is an opportunity to extend the open hybrid cloud all the way to the data sources and end users. Data might have traditionally belonged in the datacenter or cloud, but many important decisions need to happen out here–on the edge. This technology will open opportunities for many advancements.