Many talents contributed to one goal: a shared production-level research cloud.
It was a chilly morning at Boston University, and I was looking for a quiet place to gather my thoughts and do some writing. I passed two painters covering up scuffs on the white walls and a man with a floor machine busily tracing circles over the wide linoleum. Ducking into an empty conference room, I sat down and started banging away on my laptop. My creative reverie was shattered a minute later when a blast of Bach filled the room. I had wondered on previous visits whether the organ in the adjacent conference hall was ever used, and now I had my answer. At first I was annoyed by the disturbance, but I kept working, and soon I began to appreciate the interplay of swirling notes and coalescing sentences as the musician in the next room became my unwitting collaborator, bringing a new and very unusual contribution to this writing project.
Collaborators from Boston University (BU), Harvard University, and Red Hat have been leading a multiyear project to build and release a new computing environment that supports research and experimentation with a wide range of resources. The production part of this project, called the New England Research Cloud (NERC), depends on research IT professionals, software developers, systems engineers, project managers, faculty, and students for its success. It also depends on long-term partnerships with the Massachusetts Green High Performance Data Center (MGHPCC) and the MOC Alliance, as well as the National Science Foundation’s continued dedication to Computer and Information Sciences Research (NSF CISE).
This is not a greenfield project. Each collaborator brought their own kit of tools and processes that served them well in their area of expertise. One of our biggest challenges was finding a way to fit the best pieces of each kit together in a new system, figuring out what was still missing, and working together to fill in the interstitial space so we could create a more solid system and open it to global researchers. For example, Red Hat’s OpenShift software, which builds and manages Kubernetes containers, is already a commercial release with many successful users. Harvard and BU already manage thousands of research and student accounts with a website and ColdFront software that works well for their existing university research resources. But there was no existing way to connect these two solutions. We had to create the interfaces, software, procedures, and documentation to allow self-service for global researchers who would create and manage their own projects in a container environment. New roles and access control bindings had to be defined to maintain security in an environment that was no longer limited to a single community.
The Harvard and BU IT engineers, who knew what had worked well for faculty and students so far, collaborated with developers and systems engineers from BU and Red Hat, who knew how to implement new interfaces and connect them to existing APIs that managed container users and project namespaces. The NERC resources for each project had to be shared by all members of a project but not by other NERC users. The resources also needed to be managed by NERC administrators and monitored carefully to find and alert operations staff to potential issues before they caused problems for researchers and students. We needed the ability to track and report usage with an eye to predicting growth and bringing new resources into clusters quickly when necessary. Legal and regulatory requirements had to be met, and the whole project had to be tracked and managed carefully.
Most importantly, the project needed to follow open source practices adapted to deliver an entire hardware, software, and operations system. It should be possible for anyone to understand at a detailed level what the system components are and how to compose them from the bare hardware pieces through the operating system and application stack, including all the supporting production operations infrastructure. There are some intentional exceptions to this rule necessary for privacy and security, but we default to open wherever possible.
Twelve public repositories and 18 named contributors later, it’s possible to examine, use, and improve the detailed solution of this example problem for the OpenShift part of NERC. (An extensive project tracker built in Asana and many months’ worth of meeting notes are not included in these repositories but can be shared with anyone joining the project.) Explore the OCP on NERC GitHub repository to see high-level documentation, detailed configurations, and references to other open source software projects like Vault, Prometheus, and Grafana that are essential parts of the solution. Go to the NERC website to request your own project, find user guides, and read more about this environment and the other NERC offerings, like OpenStack virtual machines and bare metal computing. On March 20-21, 2023, join the in-person MOC Alliance Workshop with engineers and researchers who are building even more varied ecosystems to support research and explore the future of open computing.
Building this environment has not been easy, and it has taken a continued concentrated effort from many people who were already busy with other critical projects and daily responsibilities. Success has depended on the team’s willingness to question what we were doing, what was missing, how we could broaden our reach, how things could fail, and what we could do about it. The effort was often messy and underresourced, but the team’s determination and dedication brought us closer to the goal. Although this project will remain an unfinished symphony for as long as research computing challenges evolve, we can all be proud of composing a new research and engineering score together.