Ops is the new code: Operate First brings open source to operations

Operations are attracting increased attention in the open source community, and the open source ethos is evolving to embrace it.

The focus of open source was initially on the code. Over time, however, the health of communities creating that code and associated artifacts such as documentation has also become an open source issue. The approach to governing projects and onboarding contributors hasn’t replaced historical concerns such as licensing, but it has assumed a more prominent role.

The virtuous cycle of open source development

That we still talk about the open source development model is telling. That language emphasizes developers and other participants in that virtuous cycle, such as users and businesses. Thus, fundamentally, the focus is still on the code. But there’s also a dawning recognition that just writing code in a vacuum isn’t sustainable for most significant projects.

The attention to code is understandable. Even when proprietary software was nearly the only game in town, vendors were focused on delivering packaged bits to users with maybe some consulting on the side to get it running. Open source software freed users from a vendor’s proprietary IP and let them harness the innovation in a community extending beyond a single company. But it didn’t really change the software delivery model. Users were still mostly obliged to operate the software by themselves.

The rise of operations: Operate First

This model is changing in the modern era. Operations are becoming as important as, and sometimes more important than, code. Software-as-a-Service and public cloud providers have increasingly offloaded the operational burden of software from users. This is a challenge for open source software. While the open source development model is powerful, the value of software lies in operationalizing it so that a user can be productive with it.

One approach to dealing with this challenge is to bring something akin to the open source development model to operations. Enter Operate First.

Operations are becoming as important as, and sometimes more important than, code.

The term Operate First comes from an open source development model best practice, Upstream First. With Upstream First, the goal is to get every line of code into an upstream project before it ships as a product. This keeps the community project and downstream product closely aligned and reduces the effort of maintaining divergent code trees. An Upstream First approach recognizes that the value of open source lies not so much in the ability to view source code but in fully embracing an open approach to creating software.

You can think of Operate First as a concept, philosophy, and vision to improve open source software through open sourcing operations. In an Operate First environment, open source code is tested and proven under real workloads running at scale as they would in production. This creates a feedback loop for developers seeking to improve code operationally. Operate First and associated initiatives also aim to document how production deployments are architected and deployed. In addition to documenting best processes and practices, the Operate First project will have an Infrastructure-as-Code repository.

What does Operate First look like?

Concretely, Operate First is a project to define, build, and improve the open source hybrid cloud through learning and developing code and practices in an open production community cloud. By incorporating operational experience into open source software development, Operate First extends development to include operating, testing, and proving code in a production environment—and simplifying the deployment of that code. It builds on and complements a variety of nascent and ongoing projects in the cloud space.

Operate First started as a segment of the Mass Open Cloud (MOC) called the zero cluster, a production cloud set up to host projects and developers seeking to operate first. Announced in 2014, the MOC is a production public cloud based on the model of an Open Cloud Exchange (OCX). In this model, many stakeholders, rather than just a single provider, participate in implementing and operating the cloud.

In addition to the MOC, Operate First is closely associated with various overlapping initiatives, including OpenInfra Labs (under the Open Infrastructure Foundation) and the Red Hat Collaboratory at Boston University.

OpenInfra Labs hosts the Telemetry Working Group, one of the working groups included under the Operate First umbrella. Observability of infrastructure has become an increasingly hot topic given the challenge of reliably operating distributed systems such as those in Kubernetes environments. The term can cover a lot of ground, but a typical definition of observability spans metrics, tracing, and logging. Monitoring is often considered something distinct, but it’s also at least closely related. A key part of observability is the automatic collection and transmission of data about the system. In other words, telemetry. Telemetry is, therefore, an integral component of Operate First.

The Operate First community

The development of a community around Operate First is still in its early stages. A primary goal of that development is recognizing that there are many constituencies with disparate concerns and motivations. Operate First founders want to engage with them in a manner and through a path that those constituencies prefer.

To start this process, community leaders conducted a series of interviews with a variety of different stakeholders: developers, quality engineering (QE), site reliability engineers (SRE), traditional system admins, data scientists, and others. The objective here was two-fold. First, it was important to understand, for each role, their most pressing day-to-day concerns, what motivated them, how they measured success, and what would make Operate First of interest to them. Second, to keep things simple, identifying and combining roles that largely shared motivations and concerns would make it easier to focus engagement efforts.

The diverse needs of Operate First personas

Quality engineers who write testing frameworks and tests have an increasing amount of overlap with more traditional developers of applications and other code. Both are motivated by improving customer and internal user experiences, especially when doing so involves solving novel problems. They measure success with metrics such as satisfaction of and adoption by their constituencies as well as productivity and code quality metrics. Operate First serves these goals by encouraging and enabling software design that builds in operational capabilities while keeping the person who needs to operate the software in mind.

From an operational perspective, the focus is shifting away from traditional sysadmin roles that deal mainly with maintaining and upgrading hardware and software infrastructure using tools like scripts and configuration management. While those tasks continue, site reliability engineers (SREs) spend a significant amount of time on development tasks such as adding new features, improving scalability, and automating. SREs interact extensively with cloud APIs, whether on premises or in a public cloud. SREs aim to do more with less; the ratio of SREs to the number of managed clusters is one important metric, as is their uptime.

Operate First serves these goals by encouraging and enabling software design that builds in operational capabilities while keeping the person who needs to operate the software in mind.

In addition to developer and operations personas, the data scientists and data engineers in the OpenDataHub community have also been early adopters of Operate First. OpenDataHub is a blueprint for building an Artificial Intelligence (AI)-as-a-Service platform that integrates a variety of open source machine learning tools, including Kubeflow, Kafka, Seldon, PyTorch, and Jupyter notebooks on the Red Hat® OpenShift® Container Platform.

For these audiences, Operate First provides:

A cluster to develop and run AI applications
GitHub organizations to share and collaborate on open source projects
Custom image pipelines to publish reproducible experiments
Real production operations data for tackling machine learning problems in AIOps

Furthermore, operating a subset of OpenDataHub at scale creates an opportunity to document best practices, which can, in turn, be fed into Red Hat OpenShift Data Science, the managed cloud service offering based on OpenDataHub. Just as the open source development model forms a virtuous cycle when working as intended, Operate First can lead to a beneficial circle for operational knowledge and supporting code.

Flexibility and freedom

The ultimate goal of Operate First is to free software users from having to make a false choice. It brings the power of the open source development model to operationalizing software. Fully operationalized software is software that maintains the flexibility of open source software that isn’t tied to a single cloud provider, while also simplifying and improving the Day Two operations of that software.

It’s not simplicity or choice. It’s simplicity and choice.

SHARE THIS ARTICLE

Feature

A thread model for the real-time Linux kernel

Daniel Bristot de Oliveira

The recent advances in AI and telecommunications are enabling a new set of complex cyber-physical systems, including those for safety-critical applications.

Feature

Efficient runtime verification for the Linux kernel

Daniel Bristot de Oliveira

If safety-critical systems fail, they can cause significant damage, including loss of life. In this article we consider methods to verify their behavior in production.

Feature

Fostering open innovation in hardware

Yan Fisher

Why is open hardware important? How is the new RISC-V architecture bringing open hardware research to the forefront? How will this impact you? Read on to find out.

Feature

Enhanced observability makes optimizing LLM inference performance easier

Isaiah Stapleton

More metrics and more dashboards mean more ways for researchers to identify actionable improvements. Optimizing the performance, stability, and resource utilization of large language model (LLM) deployments is a challenge for both users and cluster administrators. The Mass Open Cloud (MOC) now supports the ability to collect inference performance metrics for LLMs deployed in our […]

Feature

Building an intelligent multicluster scheduler with network link abilities

Clodagh Walsh

Ryan Jenkins

Simplify scheduling with an intelligent, multicluster-aware scheduler capable of automatically handling dependent Kubernetes resources and ensuring network connectivity between distributed services. Scheduling resources across a multicluster environment is not a trivial task. As part of a recent cloud-to-edge research collaboration, P2CODE, a team of engineers based out of Red Hat’s Waterford office in Ireland took […]

Feature

Creating a Linux-based unikernel

Gordon Haff

Is there a way to gain the performance benefits of a unikernel without severing it from an existing general-purpose code base? Boston University professors, BU PhD students, and Red Hat engineers at the Red Hat Collaboratory at Boston University are getting close to finding the answer. A unikernel is a single bootable image consisting of […]

Feature

Look to the Horizon: Europe’s increased focus on funding open source research is creating new opportunities

Luis Tomás Bolivar

Carlos Camacho

Josh Salomon

Three principal software engineers and sought-after research collaborators share their insights on this critical EU innovation incubator. In February 2021, the European Union launched Horizon Europe, the next phase in its flagship Framework Programme for Research and Innovation. Horizon Europe, which will fund research from 2021 to 2027, was created to drive innovation, research, and […]

Feature

“Open source opens doors”: mentoring students for success

Heidi Dempsey

Research- and leadership-focused support is getting results in the push to grow and diversify the engineering talent pool. The technology industry has largely embraced the theory that diversity drives innovation, but in practice the talent pipeline continues to be leaky. Even when high school preparation is equal, students of color are more likely than white […]

Feature

Optimizing Kubernetes service selection

Daniel Bachar

Is there a way to implement load balancing in multicluster environments that won’t increase resource usage? New research suggests the answer is yes. Multicloud providers and microservice-based applications across clouds are becoming increasingly popular. Organizations that use them enjoy the benefits of high availability, performance improvements, and cost effectiveness. However, as microservices communicate with each […]

Red Hat Research Quarterly

Ops is the new code: Operate First brings open source to operations

Red Hat Research Quarterly

Ops is the new code: Operate First brings open source to operations

Gordon Haff

Red Hat Research Quarterly

February 2022

Operations are attracting increased attention in the open source community, and the open source ethos is evolving to embrace it.

The virtuous cycle of open source development

The rise of operations: Operate First

What does Operate First look like?

The Operate First community

The diverse needs of Operate First personas

Flexibility and freedom

Daniel Bristot de Oliveira

Daniel Bristot de Oliveira

Yan Fisher

Isaiah Stapleton

Clodagh Walsh

Ryan Jenkins

Gordon Haff

Luis Tomás Bolivar

Carlos Camacho

Josh Salomon

Heidi Dempsey

Daniel Bachar