Red Hat Research Quarterly

Ops is the new code: Operate First brings open source to operations 

Red Hat Research Quarterly

Ops is the new code: Operate First brings open source to operations 

about the author

Gordon Haff

Technology Evangelist at Red Hat, where he works on emerging technology product strategy, writes about tech trends and their business impact, and is a frequent speaker at customer and industry events. His books include How Open Source Ate Software, and his podcast, in which he interviews industry experts, is Innovate @ Open.

Article featured in

Operations are attracting increased attention in the open source community, and the open source ethos is evolving to embrace it.

The focus of open source was initially on the code. Over time, however, the health of communities creating that code and associated artifacts such as documentation has also become an open source issue. The approach to governing projects and onboarding contributors hasn’t replaced historical concerns such as licensing, but it has assumed a more prominent role.

The virtuous cycle of open source development

That we still talk about the open source development model is telling. That language emphasizes developers and other participants in that virtuous cycle, such as users and businesses. Thus, fundamentally, the focus is still on the code. But there’s also a dawning recognition that just writing code in a vacuum isn’t sustainable for most significant projects.

The attention to code is understandable. Even when proprietary software was nearly the only game in town, vendors were focused on delivering packaged bits to users with maybe some consulting on the side to get it running. Open source software freed users from a vendor’s proprietary IP and let them harness the innovation in a community extending beyond a single company. But it didn’t really change the software delivery model. Users were still mostly obliged to operate the software by themselves.

The rise of operations: Operate First

This model is changing in the modern era. Operations are becoming as important as, and sometimes more important than, code. Software-as-a-Service and public cloud providers have increasingly offloaded the operational burden of software from users. This is a challenge for open source software. While the open source development model is powerful, the value of software lies in operationalizing it so that a user can be productive with it.

One approach to dealing with this challenge is to bring something akin to the open source development model to operations. Enter Operate First.

Operations are becoming as important as, and sometimes more important than, code.

The term Operate First comes from an open source development model best practice, Upstream First. With Upstream First, the goal is to get every line of code into an upstream project before it ships as a product. This keeps the community project and downstream product closely aligned and reduces the effort of maintaining divergent code trees. An Upstream First approach recognizes that the value of open source lies not so much in the ability to view source code but in fully embracing an open approach to creating software.

You can think of Operate First as a concept, philosophy, and vision to improve open source software through open sourcing operations. In an Operate First environment, open source code is tested and proven under real workloads running at scale as they would in production. This creates a feedback loop for developers seeking to improve code operationally. Operate First and associated initiatives also aim to document how production deployments are architected and deployed. In addition to documenting best processes and practices, the Operate First project will have an Infrastructure-as-Code repository.

What does Operate First look like?

Concretely, Operate First is a project to define, build, and improve the open source hybrid cloud through learning and developing code and practices in an open production community cloud. By incorporating operational experience into open source software development, Operate First extends development to include operating, testing, and proving code in a production environment—and simplifying the deployment of that code. It builds on and complements a variety of nascent and ongoing projects in the cloud space. 

Operate First started as a segment of the Mass Open Cloud (MOC) called the zero cluster, a production cloud set up to host projects and developers seeking to operate first. Announced in 2014, the MOC is a production public cloud based on the model of an Open Cloud Exchange (OCX). In this model, many stakeholders, rather than just a single provider, participate in implementing and operating the cloud. 

In addition to the MOC, Operate First is closely associated with various overlapping initiatives, including OpenInfra Labs (under the Open Infrastructure Foundation) and the Red Hat Collaboratory at Boston University

OpenInfra Labs hosts the Telemetry Working Group, one of the working groups included under the Operate First umbrella. Observability of infrastructure has become an increasingly hot topic given the challenge of reliably operating distributed systems such as those in Kubernetes environments. The term can cover a lot of ground, but a typical definition of observability spans metrics, tracing, and logging. Monitoring is often considered something distinct, but it’s also at least closely related. A key part of observability is the automatic collection and transmission of data about the system. In other words, telemetry. Telemetry is, therefore, an integral component of Operate First.

The Operate First community

Operate First logo

The development of a community around Operate First is still in its early stages. A primary goal of that development is recognizing that there are many constituencies with disparate concerns and motivations. Operate First founders want to engage with them in a manner and through a path that those constituencies prefer. 

To start this process, community leaders conducted a series of interviews with a variety of different stakeholders: developers, quality engineering (QE), site reliability engineers (SRE), traditional system admins, data scientists, and others. The objective here was two-fold. First, it was important to understand, for each role, their most pressing day-to-day concerns, what motivated them, how they measured success, and what would make Operate First of interest to them. Second, to keep things simple, identifying and combining roles that largely shared motivations and concerns would make it easier to focus engagement efforts.

The diverse needs of Operate First personas

Quality engineers who write testing frameworks and tests have an increasing amount of overlap with more traditional developers of applications and other code. Both are motivated by improving customer and internal user experiences, especially when doing so involves solving novel problems. They measure success with metrics such as satisfaction of and adoption by their constituencies as well as productivity and code quality metrics. Operate First serves these goals by encouraging and enabling software design that builds in operational capabilities while keeping the person who needs to operate the software in mind.

From an operational perspective, the focus is shifting away from traditional sysadmin roles that deal mainly with maintaining and upgrading hardware and software infrastructure using tools like scripts and configuration management. While those tasks continue, site reliability engineers (SREs) spend a significant amount of time on development tasks such as adding new features, improving scalability, and automating. SREs interact extensively with cloud APIs, whether on premises or in a public cloud. SREs aim to do more with less; the ratio of SREs to the number of managed clusters is one important metric, as is their uptime.

Operate First serves these goals by encouraging and enabling software design that builds in operational capabilities while keeping the person who needs to operate the software in mind.

In addition to developer and operations personas, the data scientists and data engineers in the OpenDataHub community have also been early adopters of Operate First. OpenDataHub is a blueprint for building an Artificial Intelligence (AI)-as-a-Service platform that integrates a variety of open source machine learning tools, including Kubeflow, Kafka, Seldon, PyTorch, and Jupyter notebooks on the Red Hat® OpenShift® Container Platform.

For these audiences, Operate First provides:

  • A cluster to develop and run AI applications
  • GitHub organizations to share and collaborate on open source projects
  • Custom image pipelines to publish reproducible experiments 
  • Real production operations data for tackling machine learning problems in AIOps

Furthermore, operating a subset of OpenDataHub at scale creates an opportunity to document best practices, which can, in turn, be fed into Red Hat OpenShift Data Science, the managed cloud service offering based on OpenDataHub. Just as the open source development model forms a virtuous cycle when working as intended, Operate First can lead to a beneficial circle for operational knowledge and supporting code.

Flexibility and freedom

The ultimate goal of Operate First is to free software users from having to make a false choice. It brings the power of the open source development model to operationalizing software. Fully operationalized software is software that maintains the flexibility of open source software that isn’t tied to a single cloud provider, while also simplifying and improving the Day Two operations of that software.

It’s not simplicity or choice. It’s simplicity and choice.

More like this