A new working group is tackling observability in production.
Observability has become an increasingly hot topic given the challenges of reliably operating distributed systems such as those in Kubernetes environments. The term can cover a lot of ground, but a typical definition of observability spans metrics, tracing, and logging. Even if monitoring is often considered as something distinct, it’s at least closely related. A key part of observability is the automatic collection and transmission of data. In other words, telemetry.
It’s a cross-research university, cross-company, and cross-open source effort.
There is no shortage of open source projects in this space. However, the production-level testing and refinement of these tools—together with their associated procedures and datasets—has been much less common in an integrated multi-tenant open environment. That’s the problem that the new Telemetry Working Group (WG) is tackling.
A variety of other initiatives are related to the Telemetry WG. OpenInfra Labs (openinfralabs.org, under the Open Infrastructure Foundation) is hosting the working group. Operate First (operate-first.cloud) will house the experiments and research associated with the group. Initially, the group will focus on Kubernetes, but their work may be extended to other high-performance computing environments over time. The Mass Open Cloud (MOC; massopen.cloud), which sponsors and hosts a large portion of Operate First, is also involved, as is the New England Research Cloud (nerc.mghpcc.org).
It’s a cross-research university, cross-company, and cross-open source effort. This specific initiative was first kicked off by Boston University’s Michael Daitzman, although there have been other discussions and work going on in this general area for a while. It’s now co-chaired by Tufts University’s Raja Sambasivan and Marcel Hild, a manager of software engineering in Red Hat’s Office of the CTO.
The group’s goals are as follows:
- Create open datasets for research
- Provide access to a platform for telemetry research
- Define and implement a standardized application stack, i.e., the gold standard
- Define research problem statements around telemetry
- Iterate over implementations of solutions on those problem statements
Another explicit goal is to not create new open source projects. As Hild puts it, “We have a large number of projects solving similar enough problems. The challenge these days lies in connecting these projects and operating these projects in a real environment.” He adds, “We don’t want to do everything in a lab; that’s a controlled environment. And controlled environments are only so good.”
A core premise of the working group from the beginning has been to operate in public and to make any code open source over time, even if it’s not at the very beginning, as well as any data that does not include personally identifiable information. Anyone is welcome to participate. Meetings are recorded and can be accessed via the Telemetry Working Group Playlist on the MOC YouTube page (bit.ly/telemetryWG). The group’s repository is on GitHub (github.com/open-infrastructure-labs/telemetrywg ).