Red Hat Research Quarterly

Reproducible research

Hugh Brock

Hugh Brock is the Research Director for Red Hat, coordinating Red Hat research and collaboration with universities, governments, and industry worldwide. A Red Hatter since 2002, Hugh brings intimate knowledge of the complex relationship between upstream projects and shippable products to the task of finding research to bring into the open source world.

Article featured in

Red Hat Research Quarterly

August 2020

Download PDF

Subscribe now

In this issue

From the Director

Reproducible research

Hugh Brock

News

Red Hat Research Days coming this fall

Gagan Kumar

Heidi Dempsey

News

Why you should (virtually) attend Devconf.US

Gordon Haff

Feature

How expensive is it to crack a password derived with Argon2? Very

Vojtěch Polášek

Feature

Don’t blame the developers: making security usable for IT professionals

Martin Ukrop

Feature

Isn’t multi-tenancy Ironic?

Tzu-Mainn Chen

Lars Kellogg-Stedman

Interview

Voyage into the open Dataverse

Sherard Griffin

Feature

Fostering open innovation in hardware

Yan Fisher

Feature

How to open source cloud operations

Marcel Hild

Project Updates

Greater Boston research update: June 2020

If a tree falls in the forest, but you can’t reproduce it, how do you know if it made a sound or not?

Readers of this magazine should, I would think, be familiar with the basic process that constitutes the scientific method: decide on a hypothesis based on one’s experience and ideas of what “ought” to be true; design an experiment that may supply data to support that hypothesis; analyze the data in the hopes that it is conclusive. The “conclusive” nature of the data may either support or disprove the hypothesis, and either result is good in that both the support and the negation of a hypothesis are a new piece of knowledge.

There is, of course, a follow-on action to this process that is of critical importance: share the experiment, the data, and the analysis, such that others can reproduce the result and verify it. The requirement that you share everything for reproducibility, although not a formal part of the method, is perhaps the most critical step. Without reproducibility, science cannot advance beyond a single lab, and may in fact not advance at all. The community of researchers—the grandparent of the open source development communities many of us participate in today—is where the true value of science is realized, and if there is such a thing as progress in the world this community is surely at the heart of it.

We think these techniques are critical not only for the advancement of science but for open source development in AI…

Unfortunately, for scientific research that involves people—as is often the case in medicine, the social sciences, or artificial intelligence—sharing experimental data may be impossible due to privacy or similar concerns. This not only slows scientific progress but can lead to false results due to innocent mistakes or worse. Our interview in this issue features two people—Dr. Mercè Crosas and Dr. James Honaker—whose work is devoted to making experimental datasets consistently and universally available. Their work enables researchers developing statistical techniques to glean knowledge from data without seeing the raw data itself. We think these techniques are critical not only for the advancement of science but for open source development in AI, where training data for deep learning is a key tool. I think you’ll find the interview fascinating.

Speaking of innocent mistakes, have you ever clicked “OK” on a threatening-looking security warning without really reading it? I know I have, and felt vaguely nervous about it every time. Don’t feel bad, though: Martin Ukrop’s work on “usable security” shows that seemingly minor improvements to the text of common security warnings makes an outsize difference in whether people respond to them appropriately. Clicking through an obscure warning that doesn’t make any sense may not be entirely your fault after all.

While we’re discussing open data, I want to highlight our call to action in this issue to participate in an experiment we are launching with Boston University on Open Operations. We intend to operate a cloud in partnership with BU and Harvard University that will allow the collection and analysis of all the operational data about that cloud (with the express permission of the users, of course). It is time we made operations at scale into an open discipline, like open source software. See the article on Operate First to learn more.

Finally, on a personal note, all of us at Red Hat Research feel very fortunate to be largely unaffected by the pandemic we are dealing with in our various countries. We hope you readers have been similarly fortunate. If anything positive comes out of this, perhaps it will be some level of return to the idea that hypothesis, experiment, and proof or disproof are the only way to be mostly sure of anything.

SHARE THIS ARTICLE

A thread model for the real-time Linux kernel

Daniel Bristot de Oliveira

The recent advances in AI and telecommunications are enabling a new set of complex cyber-physical systems, including those for safety-critical applications.

Feature

Mental models: Qualitative research to design for Red Hat OpenShift users

Carl Pearson

Brian Dellascio

Sarahjane Clark

To design effectively for our users, we need to learn more about them. If we don’t, we may make a product that our users can’t be efficient in, or worse, a product that our users have no need for in the first place.

Column

How COVID made our world smaller

Idan Levi

RHRQ interviewed Idan Levi, the Research Interest Group leader in Israel, to find out how research collaboration has changed over the last year and a half as the world went virtual. RHRQ: As the leader of Red Hat Research in Israel, you work with universities that are geographically dispersed, for example, Technion to the north, […]

Feature

Scaling the PEAKS of sustainability with insights from Kepler and machine learning

Han Dong

Parul Singh

A proposed Kubernetes scheduler plugin aims to introduce energy efficiency as a factor in dynamic scheduling while still meeting performance requirements. Businesses in many sectors are setting aggressive sustainability goals, from transitioning to renewable energy sources to reducing existing consumption. Nowhere is the pressure to meet these goals more urgent than in the technology sector, […]

Interview

The right idea at the right time: networking researchers use open source for real-world results

Toke Høiland-Jørgensen

We invited Red Hat Principal Kernel Engineer Toke Høiland-Jørgensen to interview Anna Brunström, currently a Full Professor and Research Manager for the Distributed Systems and Communications Research Group at Karlstad University, Sweden. Prof. Brunström has a background in distributed systems, but her main area of work over the last years has been in computer networking. Their wide-ranging conversation covers programmable networking, open data, diversity in IT fields, and more.

Feature

Meet osnoise, a better tool for fine-tuning to reduce operating system noise in the Linux kernel

Daniel Bristot de Oliveira

Research on the root causes of OS noise in high-performance computing environments has produced a tool that can provide more precise information than was previously available. The Linux operating system (OS) has proved to be a viable option for a wide range of very niche applications, despite its general-purpose nature. For example, Linux can be […]

Feature

A data-driven approach for analyzing Common Criteria and FIPS 140 security certificates

Jaroslav Řezník

Petr Švenda

Seccerts is a much-needed tool for data scraping and analysis of security certificates, but creating it was harder than expected. Here’s why. Security certification documents from certification schemes like Common Criteria (CC) and the National Institute of Standards and Technology (NIST) Federal Information Processing Standard (FIPS) contain valuable, detailed information. Most of it, however, is […]

News

Programmable networking project reports on its first year of progress

Toke Høiland-Jørgensen

Researchers from Red Hat and Karlstad University, Sweden, have recently finished their first year of work on enhancing the performance of the eXpress Data Path (XDP), a data path integrated into the Linux kernel that permits flexible programmable networking. The group’s year one report, “Building the next generation of programmable networking—powered by Linux,” was released […]

Feature

Meet CCO: a scalable multicloud cost optimizer for complex workloads

Ilya Kolchinsky

Cost optimization is a core challenge for users of cloud computing platforms. An open source tool is now available to solve it. The era of cloud computing has introduced endless possibilities through access to vast amounts of computing power, storage, and software over the internet. This growth has led to a shift towards remote work, […]