If a tree falls in the forest, but you can’t reproduce it, how do you know if it made a sound or not?
Readers of this magazine should, I would think, be familiar with the basic process that constitutes the scientific method: decide on a hypothesis based on one’s experience and ideas of what “ought” to be true; design an experiment that may supply data to support that hypothesis; analyze the data in the hopes that it is conclusive. The “conclusive” nature of the data may either support or disprove the hypothesis, and either result is good in that both the support and the negation of a hypothesis are a new piece of knowledge.
There is, of course, a follow-on action to this process that is of critical importance: share the experiment, the data, and the analysis, such that others can reproduce the result and verify it. The requirement that you share everything for reproducibility, although not a formal part of the method, is perhaps the most critical step. Without reproducibility, science cannot advance beyond a single lab, and may in fact not advance at all. The community of researchers—the grandparent of the open source development communities many of us participate in today—is where the true value of science is realized, and if there is such a thing as progress in the world this community is surely at the heart of it.
We think these techniques are critical not only for the advancement of science but for open source development in AI…
Unfortunately, for scientific research that involves people—as is often the case in medicine, the social sciences, or artificial intelligence—sharing experimental data may be impossible due to privacy or similar concerns. This not only slows scientific progress but can lead to false results due to innocent mistakes or worse. Our interview in this issue features two people—Dr. Mercè Crosas and Dr. James Honaker—whose work is devoted to making experimental datasets consistently and universally available. Their work enables researchers developing statistical techniques to glean knowledge from data without seeing the raw data itself. We think these techniques are critical not only for the advancement of science but for open source development in AI, where training data for deep learning is a key tool. I think you’ll find the interview fascinating.
Speaking of innocent mistakes, have you ever clicked “OK” on a threatening-looking security warning without really reading it? I know I have, and felt vaguely nervous about it every time. Don’t feel bad, though: Martin Ukrop’s work on “usable security” shows that seemingly minor improvements to the text of common security warnings makes an outsize difference in whether people respond to them appropriately. Clicking through an obscure warning that doesn’t make any sense may not be entirely your fault after all.
While we’re discussing open data, I want to highlight our call to action in this issue to participate in an experiment we are launching with Boston University on Open Operations. We intend to operate a cloud in partnership with BU and Harvard University that will allow the collection and analysis of all the operational data about that cloud (with the express permission of the users, of course). It is time we made operations at scale into an open discipline, like open source software. See the article on Operate First to learn more.
Finally, on a personal note, all of us at Red Hat Research feel very fortunate to be largely unaffected by the pandemic we are dealing with in our various countries. We hope you readers have been similarly fortunate. If anything positive comes out of this, perhaps it will be some level of return to the idea that hypothesis, experiment, and proof or disproof are the only way to be mostly sure of anything.