Red Hat Research Quarterly

Shared knowledge or private IP? That is the question

Red Hat Research Quarterly

Shared knowledge or private IP? That is the question

RHRQ interviewed Idan Levi, the Research Interest Group leader in Israel, to get his take on how university research intersects with the open source approach, from datasets and collaboration to security and data privacy.

RHRQ: You lead Red Hat Research in Israel. Based on your experience, what are some of the benefits of sharing software code, environments, and data that produce research results?

Idan Levi leads Red Hat Research in Israel under the Office of the CTO organization. He has over fifteen years of experience in the industry, including the management of R&D teams with extensive knowledge of data systems, design, and architecture.

Idan Levi: First and foremost, the academic world started as open and shared their research with external partners. This is the pillar of human knowledge. So in order to promote research, we must be open. We must be open so that others can collaborate and add their insights. We must be open to criticism so that we understand when we are wrong. And it really connects to the core values of Red Hat: working openly with the diversity of the community.

As you rightly mentioned, it’s not just about the code. It’s also about the data—the data that we produce, the data that we use. In order for us to do research, especially around AI and machine learning, we need tremendous amounts of data. And getting quality data that reflects event time series and different parameters can sometimes be difficult. Without access to this amount of data, it is difficult to meaningfully advance research. 

Researchers working in open source are able to access data that they otherwise would not have access to. Our work, for example, within OpenInfra labs helps to deliver open source tools to run cloud, container, AI, machine learning, and edge workloads efficiently, repeatedly, and predictably.

RHRQ: In recent years, a lot of companies with huge investments in software development have gotten involved with open source research. How does this affect the development of open source software?

Idan Levi: I think this speaks to a trend that we have seen over the years. Take a look at the field of big data and data processing. These fields are heavily dependent on open source research and contributions. Hadoop and Spark are good examples of open source utilities that come to mind to enable all of that. 

We are also seeing a lot of partnerships that are built around these projects. By moving their research from being proprietary and opening up their use cases, these companies are advancing innovation in a scalable way. Through collaboration with upstream communities, they can get feedback and better understand what challenges they need to solve. And I’ve witnessed this. We recently had a great accomplishment where one of our research partners approached us and said, “Hey, we have a great technology to enable data skipping for Spark. And we even incorporated it to our cloud offering. But nobody outside really uses it and understands the benefits. If we are to collaborate, we’ll open source it.”

This was a very interesting conversation. The way it works is that you first open source the technology, get some feedback, change it a bit, and only then do we use it. The researchers and the engineers understood how to do it. It took a little longer for the managers. But once they got to meet the community, it became apparent to them that it’s not about just fixing a bug. We also needed to consider how plugable the technology is as well as how best to maintain it. This kind of thinking certainly makes the solution much better. 

I think with the rapid pace of change today you cannot really hold on to a particular piece of IP for too long because someone will try to go past it.

With open source research, innovation happens in the upstream. This process helps to address customer problems in a more efficient way. It’s not just about a solution that looks for a problem anymore. It’s about the synergy of problem solving and evolving, together.

RHRQ: How would you say intellectual property (IP) is handled in the open source research realm?

Idan Levi: That’s a challenging question. I think with the rapid pace of change today you cannot really hold on to a particular piece of IP for too long because someone will try to go past it. Someone will either recruit better people, wait for the patent to end, or try to work it in another way. In my opinion, holding onto a single IP does not really serve as a unique advantage over time, especially with the history of long patent wars over the years. It actually slows down innovation. The real value comes from collaborations, partnerships, and exchanging ideas.

Bear in mind that customers want to work with companies toward a certain vision they share of the future, not just because they are locked in to an architecture, solution, or platform. Vendor lock-in makes it difficult for customers to integrate with some of the other systems they own from a different provider.

I think the world of IP that locked you in is changing. In this new ecosystem where collaboration drives innovation, we find that more partners and customers are viewing IP as a hindrance. 

RHRQ: One big concern expressed by partners or customers who are just getting  started with open source is data privacy and security. How can they still get involved with open source research?

Idan Levi:  There’s both philosophical opinions as well as proven studies that show open source is generally more secure. I can also understand and relate to the concern about open source and data privacy rather than open source and security. With regards to open and secure, you have many more people looking at the code base and therefore it is easier to test and fix issues. It is much easier to file a pull request rather than contact a vendor through their support channels to report a bug. And in terms of data privacy, there are also systems that allow you to publicly share information about a dataset while withholding information about individuals in the dataset.

More like this