Red Hat Research Quarterly

Big data, security certification, and FPGAs: 2021 Red Hat Research Days have begun

Red Hat Research Quarterly

Big data, security certification, and FPGAs: 2021 Red Hat Research Days have begun

about the author

Gordon Haff

Technology Evangelist at Red Hat, where he works on emerging technology product strategy, writes about tech trends and their business impact, and is a frequent speaker at customer and industry events. His books include How Open Source Ate Software, and his podcast, in which he interviews industry experts, is Innovate @ Open.

Article featured in

Articles:

This year has already brought us several Research Days discussions streaming around the world. They have covered topics as diverse as big data stream processing, analyzing security certification reports for potential device and product vulnerabilities, and using open source tools to program FPGA applications.

Ilya Kolchinsky, Senior Software Engineer at Red Hat in Israel, kicked things off on March 2 by describing a growing problem. A large number of data-driven systems and applications have become an integral part of our daily lives, and this trend is accelerating dramatically. An estimated 1.7 MB of data is created every second for every person on Earth, for a total of over 2.5 quintillion bytes of new data every day, projected to reach 163 zettabytes by 2025. In addition to the growing volume, velocity, and variety of continuously generated data, novel technological trends such as edge processing, IoT, 5G, and federated AI bring new requirements for faster processing and deeper, more computationally heavy data analysis. Hence the challenge: old-school data processing mechanisms are no longer enough.

The research group hypothesized that a solution to P^4 could thus be built using existing open source tools…

In a spirited discussion with conversation leader Oren Oichman, Senior Cloud Consultant at Red Hat, Ilya explored potential ways to analyze this data dynamically, with an approach called Big Data Stream Processing (BDSP). BDSP uses a variety of methods for scalable and efficient data processing that do not rely on traditional databases for storing and processing the data. Ilya and Oren discussed specific examples of real-life applications that can greatly benefit from incorporating BDSP capabilities. In particular, he covered on-the-fly detection of complex patterns in streaming and stream-oriented machine learning and data mining.

Later in March, Petr Švenda, Faculty of Informatics, Masaryk University in Brno, Czech Republic, noted that long security certification reports can be a trove of publicly available data about proprietary devices and other products otherwise available only under NDA. While downloading and reading a single certificate is easy, reasoning about the characteristics of the whole associated ecosystem, which might have more than ten thousand certified devices, is much harder. Petr’s talk addressed using an open source tool for automatic analysis of publicly available certification reports to answer questions like these: Are there observable systematic differences between the Common Criteria and FIPS 140-2 certificates? Can I quickly find out whether my device is using a certified component recently found vulnerable? Most importantly, can we measure and quantify the extent to which the whole process is actually increasing the security of products being certificated?

Finally, Martin Herbordt, Professor of Electrical and Computer Engineering at Boston University, and Robert P. Munafo, a PhD candidate there, discussed practical plans for programming FPGAs (Field Programmable Gate Arrays) in the datacenter. FPGAs—flexible chips that can be “programmed” again and again with different code paths—are now essential components in the datacenter and on the edge, with millions currently deployed. FPGAs are in a wide variety of system components and provide such critical functions as SDN, encryption/decryption, and compression. Yet for nearly all system providers, much less system users, programming these FPGAs is impossible. Martin and Robert, along with Red Hat Senior Data Scientist Ahmed Sanaullah, who also joined the conversation, have been working to enable high-level language programming for FPGA application development, especially in the datacenter and at the edge, exclusively using existing open source tools.

Previous research by Martin and others showed that current compilers could deliver excellent FPGA performance for arbitrary C code, but that this capability was brittle, inconsistent, and required special programmer expertise to extract. Taking advantage of the flexibility and performance potential of FPGAs has typically required either expensive specialized engineering talent, commercial proprietary C-to-hardware tools that yielded demonstrably poor performance, or both. This is the performance portability programmability problem (P^4). 

P^4 can be reduced to the problem of generating the correct sequence of optimizations for a particular input code and target architecture. The research group hypothesized that a solution to P^4 could thus be built using existing open source tools, primarily based on the GNU C Compiler (GCC). In particular, they discussed an ongoing project that aims to use machine learning to control a newly customizable version of the GCC to automatically determine optimization pass ordering for FPGA targets specifically, and thereby improve performance as compared to existing proprietary C-to-FPGA methods. This research is continuing as part of the Red Hat Collaboratory at Boston University (bu.edu/rhcollab).

More like this