About Research Day US 2019
Red Hat Research Program has organized its very first Research Day on May 6th, 2019 in Boston, Massachusetts the day before Red Hat Summit, which was held on May 7th through 9th. Both events took place at the Boston Convention and Exhibition Center.
Research Day brought together 150 professors, grad students, Red Hat engineers, Red Hat partners and customers to hear presentations on the innovative- yet-practical work Red Hat Research supports.
Research Day featured a collection of talks on current academic and government research projects with a particular emphasis on collaborative projects between Red Hat and Boston area researchers in general and Boston University in particular.
Research Day US Themes and Topics
Research Day Program was divided into two blocks. The main topics were machine learning/data privacy and operating systems and hardware innovation. In the morning, the data governance track examined the problems around big data, privacy, provenance and research, because as a company dedicated to the open source development model, we are deeply interested in the availability of open data sets for AI development.
The afternoon track covered Hardware Diversity and
Linux for complex systems. Our speakers also introduced very interesting work that is being prompted by the increasing expense of developing smaller and faster processors: challenges to find smarter ways to increase performance than simply scaling up processor speed.
Research Day US Agenda
9:00am – 9:15am
Introduction: Looking Into the Future of Collaborative Research
By Hugh Brock, Red Hat
Track: Privacy and Machine Learning
9:15am – 9:30am
Privacy and Machine Learning – Challenges and Opportunities
By Azer Bestavros – track chair, Boston University
9:30am – 10:00am
An Architecture Stack for Data-Driven Infrastructure Management
By Dimosthenis Kyriazis, University of Piraeus
Show the abstract
In several application domains, multi-modal data are exploited towards the provision of innovative services with business value. Data management and analytics frameworks to date emphasize the computational needs and aspects of applications and services deployed in a particular infrastructure.
In this talk, the architecture of a complete stack (namely BigDataStack) will be presented. This stack, based on an infrastructure management system that aims at driving decisions according to data aspects, is fully scalable, runtime adaptable and high-performant to tackle the needs of big data operations and data-intensive applications. Furthermore, the stack goes beyond purely infrastructure elements by introducing techniques for dimensioning big data applications, modeling and analyzing processes as well as provisioning data-as-a-service by exploiting a seamless analytics framework. Technical elements of the talk are linked with specific application scenarios that utilize the components of the BigDataStack architecture.
10:00am – 10:30am
Parallel Machine Learning
By Assaf Schuster, Technion
Show the abstract
Modern deep neural networks are comprised of millions of parameters, which require massive amounts of data and time to train. Steady growth of these networks over the years has made it impractical to train them from scratch on a single GPU. Distributing the computations over several GPUs can drastically reduce this training time, however, stochastic gradient descent (SGD), which is typically used to train these networks, is an inherently sequential algorithm. As a result, training deep neural networks on multiple workers is difficult, especially when using non- dedicated cloud resources trying to maintain high efficiency, scalability, and final accuracy. In this talk we will survey some of the new ideas in this scope and discuss their potential.
10:30am – 11:00am
Model-Driven Discovery and the Science of Data
By Jeffrey Brock, Yale
Show the abstract
In the age of machine learning, artificial intelligence, and data science, we see all around us the impact of powerful tools to extract knowledge from data. Whether in advertising, political campaigns, real-time translation tools, or in the hard sciences, the face of the knowledge frontier has a new complexion. Many of these tools, such as neural networks and deep learning, work alarmingly and uncannily well, and yet we do not fully understand why. Vital questions confront those of us in the academy: how do these tools change the way we understand knowledge acquisition? How do they change how we read texts or analyze political discourse? How do they force us to rethink the scientific method, and how do they allow us to search for new models, theories, and equations that govern the universe?
While tools from machine learning have provided elaborate taxonomies of observed data, innumerable examples show that without a coherent model for a data generating mechanism, spurious conclusions abound. Yet these tools are being used in science at an increasing rate, often with alarming apparent efficacy and impact. In this talk I will engage a pressing modern question: how as a scientific community and as a society can we come to terms with the introduction of these incredible pattern recognition machines, and what must the new mathematics of data do to assure that science remains stable, reproducible, and open?
11:00am – 11:30am
Implementing Secure Multi-Party Computing
By Kinan Dak Albab, Boston University
Show the abstract
Secure Multiparty Computation (MPC) is a cryptographic primitive that allows several parties to jointly and privately compute desired functions over secret data. Building and deploying practical MPC applications faces several obstacles, including performance overhead, complicated deployment and setup procedures, and adoption of MPC protocols into modern software stacks. MPC applications expose trade-offs between efficiency and privacy that may be hard to reason about, formally characterize, and encode in a protocol design or implementation.
We describe technical and non-technical challenges from our experience deploying MPC applications in the real world. We showcase JIFF: an extensible general purpose MPC framework capable of running on web and mobile stacks, showing how developments in distributed systems, web development, and the SMDI paradigm can inform MPC constructs and implementation. JIFF is used to implement several MPC applications, including a successfully deployed study on economic opportunity for minority owned businesses in the Boston area, and a service for efficient privacy- preserving route recommendation.
11:30am – 12:00pm
Homomorphic Encryption, Why and How
By Kurt Rohloff, Duality Technologies
Show the abstract
The discovery of Fully Homomorphic Encryption (FHE) has been one of the major breakthroughs of computer science in the 21st century. FHE allows sensitive data to be encrypted such that arbitrary programs can be securely run over the encrypted data where the output, when decrypted, is equivalent to the result of running the original algorithm on the unencrypted data. FHE is ground-breaking in its ability to enable learning for AI and ML on encrypted data. This talk will review our advances in FHE, from theory, implementation and application perspectives, with a focus on commercially-relevant applications. We focus on regulated industry applications, such as those in the financial and healthcare domains.
1:00pm – 1:30pm
KEYNOTE
By Chris Wright, Red Hat
Track: Operating Systems and Hardware Innovation
1:30pm – 2:00pm
Operating Systems and Hardware Innovation
By Orran Krieger – track chair, Massachusetts Open Cloud & Uli Drepper – track chair, Red Hat
2:00pm – 2:30pm
A Partitioning Hypervisor for Latency-Sensitive Workloads
By Craig Einstein, Boston University; Richard West, Boston University & Bandan Das, Red Hat
Show the abstract
Quest-V is a separation kernel that partitions services of different criticality levels across separate virtual machines, or sandboxes. Each sandbox encapsulates a subset of machine physical resources that it manages without requiring intervention from a hypervisor. In Quest-V, a hypervisor is only needed to bootstrap the system, recover from certain faults, and establish communication channels between sandboxes. Partitioning VMs onto separate machine resources offers an opportunity for per-sandbox power management. Depending on the latency and power demands of each sandbox, it can be suspended to RAM or disk and optionally migrated across hosts to balance system resources and reduce power consumption. Shared machines can be placed into low power states when all sandboxes migrate away from them. Quest-V allows VMs to suspend and resume individual hardware resources without interfering with the operation of other VMs on the same physical platform. This allows for the creation of systems that are both power and latency aware.
2:30pm – 3:00pm
UniKernel Linux (UKL)
By Ali Raza, Boston University & Larry Woodman, Red Hat
Show the abstract
Unikernels are small, lightweight, single address space operating systems, with the kernel included as a library with the application. Because unikernels run a single application, there is no sharing or competition for resources among different applications, improving performance and security. Unikernels have thus far seen limited production deployment. This project aims to turn the Linux kernel into a unikernel with these characteristics: 1) easily compiled for any application, 2) uses battle-tested, production Linux and glibc code, 3) allows the entire upstream Linux developer community to maintain and develop the code, and 4) provides applications normally running vanilla Linux to benefit from unikernel performance and security advantages. UniKernel Linux (UKL) provides the opportunity to pursue many interesting research ideas, e.g. studying advantages of bypassing syscalls and directly invoking internal kernel functionality, studying impacts of link time optimizations across application/kernel boundaries, studying performance benefits from profile driven optimizations, and observing performance gains from simplified user level synchronization mechanisms.
3:00pm – 3:30pm
FPGAs Everywhere in Large Scale Computer Systems
By Martin Herbordt, Boston University & Ahmed Sanaullah, Boston University
Show the abstract
As modern data center workloads become increasingly complex, constrained and critical, mainstream “CPU-centric” computing can no longer keep pace. Future data centers are moving towards a more fluid model, with computation and communication no longer localized to commodity CPUs and routers. Next generation “data-centric” data centers will “compute everywhere,” whether data is stationary (in memory) or on the move (in network). Reconfigurable hardware, in the form of Field Programmable Gate Arrays (FPGAs), are transforming ordinary clouds into massive supercomputers. We will highlight many ways to deploy FPGAs in a data center node, such as traditional back-end accelerators, tightly coupled off- load processors, Smart NICs, Bump-in-the-Wire, and even in the router itself. We will also discuss our efforts to make these devices globally accessible, through deeper integration into software stacks, transparent generation of custom hardware stacks, and device management using reconfigurable hardware operating systems.
3:30pm – 4:00pm
With a Little Help from My Threads: Accelerating Single Thread Execution with Speculating Hyperthreads
By Tommy Unger, Boston University & Jonathan Appavoo, Boston University
Show the abstract
To solve problems, humans tend to synthesize known facts with some amount of new effort. When interesting problems are solved mainly with the former, the solution might be recognized as elegant. To date, we’ve struggled to get computing machines involved in much elegant problem solving. Considering constraints like budget caps and the polar ice caps, this lack of elegance becomes more than an aesthetic issue.
In this talk, I’ll present our research into using read-only “snapshots” as a tool for reducing the “new effort” required to solve computational problems in a cloud “Functions as a Service” (FaaS) case study. Dropping our prototype in as a replacement backend for the Apache OpenWhisk FaaS platform allows a compute node to cache multiplicatively more functions, and to reduce latencies on cache misses. I’ll conclude with plans for summer work making RISC-V FPGA softcores amenable to this trick.
4:00pm – 4:30pm
Automatic Configuration of Complex Hardware
By Han Dong, Boston University & Sanjay Arora, Red Hat
Show the abstract
A modern network interface card (NIC) such as the Intel X520 10 GbE is complex, with hardware registers that control every aspect of the NIC’s operation, from device initialization to dynamic runtime configuration. The Intel X520 datasheet documents over 5600 registers; only ~1890 are initialized by a modern Linux kernel. It is thus unclear what the performance impact of tuning registers on a per application basis will be.
We pursue three goals towards this understanding: 1) Identify, via a set of microbenchmarks, application characteristics that will illuminate mappings between hardware register values and their corresponding microbenchmark performance impact. 2) Use these mappings to frame NIC configuration as a set of learning problems, such that an automated system can recommend hardware settings corresponding to each network application. 3) Introduce either new dynamic or application instrumented policy into the device driver in order to better attune dynamic hardware configuration to application runtime behavior.
4:30pm – 5:00pm
Removing Memory as a Noise Factor
By Parul Sohal, Boston University & Renato Mancuso, Boston University
Show the abstract
Memory bandwidth is increasingly the bottleneck in modern systems and a resource that, until today, we could not schedule. This means that, depending on what else is running on a server, performance may be highly unpredictable, impacting the 99% tail latency which is increasingly important in modern distributed systems. Moreover, the increasing importance of high-performance computing applications, such as machine learning, and Real-Time Systems demands more deterministic performance, even in shared environments. Alternatively, many environments resist running more than one workload on a server, reducing system utilization. Recent processors have started introducing the first mechanism to monitor and control memory bandwidth. Can we use these mechanisms to enable machines to be fully used while ensuring that primary workloads have deterministic performance? We present early results from using Intel’s Resource Director Technology and some insight into this new hardware support. We also look at an algorithm to use these tools to provide deterministic performance on different workloads.