Red Hat Research Quarterly

Research project updates—February 2023

Red Hat Research Quarterly

Research project updates—February 2023

Each quarter, Red Hat Research Quarterly highlights new and ongoing research collaborations from around the world. This quarter we highlight collaborative projects in Israel at The Technion, The Ben Gurion University of The Negev, Ariel University, Reichman University, and The Hebrew University.

Contact academic@redhat.com for more information on any project described here, or explore more research projects at research.redhat.com.

PROJECT: CCO: Cloud cost optimizer

ACADEMIC INVESTIGATOR: Prof. Assaf Schuster (Technion)

Red Hat investigator: Ilya Kolchinsky

Cost optimization is one of the core challenges for users of cloud computing platforms: given a workload, how can one minimize the monetary cost of its deployment and execution over the cloud? Accurately answering this fundamental question for arbitrarily complex real-life workloads is an exceedingly hard problem. This project aimed to develop a tool, the cloud cost optimizer (CCO), capable of doing exactly that. 

Researchers have completed the project successfully. CCO retrieves information regarding the prices and capacities of all instance types, regions, availability zones, and so forth, as advertised by the cloud provider. Based on this information, CCO calculates the optimal allocation of workload components to minimize the overall cost. 

A fully functional version of the CCO supporting AWS, Azure, and hybrid mode (splitting the workload across multiple cloud providers) is now available for use. Incorporating advanced optimization techniques, CCO can create close-to-optimal deployment plans for workloads of hundreds to thousands of components within mere seconds. Watch for more details in an upcoming article of RHRQ, and visit the GitHub repository.

PROJECT: AppLearner: learn and predict the resource consumption patterns of your OpenShift application

ACADEMIC INVESTIGATOR: Prof. Assaf Schuster (Technion)

Red Hat investigator: Ilya Kolchinsky

This project targets the problem of accurately estimating resource requirements for workloads running over the Red Hat OpenShift Container Platform and adjusting these estimations during the course of application execution. For most real-life applications, It is notoriously difficult to manually estimate resource consumption patterns and thus tune the pod CPU/memory requirements accordingly to avoid under- and overutilization. While there are attempts to solve this problem by on-the-fly monitoring and adaptively scaling pods when changes are detected, these solutions could be inefficient when the application behavior is highly dynamic. Instead, AppLearner is proactively defining the provisioning plan for an application by learning and predicting its resource consumption patterns over time.

The project is now in the research phase with the goal of identifying the most promising approach to learning and predicting CPU and memory consumption of a sample set of workloads. We expect the early prototype to be available for use by Q4 of 2023.

PROJECT: SpotOS

ACADEMIC INVESTIGATOR: Prof. Assaf Schuster (Technion)

Red Hat investigators: Josh Salomon, Gabriel BenHanokh, Orit Wasserman, and Avishay Traeger

The SpotOS project aims to devise a distributed cloud-based operating system that uses unreliable or temporarily available resources to provide a reliable and scalable execution experience with a high quality of service by harnessing the power of spot instances, resources representing the currently unused cloud capacity. While spot instances are considerably cheaper than regular instances, the cloud provider can unexpectedly reclaim them anytime. In this case, a very limited time window is given to the running application to back up its current state. SpotOS aims to overcome this limitation by providing a reliable, adaptive, self-healing, user-transparent layer with spot instances serving as the underlying unreliable building blocks.

The work on SpotOS is ongoing in multiple directions. First, researchers are building the key innovation component: the EDM (external distributed memory). With EDM, the application state is split among multiple storage units (that could be hosted on regular instances, spot instances, or a mix of both) in sufficiently small chunks to complete the evacuation on time. A limited prototype of the EDM has been tested in the lab environment, and work has begun to make it applicable to real-life workloads and use cases.

Another work direction revolves around the life migration of SpotOS applications. When a spot instance is down and the customer application has to be moved to a different spot, this transition must be as seamless as possible. Achieving this smooth migration requires technical innovations, which are the focus of this subproject.

Finally, the project team is building the controller for SpotOS, a component integrating and managing the framework’s various components. This includes the EDM and the migration manager, as described above, as well as other independent projects such as CCO and AppLearner.

PROJECT: Cluster autoscaling DDoS attacks

ACADEMIC INVESTIGATOR: Prof. Anat Bremler-Barr (Reichman University)

Red Hat investigator: Benny Rochwerger

YoYo attack is a new type of DDoS (Distributed Denial-of-Service) attack based on abusing the auto-scaling mechanism by causing it to oscillate between scale-up and scale-down, thus causing economic damage (EDoS) or crashing the victim application by consuming its computation resources. As its name suggests, YoYo is based on periodic bursts of high and low traffic on the target. These bursts are short and hard to detect. YoYo attack is considerably more cost-effective than a regular DDoS on Kubernetes.

The goals of this project are as follows: 

  • Studying the YoYo attack and its impact on OpenShift-based infrastructure
  • Devising a set of best practices for cluster admins to minimize the susceptibility of an OpenShift cluster to YoYo or similar attacks
  • Investigating more general strategies for mitigating this attack class

The project is now in advanced research stages, with several published papers mainly covering the first goal. Researchers plan to invest more resources in the second and third goals during 2023.

PROJECT: Tuning QUIC protocol for Ceph workloads

ACADEMIC INVESTIGATORS: Prof. Anat Bremler-Barr and Jonathan Plotkin (Reichman University)

Red Hat investigator: Yuval Lifshitz

QUIC (Quick UDP Internet Connections) is a general-purpose transport layer network protocol designed by Google offering significant advantages over TCP, such as greatly reduced latency. This project aims to utilize the strengths of QUIC for communication between the components of Red Hat Ceph Storage, particularly between Ceph Object Gateway (RGW) and Ceph clients. To that end, researchers will study the characteristics of the workloads in the relevant use cases and adapt and tune QUIC accordingly, possibly resulting in a new protocol variation and/or implementation.

This effort is still in its preliminary stages. The goals and milestones have been approved, and a graduate student allocated to the project is ready to start.

PROJECT: CVE mining and prediction

ACADEMIC INVESTIGATORS: Prof. Anat Bremler-Barr and Dr. Tal Shapira (Reichman University)

Red Hat investigator: Keith Grant

The CVE (Common Vulnerabilities and Exposures) database contains more than 190,000 vulnerability records, of which more than 20,000 CVEs were registered in 2021. A plethora of highly useful information and invaluable insights could be extracted from CVEs and similar data sources. Acquired knowledge could be used to predict future vulnerabilities or estimate the probability of an exploit of a particular software.

However, despite the immense potential of CVE analysis and mining, this area gets little attention. The main reason is the sheer difficulty of coping with such a large volume of highly unstructured information. This project aims to implement and apply an innovative approach to harvesting knowledge from CVEs by utilizing a knowledge graph that encodes the information in the form of entities and connections. The proposed method uses NLP (natural language processing) to parse unstructured textual data from CVEs and populate a knowledge graph which could then be used for answering user queries and/or predicting the not-yet-known parts.

The project is in the early planning stage. The participating researchers are working closely with the Product Security team to identify the most promising and valuable directions and ways of applying the knowledge graph approach to mine the most important and relevant insights from the CVE database.

PROJECT: Service mesh performance study

ACADEMIC INVESTIGATORS: Prof. Anat Bremler-Barr and Yaniv Naor (Reichman University)

Red Hat investigator: Sanjeev Rampal

A service mesh is a dedicated infrastructure layer that controls service-to-service communication over a network. It provides a way to control how different parts of an application share data. In recent years, the number of Kubernetes service meshes and systems that adopt a service mesh has rapidly increased. 

Since performance has a key role in every system, performance analysis and comparison between the leading service mesh technologies could benefit the community in at least two ways. First, a detailed comparative evaluation of the alternatives could help decide which service mesh to use for a particular workload. Second, such a study has the potential to reveal fundamental flaws and limitations in the way service mesh technologies function, thus paving the way for future research tasked with addressing these flaws and improving the current state of the art.

The goal of this project is to perform this analysis and empirical evaluation. The design and the planning are now complete, and the team is working on building the corresponding experimental setup.

PROJECT: Advanced proactive caching for heterogeneous storage systems

ACADEMIC INVESTIGATORSDr. Gabriel Scalosub and Dr. Gil Einziger (Ben Gurion University of The Negev)

Red Hat investigators: Guy Margalit, Josh Salomon, Orit Wasserman, and Gabriel BenHanokh

Caching is one of the most effective optimization techniques in large distributed systems. However, the standard approach in the industry still relies on relatively generic policies. These methods exhibit several drawbacks. Most importantly, they are non-adaptive and reactive rather than proactive, so they cannot leverage system-specific and workload-specific patterns for making caching decisions in advance.

This project seeks to improve the performance of NooBaa, an object data service for hybrid and multicloud environments, by developing novel caching frameworks that take into account request heterogeneity and perform proactive caching decisions (also referred to as speculative prefetching). 

The intended contributions of this project are as follows:

  • To close the above gap algorithmically by devising new approaches for caching in storage systems incorporating heterogeneity and proactivity
  • To upstream our suggested solutions by extending NooBaa

The project is progressing toward completing these two goals by the end of 2023.

PROJECT: Software diagnosis with log files

ACADEMIC INVESTIGATORS: Dr. Meir Kalech and Dr. Roni Stern (Ben Gurion University of The Negev)

Red Hat investigator: Gil Klein

This project aims to create an automated tool to identify software failures and isolate the faulty software components (e.g., classes and functions) that caused the failure without using code coverage. The core idea is to leverage the information in the system log files instead and use it as an approximation of coverage. Such a solution could be used by QE engineers tasked with testing large distributed systems (such as Kubernetes/OpenShift) that cannot efficiently and scalably support coverage.

The project launched in January 2023.

SHARE THIS ARTICLE

More like this