Red Hat Research Quarterly

Unleashing the potential of Function as a Service in the cloud continuum

Red Hat Research Quarterly

Unleashing the potential of Function as a Service in the cloud continuum

about the author

Luis Tomás Bolivar 

Luis Tomás Bolivar is a Principal Software Engineer at Red Hat, Spain. He is currently working in the Ecosystem Engineering group and on research activities with a focus on cloud computing in general, and automation, networking, and AI in particular. He holds a PhD in Computer Science (University of Castilla-La Mancha, Spain) and has been involved in several EU projects. 

about the author

José Castillo Lema

José Castillo Lema is a Senior Software Engineer at Red Hat working with the Telco 5G performance/ scale team. During his MsC and PhD studies, he worked on QoS routing in SDN and NFV Management and Orchestration. Has been teaching postgraduate courses for the last 6 years.

Article featured in

The PHYSICS project demonstrates the value of the FaaS paradigm for application development and data analysis. Here’s how we enhanced the infrastructure layer.

The difficulty of scaling, optimizing, and maintaining infrastructure makes cloud computing too complex or resource-intensive for many developers and data scientists. The Function-as-a-Service (FaaS) model (often called serverless computing, generically) allows users to run certain types of applications in a modern, scalable, and cost-effective way without the added complexity of maintaining their own infrastructure. The PHYSICS project (oPtimized Hybrid Space-Time ServIce Continuum in faaS) aims to unlock the potential of the FaaS paradigm for cloud service providers and application developers in a cloud-agnostic way. 

PHYSICS brought together a consortium of 14 international partners leveraging a €5 million Horizon Europe grant from the European Commission, including use case leaders in e-health, smart agriculture, and smart manufacturing. Engineers from Red Hat took a lead role in the infrastructure layer, adapting and enhancing tools in the Kubernetes ecosystem for scaling, energy awareness, and multicluster automation, including automatic cluster onboarding and configuration of PHYSICS components on top. 

This article provides a brief overview of the PHYSICS project and initial milestones before detailing our most recent work on the infrastructure layer of the project.

PHYSICS 101

PHYSICS facilitates the design, implementation, and deployment of advanced FaaS applications, using new functional flow programming tools that harness established design patterns and existing cloud/FaaS component libraries. One of the key outcomes of the project is a novel Global Continuum Layer (distinct from the infrastructure layer) to facilitate efficient function deployment across diverse clusters. 

The Global Continuum Layer is a set of PHYSICS components that optimize key application objectives at the same time, such as performance, latency, and cost. Use cases developed with industry partners include a smart manufacturing system to optimize production pipelines, healthcare software using machine learning (ML) models to monitor the health of individuals and analyze anonymized collected data, and a solution for near real-time greenhouse management that is responsive to dynamic conditions. 

In our midpoint progress report, we described how visual flow programming and ready-made patterns can enhance function development and explored how ready-made patterns can enhance abstract function development. We also presented the Load Generator Metrics tool, which was developed to evaluate the performance of the different functions so that other PHYSICS components (such as the global orchestrator or colocation engine) could make optimized decisions about function placement across and inside clusters. (For a more detailed view, see the research day recording.

Advanced PHYSICS

In the second half of the project, we identified opportunities for extensions and additions in the infrastructure layer. Our main working areas were threefold:

Multicluster support

Onboarding new clusters is an important challenge addressed by PHYSICS.  Even given Open Cluster Management (OCM), extra PHYSICS components need to be installed and configured when adding a new cluster. Beyond deploying the new services, you need to:

  • Connect them to the relevant PHYSICS components in the hub
  • Create a benchmarking load in the added cluster to gather base performance and energy consumption data

When onboarding clusters into OCM, central and remote cluster configuration is crucial. The first step before deploying functions on the cluster is properly configuring them and obtaining enough information (semantics) about them to make wise decisions concerning placement. This involves connecting PHYSICS components in the added clusters with other PHYSICS components and applications at the hub. 

The Open Cluster Management API and components serve as the foundation, adding various Kubernetes objects in the OCM ManifestWork such as pods, services, and ServiceExports (for Submariner support). This also includes specific Custom Resource Definitions (CRDs), for example, the Workflow CRD introduced by PHYSICS to abstract the information related to the functions so that they can be deployed on the relevant FaaS engine—in our case, either OpenWhisk or Knative.

Overview of cluster onboarding components and interactions

PHYSICS uses Knative capabilities for event-driven and serverless operations, which optimizes resource usage. When a remote cluster is added to the hub via OCM, a ManagedCluster object is created. The Knative APIServeSource then invokes the Knative Serverless Service upon receiving the event. The cluster onboarding pod, running as a Knative Serverless Service, processes the request, configures the edge cluster, and generates a benchmarking load. Specifically, the cluster onboarding pod:

  • Obtains the cluster name.
  • Creates an OCM ManifestWork, which includes the definition of the semantic deployment and its associated service. The Klusterlet agent in the remote cluster is in charge of creating the local resources in that cluster.
  • Waits until the deployment is ready and obtains its service IP by using the OCM feedbackRule.
  • Creates another OCM ManifestWork, which includes a Kubernetes Job that will generate some benchmarking load in the managed cluster. Again, the Klusterlet agent creates the Kubernetes Job in the remote cluster.
  • Waits until the job is completed using the OCM feedbackRule.
  • Calls the semantic service endpoint, leveraging Submariner to reach the semantic service IP. This provides information about the previous job executed to gather energy consumption and performance metrics to score the cluster.
  • Calls the reasoning framework to provide the information about the semantic service IP, so it can start requesting semantic information from the new cluster.

Extra configurations can be added easily as part of the cluster onboarding logic component or even created as extra Knative Serverless Services that react to the same events and perform other actions in parallel.

Kepler integration allowed us to estimate energy scores for new clusters, enhancing our understanding of energy consumption.

Energy awareness

Kepler offers accurate energy estimates and detailed reporting of power consumption. It harnesses an extended Berkeley Packet Filter (eBPF) approach to attribute power consumption to specific processes, containers, and Kubernetes pods, running custom code in the Linux kernel (or other operating system kernels) to obtain the metrics to fuel ML models that estimate energy consumption.

Kepler integration allowed us to estimate energy scores for new clusters, enhancing our understanding of energy consumption.

PHYSICS selected the Kepler project to acquire energy-related information crucial for its components, including the scheduler. We integrated Kepler by incorporating its metrics (via Prometheus, an open source monitoring toolkit and one of the earliest Cloud Native Computing Foundation projects) into cluster onboarding and the PHYSICS semantic component. This integration allowed us to estimate energy scores for new clusters, enhancing our understanding of energy consumption.

Within the scope of the PHYSICS project, we actively engaged in Kepler’s upstream development. Our collaboration involved:

  • Identifying and resolving critical issues hindering nested environment utilization, i.e., when operating on top of virtual machines in public cloud providers (AWS or Azure)
  • Making Kepler suitable for FaaS use cases by enabling a higher frequency sampling rate

In addition, a significant facet of our involvement was assessing the accuracy of Kepler’s ML model. We gauged the model’s performance by comparing estimated metrics for power usage per node obtained through Kepler with real metrics gathered on Grid5000, helping to ensure the reliability of energy consumption estimates.

Autoscaling workloads

KEDA is the API for autoscaling workloads in Kubernetes clusters. With KEDA, container scaling is based on the number of events to be processed, rather than CPU or memory thresholds. In addition, it is lightweight and fully integrated with Kubernetes through CRDs (as with every PHYSICS component). KEDA works alongside the standard Kubernetes scaling components, such as Horizontal Pod Autoscaling (HPA), providing a straightforward way of extending its functionality without overwriting or duplication. It also allows managing different types of workloads—such as deployments, jobs, and even custom resources—and scaling down to zero (a plus for energy savings). 

Mapping between PHYSICS elastic controller and KEDA components

We evaluated the suitability of the scalers already present in the upstream catalog and identified a few options that we implemented and enhanced for PHYSICS/FaaS purposes. Two in particular were suitable for optimizing function wait time and function performance and providing the capability of scaling the nodes of the cluster if needed.

PHYSICS contributions

PHYSICS earned praise for its technical achievements during the conclusive project review and made substantial contributions to upstream open source communities like Kepler, Submariner, Knative, NodeRed, and more. Industrial use cases in e-health, smart agriculture, and smart manufacturing have been demonstrated in multiple publications. Those interested in learning more can explore further details on the PHYSICS website and access reusable artifacts at the PHYSICS marketplace. Source code for many components is also available on the public PHYSICS GitHub repository.


The PHYSICS project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101017047.

SHARE THIS ARTICLE

More like this