Red Hat Research Quarterly

Smarter AI, fewer resources: bringing cloud AI into real-time edge devices to unlock performance

Red Hat Research Quarterly

Smarter AI, fewer resources: bringing cloud AI into real-time edge devices to unlock performance

about the author

Eshed Ohn-Bar

Dr. Eshed Ohn-Bar, an Assistant Professor in the Electrical and Computer Engineering Department at Boston University, is passionate about building robust, efficient, and safe AI at scale.

Article featured in

A new AI framework for edge systems overcomes the communication and energy obstacles that limit their use in real-time applications by integrating local and cloud decision-making while maintaining strong performance.

Artificial intelligence (AI) models with vast and generalized knowledge are increasingly being integrated into everyday devices, from smartphones that provide personalized assistance to mobile robots and vehicles that continuously monitor and interact with their surroundings. Yet these powerful AI models are currently constrained by the limited resources of these edge devices. 

Running a large and accurate AI model on a smartphone or a mobile robot can quickly drain its battery within minutes and require significant energy and hardware resources. As these models continue to grow in size and computational demands (e.g., requiring expensive GPUs), deploying them across millions of everyday devices becomes increasingly difficult, expensive, and environmentally unsustainable. As part of the collaborative project Minimal Mobile Systems via Cloud-based Adaptive Task Processing, researchers at Red Hat and Boston University developed a new framework that optimizes computation to enable more efficient real-time AI applications without sacrificing model accuracy. 

Motivation

Traditionally, AI computations are offloaded to remote servers. This can save on-device resources, as local image and text data are sent to models in the cloud. Smart assistants often use this approach to offload as much computation as possible to the cloud, helping to preserve energy and local device resources. While this method is widely used today in systems like ChatGPT, relying on the cloud can introduce delays, making it unsuitable for real-time or safety-critical applications. For a robot, even a brief delay can be dangerous—for example, causing a mobile system to collide with a nearby pedestrian. As a result, latency-constrained edge systems often depend on expensive local hardware and resources to ensure quick responses. Can we design edge systems that seamlessly balance cloud and local resources to optimize for real-time accuracy, efficiency, and safety across different situations?

To address urgent societal and sustainability needs with existing systems and models, engineers today may leverage various ad hoc strategies. Developers may try to use lightweight and compressed models, but these smaller models will suffer from degraded accuracy and result in unreliable performance, such as, again, failing to detect that nearby pedestrian. Models can also be carefully tuned for specific devices and scenarios but struggle when faced with diverse operational tasks that may need more computational power. One promising alternative is systems that automatically adapt on the fly, adjusting when, where, and how computations are performed as needed.   

In work presented at the European Conference on Computer Vision 2024, researchers from Red Hat and Boston University collaborated to develop a novel framework that dynamically learns to balance shared computation across various devices and operational settings. The proposed system, UniLCD (Unified Local-Cloud Decision-Making), introduces a new approach based on a field in machine learning called reinforcement learning (RL), where the system learns by trial and error, receiving rewards or penalties based on its actions. This method trains a flexible model to decide, based on the current scenario and task, whether to offload computation to the cloud or process it locally.

Our method—UniLCD

UniLCD is a dynamic approach that empowers resource-constrained devices—such as smartphones, autonomous vehicles, and mobile robots—with the ability to leverage both local processing power and cloud resources. 

At its core, UniLCD comprises a context-dependent routing module, which takes as input an embedding, that is, a compressed representation of the current state and the history of past system decisions. This routing module is trained using RL to determine a decision policy, such as whether to implement a local action based on a lightweight but less accurate model or choose to transmit local information to the cloud server model, which is larger and more accurate but also induces latency. While this approach can be applied to any real-time AI application and edge device, Figure 1 illustrates an example system for a camera-based mobile robot navigation task.

Figure 1. Overview of UniLCD for a robot navigation task. The framework learns to offload tasks to the cloud while maintaining real-time performance.

The primary goal of our system is to learn when to offload computations to the cloud while meeting safety and real-time requirements. As shown in Figure 1, the local decision-making model (also referred to as the local policy) consists of a truncated neural network designed to rapidly process image and goal observations. The extracted features, or embedding, are then combined with a memory buffer that stores a history of past observations, providing additional context for the system. This historical data enables the system to observe latency dynamics and adapt to various constraints, such as limited communication settings. The memory is passed to a multi-layer perceptron (MLP) routing module, which determines whether to offload the current embedding to the cloud for further processing with a subsequent neural network or to classify a navigation action—such as steering, braking, or accelerating—locally. The complete algorithm for training the routing policy is shown in Figure 2

Figure 2. Training a generalized routing policy with reinforcement learning. The algorithm continuously updates a minimal local neural network that classifies between local and cloud operations.

As shown in the algorithm, UniLCD learns by receiving a reinforcement signal, or reward, based on the outcomes of its decisions. For example, a mobile system should learn to strategically interleave cloud computation, particularly when encountering challenging scenarios, to improve the accuracy of the lightweight, lower-accuracy local model. In the case of our navigation task, if the mobile robot successfully moves closer to the goal, reduces energy consumption, or selects effective action ranges and speeds, it gets a positive reward. If it is close to collision with an object, which is undesirable, it receives a negative reward, where the complete reward in each time step is computed as:

Here, alpha is a scaling factor that adjusts the overall reward to fall within the range [0, 1]. This reward ensures that the resulting policy optimizes both task performance as well as energy and communication constraints. In general, designing a multi-objective reward function can be complex, even for relatively simple tasks (e.g., robot navigation without dynamic objects, as often explored in prior work). RL typically requires extensive iteration in training. One key finding is in how the reward function impacts training efficiency and convergence significantly. By multiplying the different reward terms, the need for extensive tuning of individual components is reduced—if one term is low, it diminishes the overall reward, encouraging an effective policy to emerge within just a few minutes of operation. Once this initial training is complete, the policy can be deployed without additional training, though the model can be updated over incoming observations continually (e.g., for further efficiency gains) or automatically adapt to novel scenarios, platforms, and communication modes. 

Results 

To rigorously validate the system, a simulation environment was developed for sidewalk robot navigation in crowded outdoor settings. This environment captures complex scenarios that require frequent switching and high responsiveness, thus showcasing UniLCD’s robust capabilities in handling challenging, dynamic tasks that demand seamless cloud-edge integration. To realistically model real-world constraints, the simulation also introduces stochastic delays in data transmission between the local device and the cloud server, effectively capturing the impact of latency.

 In the most difficult and dense settings, UniLCD showed an improvement of over 35% compared to all prior baselines in an introduced Ecological Navigation Score, a metric that combines task performance (e.g., collisions, route completion, overall task time) with overall energy costs. In these intricate settings, baselines relying on naive model splitting or pruning resulted in poor navigation and frequent collisions as their design does not holistically consider environmental, communication, and safety contexts. The strong performance persisted across environmental conditions and different models, including very small local models for resource-limited use cases. This remarkable generalizability marks a significant step toward broad, ultra-low-cost deployments, which are currently being explored in follow-up research. Real-time, cloud-integrated systems with lightweight local models and minimal hardware requirements—such as smartphones—could be deployed in broader and more diverse settings, delivering high-performance operation with minimal degradation. 

Future applications

UniLCD has the potential to reshape the future of edge computing by seamlessly integrating local and cloud-based decision-making. This novel framework is currently being integrated into Red Hat OpenShift, providing a flexible solution for enabling large-scale, real-world deployments across various communication and modeling configurations. While challenges remain, including accelerating RL model training to solve for an optimal local-cloud policy within just a handful of interactions, there are several exciting future opportunities. Given the generalized nature of the routing mechanism, a potential approach to speeding up training further could be collaborative training over data from different platforms and tasks. 

UniLCD has the potential to reshape the future of edge computing by seamlessly integrating local and cloud-based decision-making.

By significantly reducing the energy consumption and cost of powerful AI models, UniLCD could unlock transformative possibilities to address societal needs across a range of domains, including transportation, healthcare, and disaster response, where real-time and efficient processing is essential. For example, autonomous vehicles could offload tasks to cloud models to conserve energy and enhance safety. Lower-cost assistive robots could operate with precision and energy efficiency in various home environments, minimizing failures associated with low-accuracy edge models or delays from waiting for cloud-based predictions. In disaster zones, robots could manage resources efficiently, adapting to different communication infrastructures and operating for extended periods without sacrificing accuracy during the most crucial moments. Handheld smartphones could provide continual and reliable support when assisting users without rapidly depleting battery life. As researchers continue to push the boundaries of what’s possible, UniLCD brings us one step closer to a future where smarter, faster, and more sustainable AI systems are seamlessly integrated into our daily lives.

SHARE THIS ARTICLE

More like this