Red Hat Research Quarterly

Concurrent, scalable, and distributed astronomy processing in the AC3 framework

Red Hat Research Quarterly

Concurrent, scalable, and distributed astronomy processing in the AC3 framework

about the author

Ben Capper

Ben Capper is a Software Engineer at Red Hat. He is currently working on the AC3 and GREEN.DAT.AI EU Horizon research projects with a focus on green energy, AI, and cloud-edge computing.

Article featured in

Astronomers at the Complutense University of Madrid collaborated with Red Hat engineers to streamline the data analysis process when working with massive datasets. 

The AC3 (Agile and Cognitive Cloud-edge Continuum management) project is an EU Horizon-funded research project focused on developing an intelligent system for managing applications in distributed computing environments. The project’s primary goal is to create a Cloud-Edge Continuum Computing Manager (CECCM), responsible for handling the full lifecycle management (LCM) of microservice-based applications deployed across a federated infrastructure spanning the cloud, edge, and far edge.

The core innovation relies on three pillars:

  • Smart forecasting (AI/ML) automatically predicts resource requirements and optimizes system deployment. This ensures reliable performance while significantly reducing energy waste, a concept known as green management.
  • The ontology and semantic aware reasoner (OSR) serves as the user interaction point with the CECCM. It allows users to define application requirements, configuration, and service level agreements (SLAs) agnostically. This enables compatibility with any type of underlying infrastructure.
  • Fully automated, hands-off system management (zero-touch) delivers a high degree of operational efficiency while significantly reducing the specialized expertise required to manage infrastructure. This automation guarantees peak performance (low latency, high throughput) and robust security across diverse deployment environments.

The AC3 project is being validated across multiple domains, including IoT and data, smart city management, and astronomy data analysis (Use Case 1, UC2, and UC3, respectively). This article details the multidisciplinary work on UC3 carried out by Red Hat Emerging Technology engineers alongside astrophysicists from the Complutense University of Madrid (UCM) to enable astronomers to carry out large-scale processing of the massive, distributed datasets collected from astronomical observations. This processing is key to astronomic research aimed at analyzing and understanding the stellar properties of galaxies and the underlying processes driving galaxy formation. 

Validation through the dedicated testing harness confirms the system surpasses a Key Performance Indicator (KPI) of 50% reduction in processing time, as illustrated in Figure 1. This measure demonstrates the framework’s ability to deliver a scalable, automated, and accessible solution for a critical scientific domain.

Figure 1. Comparison of dataset processing time in seconds, varying the number of consumer pods

Architecture

Modern telescopes generate massive datasets, and the UC3 AstroApp’s architecture, shown in Figure 2, ensures astronomers can process this data quickly and reliably, with an aim of a 50% reduction in processing time. The AstroApp system supports multiple advanced data analysis applications such as Starlight and pPXF, which are crucial to astronomical research. These are highly specialized programs that break down the light captured from celestial objects (known as spectra) to determine their age, velocity, and composition. The system also enables Voronoi binning, a technique that groups neighboring pixels until each bin reaches a high enough quality for the main analysis tools to work reliably.

Figure 2. Simplified UC3 application architecture

The app employs a producer-consumer architecture, where a central producer orchestrates incoming data and delegates tasks to multiple consumer pods for parallel processing. Astronomers upload raw spectra files (data captured from telescopes) through an intuitive graphical user interface (GUI). The producer organizes these files into batches and pushes them into a RabbitMQ queue. Consumers running spectral analysis software retrieve tasks from the queue, process the data independently, and return results to the producer for storage in S3. By decoupling data ingestion and task orchestration from processing, this design ensures the system maintains high operational efficiency and throughput under varying workloads.

Scalability is a core focus of the AstroApp and has been achieved through an application architecture where scalability is fundamentally built in. This is due to the design of the consumer pods, which consume and process any type of job (e.g., Starlight, pPXF) from the RabbitMQ queue. This agnostic design ensures that scaling the consumer component adds linear processing capability, regardless of the specific job type requesting resources.

Producer workflow

The AstroApp’s producer runs as a singleton pod, serving as the central coordinator for data processing. It provides a REST API enabling astronomers to upload datasets, trigger processing, and download results directly from the GUI. The producer retrieves raw spectra files from S3 buckets and copies them to a shared volume accessible by consumer pods. Based on user-defined settings and processing tools, it batches files or sends them individually to a RabbitMQ queue for processing by consumers. These consumers run tools like Starlight and pPXF and support both the binary and text file types required by these tools. Processed results are returned via another RabbitMQ queue, uploaded to S3, and key metrics including processing duration, job size, and queue length are logged in Redis for extraction and analysis.

Consumer workflow

Processors are scalable consumer pods that run data analysis applications for astronomical datasets. Each processor includes a receiver container that pulls tasks from a RabbitMQ queue and writes input files to a shared volume accessible by the pod’s analysis containers. The receiver also updates a process list, unique to each pod, that specifies which files to analyze. Containers running Starlight and pPXF then read this list and process the data. A sidecar container constantly monitors the shared volume for output files, then returns results back to the producer through a separate RabbitMQ queue, ensuring efficient and decoupled data processing.

Intelligent scaling

The ability to scale the processing capacity has a direct and crucial correlation with overall processing performance, making intelligent scaling an essential feature of the AstroApp. While the application architecture provides the foundation for linear improvements in processing capability, the scaling mechanism is designed to execute this functionality in a proactive and intelligent manner.

The system uses a Horizontal Pod Autoscaler (HPA) to dynamically adjust the number of consumer pods based on workload demands to alleviate computational bottlenecks. An AI model drives these scaling decisions by predicting resource needs using metrics collected from the application, such as job size, processing duration, and queue position. Trained by IBM on historical job data, the model ensures that pods are scaled as processing time is predicted to rise, thereby increasing throughput linearly and enabling the system to handle large or unpredictable datasets efficiently. 

To manage intelligent scaling, the LCM deploys the Kubernetes HPA and autoscaling component along with a Prometheus Adapter and a Custom Metrics API within the cluster. The infrastructure in Figure 3 allows the HPA to consume custom metrics, specifically the prediction of average processing time, provided by the AI model based on current conditions, informed by Prometheus. This integration ensures that the HPA scales the consumer pods based on proactively predicted resource needs.

Figure 3. UC3 scaling architecture

Testing and metrics collection

To train the AI model for scaling predictions, the AstroApp includes a testing harness that generates realistic workload data. This is done by running real datasets through
the system while varying certain data features like the number of consumers and job size. These variations ensure the model accurately handles diverse workload scenarios. The harness accepts YAML manifests to define test parameters, such as the number of datasets, consumer pods, and dataset submission intervals (e.g., 10 consumers and 20 datasets, with a batch trigger rate of 45 seconds). After each test run, the system saves job metrics like processing time and file size and exports them for model training. 

These metrics are also exposed to the broader AC3 framework through Prometheus, which scrapes data from the application at regular intervals. The collected metrics feed into the deployed AI model, enabling it to predict resource needs and inform the HPA for efficient scaling.

User interface

Dataset management

The AC3 AstroApp’s astronomy-themed GUI serves as a control panel as shown in Figure 4, enabling astronomers to manage and process telescope datasets efficiently. The GUI organizes data management for each processor into three integrated panels, streamlining workflows for users. The File Upload panel simplifies dataset ingestion by allowing astronomers to upload raw spectra files to an S3 bucket using drag-and-drop or file search functionality. As files are uploaded, clear indicators display the status of each file, ensuring users can track the process readily. This feature makes it easy to handle large datasets from telescope observations.

Figure 4. GUI data management console

The Dataset Management panel provides a comprehensive view of datasets, dynamically listing input and output files stored in S3 when a dataset is selected. Users can interact with each file through a range of options, including deleting files, triggering processing for individual files or batches, and downloading datasets as needed. This panel empowers astronomers to organize and control their data with flexibility. The Pipeline Progress panel offers a clear overview of workload status, displaying a progress bar for each dataset. This allows astronomers to monitor ongoing processing tasks at a glance, ensuring they stay informed about their workflow without needing to dive into technical details.

Visualizing results

The GUI also includes a Maps page, powered by Aladin Sky Atlas integration, enabling astronomers to visualize recently analyzed datasets in their celestial context. Users can search by an object code, which also serves as the dataset name (e.g., NGC7025). This navigates to the corresponding object on the Aladin atlas. When viewing the object, astronomers can select visualization options like stellar velocity and velocity dispersion from a sidebar to load maps from dataset analysis. 

A combination of sidebar selections and atlas position populates a gallery with thumbnails of maps. These thumbnails can be opened in a custom modal, allowing users to adjust transparency for map overlays on the atlas for detailed exploration, as shown in Figure 5. If the user navigates the atlas away from the selected object, the gallery automatically clears, maintaining a focused workflow. This Maps page enhances the GUI’s control panel, linking processed datasets to their celestial origins.

Figure 5. GUI maps visualization with Aladin

AC3 framework integration

Ontology & semantic aware reasoner (OSR)

Integration with the AC3 framework begins with the OSR. This component generates LCM-agnostic application descriptors based on developers inputting their application details via the OSR web form. This form specifies microservices (Kubernetes pods), environment variables, data sources, SLAs (e.g., job-duration targets), and networking requirements (e.g., interpod communication protocols) in a plain language manner. These descriptors define the applications deployment configuration agnostically, allowing multiple LCMs to interpret and translate them for deployment across diverse cluster environments.

Maestro lifecycle manager

Ubitech’s Maestro LCM translates OSR-generated descriptors into Kubernetes deployments, scheduling them across clusters managed via Advanced Cluster Management (ACM). This ensures efficient deployment of the applications’ microservices, such as consumer pods running Starlight or pPXF, by dynamically allocating resources based on descriptor-defined parameters. 

Maestro processes these descriptors to configure pod specifications, optimize cluster resource utilization, and coordinate deployment across multiple clusters. It interfaces with the AC3 Network Operator developed by Red Hat in the AC3 Network Programmability task to establish secure, namespace-specific network links using Skupper. This enables scalable pods to communicate seamlessly with the producer and RabbitMQ queues while maintaining data integrity for astronomical spectra analysis.

The scheduler leveraged for this complex multicluster environment is a component developed by Red Hat from the P2CODE project, also an EU Horizon initiative. This scheduler simplifies the deployment of applications distributed across cloud or edge environments by allowing developers to provide generalized descriptions of the application’s runtime requirements and intelligently bundles component dependencies such as persistent volume claims (PVCs), secrets, and configmaps. This capability is critical for managing a scalable system like the UC3 AstroApp, as it avoids developers being subject to the intricacies of the underlying infrastructure (see Figure 6). 

Figure 6. Simplified UC3 architecture

This integration between Maestro and the OSR enables astronomers to perform CRUD operations (create, read, update, delete) on the application deployments via the OSR web application. By offering zero-touch deployment and management, the AC3 framework supports flexible, efficient, and reliable processing and visualization of astronomical datasets, effectively offloading infrastructure management so that astronomers can focus on analysis results.

Future development

Architectural extendibility

Extendibility is a core development goal for the UC3 application. The modular, containerized producer-consumer architecture is designed to support the integration of additional astronomy analysis tools, such as STECKMAP, with minimal architectural refactoring. This approach ensures that as new standards or software emerge in astronomical data processing, the system can rapidly evolve. This drastically lowers the barrier to entry for developers and research groups looking to integrate their specialized tools, ensuring the platform remains adaptable and accessible.

Community and open source collaboration

While the AC3 UC3 was a multi-disciplinary collaboration between Red Hat and the UCM, there are plans to cultivate a robust open source community around the application developed in UC3. This community could serve as a collaborative space where astronomers, researchers, and software engineers can exchange knowledge, contribute analysis tools, and propose improvements. By fostering open development, the project aims to drastically streamline and improve the astronomical data analysis process for a wider international scientific audience.

Key takeaways

The UC3 AstroApp successfully validates the core innovations of the AC3 framework, showcasing its potential to transform scientific data processing.

By fostering open development, the project aims to streamline astronomical data analysis for a wider international scientific audience. 

By combining a robust producer-consumer architecture with the intelligent, semantic-aware orchestration provided by the OSR and Maestro, the system delivers a zero-touch deployment experience for astronomers.  By doing so, the AstroApp not only enables massive data scaling but also significantly increases accessibility for domain scientists, who often lack expertise in infrastructure and application management.

Using trained AI models to proactively predict resource and workload demands to drive autoscaling, coupled with an inherently scalable architectural application design, has surpassed the expected processing-time reduction. This allows the system to handle unpredictable, massive datasets without bottlenecks. The ability to adapt proactively directly addresses the scalability crisis facing modern astronomy. 

Looking ahead, ensuring architectural extendibility and the fostering of a collaborative open source community can position the UC3 AstroApp to serve as an adaptable, high-performance system. Ultimately, the successful deployment of this sophisticated, AI-driven application demonstrates the AC3 framework’s viability in delivering reliable, efficient, and scalable solutions across the entire cloud-edge continuum.

Acknowledgements

Red Hat engineers involved in this project are Ben Capper, Ryan Jenkins, Kateryna Romashko, and Ray Carroll. Direct UC Collaborators include Mario Chamorro-Cazorla and Cristina Catalán-Torrecilla from the Complutense University of Madrid (Astronomy Workflow and Data), Ubitech (Maestro LCM), and IBM (Autoscaling Model Training).

AC3 is funded by the European Union under Grant Agreement No. 101093129. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Research Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

SHARE THIS ARTICLE

More like this