Red Hat Research Quarterly

Moving ecological forecasting from supercomputer to cloud: why and how

Red Hat Research Quarterly

Moving ecological forecasting from supercomputer to cloud: why and how

about the author

Christopher Tate

Christopher Tate is a principal software engineer for logging, metrics, alerts, AI/ ML, and data-driven research projects in the New England Research Cloud (NERC) environment. 

Article featured in

New event-driven architecture enabled researchers to move the PEcAn platform to the New England Research Cloud and increase scalability.

Near-term ecological forecasting can help communities make better decisions and prepare for extreme weather events and changes in the environment. Use cases include forecasts of infectious disease outbreaks, increases or declines in animal populations, or the impact of environmental events on agriculture, forestry, or other industries.

See “Prototyping a distributed, asynchronous workflow for iterative near-term ecological forecasting” on the Red Hat Research website.

A large and active science community has formed around ecological forecasting, and scientists in several different countries are interested in experimenting with the ability to make forecasts based on large sources of local environmental data. To make these capabilities available to as many public researchers as possible, the forecasting tools must provide open, accessible, reusable, and scalable community cyberinfrastructure that can make large numbers of ecological forecasts on a repeatable, frequent basis. A project underway at the Red Hat Collaboratory at Boston University is addressing that challenge by developing a cloud-native workflow that provides asynchronous, event-driven, and distributed computing and resource management for these large-scale science projects.

Project background

As a software developer at Red Hat Research, I work with students and professors to advance their research goals with open source technology. In this project, I’m working with the ecological forecasting team at Boston University, including Professor Michael Dietze,  PhD students including Dongchen Zhang, and a team from the BU Software and Application Innovation (SAIL) lab, including Associate Director of Programs and Product Management Jeff Simeon, Associate Director of Engineering Greg Frasco, and Software Engineer Shashank Karthikeyan. The team encountered challenges collaborating on an open source software platform running on a shared high-performance supercomputer at Boston University. Switching to a more event-driven cloud architecture enables autoscaling of forecasting jobs on multiple nodes concurrently.

Meet the ecological forecasting team

We began the project in January 2023 with a Research Incubation Award from the Red Hat Collaboratory. We reviewed potential event-driven services available to us in a containerized cloud environment and developed an architecture that would work to scale continually updated predictions about the future state of ecosystems over days or years. Scaling near-term ecological forecasting in this way enables the development of services that allow communities to anticipate environmental challenges and improve decisions on actionable timescales. We believe it will also allow researchers to accelerate scientific discovery and answer some fundamental research questions about the predictability of nature.

We developed an architecture that works inside an open source cloud environment like Red Hat OpenShift. For data analysis, we used an established open source project called the Predictive Ecosystem Analyzer (PEcAn). The PEcAN community has already developed containerized data science models, database enhancements, and tools to display, monitor, and execute ecological forecasting models. What we needed on top of that was a cloud environment, a message broker supporting the Advanced Messaging Queuing Protocol (AMQP) to receive messages to run forecasts, and a scalable PostgreSQL database with built-in geolocation features to store ecological forecasting data. 

Our ultimate goal was to deploy our solution on the New England Research Cloud (NERC), but until the production cluster was ready for use, I worked on the solution on my own computer using OpenShift Local. By trying out the existing PEcAn Helm charts, we developed a reusable way to deploy the PEcAn project easily into OpenShift. Together with the SAIL team, we developed additional open source infrastructure-as-code for the project that was reusable for developing and deploying our project on our own computers, as well as in future cloud environments. 

One early challenge was our discovery that parts of the PEcAn Helm charts, which in the past have been deployed only on Kubernetes, were violating security constraints built into a Red Hat OpenShift Container Platform by default. I contributed some updates to multiple PEcAn Helm charts so that service accounts could correctly deploy the PEcAn containers. The PEcAn open source community accepted those updates to their repositories, and we then had a working cloud solution on OpenShift Local. 

Onboarding to NERC

Three months into our project, NERC opened up its new OpenShift Container Environment for research projects. We were quickly able to deploy all the same components working in OpenShift Local to NERC and show the platform running forecasts in the production-ready cloud environment, which was a very exciting moment. 

PEcAn platform on NERC OpenShift

Students at BU run ecological forecasting code written in the R language, so to replicate that environment we next needed to enable RStudio in NERC, which was made possible by OpenShift AI. Our team carefully prepared two pull requests that would enable ecological forecasting in RStudio in OpenShift AI on the NERC: 1) a new PEcAn Unconstrained Forecast container image based on an RStudio Jupyter Notebook container image that loads additional R dependencies and compiles the PEcAn source code and 2) OpenShift image streams for RStudio, as well as the PEcAn Unconstrained Forecast, that had the the right namespace, labels, and annotations needed for OpenShift AI Workbenches running on NERC. 

OpenShift and OpenShift AI components in NERC PEcAn Implementation

The NERC Team released OpenShift AI into the production OpenShift Cluster in August 2023 and later merged our project’s image streams into the computing environment in October 2023. From that point on, the science team was remarkably more comfortable working in the NERC environment. Our team no longer needed the skills of an OpenShift Admin to work in R Studio and run ecological forecasting. Being able to run the same workbench at the same time enabled all three teams—Red Hat, BU, and SAIL—to work together on the project. I was amazed at the productivity boost of our team and results from that point on. Having a friendly user interface in the cloud made a big difference. 

Event-driven ecological forecasting and scaling

With OpenShift AI available in NERC and the new R Studio image we built, we were able to develop the new event-driven workflow in the PEcAn code and test the workflow all in the cloud. We requested that the Red Hat Custom Metrics Autoscaler OpenShift Operator be deployed to NERC to allow forecasting model pods to scale, according to the number of AMQP messages sent at one time. This works very well for running multiple models at the same time, and it’s event driven. However, PEcAn model pods, which were originally developed in a different HPC enviroment, required a shared filesystem. Red Hat Engineers and Boston University students worked together on a solution to send necessary files for each job to the right container in a cloud-based way. 

Professor Dietze introduced us to his new branch for HF Landscape Unconstrained Forecasts, which has not been merged into the main branch yet. I created a branch off Dietze’s branch that replaced hard-coded paths on the high-performance supercomputer with cloud-friendly environment variables. Dongchen, who is very experienced in high-performance R applications and PEcAn, has been working on more improvements. In Dongchen’s branch, he has been smoothing out the bug fixes and improvements for integrating ERA5 (European centre for medium-range weather forecasts Reanalysis, 5th generation) environmental reanalysis data and observation prep functions, as well as adding features for message-driven RabbitMQ job forecasts. It’s worth noting that this kind of long-term collaboration would not be possible without open source software. 

It’s worth noting that this kind of long-term collaboration would not be possible without open source software.

There are a lot of important technical changes built into our branches. We built the HF Landscape branch of PEcAn into an R-Studio Jupyter Notebook Image called the PEcAn Unconstrained Forecast image and deployed the PEcAn Unconstrained Forecast image to Red Hat OpenShift AI in NERC. We ran the newly updated download R script for Harvard Forest Meteorological data to download over one year’s worth of data to our workbench persistent volumes. We updated the SDA Workflow for North America R scripts to process the HARV MET data, and Dongchen updated the SDA Runner script to run in the cloud. 

Finally, we had to work through the challenge of PEcAn being monolithic software meant to run on one giant computer with tons of storage. Since data science projects involve large nested directories of file data, we developed an rsync strategy between containers running in OpenShift to rsync files to a model pod as part of our event-driven strategy. From an OpenShift AI workbench, we can send a message to a forecasting model pod. This also triggers an rsync operation that copies all the relevant files from the workbench pod to the model pod. The model pod receives the files and the message and runs the ecological forecasting model on the data. The model pod then sends the files and additional data back to the workbench pod that triggered the message. The pod rsync operation is surprisingly fast and effective for this transfer. This may not be the long-term solution for event-driven ecological forecasting, but it has worked very well for us given our time constraints and limitations to upstream adoption of our research this year. 

Future milestones

The next step for our team will be developing and deploying an asynchronous, event-driven scheduler that will elastically launch data ingest containers. This will enable scaling our prototype to multiple sites, more data constraints, and a collection of models. Future applications could extend this system to additional forecast workflows, such as water resources, biodiversity, zoonotic disease, or invasive species. 

To learn more about the project, join us at github.com/PecanProject or pecanproject.slack.com, or contact Professor Dietze.

This project has been the fruit of successful collaboration among ecological forecasting experts at BU, faculty who provide high-level support, and students with technical domain expertise, in addition to industry know-how from software developers like myself. This combination, plus the computing resources made available to researchers through NERC, allowed us to tackle the engineering challenges of running the PEcAn platform in the cloud.

SHARE THIS ARTICLE

More like this