Moving ecological forecasting from supercomputer to cloud: why and how

New event-driven architecture enabled researchers to move the PEcAn platform to the New England Research Cloud and increase scalability.

Near-term ecological forecasting can help communities make better decisions and prepare for extreme weather events and changes in the environment. Use cases include forecasts of infectious disease outbreaks, increases or declines in animal populations, or the impact of environmental events on agriculture, forestry, or other industries.

See “Prototyping a distributed, asynchronous workflow for iterative near-term ecological forecasting” on the Red Hat Research website.

A large and active science community has formed around ecological forecasting, and scientists in several different countries are interested in experimenting with the ability to make forecasts based on large sources of local environmental data. To make these capabilities available to as many public researchers as possible, the forecasting tools must provide open, accessible, reusable, and scalable community cyberinfrastructure that can make large numbers of ecological forecasts on a repeatable, frequent basis. A project underway at the Red Hat Collaboratory at Boston University is addressing that challenge by developing a cloud-native workflow that provides asynchronous, event-driven, and distributed computing and resource management for these large-scale science projects.

Project background

As a software developer at Red Hat Research, I work with students and professors to advance their research goals with open source technology. In this project, I’m working with the ecological forecasting team at Boston University, including Professor Michael Dietze, PhD students including Dongchen Zhang, and a team from the BU Software and Application Innovation (SAIL) lab, including Associate Director of Programs and Product Management Jeff Simeon, Associate Director of Engineering Greg Frasco, and Software Engineer Shashank Karthikeyan. The team encountered challenges collaborating on an open source software platform running on a shared high-performance supercomputer at Boston University. Switching to a more event-driven cloud architecture enables autoscaling of forecasting jobs on multiple nodes concurrently.

Meet the ecological forecasting team

Jeff Simeon
BU SAIL
Greg Frasco
BU SAIL
Shashank Karthikeyan
BU SAIL
Dongchen Zhang
BU PHD Student
Michael Dietze
BU Professor
Christopher Tate
Red Hat

We began the project in January 2023 with a Research Incubation Award from the Red Hat Collaboratory. We reviewed potential event-driven services available to us in a containerized cloud environment and developed an architecture that would work to scale continually updated predictions about the future state of ecosystems over days or years. Scaling near-term ecological forecasting in this way enables the development of services that allow communities to anticipate environmental challenges and improve decisions on actionable timescales. We believe it will also allow researchers to accelerate scientific discovery and answer some fundamental research questions about the predictability of nature.

We developed an architecture that works inside an open source cloud environment like Red Hat OpenShift. For data analysis, we used an established open source project called the Predictive Ecosystem Analyzer (PEcAn). The PEcAN community has already developed containerized data science models, database enhancements, and tools to display, monitor, and execute ecological forecasting models. What we needed on top of that was a cloud environment, a message broker supporting the Advanced Messaging Queuing Protocol (AMQP) to receive messages to run forecasts, and a scalable PostgreSQL database with built-in geolocation features to store ecological forecasting data.

Our ultimate goal was to deploy our solution on the New England Research Cloud (NERC), but until the production cluster was ready for use, I worked on the solution on my own computer using OpenShift Local. By trying out the existing PEcAn Helm charts, we developed a reusable way to deploy the PEcAn project easily into OpenShift. Together with the SAIL team, we developed additional open source infrastructure-as-code for the project that was reusable for developing and deploying our project on our own computers, as well as in future cloud environments.

One early challenge was our discovery that parts of the PEcAn Helm charts, which in the past have been deployed only on Kubernetes, were violating security constraints built into a Red Hat OpenShift Container Platform by default. I contributed some updates to multiple PEcAn Helm charts so that service accounts could correctly deploy the PEcAn containers. The PEcAn open source community accepted those updates to their repositories, and we then had a working cloud solution on OpenShift Local.

Onboarding to NERC

Three months into our project, NERC opened up its new OpenShift Container Environment for research projects. We were quickly able to deploy all the same components working in OpenShift Local to NERC and show the platform running forecasts in the production-ready cloud environment, which was a very exciting moment.

Students at BU run ecological forecasting code written in the R language, so to replicate that environment we next needed to enable RStudio in NERC, which was made possible by OpenShift AI. Our team carefully prepared two pull requests that would enable ecological forecasting in RStudio in OpenShift AI on the NERC: 1) a new PEcAn Unconstrained Forecast container image based on an RStudio Jupyter Notebook container image that loads additional R dependencies and compiles the PEcAn source code and 2) OpenShift image streams for RStudio, as well as the PEcAn Unconstrained Forecast, that had the the right namespace, labels, and annotations needed for OpenShift AI Workbenches running on NERC.

*^{OpenShift and OpenShift AI components in NERC PEcAn Implementation}*

The NERC Team released OpenShift AI into the production OpenShift Cluster in August 2023 and later merged our project’s image streams into the computing environment in October 2023. From that point on, the science team was remarkably more comfortable working in the NERC environment. Our team no longer needed the skills of an OpenShift Admin to work in R Studio and run ecological forecasting. Being able to run the same workbench at the same time enabled all three teams—Red Hat, BU, and SAIL—to work together on the project. I was amazed at the productivity boost of our team and results from that point on. Having a friendly user interface in the cloud made a big difference.

Event-driven ecological forecasting and scaling

With OpenShift AI available in NERC and the new R Studio image we built, we were able to develop the new event-driven workflow in the PEcAn code and test the workflow all in the cloud. We requested that the Red Hat Custom Metrics Autoscaler OpenShift Operator be deployed to NERC to allow forecasting model pods to scale, according to the number of AMQP messages sent at one time. This works very well for running multiple models at the same time, and it’s event driven. However, PEcAn model pods, which were originally developed in a different HPC enviroment, required a shared filesystem. Red Hat Engineers and Boston University students worked together on a solution to send necessary files for each job to the right container in a cloud-based way.

Professor Dietze introduced us to his new branch for HF Landscape Unconstrained Forecasts, which has not been merged into the main branch yet. I created a branch off Dietze’s branch that replaced hard-coded paths on the high-performance supercomputer with cloud-friendly environment variables. Dongchen, who is very experienced in high-performance R applications and PEcAn, has been working on more improvements. In Dongchen’s branch, he has been smoothing out the bug fixes and improvements for integrating ERA5 (European centre for medium-range weather forecasts Reanalysis, 5th generation) environmental reanalysis data and observation prep functions, as well as adding features for message-driven RabbitMQ job forecasts. It’s worth noting that this kind of long-term collaboration would not be possible without open source software.

It’s worth noting that this kind of long-term collaboration would not be possible without open source software.

There are a lot of important technical changes built into our branches. We built the HF Landscape branch of PEcAn into an R-Studio Jupyter Notebook Image called the PEcAn Unconstrained Forecast image and deployed the PEcAn Unconstrained Forecast image to Red Hat OpenShift AI in NERC. We ran the newly updated download R script for Harvard Forest Meteorological data to download over one year’s worth of data to our workbench persistent volumes. We updated the SDA Workflow for North America R scripts to process the HARV MET data, and Dongchen updated the SDA Runner script to run in the cloud.

Finally, we had to work through the challenge of PEcAn being monolithic software meant to run on one giant computer with tons of storage. Since data science projects involve large nested directories of file data, we developed an rsync strategy between containers running in OpenShift to rsync files to a model pod as part of our event-driven strategy. From an OpenShift AI workbench, we can send a message to a forecasting model pod. This also triggers an rsync operation that copies all the relevant files from the workbench pod to the model pod. The model pod receives the files and the message and runs the ecological forecasting model on the data. The model pod then sends the files and additional data back to the workbench pod that triggered the message. The pod rsync operation is surprisingly fast and effective for this transfer. This may not be the long-term solution for event-driven ecological forecasting, but it has worked very well for us given our time constraints and limitations to upstream adoption of our research this year.

Future milestones

The next step for our team will be developing and deploying an asynchronous, event-driven scheduler that will elastically launch data ingest containers. This will enable scaling our prototype to multiple sites, more data constraints, and a collection of models. Future applications could extend this system to additional forecast workflows, such as water resources, biodiversity, zoonotic disease, or invasive species.

To learn more about the project, join us at github.com/PecanProject or pecanproject.slack.com, or contact Professor Dietze.

This project has been the fruit of successful collaboration among ecological forecasting experts at BU, faculty who provide high-level support, and students with technical domain expertise, in addition to industry know-how from software developers like myself. This combination, plus the computing resources made available to researchers through NERC, allowed us to tackle the engineering challenges of running the PEcAn platform in the cloud.

SHARE THIS ARTICLE

Feature

Faster hardware through software

Gordon Haff

Researchers have tested several techniques for using software to get the most out of hardware. Find out about three promising projects that indicate the direction of this quickly changing field. It used to be simple to make computer workloads run faster. Wait eighteen months or so for more transistors consuming the same amount of power, […]

Feature

Applying lessons from our upstream hypervisor fuzzer to improve kernel fuzzing

Alexander Bulekov

Bandan Das

Could a grammarless approach increase its effectiveness? Low-level systems such as Linux kernels and hypervisors form the foundation of cloud systems today. The virtual machines (VMs) provided by hypervisors are attractive targets for attackers. Bugs in hypervisors create the risk of an attacker in a malicious VM, compromising the isolation guarantees provided by the hypervisor, […]

Feature

Testing critical IoT systems to mitigate network disruptions

Miroslav Bureš

The Internet of Things brings new opportunities and new challenges for mission-critical applications where lives are at stake. Systematic testing can help. The Internet of Things (IoT) has significantly increased the capabilities of mission-critical systems in many domains. Integrated rescue systems, healthcare, defense, energy, and transportation benefit from using the IoT, enabling faster system reactions […]

Feature

Unpacking AI’s black box: why authenticity and traceability must be built in

Marek Grác

Martin Ukrop

An AI Bill of Materials (AIBOM) is a critical tool for establishing trust for an AI application, but today they are far from standard. Learn what researchers are exploring. Organizations are rapidly weaving artificial intelligence (AI) technologies into nearly every aspect of the enterprise, from everyday workflow tools to specialized solutions for finance, healthcare, and […]

Feature

Where will we find the data scientists?

Jennifer Wood

Universities play a primary role in developing data skills, but traditional education alone can’t close the skills gap fast enough. The mismatch between the widespread need for strong data skills and the current workforce is an obstacle for nearly every sector of the economy, which means no single sector can solve it. Collaborative partnerships among […]

Feature

Making machine learning accessible across disciplines

Marek Grác

Machine learning has been driving research breakthroughs in many fields. Now there is an open source curriculum designed to help non-specialists build the skills they need to use it. Machine learning is an increasingly important competency in a growing number of fields. Biochemists are using it to create models for protein engineering. Economists are using […]

Feature

User authentication for open source developers: what do they use?

Agáta Kružíková

Milan Brož

Ongoing research into user authentication in public open source repositories demonstrates the importance of usability–even for IT professionals.

Feature

Yuga: A tool to help Rust developers write unsafe code more safely

Sanjay Arora

Baishakhi Ray

Vikram Nitin

Some bugs in unsafe Rust arise from errors that are so easy to make that they are easily overlooked. Researchers have developed a new analyzer to find them. By Vikram Nitin, Anne Mulhern, Baishakhi Ray, and Sanjay Arora Rust, a programming language that did not exist just 10 years ago, is now well known and […]

Feature

Where AI meets secure coding: inside SEMLA’s ambition for more resilient software

Simone Ferlin-Reiter

The industry-academia collaboration aimed at using LLMs to help generate more secure code builds on its success to expand research into infrastructure. In an era when software underpins everything from critical communications and global financial systems to lifesaving medical devices, security and reliability can never be an afterthought. Yet traditional development practices often leave gaps: […]

Red Hat Research Quarterly

Moving ecological forecasting from supercomputer to cloud: why and how

Red Hat Research Quarterly

Moving ecological forecasting from supercomputer to cloud: why and how

Christopher Tate

Related Projects

Red Hat Research Quarterly

May 2024

New event-driven architecture enabled researchers to move the PEcAn platform to the New England Research Cloud and increase scalability.

Project background

Meet the ecological forecasting team

Onboarding to NERC

Event-driven ecological forecasting and scaling

Future milestones

Gordon Haff

Alexander Bulekov

Bandan Das

Miroslav Bureš

Marek Grác

Martin Ukrop

Jennifer Wood

Marek Grác

Agáta Kružíková

Milan Brož

Sanjay Arora

Baishakhi Ray

Vikram Nitin

Simone Ferlin-Reiter