AI for Cloud Ops
Today’s Continuous Integration/Continuous Development (CI/CD) trends encourage rapid design of software using a wide range of customized, off-the-shelf, and legacy software components, followed by frequent updates that are immediately deployed on the cloud. Altogether, this component diversity and breakneck pace of development amplify the difficulty in identifying, localizing, or fixing problems related to performance, resilience, and security. Existing approaches that rely on human experts have limited applicability to modern CI/CD processes, as they are fragile, costly, and often not scalable.
This project aims to address this gap in effective cloud management and operations with a concerted, systematic approach to building and integrating AI-driven software analytics into production systems. We aim to provide a rich selection of heavily-automated “ops” functionality as well as intuitive, easily-accessible analytics to users, developers, and administrators. In this way, our longer-term aim is to improve performance, resilience, and security in the cloud without incurring high operation costs.
Graphic caption: An illustrative overview of the “AI for Cloud Ops” project, which aims to demonstrate the performance, resilience, and security benefits of AI-driven cloud analytics in modern continuous integration/continuous deployment environments. The project will make customized analytics available to developers and administrators via queryable APIs during open-source software deployment (e.g., through Jupyter notebooks) and at runtime.
Other Funding that Supports this Research
- Ayse Coskun, IBM Faculty Award, 2020
- Ayse Coskun (Co-PI), NSF CISE CSR, A Just-in-Time, Cross-Layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications. PI: Raja Sambasivan at Tufts University, 2018-2022
- Ayse Coskun, IBM Open Collaborative Research Award, 2016-2020
- Ayse Coskun, Red Hat Collaboratory, 2018-2020
- Iter8 Online Experimentation Framework
- Praxi: Software Discovery using ML
- ACE: Approximate Concrete Execution
Principal Investigator: Ayse Coskun
Co-PIs: Alan Liu and Gianluca Stringhini
Red Hat Collaborators: Marcel Hild, Steven Huels, and Daniel Riek
IBM Collaborator: Fabio Oliveira
Graduate Students: Anthony Byrne, Mert Toslali, Saad Ullah, and Lesley Zhou
Research Days AI for Cloud Ops Talk, February 16, 2022