AI for Cloud Ops
Today’s Continuous Integration/Continuous Development (CI/CD) trends encourage rapid design of software using a wide range of customized, off-the-shelf, and legacy software components, followed by frequent updates that are immediately deployed on the cloud. Altogether, this component diversity and breakneck pace of development amplify the difficulty in identifying, localizing, or fixing problems related to performance, resilience, and security. Existing approaches that rely on human experts have limited applicability to modern CI/CD processes, as they are fragile, costly, and often not scalable.
This project aims to address this gap in effective cloud management and operations with a concerted, systematic approach to building and integrating AI-driven software analytics into production systems. We aim to provide a rich selection of heavily-automated “ops” functionality as well as intuitive, easily-accessible analytics to users, developers, and administrators. In this way, our longer-term aim is to improve performance, resilience, and security in the cloud without incurring high operation costs.
Graphic caption: An illustrative overview of the “AI for Cloud Ops” project, which aims to demonstrate the performance, resilience, and security benefits of AI-driven cloud analytics in modern continuous integration/continuous deployment environments. The project will make customized analytics available to developers and administrators via queryable APIs during open-source software deployment (e.g., through Jupyter notebooks) and at runtime.
Other Funding that Supports this Research
- Ayse Coskun, IBM Faculty Award, 2020
- Ayse Coskun (Co-PI), NSF CISE CSR, A Just-in-Time, Cross-Layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications. PI: Raja Sambasivan at Tufts University, 2018-2022
- Ayse Coskun, IBM Open Collaborative Research Award, 2016-2020
- Ayse Coskun, Red Hat Collaboratory, 2018-2020
Project Resources and Repositories
- Operate First/AI for Cloud Ops GitHub
- Iter8 Online Experimentation Framework
- Praxi: Software Discovery using ML
- ACE: Approximate Concrete Execution
- OSD Alert Analysis GitHub
Principal Investigator: Ayse Coskun
Co-PIs: Alan Liu and Gianluca Stringhini
Red Hat Collaborators: Marcel Hild, Steven Huels, and Daniel Riek
IBM Collaborator: Fabio Oliveira
Graduate/PhD Students: Anthony Byrne, Yajie (Lesley) Zhou, Sumatra Dimoyee, Syed Mohammad Qasim, Mert Tosdlali, Saad Ullah
Undergraduate Students: CE undergraduate: Emika Hammond, Brian Jung, Quan Pham, and Haoming Yi; ME undergraduate students: Rashid Kolaghassi and Maxwell Malamut
- Lesley Chou “PrvTel: Privacy preserving analytics for network telemetry,” CISE Graduate Student Workshop, Boston University, January 2023
- Saad Ullah “PyReT: Python Real-Time vulnerability detection”, CISE Graduate Student Workshop, Boston University, January 2023
- Anthony Byrne, “RTQA: Real-time Code Feedback for Data Scientists” at DevConf.US, August 2022
The AI4CloudOps holds a monthly community meeting. This is open to anyone that would like to discuss initiatives under the AI4CloudOps umbrella. Contact Ayse Coskun, Boston University, for more information.
RHRQ asked Professor Ayse Coskun of the Electrical and Computer Engineering Department at Boston University to sit down for an interview with Red Hatter Marcel Hild. Professor Coskun is one of the Principal Investigators on the project AI for Cloud Ops, which recently won a $1 million Red Hat Collaboratory Research Incubation Award. Their conversation delves into the need for operations-focused research on real-world systems and the capacity of more mature AI technology to solve problems on a large scale. (Read full article)
Watch: Research Days AI for Cloud Ops Talk, February 16, 2022 (event page with abstract)