AI for Cloud Ops

AI for cloud Ops is a project of the Red Hat Collaboratory at Boston University.

Today’s Continuous Integration/Continuous Development (CI/CD) trends encourage rapid design of software using a wide range of customized, off-the-shelf, and legacy software components, followed by frequent updates that are immediately deployed on the cloud. Altogether, this component diversity and breakneck pace of development amplify the difficulty in identifying, localizing, or fixing problems related to performance, resilience, and security. Existing approaches that rely on human experts have limited applicability to modern CI/CD processes, as they are fragile, costly, and often not scalable.

This project aims to address this gap in effective cloud management and operations with a concerted, systematic approach to building and integrating AI-driven software analytics into production systems. We aim to provide a rich selection of heavily-automated “ops” functionality as well as intuitive, easily-accessible analytics to users, developers, and administrators. In this way, our longer-term aim is to improve performance, resilience, and security in the cloud without incurring high operation costs.

Graphic caption: An illustrative overview of the “AI for Cloud Ops” project, which aims to demonstrate the performance, resilience, and security benefits of AI-driven cloud analytics in modern continuous integration/continuous deployment environments. The project will make customized analytics available to developers and administrators via queryable APIs during open-source software deployment (e.g., through Jupyter notebooks) and at runtime.


Other Funding that Supports this Research

  • Ayse Coskun, IBM Faculty Award, 2020
  • Ayse Coskun (Co-PI), NSF CISE CSR, A Just-in-Time, Cross-Layer Instrumentation Framework for Diagnosing Performance Problems in Distributed Applications. PI: Raja Sambasivan at Tufts University, 2018-2022
  • Ayse Coskun, IBM Open Collaborative Research Award, 2016-2020
  • Ayse Coskun, Red Hat Collaboratory, 2018-2020

Project Resources and Repositories


Project Team

Principal Investigator: Ayse Coskun
Co-PIs: Alan Liu and Gianluca Stringhini
Red Hat Collaborators: Marcel Hild, Steven Huels, and Daniel Riek
IBM Collaborator: Fabio Oliveira
Graduate/PhD Students: Anthony Byrne, Yajie (Lesley) Zhou, Sumatra Dimoyee, Syed Mohammad Qasim, Mert Tosdlali, Saad Ullah
Undergraduate Students: CE undergraduate: Emika Hammond, Brian Jung, Quan Pham, and Haoming Yi; ME undergraduate students: Rashid Kolaghassi and Maxwell Malamut


Presentations

  • Lesley Chou “PrvTel: Privacy preserving analytics for network telemetry,” CISE Graduate Student Workshop, Boston University, January 2023
  • Saad Ullah “PyReT: Python Real-Time vulnerability detection”, CISE Graduate Student Workshop, Boston University, January 2023
  • Anthony Byrne, “RTQA: Real-time Code Feedback for Data Scientists” at DevConf.US, August 2022

Get Involved

The AI4CloudOps holds a monthly community meeting. This is open to anyone that would like to discuss initiatives under the AI4CloudOps umbrella. Contact Ayse Coskun, Boston University, for more information.

Learn More

Read: Machine learning for operations: Can AI push analytics to the speed of software deployment?, Red Hat Research Quarterly, May 2022

RHRQ asked Professor Ayse Coskun of the Electrical and Computer Engineering Department at Boston University to sit down for an interview with Red Hatter Marcel Hild. Professor Coskun is one of the Principal Investigators on the project AI for Cloud Ops, which recently won a $1 million Red Hat Collaboratory Research Incubation Award. Their conversation delves into the need for operations-focused research on real-world systems and the capacity of more mature AI technology to solve problems on a large scale. (Read full article)

Watch: Research Days AI for Cloud Ops Talk, February 16, 2022 (event page with abstract)


Project Poster

Link to full size poster