Evaluating model serving strategies over streaming data

March 9, 2023

Authors: Sonia Horchidan, KTH Royal Institute Of Technology, Emmanouil Kritharakis, Boston University, Vasiliki Kalavri, Boston University, and Paris Carbone, KTH Royal Institute Of Technology


We present the first performance evaluation study of model serving integration tools in stream processing frameworks. Using Apache Flink as a representative stream processing system, we evaluate alternative Deep Learning serving pipelines for image classification. Our performance evaluation considers both the case of embedded use of Machine Learning libraries within stream tasks and that of external serving via Remote Procedure Calls. The results indicate superior throughput and scalability for pipelines that make use of embedded libraries to serve pre-trained models. Whereas, latency can vary across strategies, with external serving even achieving lower latency when network conditions are optimal due to better specialized use of underlying hardware. We discuss our findings and provide further motivating arguments towards research in the area of ML-native data streaming engines in the future.

Associated Research Projects