Skip to main navigation Skip to search Skip to main content

DatApollo: Orchestration of Serverless Functions for Scalable Data Mining

  • Mahtab Shahin
  • , Markus Bertl
  • , Nasim Janatian
  • , Juan Aznar-Poveda
  • , Syed Attique Shah (Corresponding / Lead Author)
  • , Thomas Fahringer
  • Tallinn University of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

With the exponential growth of data generated from enterprise systems, social networks, and the Internet of Things, traditional data mining techniques face major challenges in terms of scalability and efficiency. As a foundational unsupervised learning method for detecting patterns in transactional datasets, Association Rule Mining (ARM) is commonly encountered in distributed environments with performance bottlenecks due to excessive memory consumption, static resource provisioning, and costly data shuffle. The present paper presents DatApollo, a novel serverless orchestration framework that enables the execution of distributed ARM workflows in a scalable and efficient manner. DatApollo, based on the Apollo orchestration engine, offers stateless cloud functions, dynamic scheduling, intermediate state persistence, and fault-tolerant coordination in order to address the limitations of both traditional cluster-based architectures and existing Function-as-a-Service models. By decomposing ARM pipelines into orchestrated microfunctions, the framework enables elastic, cloud-native execution with minimal idle time. Using real-world healthcare and meteorological datasets, we describe the architectural design, algorithmic components, and computational complexity of DatApollo and perform a comprehensive experimental evaluation. DatApollo provides up to five times faster execution time compared to Apache Spark and lowers infrastructure costs by utilizing elastic scaling and event-driven function invocations. The results demonstrate that DatApollo is a robust, cost-effective and high-performance alternative to ARM in dynamic, large-scale data environments.
Original languageEnglish
Pages (from-to)142813 - 142828
JournalIEEE Access
Volume13
DOIs
Publication statusPublished (VoR) - 24 Jul 2025

Fingerprint

Dive into the research topics of 'DatApollo: Orchestration of Serverless Functions for Scalable Data Mining'. Together they form a unique fingerprint.

Cite this