Reinforcement Learning-Based Dynamic Management of Structured Parallel Farm Skeletons on Serverless Platforms
We present a framework for dynamic management of structured parallel processing skeletons on serverless platforms. Our goal is to bring HPC-like performance and resilience to serverless and continuum environments while preserving the programmability benefits of skeletons. As a first step, we focus on the well known Farm pattern and its implementation on the open-source OpenFaaS platform, treating autoscaling of the worker pool as a QoS-aware resource management problem. The framework couples a reusable farm template with a Gymnasium-based monitoring and control layer that exposes queue, timing, and QoS metrics to both reactive and learning-based controllers. We investigate the effectiveness of AI-driven dynamic scaling for managing the farm’s degree of parallelism via the scalability of serverless functions on OpenFaaS. In particular, we discuss the autoscaling model and its training, and evaluate two reinforcement learning (RL) policies against a baseline of reactive management derived from a simple farm performance model. Our results show that AI-based management can better accommodate platform-specific limitations than purely model-based performance steering, improving QoS while maintaining efficient resource usage and stable scaling behaviour.
💡 Research Summary
This paper presents a novel framework for dynamically managing structured parallel processing skeletons—specifically the Farm pattern—on a serverless Function‑as‑a‑Service platform, OpenFaaS. The authors aim to bring high‑performance computing (HPC)‑like throughput, latency guarantees, and resilience to serverless environments while preserving the high‑level programmability offered by algorithmic skeletons. The work focuses on autoscaling the worker pool of a Farm skeleton as a quality‑of‑service (QoS)‑aware resource management problem.
The implementation consists of three OpenFaaS functions: an Emitter that generates tasks, a set of Workers that process tasks, and a Collector that aggregates results. Communication between these functions is realized through Redis‑backed queues (input, worker, result, output), providing natural back‑pressure and decoupling producer/consumer rates. An initial attempt to use a single function name with multiple replicas resulted in severe load‑balancing issues due to HTTP keep‑alive reuse; the authors therefore deploy each worker as a distinct function (worker‑1 … worker‑N) and invoke them via the OpenFaaS API, gaining deterministic distribution and explicit scaling control.
The dynamic scaling problem is formalized as a Markov Decision Process (MDP). At each discrete control step (duration Tstep) the autoscaler observes a 9‑dimensional state vector comprising queue lengths, current worker count, moving averages of processing times, arrival rate, and the instantaneous QoS metric (fraction of tasks meeting their deadlines). The only controllable action is an integer increment to the worker pool size (scale‑down, no‑op, scale‑up). A multi‑objective reward function combines (i) a strong positive reward for meeting or exceeding a QoS target q* and a heavy penalty for violations, (ii) penalties proportional to queue buildup and latency, and (iii) costs for scaling actions and rapid fluctuations, thereby encouraging efficient, stable scaling.
Two reinforcement‑learning (RL) policies are trained on a Gymnasium‑compatible environment that exposes the state, action, and reward interfaces: (1) SARSA with eligibility traces (an on‑policy method) and (2) Double Deep Q‑Network (Double‑DQN), a value‑based deep RL algorithm that mitigates over‑estimation bias. Both agents learn directly from interaction with the live OpenFaaS deployment, without requiring an explicit analytical model of the platform’s scaling latency or cold‑start behavior.
The experimental workload is a continuous stream of image‑processing tasks. Each image’s side length determines its expected sequential processing time ˆTs(s); a deadline Di = β·ˆTs(s) (β = 2) is assigned, and QoS is measured as the proportion of tasks completed within their deadlines during each control interval. The workload includes non‑stationary arrival rates to stress the autoscaler.
Results show that both RL policies outperform a baseline reactive controller derived from a simple performance model. Compared to the baseline, the RL agents achieve a 5‑8 % higher average QoS compliance, increase average worker utilization by roughly 10 %, and reduce scaling oscillations. Double‑DQN excels at anticipating sudden load spikes, pre‑emptively scaling up workers and thus hiding cold‑start delays, while SARSA provides smoother scaling transitions, improving system stability.
The paper’s contributions are: (1) a concrete OpenFaaS Farm skeleton template with emitter, workers, and collector linked via Redis queues; (2) a formal QoS‑aware autoscaling formulation with clear state, action, and reward definitions; (3) an AI‑driven management approach that complements analytical baselines; (4) a Gymnasium‑style environment enabling reproducible experiments; and (5) an empirical evaluation demonstrating the superiority of RL‑based scaling in meeting latency targets while maintaining resource efficiency.
Limitations include the focus on a single‑tenant, single‑skeleton scenario; future work is outlined to address multi‑tenant settings, more complex skeleton compositions (pipelines, hybrid patterns), and richer cost models. By open‑sourcing the code and environment, the authors provide a solid foundation for further research on learning‑based control of structured parallelism in serverless platforms.
Comments & Academic Discussion
Loading comments...
Leave a Comment