Communication-Aware Scheduling of Precedence-Constrained Tasks on Related Machines

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Scheduling precedence-constrained tasks is a classical problem that has been studied for more than fifty years. However, little progress has been made in the setting where there are communication delays between tasks. Results for the case of identical machines were derived nearly thirty years ago, and yet no results for related machines have followed. In this work, we propose a new scheduler, Generalized Earliest Time First (GETF), and provide the first provable, worst-case approximation guarantees for the goals of minimizing both the makespan and total weighted completion time of tasks with precedence constraints on related machines with machine-dependent communication times.

💡 Research Summary

The paper tackles a fundamental scheduling problem that has become increasingly relevant in modern large‑scale machine‑learning platforms such as TensorFlow, PyTorch, and AzureML. The authors consider a single job composed of n tasks that form a directed acyclic graph (DAG) of precedence constraints. The tasks must be assigned to m heterogeneous machines (the “related‑machines” model) where each machine i has a processing speed s_i, and the time to execute task j on i is w_j / s_i. In addition, communication between any two machines i′ and i occurs at a pair‑specific speed s_{i′,i}, so that data transmitted from a predecessor task j′ to its successor j incurs a delay w_{j′,j} / s_{i′,i}. The objective is either (1) to minimize the makespan C_max (the completion time of the last task) or (2) to minimize the total weighted completion time Σ_j ω_j C_j.

Historically, the problem without communication delays has been studied extensively. For identical machines, Graham’s list‑scheduling algorithm yields a (2 − 1/m) approximation for makespan, and later work extended this to O(log m) and O(log m / log log m) approximations for related machines. When communication delays are introduced, only a single result exists: the Earliest Time First (ETF) algorithm for identical machines, which guarantees a makespan of (2 − 1/m)·OPT_i + C, where C is the total communication time along the longest chain in the DAG. No comparable guarantees were known for related machines.

The authors introduce Generalized Earliest Time First (GETF), a greedy scheduler that blends two classic ideas. First, a group‑assignment function f(·) partitions the machines into “speed‑similar” groups, a technique that underlies the best known O(log m / log log m) algorithms for related machines without communication. Second, within each group GETF applies the ETF rule: among all tasks whose predecessors have already completed, it selects the one that can start the earliest on any machine belonging to the task’s group. Ties are broken by a deterministic rule. The algorithm proceeds iteratively, fixing one task per iteration, until all tasks are scheduled.

The main theoretical contributions are two approximation theorems.

Makespan bound: For any instance, the schedule S produced by GETF satisfies
C_max(S) ≤ O(log m / log log m)·OPT_i + C,
where OPT_i is the optimal makespan when communication delays are ignored, and C is the total communication time of the longest chain in the precedence graph. This result simultaneously generalizes the classic (2 − 1/m)·OPT + C bound for identical machines and the O(log m / log log m)·OPT bound for related machines without communication.
Weighted completion‑time bound: For the same schedule S,
Σ_j ω_j C_j(S) ≤ O(log m / log log m)·wOPT_i + Σ_j ω_j C(S,j),
where wOPT_i is the optimal weighted completion time without communication, and C(S,j) denotes the communication cost of the terminal chain that contains task j in schedule S. This is the first provable guarantee for weighted completion time in the presence of machine‑dependent communication delays.

The analysis hinges on a newly presented “Separation Principle” for ETF, which cleanly decouples the impact of processing times from communication delays. By proving that the earliest‑start time of any ready task can be bounded by the sum of an “ideal” processing schedule (ignoring communication) and the communication cost of a single chain, the authors avoid the intricate interleavings that plagued earlier attempts. They then import the recent O(log m / log log m) grouping technique from

Communication-Aware Scheduling of Precedence-Constrained Tasks on Related Machines

💡 Research Summary

Comments & Academic Discussion

Leave a Comment