A Note on Parallel Algorithmic Speedup Bounds
A parallel program can be represented as a directed acyclic graph. An important performance bound is the time to execute the critical path through the graph. We show how this performance metric is related to Amdahl speedup and the degree of average parallelism. These bounds formally exclude superlinear performance.
š” Research Summary
The paper presents a rigorous analysis of parallel algorithm speedāup limits by modeling a parallel program as a directed acyclic graph (DAG). In this representation each node corresponds to a unit of work (or subātask) and each edge encodes a control or communication dependency. Two fundamental execution times are defined: Tā, the time required to execute all nodes sequentially on a single processor, and Tā, the time needed to traverse the critical path when an unlimited number of processors are available. Tā is simply the depth of the DAG and constitutes an absolute lower bound on any parallel execution because no schedule can finish faster than the longest dependency chain.
Leisersonās classic lower bounds are restated: (3) TāāÆā„āÆTā/p, which captures the intuitive notion that work cannot be completed in less than the total work divided equally among p processors; and (4) TāāÆā„āÆTā, which asserts that the critical path length dominates execution time regardless of processor count. Substituting (3) into the speedāup definition SāāÆ=āÆTā/Tā yields the ideal linear speedāup SāāÆ=āÆp. In practice, overhead, load imbalance, and communication cause subālinear speedāup, i.e., SāāÆā¤āÆp. The paper also acknowledges the theoretical possibility of superālinear speedāup (SāāÆ>āÆp) but shows that bound (3) precludes it for any realistic DAG.
The authors then introduce the average parallelism metric AāÆ=āÆW/Tā, where W is the total work (the sum of node execution times) and Tā is the criticalāpath time. They prove that A is identical to Leisersonās Sā (the asymptotic speedāup as pāÆāāÆā). An illustrative example uses a DAG with 18 unitātime nodes and a critical path of nine nodes, giving WāÆ=āÆ18, TāāÆ=āÆ9, and thus AāÆ=āÆ2. This matches SāāÆ=āÆ2. The paper contrasts this with Amdahlās law, which would predict an asymptotic speedāup of 1/ĻāÆ=āÆ4 for the same graph because four nodes are deemed serial (ĻāÆ=āÆ4/18). The discrepancy highlights that Amdahlās bound, based solely on a serial fraction, can be overly optimistic compared with the tighter bound provided by average parallelism.
To address superālinear claims, the paper proves that bound (3) is equivalent to assuming a depthāone DAG, i.e., all work placed on a single level. Since any nonātrivial DAG has depth greater than one, it cannot be flattened arbitrarily, and thus the maximal speedāup cannot exceed the number of processors (linear scaling). Apparent superālinear behavior may arise only when the baseline for linearity is chosen incorrectly (e.g., comparing smallāp runs to largerāp runs without accounting for the work distribution), but the efficiency EāāÆ=āÆSā/p never exceeds 1 under the model.
In summary, the paper unifies three perspectives on parallel performance: the criticalāpath bound (Tā), the workādivision bound (Tā/p), and the average parallelism AāÆ=āÆW/Tā. It demonstrates that average parallelism coincides with the asymptotic speedāup and provides a more precise, graphāaware alternative to Amdahlās law. Moreover, it offers a formal proof that superālinear speedāup is impossible for any DAGāstructured computation, reinforcing the importance of using correct theoretical baselines when evaluating parallel scalability.
Comments & Academic Discussion
Loading comments...
Leave a Comment