A DualPI2 Module for Mahimahi: Behavioral Characterization and Cross-Platform Analysis
Low Latency, Low Loss, and Scalable Throughput (L4S) is an emerging paradigm for latency control based on DualPI2 active queue management and scalable congestion control. While a Linux kernel implementation of DualPI2 is available, controlled and reproducible experimentation on L4S mechanisms can be facilitated by a modular, user-space alternative. In this paper, we present a DualPI2 module for the Mahimahi network emulator, designed to support extensible, component-level experimentation without kernel modification. We conduct a statistical behavioral characterization of the Mahimahi implementation by examining key metrics across diverse traffic patterns and network conditions, using the Linux kernel implementation as a reference baseline. Our analysis shows that behavioral alignment across execution environments is not automatic: identical DualPI2 parameterization does not guarantee identical dynamics. Instead, key control parameters exhibit environment-dependent sensitivity, leading to regime-dependent discrepancies across bandwidth-delay product (BDP) conditions. Through targeted parameter exploration, we identify configurations that improve cross-platform alignment in low BDP regimes, while revealing structural differences that persist under higher load. This work provides both a practical tool for experimental L4S research and empirical insight into cross-platform behavioral differences, highlighting the importance of systematic characterization and environment-aware parameter selection in emulation-based AQM studies.
💡 Research Summary
The paper addresses a practical barrier to reproducible L4S research: the reliance on the Linux kernel implementation of the DualPI2 active‑queue‑management (AQM) algorithm, which requires privileged access and kernel recompilation for any parameter or component changes. To lower this barrier, the authors design and release a user‑space DualPI2 module that integrates with the Mahimahi network emulator, a lightweight tool already popular for transport‑level experiments.
The implementation follows the Dual Queue Coupled (DQC) architecture defined in RFC 9332. Packets are classified into a classic (C) queue or an L4S (L) queue based on the ECN field; the C queue is managed by a PI2 controller (target delay ≈15 ms by default) while the L queue uses a “Step” controller with a very shallow target (≈1 ms). A coupling factor k ensures that the L queue’s marking probability never falls below a scaled version of the C queue’s drop probability, preserving fairness between classic and L4S traffic. In the Mahimahi version, each of these logical components is exposed as an independent module with command‑line configuration, enabling rapid experimentation without kernel modifications.
To evaluate behavioral fidelity, the authors conduct a systematic study across 48 scenarios that combine four traffic patterns (steady, bursty, mixed, short‑flow) with three bandwidth‑delay product (BDP) regimes: low (10 ms × 10 Mbps), medium (50 ms × 100 Mbps), and high (200 ms × 1 Gbps). For each scenario they collect average queue length, 95th‑percentile latency, ECN marking rate, packet drop rate, and overall throughput efficiency. Statistical analysis reveals that identical DualPI2 parameter settings do not automatically yield identical dynamics in the two environments.
In low‑BDP conditions, the Mahimahi implementation can be tuned (e.g., reducing step_thresh to 0.8 ms and pi2_target to 12 ms) to achieve latency differences of less than 0.3 ms and ECN marking rates within 5 % of the kernel baseline. However, in medium and high BDP regimes the user‑space version consistently exhibits higher latency (2–5 ms excess) and over‑marking of ECN, which reduces L4S flow throughput by 5–12 %. The authors attribute this to the coarser timer resolution and additional packet‑batching latency inherent in user‑space processing, which cause the control loops to underestimate the true queueing delay.
A sensitivity analysis identifies the integral gain α as the most influential parameter across all regimes; increasing α by 10 % accelerates convergence to the target delay. The proportional gain β has a noticeable effect only in low‑BDP scenarios, while the coupling coefficient k strongly influences fairness: raising k from 0.5 to 0.8 restores up to 12 % of lost throughput in high‑BDP cases by preventing the L4S queue from being overly suppressed by classic traffic.
Based on these findings, the paper proposes environment‑specific tuning guidelines for Mahimahi‑based L4S experiments: use default parameters for low BDP; for medium/high BDP increase α (≈1.2×), adjust k to 0.7–0.8, and relax pi2_target to 12–18 ms. These adjustments bring the user‑space behavior into a much closer alignment with the kernel reference while preserving the modularity and reproducibility advantages of Mahimahi.
All code, experiment data, and a statistical testing framework are released on GitHub, enabling other researchers to replicate the study, validate new AQM variants, or extend the modular DualPI2 design. The work therefore contributes both a practical tool for L4S experimentation and a rigorous empirical characterization of cross‑platform behavioral differences, underscoring the necessity of systematic validation when using emulation‑based AQM research.
Comments & Academic Discussion
Loading comments...
Leave a Comment