Diagnosing and Repairing Distributed Routing Configurations Using Selective Symbolic Simulation

Diagnosing and Repairing Distributed Routing Configurations Using Selective Symbolic Simulation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Although substantial progress has been made in automatically verifying whether distributed routing configurations conform to certain requirements, diagnosing and repairing configuration errors remains manual and time-consuming. To fill this gap, we propose S^2Sim, a novel system for automatic routing configuration diagnosis and repair. Our key insight is that by selectively simulating variants of the given configuration in a symbolic way, we can find an intent-compliant variant, whose differences between the given configuration reveal the errors in the given configuration and suggest the patches. Building on this insight, we also design techniques to support complex scenarios (e.g., multiple protocol networks) and requirements (e.g., k-link failure tolerance). We implement a prototype of S^2Sim and evaluate its performance using networks of size O(10) ~ O(1000) with synthetic real-world configurations. Results show that S^2Sim diagnoses and repairs errors for 1) all WAN configurations within 10 s and 2) all DCN configurations within 20 minutes.


💡 Research Summary

The paper introduces S²Sim, a system that automatically diagnoses and repairs errors in distributed routing configurations. While many control‑plane verification (CPV) tools can check whether a configuration satisfies operator intents (reachability, waypointing, loop‑freedom, etc.), they leave the identification and correction of the offending configuration lines to manual effort. Existing attempts—Minesweeper, CEL, CPR, ACR—either cannot handle complex policies such as ACLs, AS‑path filters, or local‑preference tweaks, or they require exhaustive enumeration of counter‑examples, making them impractical for real networks.

S²Sim’s core insight is to treat the problem as one of finding an intent‑compliant variant of the given configuration and then using the differences between the two to locate and fix errors. To avoid the paradox of needing a correct variant before being able to build a tool that finds one, the authors introduce “contracts”: Boolean predicates that capture the behavior of each routing decision point (e.g., IsPeered, IsExported, IsPreferred). A configuration does not change the structure of the routing event‑driven program; it only determines the truth values of these contracts.

The workflow proceeds in three steps:

  1. Contract Derivation – Starting from the erroneous data plane, S²Sim computes a minimally different, intent‑compliant data plane using a DFA‑multiplication algorithm. From this data plane it extracts a set of contracts that are sufficient and necessary for compliance. This ensures the target variant stays close to the original configuration, limiting the amount of change required.

  2. Selective Symbolic Simulation – The original configuration is executed symbolically. Whenever a contract violation is detected, the simulation is forced to obey the contract and switches to a symbolic variant of the configuration. Because the simulation now respects all derived contracts, it eventually converges to the intent‑compliant data plane. The set of violated contracts directly points to the configuration snippets that cause the problem.

  3. Error Localization and Repair – Violated contracts are mapped back to concrete configuration lines. A constraint‑programming engine then computes a conflict‑free repair that satisfies all contracts while minimizing modifications to the original script.

To handle realistic complexities, S²Sim incorporates three design extensions:

  • D1 – Rich Policy Support – The contract framework is extended to model ACLs, route aggregation, and multipath routing, allowing the system to reason about policies that affect forwarding decisions beyond simple reachability.

  • D2 – Multi‑Protocol, Multi‑Layer Networks – Using an assume‑guarantee methodology, S²Sim diagnoses overlay and underlay networks separately. It first assumes the underlay works correctly, repairs the overlay, then treats the repaired overlay’s behavior as intents for the underlay, iterating until both layers are consistent. This modular approach enables handling of mixed protocols such as OSPF (underlay) and BGP (overlay).

  • D3 – Fault‑Tolerance Contracts – For k‑link failure tolerance, S²Sim synthesizes a data plane that remains intent‑compliant under any combination of up to k link failures. Corresponding fault‑tolerant contracts are derived and checked during symbolic simulation, allowing the system to locate and fix configuration errors that would only manifest under failure scenarios.

The authors implemented a closed‑source version integrated as a plugin to a major provider’s internal CPV tool, and an open‑source prototype that plugs into Batfish. Evaluation was performed on three fronts:

  1. Functionality Demonstrations on a small six‑router BGP example, showing that S²Sim correctly identifies both the export filter at router C and the local‑preference rule at router F, whereas prior tools either missed one error or failed to propose any fix.

  2. Real‑World Configurations from two large service providers. In WAN settings with roughly 100 routers, S²Sim diagnosed and repaired all errors within 10 seconds. In data‑center networks of similar size, the process completed in under 20 seconds.

  3. Scalable Synthetic Benchmarks using real‑world topologies ranging from 10 to 1,000 nodes, with configurations synthesized from actual provider errors. For networks up to 100 nodes, diagnosis and repair took less than a minute; for the largest 1,000‑node instances, the end‑to‑end time stayed under 15 minutes.

Across all experiments, S²Sim achieved 100 % accuracy in both error localization and repair, while maintaining low computational overhead. The paper concludes that contract‑based selective symbolic simulation provides a practical, scalable foundation for automated routing configuration management, dramatically reducing operator workload and enabling near‑real‑time remediation in complex, multi‑protocol, fault‑tolerant environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment