Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration
Multi-expert systems, where multiple Large Language Models (LLMs) collaborate to solve complex tasks, are increasingly adopted for high-performance reasoning and generation. However, the orchestration policies governing expert interaction and sequencing remain largely opaque. We introduce INFORM, an interpretability analysis that treats orchestration as an explicit, analyzable computation, enabling the decoupling of expert interaction structure, execution order, and causal attribution. We use INFORM to evaluate an orchestrator on GSM8K, HumanEval, and MMLU using a homogeneous consortium of ten instruction-tuned experts drawn from LLaMA-3.1 8B, Qwen-3 8B, and DeepSeek-R1 8B, with controlled decoding-temperature variation, and a secondary heterogeneous consortium spanning 1B-7B parameter models. Across tasks, routing dominance is a poor proxy for functional necessity. We reveal a divergence between relational importance, captured by routing mass and interaction topology, and intrinsic importance, measured via gradient-based causal attribution: frequently selected experts often act as interaction hubs with limited causal influence, while sparsely routed experts can be structurally critical. Orchestration behaviors emerge asynchronously, with expert centralization preceding stable routing confidence and expert ordering remaining non-deterministic. Targeted ablations show that masking intrinsically important experts induces disproportionate collapse in interaction structure compared to masking frequent peers, confirming that INFORM exposes causal and structural dependencies beyond accuracy metrics alone.
💡 Research Summary
The paper tackles the opacity of orchestration policies in multi‑expert large language model (LLM) systems by introducing a novel interpretability framework called INFORM. Rather than treating the routing mechanism as a black‑box component optimized solely for downstream performance, INFORM explicitly models orchestration as a two‑stage differentiable computation: (1) an interaction module that builds a directed, weighted expert‑to‑expert transition matrix C(x) using query‑key attention augmented with a cosine‑similarity prior, and (2) a selection module that derives a marginal expert selection distribution s(x) via a Gumbel‑Softmax over global connectivity derived from C(x).
The authors define two complementary importance metrics. Relational Importance (RI) is the total incoming mass for each expert in C(x), reflecting how often an expert serves as a successor in the collaboration graph. Intrinsic Importance (II) is obtained by back‑propagating the log‑probability of the selected expert with respect to the expert’s latent representation, i.e., the L2 norm of ∇_{h_i} log P(E_i|x). By comparing RI and II the framework separates observed routing frequency from causal contribution.
Experiments are conducted on two consortium configurations: a homogeneous pool of ten instruction‑tuned 8‑billion‑parameter models (LLaMA‑3.1, Qwen‑3, DeepSeek‑R1) with diversity induced by varying decoding temperature, and a heterogeneous pool spanning 1‑billion‑parameter to 7‑billion‑parameter models. The orchestrator is trained on three benchmark suites—GSM8K (math word problems), HumanEval (code generation), and MMLU (multiple‑choice knowledge)—using a composite loss that combines task accuracy, alignment to a large oracle LLM, sparsity penalties, and symmetry constraints.
Key findings include:
-
Divergence between routing frequency and causal influence – Experts that attract the bulk of routing mass (high RI) often act as “interaction hubs” but exhibit low II, indicating they are not essential for the final decision. Conversely, sparsely routed experts can have high II, serving as structural bottlenecks whose removal disproportionately harms the collaboration graph.
-
Asynchronous emergence of orchestration structure – During training, routing confidence and specialization gradually increase, yet the concentration of mass onto a few central experts occurs earlier than the stabilization of selection entropy. This suggests the system first learns “who to trust” before learning “how confidently to route.”
-
Non‑deterministic sequencing – Entropy analysis of s(x) shows that while an initializer expert often emerges, the ordering of subsequent experts remains highly variable across inference runs, implying task‑dependent or stochastic sequencing preferences.
-
Targeted ablation validates causal attribution – Masking the single expert with the highest II leads to a 5.5× increase in KL divergence of the transition matrix and a marked shift in the selection distribution, whereas masking the most frequently selected expert yields only minor structural changes. This confirms that II captures genuine causal dependencies rather than mere usage statistics.
-
Effect of decoding temperature – Introducing temperature‑driven diversity expands the connectivity of C(x) (higher rank, lower entropy) but does not align RI with II, demonstrating that output stochasticity enriches interaction topology without necessarily improving causal relevance.
The paper situates INFORM among existing orchestration approaches (Mixture‑of‑Experts, RouteLLM, IR‑T‑Router, MetaGPT, etc.) and highlights that most prior work provides limited insight into coordination dynamics, focusing instead on cost or performance. INFORM, by contrast, delivers fine‑grained, graph‑based interaction analysis, sequencing entropy, and gradient‑based causal attribution, all without requiring changes to the underlying routing algorithm.
In conclusion, the study delivers three major contributions: (i) a systematic methodology to disentangle relational from intrinsic expert importance, revealing that high routing frequency is a poor proxy for functional necessity; (ii) empirical evidence that expert centralization precedes confidence calibration, indicating asynchronous learning dynamics; and (iii) validation that gradient‑based intrinsic importance reliably identifies structural vulnerabilities, enabling principled expert pruning and more robust multi‑expert system design. The authors suggest future work extending INFORM to dynamic expert addition/removal, cost‑performance trade‑off optimization, and human‑in‑the‑loop scenarios, thereby advancing both the interpretability and reliability of collaborative LLM architectures.
Comments & Academic Discussion
Loading comments...
Leave a Comment