Scalable Explainability-as-a-Service (XaaS) for Edge AI Systems
Though Explainable AI (XAI) has made significant advancements, its inclusion in edge and IoT systems is typically ad-hoc and inefficient. Most current methods are “coupled” in such a way that they generate explanations simultaneously with model inferences. As a result, these approaches incur redundant computation, high latency and poor scalability when deployed across heterogeneous sets of edge devices. In this work we propose Explainability-as-a-Service (XaaS), a distributed architecture for treating explainability as a first-class system service (as opposed to a model-specific feature). The key innovation in our proposed XaaS architecture is that it decouples inference from explanation generation allowing edge devices to request, cache and verify explanations subject to resource and latency constraints. To achieve this, we introduce three main innovations: (1) A distributed explanation cache with a semantic similarity based explanation retrieval method which significantly reduces redundant computation; (2) A lightweight verification protocol that ensures the fidelity of both cached and newly generated explanations; and (3) An adaptive explanation engine that chooses explanation methods based upon device capability and user requirement. We evaluated the performance of XaaS on three real-world edge-AI use cases: (i) manufacturing quality control; (ii) autonomous vehicle perception; and (iii) healthcare diagnostics. Experimental results show that XaaS reduces latency by 38% while maintaining high explanation quality across three real-world deployments. Overall, this work enables the deployment of transparent and accountable AI across large scale, heterogeneous IoT systems, and bridges the gap between XAI research and edge-practicality.
💡 Research Summary
The paper addresses a critical gap in the deployment of Explainable AI (XAI) on edge and Internet‑of‑Things (IoT) platforms. Existing XAI techniques such as LIME, SHAP, and Grad‑CAM are typically “coupled” with model inference, meaning that an explanation is generated for every inference request. This coupling leads to three major inefficiencies in edge environments: (1) redundant computation when similar inputs repeatedly trigger new explanations, (2) a mismatch between the computational demands of XAI methods and the limited resources of heterogeneous edge devices, and (3) the lack of any mechanism to reuse or verify explanations after model updates.
To overcome these problems, the authors propose Explainability‑as‑a‑Service (XaaS), a service‑oriented architecture that treats explanation generation as an independent, distributed service. The core idea is to decouple inference from explanation, allowing edge devices to request, cache, and verify explanations on demand while respecting latency and resource constraints. XaaS consists of five components: (1) an Edge Device Layer where AI models run and explanation requests are issued, (2) a Distributed Explanation Cache organized in a two‑tier hierarchy (local edge‑server caches for sub‑10 ms access and a global cloud cache for broader coverage), (3) an Explanation Generation Layer that produces explanations on cache misses using a portfolio of XAI methods, (4) a lightweight Verification Module that checks version consistency and fidelity of cached explanations, and (5) a Service Orchestrator that routes requests, balances load, and manages model versions.
Three technical innovations enable XaaS to achieve its performance gains:
-
Semantic Similarity‑Based Caching – Input instances are embedded using domain‑specific pretrained models (e.g., CLIP for images, BERT for text). FAISS is used for fast nearest‑neighbor search. A cached explanation is considered reusable only if four validity conditions hold: (i) semantic distance below a dynamic threshold ε_sim, (ii) identical model prediction, (iii) matching model version, and (iv) the cached explanation meets the fidelity requirement of the new request. This approach reduces the cost of generating explanations by two to three orders of magnitude when a cache hit occurs.
-
Lightweight Verification Protocol – Instead of recomputing a full LIME/SHAP explanation, the protocol generates a small set (n = 15) of perturbations and compares the model’s responses with the cached explanation’s predictions. Experiments show that this method detects 95.5 % of invalid explanations while incurring only a 3.2 % loss in overall explanation quality.
-
Adaptive Explanation Engine – When a cache miss occurs, the engine selects an XAI method and an execution location (device, edge server, or cloud) that minimizes a weighted cost function α·T_compute + β·T_comm, subject to the device’s compute capacity, network bandwidth, latency bound, and the requested fidelity. A greedy search over the method‑location space (|M|·|L| complexity) first prefers high‑fidelity methods and then evaluates feasible locations, ensuring that the chosen configuration satisfies all constraints.
The authors formalize the problem as a multi‑objective optimization: minimize total system cost while guaranteeing fidelity ≥ ρ_fid, latency ≤ ρ_lat, and explanation validity for the current model version. Because the decision space is high‑dimensional and dynamic (device capabilities, network conditions, cache contents evolve over time), the exact solution is intractable. The paper therefore proposes practical heuristics based on the three innovations above.
Experimental evaluation is conducted on three real‑world edge‑AI use cases:
- Manufacturing Quality Control (MQC) – 150 devices, 127 k samples, 8 defect classes.
- Autonomous Vehicle Fleet (AVF) – 80 vehicles, 215 k driving scenarios.
- Healthcare Monitoring (HCM) – 200 patients, 89 k vital‑sign records.
Hardware platforms include Raspberry Pi 4B, NVIDIA Jetson Nano, and higher‑end GPUs. Compared against a baseline “coupled” approach and against naive caching or ML‑as‑a‑Service (MLaaS) variants, XaaS achieves an average latency reduction of 38 % and cache hit rates between 62 % and 71 %. Explanation quality, measured by F1 scores against ground‑truth SHAP/LIME explanations, remains high (0.84–0.89). The lightweight verification detects over 95 % of stale explanations with minimal overhead. Resource utilization improves as well: CPU usage drops by 30 %–45 % and network traffic by roughly 25 % across all scenarios.
The paper also discusses assumptions underlying the design: (A1) consistent explanations for identical inputs, (A2) structured input distributions (images, sensor streams), (A3) tolerant latency (milliseconds to seconds), (A4) relatively slow model‑version churn, and (A5) availability of network connectivity. These assumptions delineate the applicability domain of XaaS.
Limitations are acknowledged. The effectiveness of semantic caching depends heavily on the quality of the embedding models; domains with poorly captured semantics (e.g., raw time‑series without suitable encoders) may see lower hit rates. Frequent model updates could increase cache invalidation overhead, and ultra‑low‑latency control loops (sub‑millisecond requirements) may still be beyond the current design.
In conclusion, XaaS demonstrates that treating explainability as a first‑class, service‑oriented component can dramatically improve the practicality of XAI on heterogeneous edge infrastructures. By combining semantic caching, lightweight verification, and adaptive method selection, the architecture delivers scalable, low‑latency, high‑fidelity explanations across diverse real‑world deployments. The work opens avenues for further research on domain‑agnostic embeddings, dynamic cache consistency protocols, and integration with federated learning pipelines to maintain explanation relevance in continuously evolving edge AI ecosystems.
Comments & Academic Discussion
Loading comments...
Leave a Comment