Evaluating Actionability in Explainable AI

Evaluating Actionability in Explainable AI
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A core assumption of Explainable AI (XAI) is that explanations are useful to users – that is, users will do something with the explanations. Prior work, however, does not clearly connect the information provided in explanations to user actions to evaluate effectiveness. In this paper, we articulate this connection. We conducted a formative study through 14 interviews with end users in education and medicine. We contribute a catalog of information and associated actions. Our catalog maps 12 categories of information that participants described relying on to take 60 different actions. We show how AI Creators can use the catalog’s specificity and breadth to articulate how they expect information in their explanations to lead to user actions and test their assumptions. We use an exemplar XAI system to illustrate this approach. We conclude by discussing how our catalog expands the design space for XAI systems to support actionability.


💡 Research Summary

The paper tackles a fundamental but under‑explored assumption of Explainable AI (XAI): that explanations are useful because they lead users to take concrete actions. While prior work has evaluated XAI from algorithmic perspectives (faithfulness, consistency) or through qualitative judgments of readability and intuitiveness, it has rarely linked the information presented in explanations to the specific actions users perform. To fill this gap, the authors conduct a formative study with 14 end‑users—nine physicians and five computer‑science teachers—who are representative of high‑stakes domains (medicine and education) where AI‑driven decisions affect third parties (patients, students).

Using scenario‑based design (SBD), participants were immersed in realistic, domain‑specific mock interfaces (an electronic health record for doctors, a course‑placement tool for teachers) and asked to think aloud about the tasks they would need to accomplish, the people they would need to consult, and the goals they would pursue. The SBD approach deliberately avoids technical jargon, focusing instead on the actions users would take given the information displayed.

Through iterative thematic coding of 737 interview excerpts, the researchers first derived a user‑generated lexicon for AI‑related terms, then identified 12 distinct categories of information that participants said they would need from an XAI system. These categories were grouped into four thematic clusters: model performance & reliability, data provenance & quality, risk & constraint information, and alternative actions or recommendations. In parallel, 60 concrete user actions were extracted and organized into three higher‑level groups: communication & collaboration (e.g., discussing findings with colleagues), decision modification & refinement (e.g., adjusting a drug dosage or re‑assigning a student), and additional verification & auditing (e.g., requesting a model audit or checking external guidelines).

The resulting “actionability catalog” maps each information category to the specific actions it enables. For example, knowledge about drug‑interaction risks (information) supports the action of modifying a prescription (behavior). This mapping provides two practical benefits. First, AI creators can articulate explicit expectations about which pieces of explanatory information should trigger which user actions during system design, making the purpose of explanations concrete and testable. Second, evaluators can measure whether real users actually perform the expected actions, thereby validating or refuting the designers’ hypotheses. This user‑centered evaluation goes beyond traditional metrics of faithfulness or clarity, directly assessing the impact of explanations on workflow and outcomes.

The paper also discusses design implications. Explanations must move beyond answering “why” or “how” to explicitly suggest “what to do next.” Interfaces should therefore embed actionable elements such as clear call‑to‑action buttons, audit trails, or suggested remediation steps. Moreover, domain‑specific priorities emerge: clinicians value risk and regulatory compliance information, while educators prioritize learning‑path guidance and feedback. Tailoring the information hierarchy to these needs can enhance perceived usefulness and actual adoption.

In sum, the study reframes XAI evaluation around actionability, delivering a systematic catalog that links explanatory content to user behavior. It offers a concrete framework for designers to specify, implement, and empirically test the pragmatic value of explanations. Future work should expand the catalog across more domains, develop quantitative metrics for action execution, and explore longitudinal effects of action‑oriented explanations on trust, decision quality, and overall system adoption.


Comments & Academic Discussion

Loading comments...

Leave a Comment