PRISM: Festina Lente Proactivity -- Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents

PRISM: Festina Lente Proactivity -- Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Proactive agents must decide not only what to say but also whether and when to intervene. Many current systems rely on brittle heuristics or indiscriminate long reasoning, which offers little control over the benefit-burden tradeoff. We formulate the problem as cost-sensitive selective intervention and present PRISM, a novel framework that couples a decision-theoretic gate with a dual-process reasoning architecture. At inference time, the agent intervenes only when a calibrated probability of user acceptance exceeds a threshold derived from asymmetric costs of missed help and false alarms. Inspired by festina lente (Latin: “make haste slowly”), we gate by an acceptance-calibrated, cost-derived threshold and invoke a resource-intensive Slow mode with counterfactual checks only near the decision boundary, concentrating computation on ambiguous and high-stakes cases. Training uses gate-aligned, schema-locked distillation: a teacher running the full PRISM pipeline provides dense, executable supervision on unlabeled interaction traces, while the student learns a response policy that is explicitly decoupled from the intervention gate to enable tunable and auditable control. On ProactiveBench, PRISM reduces false alarms by 22.78% and improves F1 by 20.14% over strong baselines. These results show that principled decision-theoretic gating, paired with selective slow reasoning and aligned distillation, yields proactive agents that are precise, computationally efficient, and controllable. To facilitate reproducibility, we release our code, models, and resources at https://prism-festinalente.github.io/; all experiments use the open-source ProactiveBench benchmark.


💡 Research Summary

PRISM (Proactive Risk‑Sensitive Intervention with a Slow‑mode Margin) tackles the fundamental problem of when a proactive AI assistant should intervene. The authors formalize intervention as a cost‑sensitive selective decision: at each time step the agent estimates two calibrated probabilities from the current context – p_need (the likelihood that help is needed) and p_accept (the likelihood that a user will accept the offer). A dynamic threshold τ(p_need) = C_FA / (C_FA + p_need·C_FN) is derived from asymmetric costs: C_FA for false alarms (unwanted interruptions) and C_FN for false negatives (missed opportunities). The agent intervenes only when p_accept ≥ τ(p_need). This formulation ensures that higher need relaxes the acceptance requirement, and the threshold varies monotonically with both need and cost parameters, giving designers an intuitive control knob.

To avoid the computational expense of always running a heavyweight model, PRISM adopts a dual‑process architecture. A fast, lightweight model first predicts p_need and p_accept. If the prediction lies within a narrow margin δ_slow of the decision boundary (|p_accept – τ(p_need)| ≤ δ_slow), a resource‑intensive “Slow” pass is triggered. The Slow pass performs counterfactual checks and chain‑of‑thought reasoning, but is invoked only for ambiguous, high‑stakes cases. This “slow‑only‑near‑boundary” strategy concentrates compute where it can most affect the outcome, dramatically reducing average latency and token usage while preserving accuracy.

Training mirrors deployment through a gate‑aligned distillation pipeline. A powerful teacher runs the full PRISM pipeline (Fast + Slow + gate) on unlabeled interaction traces, producing dense supervision: calibrated probabilities (q_need, q_accept) and binary labels (y_need, y_accept). The authors introduce Decision‑Consistent Curation (RDC) to rank teacher outputs:
R_DC = y_accept – |q_need – y_need|²·𝟙{y_need=1} – |q_accept – y_accept|².
Top‑ranked examples form a curated subset D* used to fine‑tune a student model. The student’s loss combines three terms: L_need and L_acc (calibration losses for need and acceptance, with inverse propensity scoring to correct selection bias) and L_burden (penalties for false alarms and Slow‑mode token cost). By embedding the same cost parameters and margin δ_slow in both training and inference, the gap between simulation and production is minimized.

Evaluation is performed on ProactiveBench, a benchmark comprising coding, writing, and daily‑life interaction streams (233 test clips). Metrics include Recall, Precision, F1, False‑Alarm Rate, and AUDBC (area under the benefit‑burden curve). PRISM’s student (Qwen3‑8B‑PRISM) achieves near‑perfect Recall (≈99%) while raising Precision from 77% (teacher DeepSeek‑R1) to 85% (human evaluation) and cutting false alarms by roughly 23% relative to strong baselines. AUDBC is consistently higher across a range of cost settings, indicating superior net benefit at comparable or lower burden. Ablation studies show that removing the δ_slow margin inflates Slow‑mode usage threefold with negligible F1 gain, and that modest recalibration of probabilities can recover performance under distribution shift.

The paper’s contributions are threefold: (1) a principled, cost‑sensitive gating mechanism that cleanly separates need and acceptance; (2) a selective Slow‑mode strategy that yields computational efficiency without sacrificing accuracy; and (3) a gate‑aligned distillation framework (RDC) that produces a lightweight student model with controllable, auditable behavior. Limitations include reliance on well‑calibrated probability estimates and a binary cost model that does not capture multi‑dimensional user preferences (e.g., safety, privacy). Future work is suggested on extending the framework to multi‑objective costs, online recalibration, and user‑customizable cost interfaces.

Overall, PRISM demonstrates that integrating decision‑theoretic gating, selective high‑cost reasoning, and aligned teacher‑student training can produce proactive agents that are more precise, efficient, and controllable than existing heuristic or monolithic approaches.


Comments & Academic Discussion

Loading comments...

Leave a Comment