Quantal Response Equilibrium as a Measure of Strategic Sophistication: Theory and Validation for LLM Evaluation
Theory of Mind benchmarks for large language models typically produce aggregate scores without theoretical grounding, making it unclear whether high performance reflects strategic reasoning or surface-level heuristics. We introduce a game-theoretic evaluation framework grounded in quantal response equilibrium (QRE). We derive closed-form equilibria for four strategic games, each targeting a distinct cognitive capability. We estimate QRE rationality parameters lambda that place model behavior on a continuous scale calibrated against human data (lambda_human in [1.0, 2.5]), and establish finite-sample convergence bounds via martingale concentration. Validation across 1,855 games with seven frontier models (plus four expansion models) confirms predictions: bluff rates converge to within 4% of equilibrium, lambda estimates range from 0.05 to 1.10 across games and models with substantial cross-model variation, and capability profiles differ across cognitive axes. Robustness analyses reveal high sensitivity to prompt framing and version instability in QRE rankings, highlighting the need for standardized protocols.
💡 Research Summary
The paper proposes a game‑theoretic framework grounded in Quantal Response Equilibrium (QRE) to evaluate the strategic Theory of Mind (ToM) capabilities of large language models (LLMs). Recognizing that existing ToM benchmarks rely on static false‑belief tasks and aggregate scores that lack theoretical grounding, the authors design four multi‑round natural‑language games, each targeting a distinct cognitive axis: (1) Strategic Claim (recursive strategic reasoning), (2) Repeated Prisoner’s Dilemma (relational state modeling), (3) Say the Same Thing (shared conceptual grounding), and (4) Text‑Dixit (epistemic state modeling). For each game they derive closed‑form approximate equilibria, providing concrete behavioral predictions such as a bluff rate of 0.340 in Strategic Claim.
Bounded rationality is modeled with the logit QRE, where the rationality parameter λ interpolates between random play (λ → 0) and Nash equilibrium (λ → ∞). The authors estimate λ from per‑round action data using maximum‑likelihood and Bayesian posterior inference (Gamma(2,1) prior), treating each round as an independent stage‑game. Human experimental literature supplies a calibration range λ_human ∈
Comments & Academic Discussion
Loading comments...
Leave a Comment