Purely Bayesian counterfactuals versus Newcomb's paradox

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper proposes a careful separation between an entity’s epistemic system and their decision system. Crucially, Bayesian counterfactuals are estimated by the epistemic system; not by the decision system. Based on this remark, I prove the existence of Newcomb-like problems for which an epistemic system necessarily expects the entity to make a counterfactually bad decision. I then address (a slight generalization of) Newcomb’s paradox. I solve the specific case where the player believes that the predictor applies Bayes rule with a supset of all the data available to the player. I prove that the counterfactual optimality of the 1-Box strategy depends on the player’s prior on the predictor’s additional data. If these additional data are not expected to reduce sufficiently the predictor’s uncertainty on the player’s decision, then the player’s epistemic system will counterfactually prefer to 2-Box. But if the predictor’s data is believed to make them quasi-omniscient, then 1-Box will be counterfactually preferred. Implications of the analysis are then discussed. More generally, I argue that, to better understand or design an entity, it is useful to clearly separate the entity’s epistemic, decision, but also data collection, reward and maintenance systems, whether the entity is human, algorithmic or institutional.


💡 Research Summary

The paper introduces a novel analytical framework that explicitly separates an entity’s epistemic system (E) from its decision system (D). The epistemic system is assumed to be a pure Bayesian reasoner that updates beliefs and computes expectations solely from observed data and prior probabilities, while the decision system takes the epistemic output as input and selects an action. This separation challenges the common assumption in decision theory that belief formation and action selection are inseparable, especially in self‑referential scenarios such as Newcomb’s paradox.

The first major result, Theorem 1, establishes an impossibility of guaranteed counterfactual optimization under imperfect knowledge. In a setting with n possible actions, any entity that cannot know its own future decision with certainty will, with probability at least 1 – 1/n, be expected to make a counterfactually bad choice. The proof constructs a predictor Ω that shares the same data and priors as the epistemic system, computes the posterior probability of each action, and then places the highest reward in the box corresponding to the action with the smallest posterior probability (≤ 1/n). Consequently, from the epistemic perspective the chosen action is almost always counterfactually suboptimal. This theorem shows that no decision algorithm can universally satisfy the property of counterfactual optimality.

The second major contribution, Theorem 2, applies the framework to a generalized version of Newcomb’s paradox. The classic paradox assumes a predictor that perfectly knows the player’s choice. Here the predictor Ω is allowed to possess a strict superset of the player’s data. Ω applies Bayes’ rule to estimate the player’s posterior probability of choosing each box. If the additional data are sufficiently informative—making Ω effectively quasi‑omniscient—then the epistemic system assigns a higher counterfactual expected reward to the one‑box strategy, rendering it counterfactually optimal. Conversely, if the extra data do not substantially reduce the epistemic system’s uncertainty about the player’s decision, the two‑box strategy yields a higher counterfactual expectation. Thus the optimality of 1‑Box versus 2‑Box hinges on the player’s prior belief about how much more the predictor knows.

The paper also discusses why the AIXI model of an embedded agent evades Theorem 1. AIXI’s environment is defined to be independent of the agent’s policy, so the expected reward conditional on a decision does not vary with the agent’s belief about its own decision probability. In contrast, realistic embedded agents exist within environments that can observe and exploit the agent’s internal beliefs, making the impossibility result applicable.

Beyond the formal results, the authors argue that decision‑theoretic properties such as counterfactual optimality should be viewed as “properties” rather than algorithmic guarantees, akin to no‑free‑lunch theorems in learning theory. They note that Bayesianism offers many rationality criteria (Dutch‑book avoidance, logical consistency, statistical admissibility), but none uniquely characterizes a “good” decision algorithm.

Section 3 re‑examines Newcomb’s paradox under the Bayesian counterfactual lens, detailing how the predictor’s extra data and the player’s trust in the predictor jointly determine the counterfactual expected utilities of the two strategies. Section 4 explores broader implications: in real‑world contexts such as voting, policy design, or market behavior, agents often face predictors (e.g., polls, AI models) that have more information. The analysis suggests that agents’ strategic choices depend critically on their epistemic assessment of the predictor’s informational advantage.

Finally, the paper proposes a comprehensive systems‑level perspective that separates epistemic, decision, data‑collection, reward, and maintenance subsystems for any information‑processing entity—human, algorithmic, or institutional. This multi‑system view is presented as a useful tool for designing and analyzing agents, with relevance to debates on free will, embedded agency, and AI safety. The authors acknowledge limitations: the model assumes strict Bayesian rationality and does not capture human cognitive biases or complex social dynamics. Future work is suggested to incorporate richer models of predictor data, empirical validation, and extensions to non‑Bayesian epistemic frameworks.


Comments & Academic Discussion

Loading comments...

Leave a Comment