Counting Defiers: A Design-Based Model of an Experiment Can Reveal Evidence Beyond the Average Effect
Using only a binary intervention and outcome and the design of the randomization within an experiment, we construct a design-based likelihood of the joint distribution of potential outcomes in the sample – the numbers of always takers, compliers, defiers, and never takers. We develop a visualization to show that samples with defiers can sometimes generate the data in more ways than samples without, yielding a higher likelihood. This likelihood can vary within the Frechet bounds, even though the traditional likelihood does not. Evidence is weak, but it exists, as we illustrate with health applications and our dbmle package.
💡 Research Summary
The paper introduces a novel “design‑based” framework for causal inference that goes beyond the usual focus on average treatment effects (ATE) and Fréchet bounds. Using only a binary treatment, a binary outcome, and the randomization design of an experiment, the authors construct a likelihood function for the joint distribution of potential outcomes in the observed sample. This joint distribution is fully characterized by the counts of four latent types—always‑takers (Y¹=1, Y⁰=1), compliers (Y¹=1, Y⁰=0), defiers (Y¹=0, Y⁰=1), and never‑takers (Y¹=0, Y⁰=0).
The key insight is that, given a fixed randomization scheme (e.g., a completely randomized design with a predetermined number of treated units), each possible joint distribution implies a specific set of random assignments that could have generated the observed data (the numbers of successes in treatment and control arms). By counting how many assignments are compatible with a given joint distribution, the authors obtain a “design‑based likelihood” proportional to this count divided by the total number of admissible assignments. In other words, the likelihood is the entropy of the data‑generating process under that joint distribution.
The paper illustrates the idea with a stylized six‑patient smoking‑cessation experiment. The observed marginal frequencies are 2/3 successes in treatment and 1/3 in control. Three representative joint distributions are examined: (1) a monotonicity‑consistent distribution with no defiers, (2) a distribution with one defier (the “mid‑point” of the Fréchet interval), and (3) a distribution with two defiers and four compliers. By enumerating the 20 possible treatment assignments (choose 3 of 6), the authors find that the first distribution is compatible with 8 assignments (40 % likelihood), the second with 6 assignments (30 %), and the third with 12 assignments (60 %). Hence, the third distribution maximizes the design‑based likelihood, suggesting that the data are most consistent with a sample containing two defiers, contrary to the usual monotonicity assumption.
From this example the authors draw several general lessons:
- Even when the ATE and Fréchet bounds are identical across competing joint distributions, the design‑based likelihood can discriminate among them, revealing evidence about heterogeneity (especially the presence of defiers).
- Likelihood is higher when the sample can be described by fewer latent types and when each type is balanced across treatment and control arms. This “type‑balance” principle explains why the third distribution (only compliers and defiers) attains the highest likelihood.
- The monotonicity assumption (no defiers) is not a logical necessity; it can be empirically challenged using the proposed method.
Methodologically, the authors develop a maximum design‑based likelihood estimator (DBMLE) that searches over all feasible (non‑negative integer) counts of the four types subject to the observed marginal constraints. They implement the estimator in an R package called dbmle, which takes as input the sample size, treatment allocation, and observed treatment‑ and control‑group success counts, and returns the likelihood surface, the MLE joint distribution, and visualizations analogous to Figure 1 in the paper.
The framework is applied to two real‑world health experiments (details are in the paper’s Section 3). In both cases, the DBMLE identifies joint distributions with non‑zero defier counts that have higher likelihood than the monotonicity‑consistent alternatives, even though traditional analyses would deem the data compatible with a wide Fréchet interval (e.g., 0–2 defiers). The authors argue that this extra information can be crucial for policy decisions: knowing that a subset of individuals would react oppositely to the intervention (defiers) may alter the ethical calculus of implementing the program.
Limitations are acknowledged. The approach relies on the “design‑based” assumption that potential outcomes are fixed (non‑stochastic) and that randomization is the only source of uncertainty. With small samples, the likelihood differences can be driven by random variation, and confidence intervals for the type counts are not provided in the current version of the software. Moreover, the method is restricted to binary treatments and outcomes; extensions to multi‑level or continuous settings would require substantial methodological development.
In conclusion, the paper offers a compelling addition to the causal inference toolbox. By integrating the randomization design with a likelihood over latent type counts, researchers can extract evidence about treatment effect heterogeneity—particularly the existence of defiers—that is invisible to standard ATE or Fréchet‑bound analyses. This design‑based likelihood approach provides a principled way to move from “on average the program works” to “who exactly benefits or is harmed,” thereby informing more nuanced and ethically aware policy decisions.
Comments & Academic Discussion
Loading comments...
Leave a Comment