Bayesian Additive Regression Trees for functional ANOVA model
Bayesian Additive Regression Trees (BART) is a powerful statistical model that leverages the strengths of Bayesian inference and regression trees. It has received significant attention for capturing complex non-linear relationships and interactions among predictors. However, the accuracy of BART often comes at the cost of interpretability. To address this limitation, we propose ANOVA Bayesian Additive Regression Trees (ANOVA-BART), a novel extension of BART based on the functional ANOVA decomposition, which is used to decompose the variability of a function into different interactions, each representing the contribution of a different set of covariates or factors. Our proposed ANOVA-BART enhances interpretability, preserves and extends the theoretical guarantees of BART, and achieves comparable prediction performance. Specifically, we establish that the posterior concentration rate of ANOVA-BART is nearly minimax optimal, and further provides the same convergence rates for each interaction that are not available for BART. Moreover, comprehensive experiments confirm that ANOVA-BART is comparable to BART in both accuracy and uncertainty quantification, while also demonstrating its effectiveness in component selection. These results suggest that ANOVA-BART offers a compelling alternative to BART by balancing predictive accuracy, interpretability, and theoretical consistency.
💡 Research Summary
This paper addresses a key limitation of Bayesian Additive Regression Trees (BART): while BART delivers state‑of‑the‑art predictive performance, its ensemble of decision trees is notoriously difficult to interpret because the contribution of each covariate is entangled across many trees. To remedy this, the authors introduce ANOVA‑BART, a novel Bayesian model that embeds the functional ANOVA decomposition directly into the BART framework.
The functional ANOVA representation states that any multivariate regression function f(x) can be uniquely expressed as a sum of component functions f_S(x_S) over all subsets S⊆{1,…,p}, provided each component satisfies a μ‑identifiability condition (zero mean with respect to the marginal distribution of each variable). The components of order |S| correspond to main effects (|S|=1), pairwise interactions (|S|=2), and higher‑order interactions. Existing BART approximates f(x) by a sum of T independent trees, but it does not respect this decomposition, making it a black‑box.
ANOVA‑BART enforces the decomposition by approximating each component f_S with an “identifiable binary‑product tree”. Such a tree has a very specific structure: all leaves share the same depth, and all nodes at a given depth use the same split rule across variables in S. The tree output is β·∏{j∈S} I(x_j≤s_j) + Σ{j∈S} a_j·I(x_j>s_j), where the coefficients a_j are deterministic functions of the empirical marginal distribution of X_j and the split point s_j. By construction this form satisfies the μ‑identifiability condition, guaranteeing that each tree captures only the intended interaction term.
The full model is
f(x)=∑_{t=1}^T T(x;S_t,s_t,β_t),
with priors placed on (i) the number of trees T (exponential penalty e^{−C* T log n}), (ii) the subset size |S_t| (a decreasing mixture controlled by α_split and γ_split), (iii) the split points s_t (uniform over mid‑points of ordered covariate values), (iv) the leaf height β_t (diffuse Gaussian), and (v) the noise variance σ² (inverse‑Gamma). This prior design mirrors BART’s sparsity‑inducing behavior while explicitly encouraging low‑order interactions unless data evidence supports higher orders.
Posterior inference is performed via a Gibbs‑Metropolis algorithm that cycles through three steps: (1) propose adding or deleting a tree (T→T±1) using a simple birth‑death MH step; (2) update each tree’s (S_t,s_t,β_t) conditional on the rest of the ensemble; (3) update σ² from its conjugate inverse‑Gamma full conditional. The proposal for S_t is a mixture of (a) a “Random” draw from the prior and (b) a “Stepwise” move that augments an existing tree’s variable set by one new covariate, thereby biasing the chain toward exploring higher‑order interactions efficiently. Because the leaf‑height dimension is fixed, no reversible‑jump machinery is required, simplifying implementation relative to generalized BART extensions.
Theoretical contributions are substantial. The authors prove that the posterior contracts around the true regression function at a rate that is nearly minimax optimal over Hölder classes, and crucially, that each component f_S contracts at the same rate. This component‑wise guarantee is unavailable for standard BART, which only provides a global rate. The proof leverages the identifiability of the binary‑product trees and the prior’s ability to allocate sufficient mass to neighborhoods of the true components.
Empirical evaluation comprises extensive simulations and several benchmark datasets (including UCI regression tasks and a medical survival example). Across all settings, ANOVA‑BART matches BART in root‑mean‑square error and predictive interval coverage, confirming that interpretability does not sacrifice accuracy. Moreover, the model excels at component selection: it reliably recovers the true set of active main effects and interactions, offering clear, interpretable summaries of which covariates drive the response.
In summary, ANOVA‑BART delivers a principled, Bayesian, and interpretable alternative to BART. By embedding the functional ANOVA decomposition into the tree ensemble, it preserves BART’s predictive strength, attains (near‑)minimax posterior rates for each interaction term, and provides a transparent decomposition of the fitted function. The method is computationally tractable, requires only modest modifications to existing BART codebases, and opens the door to interpretable non‑linear modeling in domains where understanding variable interactions is as critical as accurate prediction.
Comments & Academic Discussion
Loading comments...
Leave a Comment