GAMformer: Bridging Tabular Foundation Models and Interpretable Machine Learning
While interpretability is crucial for machine learning applications in safety-critical domains and for regulatory compliance, existing tabular foundation models like TabPFN lack transparency. Generalized Additive Models (GAMs) provide the needed interpretability through their additive structure, but traditional GAM methods rely on iterative learning algorithms (such as splines, boosted trees, or neural networks) that are fundamentally incompatible with the in-context learning paradigm of foundation models. In this paper, we introduce GAMformer, the first tabular foundation model for GAMs that bridges the gap between the power of foundation models and the interpretability requirements of critical real-world applications. GAMformer estimates GAM shape functions in a single forward pass using in-context learning, representing a significant departure from conventional iterative approaches. Building on previous research on tabular foundation models, we train GAMformer exclusively on synthetically generated tables to prevent data leakage. Our experiments demonstrate that GAMformer performs comparably to other leading GAMs across various classification benchmarks.
💡 Research Summary
GAMformer introduces a novel approach that merges the predictive power of tabular foundation models with the interpretability of Generalized Additive Models (GAMs). Traditional GAMs—implemented via splines, boosted trees (EBMs), or neural additive models (NAMs)—require iterative fitting, hyper‑parameter tuning, and often struggle to capture abrupt discontinuities. In contrast, modern tabular foundation models such as TabPFN achieve high accuracy through large‑scale pre‑training on synthetic data but operate as opaque black boxes. GAMformer bridges this gap by training a transformer exclusively on synthetic tabular datasets generated from two priors: random structural causal graphs and Gaussian processes. The model learns to infer per‑feature shape functions directly from in‑context examples, eliminating the need for any gradient‑based fitting at inference time.
The pipeline consists of two steps. First, during a single forward pass, the transformer receives a context consisting of training feature vectors and their labels. All features are quantized into 64 bins, one‑hot encoded, and embedded via a small MLP; labels are similarly embedded and added to each feature embedding. The transformer alternates column‑wise attention (capturing interactions among features within a data point) and row‑wise attention (capturing interactions across data points for a given feature). This bi‑attention design ensures permutation equivariance with respect to both feature order and sample order, allowing the model to handle variable‑sized tables without padding. After the attention layers, class‑wise average embeddings are computed, yielding a tensor of shape (features × embedding_dim × classes). A shared decoder MLP maps each feature‑class embedding to a 64‑dimensional vector that represents the discretized shape function for that feature and class.
Second, for a test instance, each feature value is placed into its corresponding bin, and the pre‑computed shape values are summed across features (and classes, if multi‑class) to produce the final prediction. This operation mirrors the additive structure of a GAM: g(ĥy) = ∑ₖ fₖ(xₖ). Because the shape functions are already estimated during the forward pass, no iterative optimization is required, dramatically reducing inference latency and removing the need for regularization hyper‑parameters.
The authors evaluate GAMformer on several public classification benchmarks (e.g., Adult, Credit, Higgs) and a real‑world medical dataset (MIMIC‑III). In terms of accuracy, GAMformer matches or slightly exceeds leading GAM implementations such as EBMs, NAMs, and even XGBoost when pairwise interaction terms are incorporated. Importantly, the discretized shape functions enable clear visualisation of non‑linear effects and sudden jumps—illustrated with treatment‑effect discontinuities in the MIMIC‑III case study. A carbon‑footprint analysis shows that, after amortising the one‑time pre‑training cost, GAMformer’s inference is more energy‑efficient than EBMs, especially when the model is reused across many tasks.
Limitations include the fixed 64‑bin discretisation, which may miss very fine‑grained smooth variations, and the current restriction to up to ten classes in multi‑class settings. The pre‑training phase demands substantial synthetic data generation and GPU resources, representing a high upfront cost. Future work could explore adaptive binning, extensions to regression and multi‑label problems, and hybrid priors that blend synthetic and real data to improve downstream transfer.
In summary, GAMformer is the first tabular foundation model that delivers true GAM‑style interpretability without sacrificing the scalability and accuracy of modern transformer‑based approaches. It offers a promising pathway toward transparent, trustworthy AI systems for safety‑critical domains such as healthcare, finance, and hiring.
Comments & Academic Discussion
Loading comments...
Leave a Comment