Closing the gap on tabular data with Fourier and Implicit Categorical Features

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

While Deep Learning has demonstrated impressive results in applications on various data types, it continues to lag behind tree-based methods when applied to tabular data, often referred to as the last “unconquered castle” for neural networks. We hypothesize that a significant advantage of tree-based methods lies in their intrinsic capability to model and exploit non-linear interactions induced by features with categorical characteristics. In contrast, neural-based methods exhibit biases toward uniform numerical processing of features and smooth solutions, making it challenging for them to effectively leverage such patterns. We address this performance gap by using statistical-based feature processing techniques to identify features that are strongly correlated with the target once discretized. We further mitigate the bias of deep models for overly-smooth solutions, a bias that does not align with the inherent properties of the data, using Learned Fourier. We show that our proposed feature preprocessing significantly boosts the performance of deep learning models and enables them to achieve a performance that closely matches or surpasses XGBoost on a comprehensive tabular data benchmark.

💡 Research Summary

The paper tackles the long‑standing performance gap between deep learning (DL) models and tree‑based ensembles such as XGBoost on tabular data. The authors argue that two key factors explain why DL lags behind: (1) many numerical features in tabular datasets are actually categorical or “implicitly categorical” – they have low cardinality and exhibit strong target correlation when discretized, yet standard DL pipelines treat them as continuous values, thereby missing sharp, non‑smooth interactions; (2) neural networks have an inherent bias toward learning smooth functions, which is ill‑suited for the highly irregular decision boundaries often present in tabular problems.

To address these issues, the authors propose a two‑stage preprocessing pipeline. First, Categorical Feature Detection (CFD) uses simple statistical tests on the training data to flag implicitly categorical features. For classification tasks, a chi‑square test is applied after binning each numeric column; features with p‑values below a configurable threshold (or with very low cardinality) are marked as categorical. For regression, one‑way ANOVA and a Mutual‑Information‑ratio test are employed, selecting features whose categorical encoding yields a significantly higher mutual information with the target than the raw numeric version. Identified features are then binned and one‑hot encoded, while the original numeric value is retained in the first channel, producing a multi‑channel representation suitable for downstream models.

Second, the authors adapt Learned Fourier Features (LFF) to overcome the smoothness bias. Two variants are introduced: Conv1x1LFF, which shares parameters across features via a 1‑D convolution with kernel size 1, and LinearLFF, which uses a learned linear projection. Both map the input matrix Z to a higher‑dimensional space by concatenating cosine and sine of π Z, enabling the network to learn high‑frequency components and thus model discontinuous relationships.

The pipeline is evaluated on two backbone architectures: a rotation‑invariant multilayer perceptron (MLP) and a 1‑D residual network (ResNet) that processes the multi‑channel input with depth‑wise convolutions. The ResNet uses a kernel size proportional to a fraction ϕ of the total feature count and stacks residual blocks with batch normalization, dropout, and ReLU. Both backbones are trained under the same experimental protocol as Grinsztajn et al. (2022), covering 68 binary‑classification, multi‑class, and regression tasks drawn from a public benchmark. Hyper‑parameter search is performed via extensive random search, with CFD and LFF toggles treated as hyper‑parameters.

Results show that CFD alone yields substantial gains, especially on datasets where low‑cardinality features have strong target dependence; performance spikes of up to 7 percentage points in accuracy are reported. LFF alone improves smoothness‑biased models, reducing RMSE by ~4 % on regression tasks and increasing AUC by 0.02–0.03 on classification. The combination (ResNet+F|C) outperforms XGBoost on 55 of the 68 datasets and matches it within 0.5 % on the remaining ones. MLP+F|C narrows the gap considerably, though it still trails on a few noisy datasets due to its rotation‑invariant nature.

The study demonstrates that simple statistical detection of implicitly categorical features together with Fourier‑based embeddings can bring deep models to parity with, and sometimes surpass, state‑of‑the‑art tree ensembles on tabular data, without resorting to massive pre‑training or complex architectures. It also highlights that overlooking the categorical nature of numeric columns is a major source of DL underperformance, suggesting future work on automated feature‑type inference and more expressive periodic embeddings.

Closing the gap on tabular data with Fourier and Implicit Categorical Features

💡 Research Summary

Comments & Academic Discussion

Leave a Comment