Shortcut Features as Top Eigenfunctions of NTK: A Linear Neural Network Case and More
One of the chronic problems of deep-learning models is shortcut learning. In a case where the majority of training data are dominated by a certain feature, neural networks prefer to learn such a feature even if the feature is not generalizable outside the training set. Based on the framework of Neural Tangent Kernel (NTK), we analyzed the case of linear neural networks to derive some important properties of shortcut learning. We defined a feature of a neural network as an eigenfunction of NTK. Then, we found that shortcut features correspond to features with larger eigenvalues when the shortcuts stem from the imbalanced number of samples in the clustered distribution. We also showed that the features with larger eigenvalues still have a large influence on the neural network output even after training, due to data variances in the clusters. Such a preference for certain features remains even when a margin of a neural network output is controlled, which shows that the max-margin bias is not the only major reason for shortcut learning. These properties of linear neural networks are empirically extended for more complex neural networks as a two-layer fully-connected ReLU network and a ResNet-18.
💡 Research Summary
This paper investigates shortcut learning—where neural networks rely on spurious, non‑generalizable features that dominate the training data—through the lens of Neural Tangent Kernel (NTK) theory. The authors first formalize a “feature” as an eigenfunction of the NTK and then study a simple linear neural network trained on a Gaussian mixture model (GMM) in which one set of clusters (the “biased” clusters) contains far more samples than the complementary clusters.
In the infinite‑width limit, the NTK becomes a fixed kernel K₀, and its eigendecomposition K₀ = Σ_i λ_i v_i v_i^⊤ determines the dynamics of gradient descent: the component of the error along eigenvector v_i decays as e^{‑ηλ_i t}. Consequently, directions with larger eigenvalues converge faster—a phenomenon known as spectral bias. Proposition 3.1 shows that for the inner‑product kernel k(x, y)=⟨x, y⟩, the eigenfunctions are linear projections onto the eigenvectors of the weighted covariance Σ_k π_k μ_k μ_k^⊤. Because the biased clusters have larger mixture weights π_B, the eigenvectors aligned with their means acquire larger eigenvalues. Thus, shortcut features correspond to NTK eigenfunctions with the largest λ_i and are learned more quickly.
Proposition 3.2 examines the network after convergence. For a linear model trained with mean‑squared error, the output can be expressed as f(x)=Σ_k w_k (x^⊤ v_k), where each weight w_k is proportional to the mixture weight π_k and the squared norm of the corresponding cluster mean ‖μ_k‖². Hence, eigenfunctions associated with larger clusters not only converge faster but also exert a stronger influence on the final decision boundary. The authors argue that this dual effect explains why models continue to rely on shortcuts even when the training loss is near zero on unbiased samples.
To test whether controlling the margin eliminates this bias, the authors incorporate debiasing methods (SD and Marg‑Ctrl) that explicitly penalize large margins. Theoretical analysis shows that, because the NTK spectrum remains unchanged, the decision boundary can still be dominated by high‑eigenvalue shortcut features. Therefore, max‑margin bias is not the sole driver of shortcut learning.
Empirical validation extends the analysis beyond linear models. The authors train a two‑layer fully‑connected ReLU network and a ResNet‑18 on several real‑world biased datasets: Patched‑MNIST, Colored‑MNIST, Waterbirds, CelebA, and Dogs‑Cats. They introduce two metrics: predictability (how well a feature alone predicts the label) and availability (the alignment of a feature with top NTK eigenfunctions). Across datasets, shortcut features exhibit low predictability but high availability, confirming the theoretical predictions. Saliency maps derived from the eigenfunction decomposition reveal that high‑eigenvalue features focus on the spurious attributes (e.g., patches, background colors, facial edges) rather than the core semantic content.
The paper concludes that shortcut learning can be understood as a consequence of the NTK spectrum: biased clusters generate high‑eigenvalue eigenfunctions that are both learned rapidly and retain strong influence after training. This insight suggests new avenues for mitigating shortcuts, such as designing data distributions or architectures that flatten the NTK spectrum, or explicitly regularizing high‑eigenvalue components. The work bridges kernel theory with practical deep‑learning phenomena, offering a principled explanation for why neural networks gravitate toward non‑generalizable cues.
Comments & Academic Discussion
Loading comments...
Leave a Comment