TruKAN: Towards More Efficient Kolmogorov-Arnold Networks Using Truncated Power Functions
To address the trade-off between computational efficiency and adherence to Kolmogorov-Arnold Network (KAN) principles, we propose TruKAN, a new architecture based on the KAN structure and learnable activation functions. TruKAN replaces the B-spline basis in KAN with a family of truncated power functions derived from k-order spline theory. This change maintains the KAN’s expressiveness while enhancing accuracy and training time. Each TruKAN layer combines a truncated power term with a polynomial term and employs either shared or individual knots. TruKAN exhibits greater interpretability than other KAN variants due to its simplified basis functions and knot configurations. By prioritizing interpretable basis functions, TruKAN aims to balance approximation efficacy with transparency. We develop the TruKAN model and integrate it into an advanced EfficientNet-V2-based framework, which is then evaluated on computer vision benchmark datasets. To ensure a fair comparison, we develop various models: MLP-, KAN-, SineKAN and TruKAN-based EfficientNet frameworks and assess their training time and accuracy across small and deep architectures. The training phase uses hybrid optimization to improve convergence stability. Additionally, we investigate layer normalization techniques for all the models and assess the impact of shared versus individual knots in TruKAN. Overall, TruKAN outperforms other KAN models in terms of accuracy, computational efficiency and memory usage on the complex vision task, demonstrating advantages beyond the limited settings explored in prior KAN studies.
💡 Research Summary
TruKAN introduces a novel variant of Kolmogorov‑Arnold Networks (KANs) that replaces the traditional B‑spline basis with k‑order truncated power functions (TPFs). The authors argue that B‑splines, while smooth and locally supported, require the recursive de Boor‑Cox algorithm, which is computationally heavy and memory‑intensive, especially in deep or high‑dimensional settings. By using TPFs of the form ((x-\tau)_+^k), where (\tau) denotes a knot and (k) the polynomial degree, TruKAN eliminates the need for recursive evaluation and reduces each edge‑wise activation to a simple piecewise polynomial that can be computed with basic tensor operations.
Each TruKAN layer combines two components: (1) a truncated‑power term that provides localized non‑linearity, and (2) a low‑order polynomial term (typically constant or linear) that captures a global trend. This hybrid design mitigates the risk of gradient explosion or vanishing that can arise from the exponential growth/decay of pure TPFs, while preserving the expressive power of the original KAN formulation.
The paper proposes two knot‑management strategies. In the “shared‑knot” variant, all output channels share a common set of equally spaced knots, drastically reducing the number of learnable parameters and enhancing interpretability. In the “individual‑knot” variant, each output channel learns its own knot positions, subject to positivity, ordering, and minimum‑spacing constraints to avoid pathological configurations. The individual‑knot version offers greater local adaptability at the cost of a modest increase in parameter count.
Training employs a “Hybrid Optimization” schedule: an initial phase with Adam for rapid convergence, followed by a second phase using SGD (or LAMB) with weight decay to improve generalization. Layer Normalization and an extensive data‑augmentation pipeline are applied uniformly across all model variants to ensure fair comparisons.
For empirical evaluation, the authors embed four classifier heads—MLP, classic B‑spline‑based KAN, SineKAN, and the proposed TruKAN—into an EfficientNet‑V2 backbone. Both a “small” configuration (~5 M parameters) and a “deep” configuration (~20 M parameters) are tested on CIFAR‑10, CIFAR‑100, and Tiny‑ImageNet. Results show that TruKAN consistently achieves the highest top‑1 accuracy among the four heads, with improvements ranging from 1.2 % to 2.0 % over the best baseline. Training time is reduced by an average factor of 3.2× compared to the B‑spline KAN, and memory consumption drops by more than 40 % in the shared‑knot setting (≈30 % in the individual‑knot setting). The shared‑knot version, while slightly less accurate, offers the best efficiency‑interpretability trade‑off.
The authors acknowledge that TPFs have weaker global support than B‑splines, which can limit approximation of sharp discontinuities. However, they argue that the hierarchical composition of multiple TruKAN layers compensates for this limitation, as each layer refines the representation locally. To prevent coefficient blow‑up, they employ coefficient clipping, L2 regularization, and learning‑rate scheduling, which together stabilize training.
Critical observations include: (1) the paper lacks a systematic sensitivity analysis of the polynomial degree (k) and the number of knots, leaving practitioners without clear guidance on hyper‑parameter selection; (2) comparisons with other recent KAN accelerations such as FastKAN, EfficientKAN, or PowerMLP are absent, making it difficult to assess relative merits; (3) the evaluation is confined to vision tasks, so generalization to time‑series, NLP, or graph domains remains unverified; (4) potential “knot collapse” (multiple knots converging to the same location) is mentioned but not rigorously addressed with dedicated regularizers.
Despite these gaps, TruKAN makes a compelling contribution by demonstrating that a simple, analytically tractable basis can retain KAN’s interpretability while dramatically improving computational efficiency. The work opens a pathway for deploying KAN‑style models in real‑world, large‑scale applications where the original B‑spline formulation would be prohibitive. Future research directions suggested include extending TruKAN to other data modalities, automating knot‑placement via meta‑learning, and exploring hardware‑specific optimizations (e.g., FPGA or ASIC implementations) to further exploit the reduced computational footprint.
Comments & Academic Discussion
Loading comments...
Leave a Comment