PerNodeDrop: A Method Balancing Specialized Subnets and Regularization in Deep Neural Networks

Reading time: 5 minute
...

📝 Original Info

  • Title: PerNodeDrop: A Method Balancing Specialized Subnets and Regularization in Deep Neural Networks
  • ArXiv ID: 2512.12663
  • Date: 2025-12-14
  • Authors: Gelesh G Omathil, Sreeja CS

📝 Abstract

Deep neural networks possess strong representational capacity yet remain vulnerable to overfitting, primarily because neurons tend to co-adapt in ways that, while capturing complex and fine-grained feature interactions, also reinforce spurious and non-generalizable patterns that inflate training performance but reduce reliability on unseen data. Noise-based regularizers such as Dropout and DropConnect address this issue by injecting stochastic perturbations during training, but the noise they apply is typically uniform across a layer or across a batch of samples, which can suppress both harmful and beneficial co-adaptation. This work introduces PerNodeDrop, a lightweight stochastic regularization method. It applies per-sample, per-node perturbations to break the uniformity of the noise injected by existing techniques, thereby allowing each node to experience input-specific variability. Hence, PerNodeDrop preserves useful co-adaptation while applying regularization. This narrows the gap between training and validation performance and improves reliability on unseen data, as evident from the experiments. Although superficially similar to DropConnect, PerNodeDrop operates at the sample level. It drops weights at the sample level, not the batch level. An expected-loss analysis formalizes how its perturbations attenuate excessive co-adaptation while retaining predictive interactions. Empirical evaluations on vision, text, and audio benchmarks indicate improved generalization relative to the standard noise-based regularizer.

💡 Deep Analysis

📄 Full Content

1 PerNodeDrop: A Method Balancing Specialized Subnets and Regularization in Deep Neural Networks Gelesh G. Omathil, Sreeja C. S, Abstract—Deep neural networks possess strong representa- tional capacity yet remain vulnerable to overfitting, primarily because neurons tend to co-adapt in ways that, while capturing complex and fine-grained feature interactions, also reinforce spurious and non-generalizable patterns that inflate training performance but reduce reliability on unseen data. Noise-based regularizers such as Dropout and DropConnect address this issue by injecting stochastic perturbations during training, but the noise they apply is typically uniform across a layer or across a batch of samples, which can suppress both harmful and beneficial co-adaptation. This work introduces PerNodeDrop, a lightweight stochastic regularization method. It applies per-sample, per-node per- turbations to break the uniformity of the noise injected by existing techniques, thereby allowing each node to experience input-specific variability. Hence, PerNodeDrop preserves useful co-adaptation while applying regularization. This narrows the gap between training and validation performance and improves reliability on unseen data, as evident from the experiments. Although superficially similar to DropConnect, PerNodeDrop operates at the sample level. It drops weights at the sample level, not the batch level. An expected-loss analysis formalizes how its perturbations attenuate excessive co-adaptation while retaining predictive interactions. Empirical evaluations on vision, text, and audio benchmarks indicate improved generalization relative to the standard noise-based regularizer. Index Terms—Deep neural network, overfitting, regularization, Dropout, Dropconnect, Masksembles. I. INTRODUCTION D EEP NEURAL NETWORKS’ ability to model complex patterns and relationships in data makes them susceptible to overfitting. Hence, regularization is essential to ensure gen- eralization and prevent memorization of the training data. Con- ventional regularization techniques, such as Dropout, Gaussian Dropout [1], and DropConnect [2], incorporate stochasticity by randomly deactivating the weights associated with the neurons during training. Dropout typically applies a Bernoulli random mask per in- put, reducing neuron outputs probabilistically, whereas Gaus- sian Dropout multiplies neuron outputs with Gaussian noise. Both approaches introduce randomness into the neural network Gelesh G. Omathil is a Research Scholar with the Department of Computer Science, Christ University, Bangalore, India (e-mail: gelesh.omathil@res.christuniversity.in). Sreeja C. S. is an Assistant Professor with the Department of Computer Science, Christ University, Bangalore, India (e-mail: sreeja.cs@christuniversity.in). A provisional patent related to the PerNodeDrop regularization method has been filed by Chemophilic Data Sage. activations on a per-input basis while maintaining a consistent perturbation pattern across all nodes within a layer. Drop- Connect applies a Bernoulli mask to weights, which remains uniform across batches. Ensemble-inspired methodologies, such as Masksembles [3], replicate diverse model behaviors by applying specified binary masks to subnetworks, thereby enabling ensemble-like inference within a single model. How- ever, these masks remain static during training, limiting the stochasticity of regularization. The proposed PerNodeDrop ap- proach breaks this homogeneity by incorporating per-sample, per-connection stochasticity using Bernoulli or Gaussian noise, creating a nuanced regularization layer. PerNodeDrop is a per-sample, per-node regularization layer that introduces fine-grained, structured stochastic perturba- tions, offering greater diversity than prior work. It can operate in either binary or Gaussian modes, with fixed or dynamic masks, providing a user-configurable stochastic behavior that is compatible with common hyperparameter search APIs. This design produces an implicit ensemble effect at low computa- tional cost by leveraging the dynamic subnetworks explored during training. The method was evaluated across image, text, and audio tasks, and showed stable training, reduced validation loss, and indications of improved resistance to overfitting. II. RELATED WORK Overfitting is common in deep neural networks, prompting the study and development of approaches that add a regu- larization layer to the network composition. These methods induce stochasticity at multiple levels, including neurons, connections, and structural features, by using mechanisms such as dropout or Gaussian perturbations. The applicability of certain strategies is uniform across an entire layer of nodes, or across a batch of inputs, resulting in consistent noise being applied to the target layer or the target batch of inputs. The theoretical basis for noise-based regularization was established by Bishop [5], who showed that training with ad- ditive noise is mathematica

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut