📝 Original Info
- Title: PerNodeDrop: A Method Balancing Specialized Subnets and Regularization in Deep Neural Networks
- ArXiv ID: 2512.12663
- Date: 2025-12-14
- Authors: Gelesh G Omathil, Sreeja CS
📝 Abstract
Deep neural networks possess strong representational capacity yet remain vulnerable to overfitting, primarily because neurons tend to co-adapt in ways that, while capturing complex and fine-grained feature interactions, also reinforce spurious and non-generalizable patterns that inflate training performance but reduce reliability on unseen data. Noise-based regularizers such as Dropout and DropConnect address this issue by injecting stochastic perturbations during training, but the noise they apply is typically uniform across a layer or across a batch of samples, which can suppress both harmful and beneficial co-adaptation.
This work introduces PerNodeDrop, a lightweight stochastic regularization method. It applies per-sample, per-node perturbations to break the uniformity of the noise injected by existing techniques, thereby allowing each node to experience input-specific variability. Hence, PerNodeDrop preserves useful co-adaptation while applying regularization. This narrows the gap between training and validation performance and improves reliability on unseen data, as evident from the experiments.
Although superficially similar to DropConnect, PerNodeDrop operates at the sample level. It drops weights at the sample level, not the batch level. An expected-loss analysis formalizes how its perturbations attenuate excessive co-adaptation while retaining predictive interactions. Empirical evaluations on vision, text, and audio benchmarks indicate improved generalization relative to the standard noise-based regularizer.
💡 Deep Analysis
📄 Full Content
1
PerNodeDrop: A Method Balancing Specialized
Subnets and Regularization in Deep Neural
Networks
Gelesh G. Omathil, Sreeja C. S,
Abstract—Deep neural networks possess strong representa-
tional capacity yet remain vulnerable to overfitting, primarily
because neurons tend to co-adapt in ways that, while capturing
complex and fine-grained feature interactions, also reinforce
spurious and non-generalizable patterns that inflate training
performance but reduce reliability on unseen data. Noise-based
regularizers such as Dropout and DropConnect address this issue
by injecting stochastic perturbations during training, but the
noise they apply is typically uniform across a layer or across a
batch of samples, which can suppress both harmful and beneficial
co-adaptation.
This work introduces PerNodeDrop, a lightweight stochastic
regularization method. It applies per-sample, per-node per-
turbations to break the uniformity of the noise injected by
existing techniques, thereby allowing each node to experience
input-specific variability. Hence, PerNodeDrop preserves useful
co-adaptation while applying regularization. This narrows the
gap between training and validation performance and improves
reliability on unseen data, as evident from the experiments.
Although superficially similar to DropConnect, PerNodeDrop
operates at the sample level. It drops weights at the sample level,
not the batch level. An expected-loss analysis formalizes how its
perturbations attenuate excessive co-adaptation while retaining
predictive interactions. Empirical evaluations on vision, text, and
audio benchmarks indicate improved generalization relative to
the standard noise-based regularizer.
Index Terms—Deep neural network, overfitting, regularization,
Dropout, Dropconnect, Masksembles.
I. INTRODUCTION
D
EEP NEURAL NETWORKS’ ability to model complex
patterns and relationships in data makes them susceptible
to overfitting. Hence, regularization is essential to ensure gen-
eralization and prevent memorization of the training data. Con-
ventional regularization techniques, such as Dropout, Gaussian
Dropout [1], and DropConnect [2], incorporate stochasticity by
randomly deactivating the weights associated with the neurons
during training.
Dropout typically applies a Bernoulli random mask per in-
put, reducing neuron outputs probabilistically, whereas Gaus-
sian Dropout multiplies neuron outputs with Gaussian noise.
Both approaches introduce randomness into the neural network
Gelesh
G.
Omathil
is
a
Research
Scholar
with
the
Department
of
Computer
Science,
Christ
University,
Bangalore,
India
(e-mail:
gelesh.omathil@res.christuniversity.in).
Sreeja
C.
S.
is
an
Assistant
Professor
with
the
Department
of
Computer
Science,
Christ
University,
Bangalore,
India
(e-mail:
sreeja.cs@christuniversity.in).
A provisional patent related to the PerNodeDrop regularization method has
been filed by Chemophilic Data Sage.
activations on a per-input basis while maintaining a consistent
perturbation pattern across all nodes within a layer. Drop-
Connect applies a Bernoulli mask to weights, which remains
uniform across batches. Ensemble-inspired methodologies,
such as Masksembles [3], replicate diverse model behaviors
by applying specified binary masks to subnetworks, thereby
enabling ensemble-like inference within a single model. How-
ever, these masks remain static during training, limiting the
stochasticity of regularization. The proposed PerNodeDrop ap-
proach breaks this homogeneity by incorporating per-sample,
per-connection stochasticity using Bernoulli or Gaussian noise,
creating a nuanced regularization layer.
PerNodeDrop is a per-sample, per-node regularization layer
that introduces fine-grained, structured stochastic perturba-
tions, offering greater diversity than prior work. It can operate
in either binary or Gaussian modes, with fixed or dynamic
masks, providing a user-configurable stochastic behavior that
is compatible with common hyperparameter search APIs. This
design produces an implicit ensemble effect at low computa-
tional cost by leveraging the dynamic subnetworks explored
during training. The method was evaluated across image, text,
and audio tasks, and showed stable training, reduced validation
loss, and indications of improved resistance to overfitting.
II. RELATED WORK
Overfitting is common in deep neural networks, prompting
the study and development of approaches that add a regu-
larization layer to the network composition. These methods
induce stochasticity at multiple levels, including neurons,
connections, and structural features, by using mechanisms
such as dropout or Gaussian perturbations. The applicability
of certain strategies is uniform across an entire layer of nodes,
or across a batch of inputs, resulting in consistent noise being
applied to the target layer or the target batch of inputs.
The theoretical basis for noise-based regularization was
established by Bishop [5], who showed that training with ad-
ditive noise is mathematica
Reference
This content is AI-processed based on open access ArXiv data.