Variational Distributional Neuron
We propose a proof of concept for a variational distributional neuron: a compute unit formulated as a VAE brick, explicitly carrying a prior, an amortized posterior and a local ELBO. The unit is no longer a deterministic scalar but a distribution: computing is no longer about propagating values, but about contracting a continuous space of possibilities under constraints. Each neuron parameterizes a posterior, propagates a reparameterized sample and is regularized by the KL term of a local ELBO - hence, the activation is distributional. This “contraction” becomes testable through local constraints and can be monitored via internal measures. The amount of contextual information carried by the unit, as well as the temporal persistence of this information, are locally tuned by distinct constraints. This proposal addresses a structural tension: in sequential generation, causality is predominantly organized in the symbolic space and, even when latents exist, they often remain auxiliary, while the effective dynamics are carried by a largely deterministic decoder. In parallel, probabilistic latent models capture factors of variation and uncertainty, but that uncertainty typically remains borne by global or parametric mechanisms, while units continue to propagate scalars - hence the pivot question: if uncertainty is intrinsic to computation, why does the compute unit not carry it explicitly? We therefore draw two axes: (i) the composition of probabilistic constraints, which must be made stable, interpretable and controllable; and (ii) granularity: if inference is a negotiation of distributions under constraints, should the primitive unit remain deterministic or become distributional? We analyze “collapse” modes and the conditions for a “living neuron”, then extend the contribution over time via autoregressive priors over the latent, per unit.
💡 Research Summary
The paper introduces a novel computational primitive called the Variational Distributional Neuron (EVE), which turns each neural unit into a tiny variational auto‑encoder (VAE). Traditional deep networks propagate deterministic scalar activations, while uncertainty is relegated to global latent variables or Bayesian weight distributions. This separation limits observability and control of internal stochasticity. EVE addresses the gap by giving every neuron its own latent variable z, a prior p(z) (standard normal), an amortized posterior q(z|h) parameterized by μ(h) and σ(h), and a local evidence lower bound (ELBO) that is optimized jointly with the task loss. The neuron samples z via the reparameterization trick, emits an activation a through a decoder p(a|z,h), and incurs a KL regularization term that acts as a “spring” pulling z towards the prior.
Key design choices include fixing the latent dimensionality to k = 1, ensuring that any performance gain stems from the variational mechanism rather than increased capacity. The loss for a layer of N neurons is L = L_task + β · KL_mean, where KL_mean = (1/N)∑_i KL(q_i‖p) keeps the regularization scale stable as the network widens. To prevent posterior collapse or uncontrolled variance, the authors introduce a suite of per‑neuron diagnostics and controls—effective KL, a μ² “band” that limits the squared mean, out‑of‑band fraction, and drift indicators. These metrics form a dashboard that can be monitored during training and used to enforce anti‑collapse constraints (e.g., hard projection of μ into the band, KL clipping, β scheduling). This “AutoPilot” mechanism enables heterogeneous neurons: some with tight bands (low contextual capacity) and others with looser bands (higher capacity), effectively giving each unit a tunable memory or inertia.
Temporal dynamics are added by equipping each neuron with an autoregressive (AR) prior: z_t = α z_{t‑1} + ε_t, where α is learned per neuron. This creates micro‑level dynamics that can be fast (α≈0) or slow (α≈1), offering a principled way to embed local memory without a global state‑space model. Experiments compare i.i.d. priors to AR priors, showing that AR dynamics improve long‑term stability of the internal metrics and yield comparable or better downstream performance on sequence prediction tasks.
The authors validate their proposal through a “Living Neuron” test (measuring the proportion of neurons that maintain non‑collapsed KL and μ² values), ablations of each control component, and performance benchmarks against deterministic baselines. Results demonstrate that EVE‑based networks achieve similar or higher reconstruction accuracy while providing interpretable internal signals and resisting collapse even in wide architectures.
In summary, the contributions are threefold: (1) formalizing an atomic variational inference block as a neuron, (2) introducing a comprehensive set of per‑neuron diagnostics and control knobs that make stochastic computation observable and steerable, and (3) extending the concept temporally with per‑neuron AR dynamics, allowing a systematic study of micro versus macro stochastic processes. The work opens a new direction where uncertainty is not an external add‑on but an intrinsic property of the compute unit itself, paving the way for more transparent, controllable, and potentially more robust deep learning systems. Future work will need to explore higher‑dimensional latent spaces, hybrid architectures that combine global and local latents, and real‑world applications such as robotics or medical time‑series analysis.
Comments & Academic Discussion
Loading comments...
Leave a Comment