Faster Predictive Coding Networks via Better Initialization
Research aimed at scaling up neuroscience inspired learning algorithms for neural networks is accelerating. Recently, a key research area has been the study of energy-based learning algorithms such as predictive coding, due to their versatility and mathematical grounding. However, the applicability of such methods is held back by the large computational requirements caused by their iterative nature. In this work, we address this problem by showing that the choice of initialization of the neurons in a predictive coding network matters significantly and can notably reduce the required training times. Consequently, we propose a new initialization technique for predictive coding networks that aims to preserve the iterative progress made on previous training samples. Our approach suggests a promising path toward reconciling the disparities between predictive coding and backpropagation in terms of computational efficiency and final performance. In fact, our experiments demonstrate substantial improvements in convergence speed and final test loss in both supervised and unsupervised settings.
💡 Research Summary
The paper tackles the long‑standing efficiency problem of Predictive Coding Networks (PCNs), which are energy‑based models that learn by alternating an inference phase (minimizing an energy function) and a weight‑update phase. Because inference requires many iterative updates—typically T≈5L steps for a network with L layers—the computational cost, measured in Sequential Matrix Multiplications (SMM), is SMM_PC = 2T, far exceeding the back‑propagation (BP) cost of SMM_BP = 2L‑1. The authors argue that the choice of neuron initialization critically influences how quickly the inference dynamics converge, yet prior work has never systematically compared initialization schemes.
Four existing initialization strategies are identified: (1) Random (I_N), sampling each layer from a fixed distribution; (2) Zero (I_0), setting all activations to zero; (3) Null (I_∅), preserving the hidden state from the previous minibatch; and (4) Forward (I_fw), performing a forward pass and initializing each hidden layer with its predicted value μ_l. Empirical tests on a 5‑layer fully‑connected FashionMNIST classifier show that I_fw yields the fastest convergence for low inference steps, while I_∅ outperforms random and zero initializations. However, I_fw has two major drawbacks: (a) it requires at least L inference steps to propagate error back to the input, giving a lower bound SMM_I_fw ≥ 2L, and (b) it adds an extra L sequential multiplications for the forward sweep, resulting in total SMM_I_fw ≤ 3L, still higher than BP. Moreover, I_fw cannot be applied to PCNs with cyclic connections or generative architectures.
To overcome these limitations, the authors propose two complementary ideas. First, “stream‑aligned training” reorganizes minibatches so that consecutive samples belong to the same class, thereby reducing the variance between successive hidden states. This alignment makes the hidden state after inference on batch b a good approximation for the starting point on batch b+1. Second, they introduce “stream‑aligned average initialization,” which computes class‑wise averages of the hidden activations from the previous batch and uses these averages as the initial activations for the next batch. This method retains the beneficial information of I_∅ while avoiding the noise introduced by unrelated samples, and it eliminates the need for a full forward pass required by I_fw. For unsupervised learning, the authors augment PC layers with continuous Hopfield networks, exploiting their associative memory to generate class‑wise average states automatically.
Theoretical analysis on a toy three‑layer network proves that the average initialization yields a lower initial energy than I_fw and converges in fewer inference steps. Extensive experiments on supervised benchmarks (FashionMNIST, CIFAR‑10, SVHN) and unsupervised reconstruction tasks demonstrate that the proposed method can achieve up to a five‑fold reduction in SMMs while matching or surpassing the test accuracy of standard PCNs. Notably, with as few as T=5–10 inference steps, the new initialization still learns effectively, whereas I_fw fails at such low T. Final test loss improves by roughly 1–2 % compared to I_fw, and the gap to BP shrinks to less than 0.5 %. In the unsupervised setting, the Hopfield‑enhanced PCN reduces reconstruction error by over 15 % relative to the baseline.
In summary, the paper provides the first systematic study of neuron initialization for predictive coding, identifies the shortcomings of existing schemes, and offers a practical, theoretically‑grounded alternative that dramatically speeds up PCNs. By bringing the computational efficiency of PCNs close to that of back‑propagation, the work narrows the gap between biologically plausible learning algorithms and mainstream deep learning, opening the door for energy‑efficient neuromorphic implementations and broader real‑world adoption.
Comments & Academic Discussion
Loading comments...
Leave a Comment