Learning Long-Range Dependencies with Temporal Predictive Coding

Learning Long-Range Dependencies with Temporal Predictive Coding
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Predictive Coding (PC) is a biologically-inspired learning framework characterised by local, parallelisable operations, properties that enable energy-efficient implementation on neuromorphic hardware. Despite this, extending PC effectively to recurrent neural networks (RNNs) has been challenging, particularly for tasks involving long-range temporal dependencies. Backpropagation Through Time (BPTT) remains the dominant method for training RNNs, but its non-local computation, lack of spatial parallelism, and requirement to store extensive activation histories results in significant energy consumption. This work introduces a novel method combining Temporal Predictive Coding (tPC) with approximate Real-Time Recurrent Learning (RTRL), enabling effective spatio-temporal credit assignment. Results indicate that the proposed method can closely match the performance of BPTT on both synthetic benchmarks and real-world tasks. On a challenging machine translation task, with a 15-million parameter model, the proposed method achieves a test perplexity of 7.62 (vs. 7.49 for BPTT), marking one of the first applications of tPC to tasks of this scale. These findings demonstrate the potential of this method to learn complex temporal dependencies whilst retaining the local, parallelisable, and flexible properties of the original PC framework, paving the way for more energy-efficient learning systems.


💡 Research Summary

This paper introduces a novel learning framework that merges Temporal Predictive Coding (tPC) with Real‑Time Recurrent Learning (RTRL), termed tPC‑RTRL. Predictive Coding (PC) offers a biologically‑inspired, locally‑computed, parallelizable learning rule, but its extension to recurrent neural networks (RNNs) has struggled with long‑range temporal dependencies. Standard back‑propagation through time (BPTT) remains the dominant method for training RNNs, yet it requires unrolling the network, storing full activation histories, and performing non‑local gradient calculations, leading to high energy and memory costs.

tPC extends PC to sequential data by treating each time step as a hierarchical Gaussian model whose latent variables are inferred through an EM‑style inference loop that minimizes a free‑energy objective. The free energy reduces to a sum of precision‑weighted prediction errors, enabling purely local error signals. However, the original tPC update only accounts for the immediate influence of parameters on the current free‑energy term, ignoring how parameters affect future hidden states.

RTRL, introduced by Williams and Zipser (1989), maintains an influence matrix M(t)=∂x(t)/∂W that recursively accumulates the contribution of each parameter to the current hidden state. This yields exact gradients with respect to any loss at time t without storing the entire unrolled graph. The drawback is that M(t) scales as O(n³) for an n‑unit hidden layer, making it impractical for large networks.

The authors combine these ideas by redefining the influence matrix update for tPC. Instead of using the predicted hidden state μ(t), they substitute the converged hidden state ˆx(t) obtained after the inference loop:

M(t)=∂μ(t)/∂W + ∂μ(t)/∂ˆx(t‑1)·M(t‑1) ,  μ(t)=ˆx(t)

The parameter update then incorporates both immediate and historic contributions:

ΔW = –η


Comments & Academic Discussion

Loading comments...

Leave a Comment