Meta-learning three-factor plasticity rules for structured credit assignment with sparse feedback

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Biological neural networks learn complex behaviors from sparse, delayed feedback using local synaptic plasticity, yet the mechanisms enabling structured credit assignment remain elusive. In contrast, artificial recurrent networks solving similar tasks typically rely on biologically implausible global learning rules or hand-crafted local updates. The space of local plasticity rules capable of supporting learning from delayed reinforcement remains largely unexplored. Here, we present a meta-learning framework that discovers local learning rules for structured credit assignment in recurrent networks trained with sparse feedback. Our approach interleaves local neo-Hebbian-like updates during task execution with an outer loop that optimizes plasticity parameters via \textbf{tangent-propagation through learning}. The resulting three-factor learning rules enable long-timescale credit assignment using only local information and delayed rewards, offering new insights into biologically grounded mechanisms for learning in recurrent circuits.

💡 Research Summary

This paper addresses a fundamental puzzle in neuroscience: how do biological neural networks learn complex behaviors using only sparse and delayed feedback, a process known as structured credit assignment? While artificial recurrent neural networks (RNNs) typically rely on biologically implausible global algorithms like backpropagation through time (BPTT), the brain is thought to use local synaptic plasticity rules. The space of such local rules capable of solving credit assignment with sparse rewards remains largely unknown.

To explore this space, the authors propose a novel meta-learning framework. The core idea is to not hand-craft a learning rule, but to discover effective ones through optimization. The framework operates with two nested loops. In the inner loop, an RNN performs a cognitive task over multiple episodes. Its recurrent weights are updated episode-by-episode using a parameterized local plasticity rule. This rule is a three-factor rule: it depends on (1) presynaptic activity, (2) postsynaptic activity, and (3) a reward prediction error that modulates a synaptic eligibility trace. The eligibility trace itself is a learned function of pre- and post-synaptic activity, parameterized by coefficients θ. Crucially, the reward signal is provided only at the end of each episode, mimicking sparse biological feedback.

The outer, meta-learning loop optimizes the parameters θ of this plasticity rule itself. The objective is to maximize the total reward accumulated over many training episodes. Computing gradients through the entire extended learning trajectory is computationally challenging. Therefore, the authors employ an efficient method combining a REINFORCE-style policy gradient estimator with “tangent propagation through learning.” This involves forward-mode differentiation of the learning dynamics, propagating tangents (sensitivities) of network states, eligibility traces, and weights with respect to θ, to estimate the meta-gradient.

Through this process, the meta-learner discovers families of local plasticity rules that enable the RNN to solve tasks requiring long-timescale credit assignment. The discovered rules are not simple Hebbian correlations but more complex polynomial interactions. The paper shows that different parameterizations lead to qualitatively different learning trajectories and internal neural representations, analogous to how different gradient-based optimizers can shape learning in artificial networks.

In summary, this work introduces a powerful bottom-up approach for discovering biologically plausible learning mechanisms. Instead of imposing top-down approximations of backpropagation, it uses meta-optimization to search the vast space of local synaptic update rules, revealing candidates that could explain how the brain performs structured learning with sparse, delayed rewards. The technical innovation of tangent-propagation through learning provides a scalable method for this meta-search, offering new computational tools and theoretical insights into the principles of learning in neural circuits.

Meta-learning three-factor plasticity rules for structured credit assignment with sparse feedback

💡 Research Summary

Comments & Academic Discussion

Leave a Comment