Flexible inference for animal learning rules using neural networks

Flexible inference for animal learning rules using neural networks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Understanding how animals learn is a central challenge in neuroscience, with growing relevance to the development of animal- or human-aligned artificial intelligence. However, existing approaches tend to assume fixed parametric forms for the learning rule (e.g., Q-learning, policy gradient), which may not accurately describe the complex forms of learning employed by animals in realistic settings. Here we address this gap by developing a framework to infer learning rules directly from behavioral data collected during de novo task learning. We assume that animals follow a decision policy parameterized by a generalized linear model (GLM), and we model their learning rule – the mapping from task covariates to per-trial weight updates – using a deep neural network (DNN). This formulation allows flexible, data-driven inference of learning rules while maintaining an interpretable form of the decision policy itself. To capture more complex learning dynamics, we introduce a recurrent neural network (RNN) variant that relaxes the Markovian assumption that learning depends solely on covariates of the current trial, allowing for learning rules that integrate information over multiple trials. Simulations demonstrate that the framework can recover ground-truth learning rules. We applied our DNN and RNN-based methods to a large behavioral dataset from mice learning to perform a sensory decision-making task and found that they outperformed traditional RL learning rules at predicting the learning trajectories of held-out mice. The inferred learning rules exhibited reward-history-dependent learning dynamics, with larger updates following sequences of rewarded trials. Overall, these methods provide a flexible framework for inferring learning rules from behavioral data in de novo learning tasks, setting the stage for improved animal training protocols and the development of behavioral digital twins.


💡 Research Summary

Understanding animal learning has traditionally relied on fitting behavior to a small set of predefined reinforcement‑learning (RL) algorithms such as Q‑learning or policy‑gradient methods. These parametric approaches assume that the learning rule—i.e., the mapping from trial‑by‑trial task variables to weight updates—is known a priori, an assumption that is increasingly recognized as unrealistic for complex, de‑novo learning tasks. In this paper, Liu, Geadah, and Pillow introduce a data‑driven framework that infers the learning rule directly from behavioral time series without imposing a fixed functional form.

The core of the framework is a two‑level model. At the lower level, the animal’s decision policy on each trial is represented by a Bernoulli generalized linear model (GLM) with a weight vector (w_t) that linearly combines task covariates (stimulus intensity and a bias term) to produce a choice probability via a logistic link. The GLM provides an interpretable, low‑dimensional description of the policy that evolves over the course of learning. At the higher level, the weight update (\Delta w_t) is modeled as a flexible function (f_\theta) of the current trial’s variables ((w_t, x_t, y_t, r_t)). Two parameterizations of (f_\theta) are explored:

  1. DNNGLM (feed‑forward Deep Neural Network) – a multilayer perceptron with two hidden layers that maps the current trial’s inputs directly to (\Delta w_t). This architecture captures any Markovian learning rule, i.e., updates that depend only on the present trial.

  2. RNNGLM (Recurrent Neural Network) – a GRU‑based recurrent module that first aggregates a hidden state (h_t) from the entire trial history and then feeds (h_t) into the same feed‑forward network to produce (\Delta w_t). This design enables the model to learn arbitrary non‑Markovian dependencies, such as eligibility‑trace‑like effects where past rewards influence current updates.

Both models are trained by maximizing the log‑likelihood of the observed choices under the dynamic GLM, which is equivalent to minimizing binary cross‑entropy across trials. The authors employ cross‑validation and the inherent regularization of neural networks to avoid over‑fitting.

Simulation validation. The authors first simulate agents that learn via the classic REINFORCE policy‑gradient rule. Using synthetic data, DNNGLM accurately recovers the ground‑truth weight‑update surface: the magnitude of (\Delta w) varies with stimulus contrast, current weight magnitude, and whether the trial was correct. Reconstruction error (RMSE) declines sharply with increasing data, demonstrating consistency. Next, they construct a non‑Markovian variant of REINFORCE that incorporates an eligibility‑trace term summing contributions from the previous three trials. Only RNNGLM captures the resulting history‑dependent amplification of weight updates after sequences of rewarded trials; DNNGLM (even when augmented with manually engineered history features) fails to reproduce this effect, highlighting the necessity of recurrent architectures for truly non‑Markovian learning.

Application to real data. The framework is applied to a large dataset from the International Brain Laboratory, where mice learn a visual discrimination task by turning a wheel. For each mouse, the authors fit DNNGLM and RNNGLM on the data of all other mice and then predict the held‑out mouse’s choice sequence. Both models achieve significantly higher test log‑likelihoods than a baseline REINFORCE model, with RNNGLM also outperforming a DNNGLM that includes the previous trial as an explicit input. The inferred weight trajectories (stimulus and bias weights) closely match those obtained by PsyTrack, a model‑agnostic method for tracking psychometric weights, confirming that the learned updates are biologically plausible. Importantly, the inferred learning rules exhibit a clear dependence on recent reward history: weight changes are larger after strings of correct, rewarded trials, a pattern absent from standard RL rules.

Robustness checks. The authors explore several variations: (i) learning rules without outcome asymmetry, (ii) mixtures of different update functions across individuals, (iii) different initial weight settings, and (iv) longer trial sequences. Across all conditions, the DNNGLM/RNNGLM framework remains stable, and performance degrades gracefully when the initial weight estimate is perturbed. Sensitivity analyses show that the models are not driven by over‑parameterization; cross‑validation error remains low, indicating effective regularization.

Key contributions.

  1. Introduction of a non‑parametric, potentially non‑Markovian learning‑rule inference method that operates directly on behavioral time series.
  2. Combination of an interpretable GLM policy with flexible neural‑network‑based weight‑update functions, preserving interpretability while allowing rich dynamics.
  3. Demonstration that the inferred rules outperform classic RL algorithms in predicting unseen animal behavior, both in simulated and real‑world settings.
  4. Discovery of reward‑history‑dependent learning dynamics in mice, suggesting that animal learning may incorporate eligibility‑trace‑like mechanisms not captured by standard models.

Implications. By providing a systematic way to extract the underlying learning algorithm from behavior, this work opens avenues for designing more efficient training protocols, constructing accurate “behavioral digital twins” for simulation, and informing the development of biologically aligned artificial intelligence systems that mimic the flexibility of animal learning.


Comments & Academic Discussion

Loading comments...

Leave a Comment