Entropy-Lens: Uncovering Decision Strategies in LLMs

Entropy-Lens: Uncovering Decision Strategies in LLMs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In large language models (LLMs), each block operates on the residual stream to map input token sequences to output token distributions. However, most of the interpretability literature focuses on internal latent representations, leaving token-space dynamics underexplored. The high dimensionality and categoricity of token distributions hinder their analysis, as standard statistical descriptors are not suitable. We show that the entropy of logit-lens predictions overcomes these issues. In doing so, it provides a per-layer scalar, permutation-invariant metric. We introduce Entropy-Lens to distill the token-space dynamics of the residual stream into a low-dimensional signal. We call this signal the entropy profile. We apply our method to a variety of model sizes and families, showing that (i) entropy profiles uncover token prediction dynamics driven by expansion and pruning strategies; (ii) these dynamics are family-specific and invariant under depth rescaling; (iii) they are characteristic of task type and output format; (iv) these strategies have unequal impact on downstream performance, with the expansion strategy usually being more critical. Ultimately, our findings further enhance our understanding of the residual stream, enabling a granular assessment of how information is processed across model depth.


💡 Research Summary

The paper introduces Entropy‑Lens, a simple yet powerful framework for probing the token‑space dynamics of large language models (LLMs) by measuring the entropy of logit‑lens predictions at each transformer layer. Traditional interpretability work has largely focused on the high‑dimensional residual stream embeddings, which are difficult to relate directly to token‑level decisions because vocabulary distributions are unordered and extremely high‑dimensional. By projecting the residual stream into the output vocabulary via the logit‑lens and computing Shannon entropy (or more generally Rényi entropy) of the resulting probability distribution, the authors obtain a single scalar per layer per token. This scalar—called the entropy profile—serves as a low‑dimensional, permutation‑invariant summary of how many candidate tokens the model is considering at each depth.

The authors argue that changes in entropy (ΔH_i = H_i – H_{i‑1}) can be interpreted as “expansion” (entropy increase, more candidates) or “pruning” (entropy decrease, fewer candidates). They validate two key claims: (C1) ΔH_i is monotonically related to the change in the number of top‑p candidates, demonstrated by high Spearman correlations (0.73–0.88) across several models; (C2) adjacent layers share a large fraction of their top‑p candidate sets (60‑100% overlap), confirming that the candidate pool evolves gradually rather than jumping arbitrarily.

Using this metric, the paper conducts a systematic study across model families (Llama‑3.2, Gemma‑2) and scales (1 B to 9 B parameters). Each family exhibits a characteristic mixing of expansion and pruning phases that is stable under depth rescaling. For example, Llama‑3.2 shows a sharp entropy rise in early layers followed by a gentle decline, whereas Gemma‑2 displays a smoother rise and modest decline throughout. These patterns reflect architectural and training differences.

Task‑level analyses reveal that different generation objectives produce distinct entropy signatures. Creative tasks such as poetry or story generation start with high entropy (broad exploration) and then prune aggressively to lock in stylistic constraints, while factual QA or syntactic checking maintain lower entropy throughout, indicating a tighter candidate set. Output formats (chat logs vs. scientific articles) also modulate the profile, with dialogue models showing more frequent pruning cycles.

To assess functional importance, the authors run multiple‑choice benchmarks where they artificially fix or perturb entropy at specific layers. Results show that expansion phases have a larger impact on downstream accuracy than pruning phases, suggesting that early broad exploration is crucial for final performance.

The study also explores Rényi entropy with various α values (0.5–2.0) and finds that the qualitative shape of entropy profiles remains stable, justifying the use of Shannon entropy as a parameter‑free default.

Overall, Entropy‑Lens provides a model‑agnostic, gradient‑free tool that compresses the complex, high‑dimensional token‑space dynamics of LLMs into an interpretable scalar trajectory. It enables researchers and engineers to visualize, compare, and diagnose model behavior across depth, architecture, task, and output style, and offers new insights into how LLMs balance candidate expansion and pruning during generation.


Comments & Academic Discussion

Loading comments...

Leave a Comment