AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross-file context, they suffer from two fundamental problems: misalignment between the query and the target code in the retrieval process, and the inability of existing retrieval methods to effectively utilize the inference information. To address these challenges, we propose AlignCoder, a repository-level code completion framework that introduces a query enhancement mechanism and a reinforcement learning based retriever training method. Our approach generates multiple candidate completions to construct an enhanced query that bridges the semantic gap between the initial query and the target code. Additionally, we employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval. We evaluate AlignCoder on two widely-used benchmarks (CrossCodeEval and RepoEval) across five backbone code LLMs, demonstrating an 18.1% improvement in EM score compared to baselines on the CrossCodeEval benchmark. The results show that our framework achieves superior performance and exhibits high generalizability across various code LLMs and programming languages.


💡 Research Summary

Repository‑level code completion poses a unique challenge for code large language models (LLMs) because the target code often depends on context spread across many files and on domain‑specific knowledge that the model has not seen during pre‑training. Existing retrieval‑augmented generation (RAG) approaches mitigate this by fetching cross‑file snippets, but they suffer from two fundamental shortcomings. First, the query used for retrieval is typically the unfinished code (Cu) itself, which lacks the key tokens that appear only in the target code (Ct). This creates a semantic gap that makes it difficult to retrieve truly relevant snippets. Second, the retrievers (both sparse such as BM25 and dense such as UniXcoder) are not trained to exploit the inference information produced by the LLM; they treat the query as a static string and cannot learn the relationship between a generated candidate completion and the repository.

AlignCoder addresses both problems with a two‑stage framework. In the query‑enhancement stage, a sampler generates multiple candidate completions {Ĉ₁,…,Ĉₙ} for the unfinished code. These candidates are concatenated with Cu to form an enhanced query Q̂ = Cu ⊕ Ĉ₁ ⊕ … ⊕ Ĉₙ. By injecting diverse possible continuations, the enhanced query is far more likely to contain the “key tokens” that bridge the semantic gap to the true target. In the retriever‑training stage, a new retriever called AlignRetriever is trained via reinforcement learning (RL). For each enhanced query Q̂, AlignRetriever retrieves a set of snippets Ŝ = {S₁,…,Sₘ} from the repository. The LLM then generates a final completion Ct̂ using Cu together with Ŝ as context. The reward is defined as the negative perplexity of Ct̂ under the LLM (r = –PPL(Ct̂ | Q̂, Ŝ)). Using the REINFORCE algorithm, the retriever’s parameters are updated to maximize this reward, effectively teaching the retriever to select snippets that most reduce the model’s uncertainty about the target code.

The authors evaluate AlignCoder on two widely‑used benchmarks: CrossCodeEval (Python and Java) and RepoEval (line‑level and API‑level). Five backbone LLMs (e.g., DeepSeekCoder‑1B, StarCoder‑Base) serve as generators. Compared with strong baselines such as BM25‑RAG, ReACC, and RepoCoder, AlignCoder achieves an 18.1 percentage‑point increase in Exact Match (EM) on CrossCodeEval Python, with similarly large gains on Java and on RepoEval tasks. Ablation studies show that removing the query‑enhancement (using a single sample) drops performance by 6–8 pp, while training the retriever without RL reduces it by another 4–5 pp, confirming that both components are essential. Sensitivity analysis indicates that generating around n = 10 candidates and retrieving m = 5 snippets provides the best trade‑off between accuracy and computational cost.

Key insights from the paper include: (1) Multiple sampling dramatically raises the probability that at least one candidate contains the necessary key tokens, effectively shrinking the semantic gap; (2) Perplexity‑based RL rewards align the retriever’s objective with the LLM’s confidence, enabling the retriever to learn how to exploit inference information; (3) The framework is model‑agnostic and language‑agnostic, delivering consistent improvements across different LLM sizes and programming languages, which demonstrates strong generalizability.

The authors acknowledge limitations: the multiple‑sampling step incurs extra compute, though it can be parallelized; the reward function currently only reflects perplexity and does not directly measure downstream criteria such as compilation success or test‑suite passing. Future work may explore richer multi‑objective rewards, candidate‑quality filtering before query construction, attention‑based weighting of retrieved snippets, and efficient incremental indexing for large, continuously evolving codebases.

In summary, AlignCoder introduces a principled way to align retrieval with the target intent by enriching the query with diverse candidate completions and training the retriever via RL on perplexity rewards. This dual strategy closes the semantic gap and enables the retriever to make use of the LLM’s own inference, leading to substantial and robust performance gains in repository‑level code completion.


Comments & Academic Discussion

Loading comments...

Leave a Comment