Beyond Blame: Rethinking SZZ with Knowledge Graph Search

Beyond Blame: Rethinking SZZ with Knowledge Graph Search
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Identifying Bug-Inducing Commits (BICs) is fundamental for understanding software defects and enabling downstream tasks such as defect prediction and automated program repair. Yet existing SZZ-based approaches are limited by their reliance on git blame, which restricts the search space to commits that directly modified the fixed lines. Our preliminary study on 2,102 validated bug-fixing commits reveals that this limitation is significant: over 40% of cases cannot be solved by blame alone, as 28% of BICs require traversing commit history beyond blame results and 14% are blameless. We present AgenticSZZ, the first approach to apply Temporal Knowledge Graphs (TKGs) to software evolution analysis. AgenticSZZ reframes BIC identification from a ranking problem over blame commits into a graph search problem, where temporal ordering is fundamental to causal reasoning about bug introduction. The approach operates in two phases: (1) constructing a TKG that encodes commits with temporal and structural relationships, expanding the search space by traversing file history backward from two reference points (blame commits and the BFC); and (2) leveraging an LLM agent to navigate the graph using specialized tools for candidate exploration and causal analysis. Evaluation on three datasets shows that AgenticSZZ achieves F1-scores of 0.48 to 0.74, with statistically significant improvements over state-of-the-art by up to 27%. Our ablation study confirms that both components are essential, reflecting a classic exploration-exploitation trade-off: the TKG expands the search space while the agent provides intelligent selection. By transforming BIC identification into a graph search problem, we open a new research direction for temporal and causal reasoning in software evolution analysis.


💡 Research Summary

The paper tackles a long‑standing limitation of SZZ‑based Bug‑Inducing Commit (BIC) identification: reliance on git blame confines the candidate set to commits that directly modified the lines deleted or changed in a bug‑fixing commit (BFC). By analyzing 2,102 validated BFC‑BIC pairs from three well‑known datasets (Linux, Apache, GitHub), the authors show that only 57 % of ground‑truth BICs appear in the blame set, while 28 % require traversing the commit graph beyond blame (either as ancestors of blame commits or as ancestors of the BFC), and 14 % are “blameless” because the BFC contains only added lines. This empirical study demonstrates that any approach limited to blame candidates will miss a substantial portion of bugs.

To overcome this, the authors introduce AgenticSZZ, the first method that frames BIC identification as a search problem over a Temporal Knowledge Graph (TKG) rather than a static ranking problem. AgenticSZZ operates in two phases:

  1. TKG Construction – Commits become nodes; edges encode temporal precedence, shared files, shared functions, authorship, and other structural relationships. Starting from two reference points (the set of blame commits and the BFC itself), the graph is expanded by traversing file history backward, thereby including “Blame Ancestor” and “BFC Ancestor” candidates. The authors limit traversal to commits that touched the same files, dramatically reducing the search space (median depth ≈ 13 k commits vs. a few hundred when unrestricted).

  2. Agentic BIC Search – A large language model (LLM) agent navigates the TKG using four specialized tools: candidate enumeration, structural traversal, property queries, and causal analysis. The agent formulates natural‑language questions such as “Could this commit have introduced the bug?” and iteratively refines its belief by examining code changes, temporal ordering, and functional dependencies. By reasoning about causality (a commit can only induce a bug if it precedes the BFC) the agent can prioritize older commits that introduced buggy logic later manifested by the fix.

The evaluation compares AgenticSZZ against eight baselines, including the recent LLM4SZZ, classic SZZ, refactoring‑aware SZZ, and deep‑learning‑based variants. Across the three datasets, AgenticSZZ achieves F1 scores ranging from 0.48 to 0.74, outperforming the best baseline by up to 27 % (statistically significant with p < 0.01). An ablation study shows that removing the TKG leads to excessive noise (low precision), while removing the agent reduces recall dramatically, confirming that both components are essential and embody a classic exploration‑exploitation trade‑off.

The paper also discusses “blameless” cases, proposing a fallback that applies blame to the two lines surrounding each added hunk, but acknowledges that this heuristic is insufficient for many real‑world bugs. Future work is outlined: integrating dynamic information such as test execution traces, employing multimodal LLMs that jointly process code, issue text, and execution logs, and scaling the approach to continuous integration pipelines for real‑time bug‑origin detection.

In summary, AgenticSZZ redefines BIC identification as temporal‑graph search, leveraging a TKG to broaden the candidate space and an LLM agent to perform intelligent causal reasoning. This shift not only yields measurable performance gains but also opens a new research direction for temporal and causal reasoning in software evolution, with potential impact on defect prediction, automated program repair, and risk‑aware refactoring.


Comments & Academic Discussion

Loading comments...

Leave a Comment