Bridging Graph Structure and Knowledge-Guided Editing for Interpretable Temporal Knowledge Graph Reasoning
Temporal knowledge graph reasoning (TKGR) aims to predict future events by inferring missing entities with dynamic knowledge structures. Existing LLM-based reasoning methods prioritize contextual over structural relations, struggling to extract relevant subgraphs from dynamic graphs. This limits structural information understanding, leading to unstructured, hallucination-prone inferences especially with temporal inconsistencies. To address this problem, we propose IGETR (Integration of Graph and Editing-enhanced Temporal Reasoning), a hybrid reasoning framework that combines the structured temporal modeling capabilities of Graph Neural Networks (GNNs) with the contextual understanding of LLMs. IGETR operates through a three-stage pipeline. The first stage aims to ground the reasoning process in the actual data by identifying structurally and temporally coherent candidate paths through a temporal GNN, ensuring that inference starts from reliable graph-based evidence. The second stage introduces LLM-guided path editing to address logical and semantic inconsistencies, leveraging external knowledge to refine and enhance the initial paths. The final stage focuses on integrating the refined reasoning paths to produce predictions that are both accurate and interpretable. Experiments on standard TKG benchmarks show that IGETR achieves state-of-the-art performance, outperforming strong baselines with relative improvements of up to 5.6% on Hits@1 and 8.1% on Hits@3 on the challenging ICEWS datasets. Additionally, we execute ablation studies and additional analyses confirm the effectiveness of each component.
💡 Research Summary
The paper introduces IGETR (Integration of Graph and Editing‑enhanced Temporal Reasoning), a novel hybrid framework for temporal knowledge graph reasoning (TKGR) that simultaneously leverages the structural strengths of graph neural networks (GNNs) and the contextual, world‑knowledge capabilities of large language models (LLMs). Existing approaches fall into two camps: pure GNN‑based methods excel at capturing observed structural and temporal dependencies but are limited to the data they see, lacking mechanisms to correct noisy edges or hypothesize unseen relations; pure LLM‑based methods bring extensive pre‑trained knowledge and flexible reasoning but suffer from hallucinations, temporal inconsistencies, and a lack of controllable, interpretable reasoning paths, especially in dynamic settings where the knowledge graph evolves over time. IGETR bridges this gap through a three‑stage pipeline.
Stage A – GNN‑based candidate extraction: A temporal GNN equipped with attention‑driven edge sampling extracts a set of candidate entities and associated multi‑hop reasoning paths for a given query (subject, relation, future timestamp). The attention scores provide a quantitative measure of each path’s relevance, ensuring that the initial reasoning is grounded in graph‑derived evidence that respects chronological proximity while preserving diversity through stratified attention.
Stage B – LLM‑guided path editing: The candidate paths are fed to an LLM via a carefully crafted prompt that asks the model to “refine the path step‑by‑step, considering the query, the candidate score, and external knowledge.” The LLM acts as a logical validator: it detects temporal inversions, contradictory relations, irrelevant entities, and can insert missing but plausible links using its background knowledge. This editing step mitigates the hallucination problem of pure LLM generation and injects semantic richness that pure GNNs cannot provide.
Stage C – Graph Transformer integration: The edited paths are processed by a graph transformer that dynamically weights each path based on three factors: (i) structural coherence (how well the path follows graph topology), (ii) temporal relevance (time gaps between consecutive edges), and (iii) the LLM‑assigned edit confidence. The transformer aggregates multi‑hop evidence into a final score for each candidate entity while preserving an interpretable chain of reasoning that can be inspected by users.
The authors evaluate IGETR on three ICEWS benchmarks (ICEWS18, ICEWS14, ICEWS05‑15). Compared with strong baselines such as xERTE, CaORG, and RPC, IGETR achieves relative improvements of up to 5.6 % on Hits@1 and 8.1 % on Hits@3, as well as consistent gains in MRR. Ablation studies demonstrate that each component contributes meaningfully: removing the GNN stage (using only LLM) drastically drops performance; omitting the LLM editing (using raw GNN paths) reduces temporal consistency; and discarding the transformer aggregation limits the model’s ability to fuse multi‑hop evidence. Additional analyses reveal that the LLM editing reduces temporal inversion errors by over 70 % and that longer paths benefit more from semantic refinement.
Key contributions are: (1) the first path‑refine reasoning framework that tightly couples temporal GNNs with knowledge‑augmented LLMs; (2) a three‑stage pipeline that grounds reasoning in data, refines it with external knowledge, and aggregates it in an interpretable transformer; (3) extensive empirical validation confirming state‑of‑the‑art performance and interpretability; and (4) a demonstration that such a hybrid approach is especially suitable for high‑stakes domains (e.g., real‑time decision making, risk assessment, policy planning) where both predictive accuracy and transparent justification are essential.
Comments & Academic Discussion
Loading comments...
Leave a Comment