Enhancing LLM-based Specification Generation via Program Slicing and Logical Deletion
Traditional formal specification generation methods are typically tailored to specific specification types, and therefore suffer from limited generality. In recent years, large language model (LLM)-based specification generation approaches have emerged, offering a new direction for improving the universality of automated specification synthesis. However, when dealing with complex control flow, LLMs often struggle to precisely generate complete specifications that cover substructures. Moreover, the distinctive verification pipelines adopted by existing approaches may incorrectly discard logically correct specifications, while verification tools alone cannot reliably identify correct specifications. To address these issues, we propose SLD-Spec, an LLM-based specification generation method that combines program slicing and logical deletion. Specifically, SLD-Spec augments the conventional specification generation framework with two key stages: (1) a program slicing stage that decomposes the target function into several smaller code slices, enabling LLMs to focus on more localized semantic structures and thereby improving specification relevance and completeness; and (2) a logical deletion stage that leverages LLMs to perform logical reasoning and filtering over candidate specifications so as to retain logically correct ones. Experimental results show that SLD-Spec consistently outperforms existing methods on datasets containing programs of varying complexity, verifying more programs and generating specifications that are more relevant and more complete. Further ablation studies indicate that program slicing mainly improves specification relevance and completeness, whereas logical deletion plays a key role in increasing verification success rates.
💡 Research Summary
The paper introduces SLD‑Spec, a novel framework that enhances large‑language‑model (LLM)‑based formal specification generation by integrating program slicing and a “logical deletion” stage. Traditional LLM‑driven approaches such as AutoSpec and SpecGen suffer from two major drawbacks: (1) when a function contains complex control‑flow, feeding the whole function to an LLM leads to ambiguous scope interpretation, causing missing or irrelevant specifications for sub‑structures; (2) verification tools are used as filters, but their strict syntactic checks often discard specifications that are logically correct but lack sufficient contextual information, resulting in over‑pruning.
SLD‑Spec restructures the classic guess‑and‑verify pipeline into four phases: slice → guess → logical‑delete → verify.
-
Slicing Phase – A static analysis pass builds a function‑call graph (FCG) and extracts each target function in a bottom‑up order. An automatic variable‑relevance algorithm identifies slicing criteria, and the function is decomposed into a set of independent code slices (typically one per loop, conditional block, or tightly coupled statement group). By reducing the amount of code presented to the LLM, each slice becomes a focused semantic unit, mitigating the “context‑overload” problem.
-
Guessing Phase – For each slice, a prompt is constructed that includes a few‑shot example of specifications in the ANSI/ISO C Specification Language (ACSL). The LLM (e.g., GPT‑4) is queried to generate a candidate set of specifications for that slice. Because the prompt contains only the slice’s code, the model can reason about the local semantics without distraction from unrelated parts of the function.
-
Logical Deletion Phase – Instead of directly feeding the candidates to a verification tool, SLD‑Spec uses the LLM as a judge. The logical deletion pipeline consists of four steps: (i) exclusion of obviously irrelevant candidates, (ii) comprehension of each candidate’s meaning, (iii) logical reasoning that checks consistency between the candidate and the slice’s code (e.g., loop invariants vs. loop body, pre‑conditions vs. variable ranges), and (iv) output of the retained, logically sound specifications. This stage filters out spurious candidates while preserving those that may be rejected by a purely syntactic verifier due to missing contextual cues.
-
Verification Phase – All retained specifications from all slices are aggregated and submitted to a formal verification tool (e.g., Frama‑C, VeriFast). If verification fails, the tool’s counter‑example information is fed back to the logical deletion module, which removes the offending specifications and repeats the verification until either the whole function verifies or only harmless assertion violations remain.
Experimental Evaluation – Two datasets were used: (a) a “simple” benchmark identical to prior work (51 programs) and (b) a newly constructed “complex control‑flow” benchmark (11 programs) containing longer code and nested loops/conditionals. Compared with AutoSpec and SpecGen, SLD‑Spec verified 37/51 programs on the simple set and 10/11 on the complex set, outperforming the baselines by a large margin. Metrics such as the number of generated specifications, precision (correctness), relevance (semantic alignment with code), and verification success rate were all higher for SLD‑Spec. Ablation studies showed that removing the slicing stage mainly reduced relevance and completeness, while removing logical deletion sharply lowered verification success, confirming the complementary roles of the two components.
Contributions –
- A fully automated pipeline that couples program slicing with LLM‑driven specification generation, improving both quantity and quality of specifications.
- Introduction of a logical deletion mechanism that treats the LLM as a reasoning engine, bridging the gap between LLM output and formal verification tools.
- Creation of a challenging benchmark with complex control flow and an expanded evaluation metric suite.
- Open‑source release of the implementation, datasets, and experimental results to foster reproducibility.
Limitations & Future Work – The approach assumes that slicing does not sever essential cross‑slice dependencies; handling such cases may require inter‑slice information flow analysis. Logical deletion relies heavily on the LLM’s reasoning ability; future work could explore ensemble LLMs or external reasoning modules to increase robustness. Finally, optimizing slice granularity to balance LLM workload and verification overhead remains an open research direction.
In summary, SLD‑Spec demonstrates that carefully structuring the input to LLMs (via slicing) and employing the LLM itself for logical filtering (via logical deletion) can substantially improve the practicality of automated formal specification generation for real‑world C programs.
Comments & Academic Discussion
Loading comments...
Leave a Comment