Combining Tests and Proofs for Better Software Verification

Combining Tests and Proofs for Better Software Verification
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Test or prove? These two approaches to software verification have long been presented as opposites. One is dynamic, the other static: a test executes the program, a proof only analyzes the program text. A different perspective is emerging, in which testing and proving are complementary rather than competing techniques for producing software of verified quality. Work performed over the past few years and reviewed here develops this complementarity by taking advantage of Design by Contract, as available in Eiffel, and exploiting a feature of modern program-proving tools based on ``Satisfiability Modulo Theories’’ (SMT): counterexample generation. A counterexample is an input combination that makes the program fail. If we are trying to prove a program correct, we hope not to find any. One can, however, apply counterexample generation to incorrect programs, as a tool for automatic test generation. We can also introduce faults into a correct program and turn the counterexamples into an automatically generated regression test suite with full coverage. Additionally, we can use these mechanisms to help produce program fixes for incorrect programs, with a guarantee that the fixes are correct. All three applications, leveraging on the mechanisms of Eiffel and Design by Contract, hold significant promise to address some of the challenges of program testing, software maintenance and Automatic Program Repair.


💡 Research Summary

The paper argues that testing and proving, traditionally seen as opposing verification techniques, can be combined into a unified workflow that leverages Design by Contract (DbC) and modern SMT‑based provers. The authors focus on Eiffel, a language that natively supports contracts (preconditions, postconditions, loop invariants, and variants), and on the EiffelStudio tool suite (AutoProof, AutoTest, AutoFix). The central observation is that many SMT‑based provers, such as Z3 used through Boogie, attempt to prove a property by searching for a counterexample. If a counterexample is found, the proof fails; if none is found, the proof succeeds. The authors turn the “failure” case into a source of concrete, actionable information.

Three concrete techniques are presented:

  1. Proof2Test (Section 4) extracts the counterexample generated by a failed proof, minimizes it to human‑readable values, and automatically emits a runnable test case. This gives developers concrete evidence of why a proof failed, turning an abstract “postcondition violated” message into a specific input that reproduces the bug. The generated test is also stored as a regression test, ensuring that once the bug is fixed it will not reappear.

  2. Proof2Fix (Section 5) uses the same counterexamples as specifications for automatic program repair. Traditional APR approaches rely on manually written tests to validate candidate patches. Here, the contract‑based counterexample serves as a precise specification of the faulty behavior, and the SMT solver is used again to verify that a proposed fix satisfies all contracts. Consequently, any generated patch is guaranteed to be provably correct with respect to the given contracts.

  3. Seeding Contradiction (Section 6) deliberately injects contradictions into a correct program, causing the prover to fail. Proof2Test is then applied to the resulting counterexamples, producing a large set of tests automatically. By unrolling loops and varying inputs, the technique can achieve high‑level coverage metrics such as Modified Condition/Decision Coverage (MC/DC), which is often required in safety‑critical domains. The authors demonstrate that this systematic test generation yields substantially higher coverage than random testing tools.

The paper situates these contributions within a broader historical context, noting the long‑standing debate between “tests prove bugs exist, proofs prove they do not” (citing Dijkstra) and the recent emergence of the Tests and Proofs (TAP) community. It argues that static debugging (failed proofs) and dynamic debugging (failed tests) are complementary: static debugging provides a guarantee of correctness when successful, while dynamic debugging offers concrete evidence when something goes wrong. By turning proof failures into test cases, the authors bridge the gap between the two worlds.

Implementation details reveal that the workflow is tightly integrated into EiffelStudio. Contracts are embedded directly in the source code, enabling both runtime checking (for tests) and static reasoning (for proofs). AutoProof translates Eiffel code into Boogie, which in turn generates SMT‑LIB queries for Z3. The new tools (Proof2Test, Proof2Fix, and the seeding mechanism) parse the solver’s internal counterexample model, perform minimization (e.g., reducing large integer values to small, human‑readable numbers), and emit Eiffel test code or patch suggestions. The authors also discuss portability: while Eiffel provides native contract support, similar ideas could be applied to Java (via JML), C# (via Spec#), or any language that can be annotated with contracts.

Experimental evaluation includes several case studies (e.g., a MAX routine that returns the maximum element of an array). Proof2Test generated a minimal counterexample (array size 2, elements 0 and 1) that exposed a postcondition violation. Proof2Fix automatically repaired an off‑by‑one error and the repaired code passed both the generated regression test and the full static proof. Seeding Contradiction produced a suite of tests that achieved 100 % branch coverage and 85 % MC/DC on the same routine, outperforming existing random testing tools.

In conclusion, the paper demonstrates that the failure mode of modern provers—counterexample generation—can be repurposed as a powerful source of test cases, regression suites, and even formally verified patches. By exploiting contracts as a common specification language for both testing and proving, the authors provide a practical pathway to integrate static and dynamic verification, reduce development effort, and increase software reliability, especially in domains where high assurance is mandatory.


Comments & Academic Discussion

Loading comments...

Leave a Comment