Understanding Counterexamples for Relational Properties with DIbugger

Benedikt Wagner (Karlsruhe Institute of Technology (KIT)); Bernhard Beckert (Karlsruhe Institute of Technology (KIT)); Chiara Staudenmaier (Karlsruhe Institute of Technology (KIT)); Etienne Brunner (Karlsruhe Institute of Technology (KIT)); Joana Plewnia (Karlsruhe Institute of Technology (KIT)); Michael Kirsten (Karlsruhe Institute of Technology (KIT)); Mihai Herda (Karlsruhe Institute of Technology (KIT)); Pascal Zwick (Karlsruhe Institute of Technology (KIT)); Ulla Scheler (Karlsruhe Institute of Technology (KIT))

arxiv: 1907.03996 · v1 · pith:O5E3I5CRnew · submitted 2019-07-09 · 💻 cs.SE · cs.CR· cs.PL

Understanding Counterexamples for Relational Properties with DIbugger

Mihai Herda (Karlsruhe Institute of Technology (KIT)) , Michael Kirsten (Karlsruhe Institute of Technology (KIT)) , Etienne Brunner (Karlsruhe Institute of Technology (KIT)) , Joana Plewnia (Karlsruhe Institute of Technology (KIT)) , Ulla Scheler (Karlsruhe Institute of Technology (KIT)) , Chiara Staudenmaier (Karlsruhe Institute of Technology (KIT)) , Benedikt Wagner (Karlsruhe Institute of Technology (KIT)) , Pascal Zwick (Karlsruhe Institute of Technology (KIT))

show 1 more author

Bernhard Beckert (Karlsruhe Institute of Technology (KIT))

This is my paper

Pith reviewed 2026-05-25 00:27 UTC · model grok-4.3

classification 💻 cs.SE cs.CRcs.PL

keywords software verificationcounterexamplesrelational propertiesdebugging toolprogram analysissecurity propertiesfairness verification

0 comments

The pith

DIbugger enables simultaneous debugging of multiple related programs to analyze counterexamples for relational properties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Software verification for properties like security or fairness requires comparing multiple related program executions, and failed attempts produce counterexamples that are often hard to interpret. The paper introduces DIbugger to let users debug several such executions at once, making the reasons for violations clearer. This targets a practical bottleneck where verification tools report inputs but leave the link to the property violation opaque. A sympathetic reader would care because it aims to shorten the cycle of failed attempts, manual inspection, and adjustments to code or specifications.

Core claim

The paper presents DIbugger as a tool that enhances the software verification process by allowing the user to debug multiple related programs simultaneously when analyzing counterexamples of relational properties.

What carries the argument

DIbugger, a debugging tool that supports simultaneous inspection of multiple related program executions to clarify violations of relational properties.

If this is right

Verification of complex relational requirements such as security and fairness becomes less tedious through clearer counterexample analysis.
Users can adjust programs or specifications more readily once the link between inputs and property violations is visible across multiple runs.
The overall verification workflow gains support for properties that inherently relate several executions rather than single runs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The simultaneous-debugging idea could be adapted to properties that are not strictly relational if similar multi-execution views prove useful.
Integration with existing verifiers that already produce counterexamples would be a direct next step to test adoption.
Visual diff views across the multiple executions might emerge as a natural extension of the tool's core mechanism.

Load-bearing premise

The primary difficulty when relational verification fails is lack of clarity about why a given counterexample violates the property, and that simultaneous debugging of multiple runs will substantially reduce this difficulty.

What would settle it

A controlled user study in which participants fail to identify violation causes faster or more accurately when using DIbugger than when using standard single-run counterexample outputs would falsify the central claim.

Figures

Figures reproduced from arXiv: 1907.03996 by Benedikt Wagner (Karlsruhe Institute of Technology (KIT)), Bernhard Beckert (Karlsruhe Institute of Technology (KIT)), Chiara Staudenmaier (Karlsruhe Institute of Technology (KIT)), Etienne Brunner (Karlsruhe Institute of Technology (KIT)), Joana Plewnia (Karlsruhe Institute of Technology (KIT)), Michael Kirsten (Karlsruhe Institute of Technology (KIT)), Mihai Herda (Karlsruhe Institute of Technology (KIT)), Pascal Zwick (Karlsruhe Institute of Technology (KIT)), Ulla Scheler (Karlsruhe Institute of Technology (KIT)).

**Figure 2.** Figure 2: The user interface of DIbugger 2.1 Debugging Operations The buttons for the debugging operations are situated in the top right part of the GUI (see [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

Software verification is a tedious process that involves the analysis of multiple failed verification attempts, and adjustments of the program or specification. This is especially the case for complex requirements, e.g., regarding security or fairness, when one needs to compare multiple related runs of the same software. Verification tools often provide counterexamples consisting of program inputs when a proof attempt fails, however it is often not clear why the reported counterexample leads to a violation of the checked property. In this paper, we enhance this aspect of the software verification process by providing DIbugger, a tool for analyzing counterexamples of relational properties, allowing the user to debug multiple related programs simultaneously.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DIbugger is a new tool for joint debugging of relational counterexamples, but the paper gives no evidence that the simultaneous view actually helps users understand violations.

read the letter

The paper's main point is the introduction of DIbugger, a debugger that steps through multiple related program executions at the same time to inspect why a relational property fails. It builds on existing verification tools by adding this multi-run interface for properties like security or fairness comparisons. The description of the tool's architecture, input handling, and example scenarios is straightforward and shows how the system is meant to work in practice. That part is solid for a tool paper. The motivation section correctly notes that counterexamples from relational verifiers often leave the user unsure about the exact violation, and the tool aims to address visibility across runs. No circular reasoning or unsupported math appears. The soft spot is the complete absence of evaluation. The manuscript describes the system and gives usage examples but supplies no user study, no timed comparison against separate debuggers, and no data on diagnostic accuracy or time saved. The central assumption that joint debugging will substantially reduce the difficulty therefore stays untested. This is a standard tool-description paper in cs.SE. Readers who build or use relational verifiers might pick up practical ideas from the implementation details. It deserves peer review because the tool is a legitimate extension of prior work and the description is coherent on its own terms, even if the usefulness claim would need more support in revision.

Referee Report

1 major / 0 minor

Summary. The paper presents DIbugger, a tool for analyzing counterexamples arising from failed verification attempts on relational properties. It motivates the work by observing that standard counterexamples (program inputs) often leave unclear why a relational property is violated, and claims to address this by enabling simultaneous debugging of multiple related program runs.

Significance. A tool that makes relational counterexample inspection more direct could reduce iteration time in verifying security or fairness properties. The manuscript supplies a concrete description of the tool architecture and input format together with usage scenarios; these elements constitute the primary contribution and are presented at a level appropriate for a tool paper.

major comments (1)

[Abstract] Abstract: the central claim that DIbugger 'enhance[s] this aspect of the software verification process' rests on the untested assumption that simultaneous multi-program debugging substantially clarifies why a counterexample violates a relational property. No user study, controlled comparison against separate debuggers, time-to-insight metrics, or even qualitative case studies are supplied to support the enhancement.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments on our tool paper. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that DIbugger 'enhance[s] this aspect of the software verification process' rests on the untested assumption that simultaneous multi-program debugging substantially clarifies why a counterexample violates a relational property. No user study, controlled comparison against separate debuggers, time-to-insight metrics, or even qualitative case studies are supplied to support the enhancement.

Authors: We agree that the manuscript provides no empirical evaluation (user study, controlled comparison, or metrics) to substantiate a measured improvement in insight. As a tool paper whose primary contribution is the architecture, input format, and usage scenarios, the abstract wording was intended to describe the tool's intended purpose rather than report validated outcomes. We will revise the abstract to use more precise language that describes the functionality (simultaneous debugging of related executions) without claiming an untested enhancement to the verification process. revision: yes

Circularity Check

0 steps flagged

No circularity: tool-description paper with no derivations or fitted claims

full rationale

The manuscript describes the architecture, input format, and usage scenarios of the DIbugger tool for simultaneous debugging of relational counterexamples. It contains no equations, predictions, fitted parameters, or derivation chain that could reduce to its own inputs. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. The central claim is a straightforward description of tool functionality rather than a result derived from prior self-referential content. This is the expected non-finding for a pure tool paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software-tool description paper; no mathematical free parameters, axioms, or invented entities are involved.

pith-pipeline@v0.9.0 · 5740 in / 896 out tokens · 26107 ms · 2026-05-25T00:27:53.897796+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

LNCS 10001, Springer, doi:10.1007/978-3-319-49812-6

Wolfgang Ahrendt et al., editors (2016): Deductive Software Veriﬁcation - The KeY Book: From Theory to Practice. LNCS 10001, Springer, doi:10.1007/978-3-319-49812-6

work page doi:10.1007/978-3-319-49812-6 2016
[2]

In Umberto Grandi & Jeffrey S

Bernhard Beckert, Thorsten Bormer, Michael Kirsten, Till Neuber & Mattias Ulbrich (2016): Automated Veriﬁcation for Functional and Relational Properties of Voting Rules . In Umberto Grandi & Jeffrey S. Rosenschein, editors: Sixth International Workshop on Computational Social Choice (COMSOC 2016) , doi:10.5445/IR/1000092712

work page doi:10.5445/ir/1000092712 2016
[3]

Bernhard Beckert, Sarah Grebing & Mattias Ulbrich (2017): An Interaction Concept for Program Veriﬁ- cation Systems with Explicit Proof Object . In Ofer Strichman & Rachel Tzoref-Brill, editors: 13th Inter- national Haifa Veriﬁcation Conference on Hardware and Software (HVC 2017) , LNCS 10629, Springer, doi:10.1007/978-3-319-70389-3_11

work page doi:10.1007/978-3-319-70389-3_11 2017
[4]

In Michael J

Bernhard Beckert, Mattias Ulbrich, Birgit V ogel-Heuser & Alexander Weigl (2015): Regression Veriﬁca- tion for Programmable Logic Controller Software . In Michael J. Butler, Sylvain Conchon & Fatiha Za- ïdi, editors: 17th International Conference on Formal Engineering Methods (ICFEM 2015) , LNCS 9407, doi:10.1007/978-3-319-25423-4_15

work page doi:10.1007/978-3-319-25423-4_15 2015
[5]

Clarkson & Fred B

Michael R. Clarkson & Fred B. Schneider (2010): Hyperproperties. Journal of Computer Security 18(6), doi:10.3233/JCS-2009-0393

work page doi:10.3233/jcs-2009-0393 2010
[6]

In Gruia-Catalin Roman, William G

Holger Cleve & Andreas Zeller (2005): Locating Causes of Program Failures . In Gruia-Catalin Roman, William G. Griswold & Bashar Nuseibeh, editors: 27th International Conference on Software Engineering (ICSE 2005), ACM, doi:10.1145/1062455.1062522

work page doi:10.1145/1062455.1062522 2005
[7]

In Rajeev Alur & Doron A

Alex Groce, Daniel Kroening & Flavio Lerda (2004): Understanding Counterexamples with explain. In Rajeev Alur & Doron A. Peled, editors:16th International Conference on Computer Aided Veriﬁcation (CA V 2004), LNCS 3114, Springer, doi:10.1007/978-3-540-27813-9_35

work page doi:10.1007/978-3-540-27813-9_35 2004
[8]

In David Lo, Sven Apel & Sarfraz Khur- M

Martin Hentschel, Reiner Hähnle & Richard Bubel (2016): The Interactive Veriﬁcation Debugger: Effective Understanding of Interactive Proof Attempts . In David Lo, Sven Apel & Sarfraz Khur- M. Herda et al. 13 shid, editors: 31st International Conference on Automated Software Engineering (ASE 2016) , ACM, doi:10.1145/2970276.2970292

work page doi:10.1145/2970276.2970292 2016
[9]

In Gruia-Catalin Roman & André van der Hoek, editors: 18th International Symposium on Foundations of Software Engineering (SIGSOFT FSE 2010), ACM, doi:10.1145/1882291.1882302

Nicholas Jalbert & Koushik Sen (2010): A Trace Simpliﬁcation Technique for Effective Debugging of Con- current Programs. In Gruia-Catalin Roman & André van der Hoek, editors: 18th International Symposium on Foundations of Software Engineering (SIGSOFT FSE 2010), ACM, doi:10.1145/1882291.1882302

work page doi:10.1145/1882291.1882302 2010
[10]

Journal of Automated Reasoning 60(3), doi:10.1007/s10817-017-9433-5

Moritz Kiefer, Vladimir Klebanov & Mattias Ulbrich (2018): Relational Program Reasoning Using Com- piler IR - Combining Static Veriﬁcation and Dynamic Analysis . Journal of Automated Reasoning 60(3), doi:10.1007/s10817-017-9433-5

work page doi:10.1007/s10817-017-9433-5 2018
[11]

In Umberto Grandi & Jeffrey S

Michael Kirsten & Olivier Cailloux (2018): Towards automatic argumentation about voting rules. In San- dra Bringay & Juliette Mattioli, editors: 4ème conférence sur les Applications Pratiques de l’Intelligence Artiﬁcielle (APIA 2018), doi:10.5445/IR/1000092711

work page doi:10.5445/ir/1000092711 2018
[12]

Tian Lan, David T. H. Kao, Mung Chiang & Ashutosh Sabharwal (2010):An Axiomatic Theory of Fairness in Network Resource Allocation. In: 29th International Conference on Computer Communications (INFOCOM 2010), IEEE, doi:10.1109/INFCOM.2010.5461911

work page doi:10.1109/infcom.2010.5461911 2010
[13]

In: Deductive Software Veriﬁcation - The KeY Book: From Theory to Practice , chapter 13, LNCS 10001, Springer, doi:10.1007/978-3-319- 49812-6_13

Christoph Scheben & Simon Greiner (2016): Information Flow Analysis. In: Deductive Software Veriﬁcation - The KeY Book: From Theory to Practice , chapter 13, LNCS 10001, Springer, doi:10.1007/978-3-319- 49812-6_13

work page doi:10.1007/978-3-319- 2016

[1] [1]

LNCS 10001, Springer, doi:10.1007/978-3-319-49812-6

Wolfgang Ahrendt et al., editors (2016): Deductive Software Veriﬁcation - The KeY Book: From Theory to Practice. LNCS 10001, Springer, doi:10.1007/978-3-319-49812-6

work page doi:10.1007/978-3-319-49812-6 2016

[2] [2]

In Umberto Grandi & Jeffrey S

Bernhard Beckert, Thorsten Bormer, Michael Kirsten, Till Neuber & Mattias Ulbrich (2016): Automated Veriﬁcation for Functional and Relational Properties of Voting Rules . In Umberto Grandi & Jeffrey S. Rosenschein, editors: Sixth International Workshop on Computational Social Choice (COMSOC 2016) , doi:10.5445/IR/1000092712

work page doi:10.5445/ir/1000092712 2016

[3] [3]

Bernhard Beckert, Sarah Grebing & Mattias Ulbrich (2017): An Interaction Concept for Program Veriﬁ- cation Systems with Explicit Proof Object . In Ofer Strichman & Rachel Tzoref-Brill, editors: 13th Inter- national Haifa Veriﬁcation Conference on Hardware and Software (HVC 2017) , LNCS 10629, Springer, doi:10.1007/978-3-319-70389-3_11

work page doi:10.1007/978-3-319-70389-3_11 2017

[4] [4]

In Michael J

Bernhard Beckert, Mattias Ulbrich, Birgit V ogel-Heuser & Alexander Weigl (2015): Regression Veriﬁca- tion for Programmable Logic Controller Software . In Michael J. Butler, Sylvain Conchon & Fatiha Za- ïdi, editors: 17th International Conference on Formal Engineering Methods (ICFEM 2015) , LNCS 9407, doi:10.1007/978-3-319-25423-4_15

work page doi:10.1007/978-3-319-25423-4_15 2015

[5] [5]

Clarkson & Fred B

Michael R. Clarkson & Fred B. Schneider (2010): Hyperproperties. Journal of Computer Security 18(6), doi:10.3233/JCS-2009-0393

work page doi:10.3233/jcs-2009-0393 2010

[6] [6]

In Gruia-Catalin Roman, William G

Holger Cleve & Andreas Zeller (2005): Locating Causes of Program Failures . In Gruia-Catalin Roman, William G. Griswold & Bashar Nuseibeh, editors: 27th International Conference on Software Engineering (ICSE 2005), ACM, doi:10.1145/1062455.1062522

work page doi:10.1145/1062455.1062522 2005

[7] [7]

In Rajeev Alur & Doron A

Alex Groce, Daniel Kroening & Flavio Lerda (2004): Understanding Counterexamples with explain. In Rajeev Alur & Doron A. Peled, editors:16th International Conference on Computer Aided Veriﬁcation (CA V 2004), LNCS 3114, Springer, doi:10.1007/978-3-540-27813-9_35

work page doi:10.1007/978-3-540-27813-9_35 2004

[8] [8]

In David Lo, Sven Apel & Sarfraz Khur- M

Martin Hentschel, Reiner Hähnle & Richard Bubel (2016): The Interactive Veriﬁcation Debugger: Effective Understanding of Interactive Proof Attempts . In David Lo, Sven Apel & Sarfraz Khur- M. Herda et al. 13 shid, editors: 31st International Conference on Automated Software Engineering (ASE 2016) , ACM, doi:10.1145/2970276.2970292

work page doi:10.1145/2970276.2970292 2016

[9] [9]

In Gruia-Catalin Roman & André van der Hoek, editors: 18th International Symposium on Foundations of Software Engineering (SIGSOFT FSE 2010), ACM, doi:10.1145/1882291.1882302

Nicholas Jalbert & Koushik Sen (2010): A Trace Simpliﬁcation Technique for Effective Debugging of Con- current Programs. In Gruia-Catalin Roman & André van der Hoek, editors: 18th International Symposium on Foundations of Software Engineering (SIGSOFT FSE 2010), ACM, doi:10.1145/1882291.1882302

work page doi:10.1145/1882291.1882302 2010

[10] [10]

Journal of Automated Reasoning 60(3), doi:10.1007/s10817-017-9433-5

Moritz Kiefer, Vladimir Klebanov & Mattias Ulbrich (2018): Relational Program Reasoning Using Com- piler IR - Combining Static Veriﬁcation and Dynamic Analysis . Journal of Automated Reasoning 60(3), doi:10.1007/s10817-017-9433-5

work page doi:10.1007/s10817-017-9433-5 2018

[11] [11]

In Umberto Grandi & Jeffrey S

Michael Kirsten & Olivier Cailloux (2018): Towards automatic argumentation about voting rules. In San- dra Bringay & Juliette Mattioli, editors: 4ème conférence sur les Applications Pratiques de l’Intelligence Artiﬁcielle (APIA 2018), doi:10.5445/IR/1000092711

work page doi:10.5445/ir/1000092711 2018

[12] [12]

Tian Lan, David T. H. Kao, Mung Chiang & Ashutosh Sabharwal (2010):An Axiomatic Theory of Fairness in Network Resource Allocation. In: 29th International Conference on Computer Communications (INFOCOM 2010), IEEE, doi:10.1109/INFCOM.2010.5461911

work page doi:10.1109/infcom.2010.5461911 2010

[13] [13]

In: Deductive Software Veriﬁcation - The KeY Book: From Theory to Practice , chapter 13, LNCS 10001, Springer, doi:10.1007/978-3-319- 49812-6_13

Christoph Scheben & Simon Greiner (2016): Information Flow Analysis. In: Deductive Software Veriﬁcation - The KeY Book: From Theory to Practice , chapter 13, LNCS 10001, Springer, doi:10.1007/978-3-319- 49812-6_13

work page doi:10.1007/978-3-319- 2016