A Minimal Executable Proof for Multi-Language Contract Traceability

Werner Kasselman

arxiv: 2605.28546 · v1 · pith:G45TQ34Snew · submitted 2026-05-27 · 💻 cs.SE

A Minimal Executable Proof for Multi-Language Contract Traceability

Werner Kasselman This is my paper

Pith reviewed 2026-06-29 10:36 UTC · model grok-4.3

classification 💻 cs.SE

keywords executable proofmulti-language contractDAG-TOMLwitness harnesstraceability chainevidence matrixreadiness gate

0 comments

The pith

A minimal executable proof links six language implementations to one strict output contract via witnesses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a deliberately small executable proof that ties six Hello World programs in Rust, Go, C, Java, TypeScript, and AWK to a single DAG-TOML contract. The contract demands an exact UTF-8 byte sequence of Hello, world! followed by newline, zero bytes on stderr, and exit code zero. A witness harness runs the implementations and reports five passes plus one skip for absent Java tools, with zero failures. Side witnesses further check source-level claims on modified versions in Go and AWK. The result is a compact, falsifiable artifact showing how a contract, implementation graph, traceability file, and evidence matrix can be verified together.

Core claim

The paper reports one observable-output contract, one implementation DAG, one traceability file, one readiness gate, and one evidence matrix, all exercised by a witness harness on six language implementations. The load-bearing contract specifies the precise UTF-8 sequence Hello, world!\n, zero stderr bytes, and exit status zero. On the reported runner the harness records five PASS results, one SKIP for Java due to missing javac/java on PATH, and zero FAIL results. Two narrower sidecar witnesses confirm that a convoluted Go rewrite remains detectable at the AST symbol level and that an indirect AWK rewrite is handled via a declared source profile.

What carries the argument

The DAG-TOML contract together with its witness harness, which executes each implementation and checks the exact required output bytes, stderr count, and exit code.

If this is right

A contract, DAG, traceability file, and evidence matrix can be checked together against executable witnesses.
Source-analysis claims on rewritten implementations remain verifiable at the AST symbol level even when the literal string is obscured.
Languages outside the sqry-backed validator set can still be covered through declared source profiles.
The same harness structure can serve as a readiness gate that records PASS, SKIP, or FAIL for each linked implementation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be applied to contracts that specify more complex observable behaviors beyond simple stdout and exit status.
Traceability chains of this form might integrate with existing build or review pipelines to produce machine-checkable evidence matrices.
Extending the harness to additional languages would require only new implementation entries and matching execution commands.

Load-bearing premise

The supplied implementations and harness correctly realize the stated contract requirements in an execution environment that matches the one described.

What would settle it

Running the witness harness on the six provided implementations and obtaining any result other than exactly five PASS outcomes, one SKIP for Java, and zero FAIL outcomes.

read the original abstract

This paper reports a deliberately small executable proof for a DAG-TOML contract: six "Hello, world!" implementations in Rust, Go, C, Java, TypeScript, and AWK are linked to one observable-output contract, one implementation DAG, one traceability file, one readiness gate, and one evidence matrix. The load-bearing contract requires the exact UTF-8 byte sequence `Hello, world!\n`, zero stderr bytes, and exit code 0. On the runner used for this paper, the witness harness reported five PASS outcomes, one SKIP for Java because `javac/java` was not on `PATH`, and zero FAIL outcomes. Two sidecar witnesses exercise narrower source-analysis claims: a convoluted Go rewrite hides the contiguous greeting literal but remains visible to sqry at the declared AST symbol and simple-edge level, while an indirect AWK rewrite uses a declared source profile because AWK is not in the repository's sqry-backed validator language set. The contribution is not a benchmark, a claim of general semantic equivalence, or a production assurance system. It is a compact, falsifiable artifact that shows how a contract, implementation graph, traceability chain, and review gate can be checked against executable witnesses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a compact runnable demo of multi-language traceability for one hello-world contract, with no broader claims.

read the letter

The main takeaway is that the paper ships a small, checkable artifact: six hello-world programs in Rust, Go, C, Java, TypeScript, and AWK wired to a single DAG-TOML contract that demands exact UTF-8 output, zero stderr, and exit code zero. The harness reports five passes and one Java skip, plus two side witnesses that test source-level visibility on a rewritten Go version and an AWK profile.

What is new is the concrete packaging. The combination of the implementation DAG, traceability file, readiness gate, evidence matrix, and executable witnesses in one falsifiable bundle does not appear in the cited prior work. The paper does well at staying honest about limits and at making the whole thing runnable so others can inspect or break it.

The soft spots are exactly what the narrow scope produces. This tests only trivial output matching on hello-world code, so it does not speak to complex contracts, semantic equivalence, or real-world scale. The Java skip shows environment sensitivity, and the side witnesses are illustrative rather than exhaustive. Those are design choices, not hidden flaws, but they cap how far the result travels.

This is for readers who want a minimal reference example in contract traceability or multi-language verification work. It is not aimed at practitioners needing production tooling.

I would send it to peer review. The executable harness makes the central claim directly testable, and the modest framing avoids overreach.

Referee Report

0 major / 0 minor

Summary. The paper reports a deliberately minimal executable proof consisting of one DAG-TOML contract that enforces the exact UTF-8 byte sequence `Hello, world! `, zero stderr bytes, and exit code 0. This contract is linked to six language implementations (Rust, Go, C, Java, TypeScript, AWK), an implementation DAG, a traceability file, a readiness gate, and an evidence matrix. The witness harness produces five PASS outcomes, one SKIP (Java, missing javac/java on PATH), and zero FAIL outcomes; two sidecar witnesses address narrower source-analysis claims for a convoluted Go rewrite and an indirect AWK rewrite.

Significance. If the reported harness and implementations match the stated contract, the work supplies a compact, falsifiable, and reproducible artifact demonstrating how a contract, DAG, traceability chain, and review gate can be checked by executable witnesses across languages. This is a concrete contribution to software engineering traceability research; credit is due for the executable nature of the demonstration and the explicit disclaimers that no benchmarks, general semantic equivalence, or production claims are made.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment and the recommendation to accept. The referee's summary correctly describes the deliberately minimal scope of the executable proof, the harness outcomes, and the explicit disclaimers regarding benchmarks and general claims.

Circularity Check

0 steps flagged

No significant circularity; executable demonstration is self-contained

full rationale

The paper contains no equations, fitted parameters, predictions, or derivation chain. Its central claim is the existence of a compact, falsifiable artifact (DAG-TOML contract + implementations + harness) whose outcomes (5 PASS / 1 SKIP / 0 FAIL) are directly reported from execution. This is externally checkable against the stated contract requirements and does not reduce to any self-referential input by construction. No self-citations or uniqueness theorems are invoked as load-bearing steps. The text explicitly disclaims broader claims, making the result a narrow demonstration rather than a derived result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be extracted beyond the implicit assumption that the described harness and contract are faithfully implemented.

pith-pipeline@v0.9.1-grok · 5734 in / 1108 out tokens · 32159 ms · 2026-06-29T10:36:25.313413+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 8 canonical work pages

[1]

Empirical standards for conducting and evaluating research in software engineering, 2026

ACM SIGSOFT. Empirical standards for conducting and evaluating research in software engineering, 2026. URLhttps://www2.sigsoft.org/EmpiricalStandards/

2026
[2]

Submit TeX/LaTeX, 2026

arXiv. Submit TeX/LaTeX, 2026. URLhttps://info.arxiv.org/help/submit_tex.html

2026
[3]

TeX live at arxiv, 2026

arXiv. TeX live at arxiv, 2026. URLhttps://info.arxiv.org/help/faq/texlive.html

2026
[4]

Artifact review and badging version 1.1, Au- gust 2020

Association for Computing Machinery. Artifact review and badging version 1.1, Au- gust 2020. URL https://www.acm.org/publications/policies/artifact-review-and- badging-current

2020
[5]

Hayden Cheers, Yuqing Lin, and Shamus P. Smith. Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications, 2021. URL https://arxiv.org/abs/2102.03997

work page arXiv 2021
[6]

Fine-grained and accurate source code differencing

Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. Fine-grained and accurate source code differencing. InProceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pages 313–324, 2014. doi: 10.1145/2642937.2642982. 7

work page doi:10.1145/2642937.2642982 2014
[7]

Validity threats in empirical software engineering research: An initial survey

Robert Feldt and Ana Magazinius. Validity threats in empirical software engineering research: An initial survey. InProceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering, pages 374–379, 2010

2010
[8]

Hoffman and Richard T

Daniel M. Hoffman and Richard T. Snodgrass. Trace specifications: Methodology and models. IEEE Transactions on Software Engineering, 14(9):1243–1252, 1988. doi: 10.1109/32.6168

work page doi:10.1109/32.6168 1988
[9]

DECKARD: Scalable and accurate tree-based detection of code clones

Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stéphane Glondu. DECKARD: Scalable and accurate tree-based detection of code clones. InProceedings of the 29th International Conference on Software Engineering, pages 96–105, 2007. doi: 10.1109/ICSE.2007.30

work page doi:10.1109/icse.2007.30 2007
[10]

POPL 2023 artifact evaluation, 2023

POPL 2023 Artifact Evaluation Committee. POPL 2023 artifact evaluation, 2023. URL https://popl23.sigplan.org/track/POPL-2023-artifact-evaluation

2023
[11]

Finding plagiarisms among a set of programs with JPlag.Journal of Universal Computer Science, 8(11):1016–1038, 2002

Lutz Prechelt, Guido Malpohl, and Michael Philippsen. Finding plagiarisms among a set of programs with JPlag.Journal of Universal Computer Science, 8(11):1016–1038, 2002. doi: 10.3217/jucs-008-11-1016

work page doi:10.3217/jucs-008-11-1016 2002
[12]

Furia, Daniel Graziotin, Peng He, et al

Paul Ralph, Sebastian Baltes, Domenico Bianculli, Yvonne Dittrich, Michael Felderer, Robert Feldt, Antonio Filieri, Carlo A. Furia, Daniel Graziotin, Peng He, et al. Empirical standards for software engineering research, 2020. URLhttps://arxiv.org/abs/2010.03525

work page arXiv 2020
[13]

Roy, James R

Chanchal K. Roy, James R. Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach.Science of Computer Programming, 74(7):470–495, 2009. doi: 10.1016/j.scico.2009.02.007

work page doi:10.1016/j.scico.2009.02.007 2009
[14]

Guidelines for conducting and reporting case study research in software engineering.Empirical Software Engineering, 14(2):131–164, 2009

Per Runeson and Martin Höst. Guidelines for conducting and reporting case study research in software engineering.Empirical Software Engineering, 14(2):131–164, 2009. doi: 10.1007/s10664- 008-9102-8

work page doi:10.1007/s10664- 2009
[15]

Wilkerson, and Alex Aiken

Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. Winnowing: Local algorithms for document fingerprinting. InProceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 76–85, 2003. 8

2003

[1] [1]

Empirical standards for conducting and evaluating research in software engineering, 2026

ACM SIGSOFT. Empirical standards for conducting and evaluating research in software engineering, 2026. URLhttps://www2.sigsoft.org/EmpiricalStandards/

2026

[2] [2]

Submit TeX/LaTeX, 2026

arXiv. Submit TeX/LaTeX, 2026. URLhttps://info.arxiv.org/help/submit_tex.html

2026

[3] [3]

TeX live at arxiv, 2026

arXiv. TeX live at arxiv, 2026. URLhttps://info.arxiv.org/help/faq/texlive.html

2026

[4] [4]

Artifact review and badging version 1.1, Au- gust 2020

Association for Computing Machinery. Artifact review and badging version 1.1, Au- gust 2020. URL https://www.acm.org/publications/policies/artifact-review-and- badging-current

2020

[5] [5]

Hayden Cheers, Yuqing Lin, and Shamus P. Smith. Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications, 2021. URL https://arxiv.org/abs/2102.03997

work page arXiv 2021

[6] [6]

Fine-grained and accurate source code differencing

Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. Fine-grained and accurate source code differencing. InProceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pages 313–324, 2014. doi: 10.1145/2642937.2642982. 7

work page doi:10.1145/2642937.2642982 2014

[7] [7]

Validity threats in empirical software engineering research: An initial survey

Robert Feldt and Ana Magazinius. Validity threats in empirical software engineering research: An initial survey. InProceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering, pages 374–379, 2010

2010

[8] [8]

Hoffman and Richard T

Daniel M. Hoffman and Richard T. Snodgrass. Trace specifications: Methodology and models. IEEE Transactions on Software Engineering, 14(9):1243–1252, 1988. doi: 10.1109/32.6168

work page doi:10.1109/32.6168 1988

[9] [9]

DECKARD: Scalable and accurate tree-based detection of code clones

Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stéphane Glondu. DECKARD: Scalable and accurate tree-based detection of code clones. InProceedings of the 29th International Conference on Software Engineering, pages 96–105, 2007. doi: 10.1109/ICSE.2007.30

work page doi:10.1109/icse.2007.30 2007

[10] [10]

POPL 2023 artifact evaluation, 2023

POPL 2023 Artifact Evaluation Committee. POPL 2023 artifact evaluation, 2023. URL https://popl23.sigplan.org/track/POPL-2023-artifact-evaluation

2023

[11] [11]

Finding plagiarisms among a set of programs with JPlag.Journal of Universal Computer Science, 8(11):1016–1038, 2002

Lutz Prechelt, Guido Malpohl, and Michael Philippsen. Finding plagiarisms among a set of programs with JPlag.Journal of Universal Computer Science, 8(11):1016–1038, 2002. doi: 10.3217/jucs-008-11-1016

work page doi:10.3217/jucs-008-11-1016 2002

[12] [12]

Furia, Daniel Graziotin, Peng He, et al

Paul Ralph, Sebastian Baltes, Domenico Bianculli, Yvonne Dittrich, Michael Felderer, Robert Feldt, Antonio Filieri, Carlo A. Furia, Daniel Graziotin, Peng He, et al. Empirical standards for software engineering research, 2020. URLhttps://arxiv.org/abs/2010.03525

work page arXiv 2020

[13] [13]

Roy, James R

Chanchal K. Roy, James R. Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach.Science of Computer Programming, 74(7):470–495, 2009. doi: 10.1016/j.scico.2009.02.007

work page doi:10.1016/j.scico.2009.02.007 2009

[14] [14]

Guidelines for conducting and reporting case study research in software engineering.Empirical Software Engineering, 14(2):131–164, 2009

Per Runeson and Martin Höst. Guidelines for conducting and reporting case study research in software engineering.Empirical Software Engineering, 14(2):131–164, 2009. doi: 10.1007/s10664- 008-9102-8

work page doi:10.1007/s10664- 2009

[15] [15]

Wilkerson, and Alex Aiken

Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. Winnowing: Local algorithms for document fingerprinting. InProceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 76–85, 2003. 8

2003