pith. sign in

arxiv: 2605.28546 · v1 · pith:G45TQ34Snew · submitted 2026-05-27 · 💻 cs.SE

A Minimal Executable Proof for Multi-Language Contract Traceability

Pith reviewed 2026-06-29 10:36 UTC · model grok-4.3

classification 💻 cs.SE
keywords executable proofmulti-language contractDAG-TOMLwitness harnesstraceability chainevidence matrixreadiness gate
0
0 comments X

The pith

A minimal executable proof links six language implementations to one strict output contract via witnesses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a deliberately small executable proof that ties six Hello World programs in Rust, Go, C, Java, TypeScript, and AWK to a single DAG-TOML contract. The contract demands an exact UTF-8 byte sequence of Hello, world! followed by newline, zero bytes on stderr, and exit code zero. A witness harness runs the implementations and reports five passes plus one skip for absent Java tools, with zero failures. Side witnesses further check source-level claims on modified versions in Go and AWK. The result is a compact, falsifiable artifact showing how a contract, implementation graph, traceability file, and evidence matrix can be verified together.

Core claim

The paper reports one observable-output contract, one implementation DAG, one traceability file, one readiness gate, and one evidence matrix, all exercised by a witness harness on six language implementations. The load-bearing contract specifies the precise UTF-8 sequence Hello, world!\n, zero stderr bytes, and exit status zero. On the reported runner the harness records five PASS results, one SKIP for Java due to missing javac/java on PATH, and zero FAIL results. Two narrower sidecar witnesses confirm that a convoluted Go rewrite remains detectable at the AST symbol level and that an indirect AWK rewrite is handled via a declared source profile.

What carries the argument

The DAG-TOML contract together with its witness harness, which executes each implementation and checks the exact required output bytes, stderr count, and exit code.

If this is right

  • A contract, DAG, traceability file, and evidence matrix can be checked together against executable witnesses.
  • Source-analysis claims on rewritten implementations remain verifiable at the AST symbol level even when the literal string is obscured.
  • Languages outside the sqry-backed validator set can still be covered through declared source profiles.
  • The same harness structure can serve as a readiness gate that records PASS, SKIP, or FAIL for each linked implementation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be applied to contracts that specify more complex observable behaviors beyond simple stdout and exit status.
  • Traceability chains of this form might integrate with existing build or review pipelines to produce machine-checkable evidence matrices.
  • Extending the harness to additional languages would require only new implementation entries and matching execution commands.

Load-bearing premise

The supplied implementations and harness correctly realize the stated contract requirements in an execution environment that matches the one described.

What would settle it

Running the witness harness on the six provided implementations and obtaining any result other than exactly five PASS outcomes, one SKIP for Java, and zero FAIL outcomes.

read the original abstract

This paper reports a deliberately small executable proof for a DAG-TOML contract: six "Hello, world!" implementations in Rust, Go, C, Java, TypeScript, and AWK are linked to one observable-output contract, one implementation DAG, one traceability file, one readiness gate, and one evidence matrix. The load-bearing contract requires the exact UTF-8 byte sequence `Hello, world!\n`, zero stderr bytes, and exit code 0. On the runner used for this paper, the witness harness reported five PASS outcomes, one SKIP for Java because `javac/java` was not on `PATH`, and zero FAIL outcomes. Two sidecar witnesses exercise narrower source-analysis claims: a convoluted Go rewrite hides the contiguous greeting literal but remains visible to sqry at the declared AST symbol and simple-edge level, while an indirect AWK rewrite uses a declared source profile because AWK is not in the repository's sqry-backed validator language set. The contribution is not a benchmark, a claim of general semantic equivalence, or a production assurance system. It is a compact, falsifiable artifact that shows how a contract, implementation graph, traceability chain, and review gate can be checked against executable witnesses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 0 minor

Summary. The paper reports a deliberately minimal executable proof consisting of one DAG-TOML contract that enforces the exact UTF-8 byte sequence `Hello, world! `, zero stderr bytes, and exit code 0. This contract is linked to six language implementations (Rust, Go, C, Java, TypeScript, AWK), an implementation DAG, a traceability file, a readiness gate, and an evidence matrix. The witness harness produces five PASS outcomes, one SKIP (Java, missing javac/java on PATH), and zero FAIL outcomes; two sidecar witnesses address narrower source-analysis claims for a convoluted Go rewrite and an indirect AWK rewrite.

Significance. If the reported harness and implementations match the stated contract, the work supplies a compact, falsifiable, and reproducible artifact demonstrating how a contract, DAG, traceability chain, and review gate can be checked by executable witnesses across languages. This is a concrete contribution to software engineering traceability research; credit is due for the executable nature of the demonstration and the explicit disclaimers that no benchmarks, general semantic equivalence, or production claims are made.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment and the recommendation to accept. The referee's summary correctly describes the deliberately minimal scope of the executable proof, the harness outcomes, and the explicit disclaimers regarding benchmarks and general claims.

Circularity Check

0 steps flagged

No significant circularity; executable demonstration is self-contained

full rationale

The paper contains no equations, fitted parameters, predictions, or derivation chain. Its central claim is the existence of a compact, falsifiable artifact (DAG-TOML contract + implementations + harness) whose outcomes (5 PASS / 1 SKIP / 0 FAIL) are directly reported from execution. This is externally checkable against the stated contract requirements and does not reduce to any self-referential input by construction. No self-citations or uniqueness theorems are invoked as load-bearing steps. The text explicitly disclaims broader claims, making the result a narrow demonstration rather than a derived result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be extracted beyond the implicit assumption that the described harness and contract are faithfully implemented.

pith-pipeline@v0.9.1-grok · 5734 in / 1108 out tokens · 32159 ms · 2026-06-29T10:36:25.313413+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 8 canonical work pages

  1. [1]

    Empirical standards for conducting and evaluating research in software engineering, 2026

    ACM SIGSOFT. Empirical standards for conducting and evaluating research in software engineering, 2026. URLhttps://www2.sigsoft.org/EmpiricalStandards/

  2. [2]

    Submit TeX/LaTeX, 2026

    arXiv. Submit TeX/LaTeX, 2026. URLhttps://info.arxiv.org/help/submit_tex.html

  3. [3]

    TeX live at arxiv, 2026

    arXiv. TeX live at arxiv, 2026. URLhttps://info.arxiv.org/help/faq/texlive.html

  4. [4]

    Artifact review and badging version 1.1, Au- gust 2020

    Association for Computing Machinery. Artifact review and badging version 1.1, Au- gust 2020. URL https://www.acm.org/publications/policies/artifact-review-and- badging-current

  5. [5]

    Hayden Cheers, Yuqing Lin, and Shamus P. Smith. Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications, 2021. URL https://arxiv.org/abs/2102.03997

  6. [6]

    Fine-grained and accurate source code differencing

    Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. Fine-grained and accurate source code differencing. InProceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, pages 313–324, 2014. doi: 10.1145/2642937.2642982. 7

  7. [7]

    Validity threats in empirical software engineering research: An initial survey

    Robert Feldt and Ana Magazinius. Validity threats in empirical software engineering research: An initial survey. InProceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering, pages 374–379, 2010

  8. [8]

    Hoffman and Richard T

    Daniel M. Hoffman and Richard T. Snodgrass. Trace specifications: Methodology and models. IEEE Transactions on Software Engineering, 14(9):1243–1252, 1988. doi: 10.1109/32.6168

  9. [9]

    DECKARD: Scalable and accurate tree-based detection of code clones

    Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stéphane Glondu. DECKARD: Scalable and accurate tree-based detection of code clones. InProceedings of the 29th International Conference on Software Engineering, pages 96–105, 2007. doi: 10.1109/ICSE.2007.30

  10. [10]

    POPL 2023 artifact evaluation, 2023

    POPL 2023 Artifact Evaluation Committee. POPL 2023 artifact evaluation, 2023. URL https://popl23.sigplan.org/track/POPL-2023-artifact-evaluation

  11. [11]

    Finding plagiarisms among a set of programs with JPlag.Journal of Universal Computer Science, 8(11):1016–1038, 2002

    Lutz Prechelt, Guido Malpohl, and Michael Philippsen. Finding plagiarisms among a set of programs with JPlag.Journal of Universal Computer Science, 8(11):1016–1038, 2002. doi: 10.3217/jucs-008-11-1016

  12. [12]

    Furia, Daniel Graziotin, Peng He, et al

    Paul Ralph, Sebastian Baltes, Domenico Bianculli, Yvonne Dittrich, Michael Felderer, Robert Feldt, Antonio Filieri, Carlo A. Furia, Daniel Graziotin, Peng He, et al. Empirical standards for software engineering research, 2020. URLhttps://arxiv.org/abs/2010.03525

  13. [13]

    Roy, James R

    Chanchal K. Roy, James R. Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach.Science of Computer Programming, 74(7):470–495, 2009. doi: 10.1016/j.scico.2009.02.007

  14. [14]

    Guidelines for conducting and reporting case study research in software engineering.Empirical Software Engineering, 14(2):131–164, 2009

    Per Runeson and Martin Höst. Guidelines for conducting and reporting case study research in software engineering.Empirical Software Engineering, 14(2):131–164, 2009. doi: 10.1007/s10664- 008-9102-8

  15. [15]

    Wilkerson, and Alex Aiken

    Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. Winnowing: Local algorithms for document fingerprinting. InProceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 76–85, 2003. 8