pith. sign in

arxiv: 2605.12981 · v3 · pith:T5PHJO3Ynew · submitted 2026-05-13 · 💻 cs.SE · cs.AI· cs.LG

Protocol-Driven Development: Governing Generated Software Through Invariants and Continuous Evidence

Pith reviewed 2026-05-20 21:57 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.LG
keywords protocol-driven developmentsoftware governanceprogram synthesisinvariant enforcementevidence chainruntime verificationdynamic evidence ledger
0
0 comments X

The pith

Protocols defined as invariant triplets become the primary artifact that admits or rejects generated implementations based on verifiable compliance evidence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Protocol-Driven Development to solve the governance problem created by automated program synthesis. Natural-language specifications remain ambiguous and tests only sample limited behaviors, so neither alone can reliably decide which generated artifacts are safe to use. PDD instead treats a machine-enforceable protocol as the durable control boundary. The protocol is expressed as the triplet of structural, behavioral, and operational invariants whose conjunction defines the admissible space for any implementation. An implementation is accepted only when it satisfies the protocol and produces a signed Evidence Chain that records how compliance was established, with the chain later extended into a live Dynamic Evidence Ledger that continuously attests runtime behavior.

Core claim

Under PDD the primary software artifact is the protocol P = (S, B, O) that encodes structural, behavioral, and operational invariants. Implementations are replaceable realizations found by constrained search; they are admitted solely when they satisfy the protocol and emit a verifiable Evidence Chain of compliance. For running systems the chain extends into a Dynamic Evidence Ledger where runtime verifiers append signed observations and violations, thereby connecting live failures back to the generation loop without granting the generator any ongoing runtime authority.

What carries the argument

The protocol triplet P = (S, B, O) together with the Evidence Chain that records protocol satisfaction and the Dynamic Evidence Ledger that extends the chain with runtime attestations.

If this is right

  • Code becomes transient and replaceable provided it continues to meet the protocol.
  • Runtime failures can be traced directly back to the synthesis process through the ledger without trusting the generator.
  • Governance can combine formal methods, property-based testing, runtime verification, and provenance records under one protocol boundary.
  • Monitorable obligations remain continuously attested after deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The model could allow organizations to accept synthesized components in safety-critical settings by maintaining independent evidence rather than vendor trust.
  • Ledger data might later be used to refine or tighten the original protocol invariants in a feedback loop.
  • Similar evidence-ledger techniques could apply to other automated engineering domains where specifications must outlast any particular implementation.

Load-bearing premise

That the three-part protocol can be written completely enough and made machine-enforceable enough to serve as the sole lasting boundary for admissible behavior.

What would settle it

A concrete counter-example would be a generated implementation that passes all protocol invariants and produces a complete Evidence Chain yet still produces runtime behavior outside the intended admissible space that the protocol failed to capture.

read the original abstract

Automated program synthesis lowers the cost of producing implementations but introduces a harder governance problem: determining which generated artifacts are admissible. Natural-language specifications are ambiguous, and example-based tests sample only part of the behavioral space. Used alone, neither provides a sufficient control boundary. We introduce Protocol-Driven Development (PDD), where the primary software artifact is a machine-enforceable protocol rather than code. We define a protocol as the triplet P = (S, B, O), specifying structural, behavioral, and operational invariants. Their conjunction defines the admissible implementation space of a software component. Under PDD, implementations are replaceable realizations discovered through constrained search. An implementation is admitted only if it satisfies the protocol and produces a verifiable Evidence Chain of compliance. Admission is grounded in protocol satisfaction and recorded evidence rather than trust in the generator. For deployed systems, we extend the Evidence Chain into a Dynamic Evidence Ledger. Runtime verifiers append signed observations, invariant checks, and violations to the ledger, allowing monitorable obligations to be continuously attested. This connects live failures back to the generation loop without granting the generator runtime authority. Combining formal methods, property testing, runtime verification, policy-as-code, and software provenance, PDD defines a governance model for automated software engineering. Its organizing principle is that code is transient, while the protocol carries durable authority.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Protocol-Driven Development (PDD) as a governance framework for automated software synthesis. The primary artifact is a machine-enforceable protocol defined as the triplet P = (S, B, O) capturing structural, behavioral, and operational invariants; their conjunction is asserted to define the admissible implementation space. Implementations are discovered via constrained search and admitted only upon protocol satisfaction plus production of a verifiable Evidence Chain. The model extends to runtime via a Dynamic Evidence Ledger that records signed observations and violations, enabling continuous attestation while connecting failures back to the generation process without granting the generator runtime authority. PDD integrates formal methods, property testing, runtime verification, policy-as-code, and provenance techniques, with the organizing principle that code is transient while the protocol carries durable authority.

Significance. If the protocol triplet can be shown to be sufficiently complete and machine-enforceable, PDD would provide a coherent alternative to natural-language specifications and partial test suites for controlling generated code. The emphasis on replaceable implementations, evidence-based admission, and runtime ledgers aligns with emerging needs in AI-assisted software engineering and could reduce reliance on trust in generators. The explicit synthesis of existing verification approaches is a constructive contribution, though the conceptual nature of the proposal limits immediate impact without further elaboration.

major comments (2)
  1. [Abstract] Abstract / Definition of protocol triplet: The claim that the conjunction of S, B, and O invariants suffices to eliminate specification gaps and serves as the sole durable control boundary is central to the argument but rests on definitional assertion. No formal syntax or semantics for the triplet components is supplied, nor is there a worked example demonstrating completeness for a non-trivial component (e.g., a file system or network protocol). This leaves open the possibility of residual underspecification, which directly affects whether implementations can still violate intent while satisfying the protocol.
  2. [Abstract] Abstract / Evidence Chain and Dynamic Evidence Ledger: The mechanism by which an Evidence Chain is produced during constrained search and then extended at runtime into a Dynamic Evidence Ledger is described at a high level but lacks any indication of how violations are formally linked back to the original protocol invariants or generation loop. Without a concrete representation or verification procedure, it is difficult to evaluate whether the ledger actually provides the claimed monitorable obligations.
minor comments (2)
  1. [Abstract] The integration of prior techniques (formal methods, property testing, runtime verification, etc.) is mentioned but would benefit from explicit citations to representative prior work in each area to clarify the novel synthesis.
  2. [Abstract] Notation for the triplet P = (S, B, O) and the Evidence Chain could be introduced with a small illustrative diagram or pseudocode fragment to improve readability for readers unfamiliar with the combination of these concepts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review of our manuscript on Protocol-Driven Development. The comments correctly identify areas where the conceptual presentation would benefit from greater precision and concrete illustration. We respond to each major comment below and indicate the changes we will make in revision.

read point-by-point responses
  1. Referee: [Abstract] Abstract / Definition of protocol triplet: The claim that the conjunction of S, B, and O invariants suffices to eliminate specification gaps and serves as the sole durable control boundary is central to the argument but rests on definitional assertion. No formal syntax or semantics for the triplet components is supplied, nor is there a worked example demonstrating completeness for a non-trivial component (e.g., a file system or network protocol). This leaves open the possibility of residual underspecification, which directly affects whether implementations can still violate intent while satisfying the protocol.

    Authors: We agree that the abstract states the completeness claim at a definitional level. The body of the manuscript motivates the triplet by mapping structural invariants to interface and data constraints, behavioral invariants to property-testable specifications, and operational invariants to runtime policy constraints, drawing on existing techniques from formal methods and policy-as-code. However, the manuscript does not supply a formal syntax or semantics for the components, nor does it include a worked example for a non-trivial component. We will revise the paper to add a formal sketch of the triplet using a simple predicate-logic notation and to include a worked example for a simplified file-system component that shows how the three classes of invariants are instantiated and why their conjunction is intended to define the admissible space. The revision will also discuss the possibility of residual underspecification and the role of the evidence chain in mitigating it. revision: yes

  2. Referee: [Abstract] Abstract / Evidence Chain and Dynamic Evidence Ledger: The mechanism by which an Evidence Chain is produced during constrained search and then extended at runtime into a Dynamic Evidence Ledger is described at a high level but lacks any indication of how violations are formally linked back to the original protocol invariants or generation loop. Without a concrete representation or verification procedure, it is difficult to evaluate whether the ledger actually provides the claimed monitorable obligations.

    Authors: The manuscript describes the Evidence Chain as the verifiable record generated when an implementation satisfies the protocol during constrained search and the Dynamic Evidence Ledger as the runtime extension that appends signed observations and violations. We acknowledge that the current text does not provide a concrete representation of evidence records or a verification procedure that formally links a violation to specific invariants and back to the generation constraints. In the revised manuscript we will specify a record structure that includes cryptographic references to invariant identifiers and generation metadata, and we will outline a verification procedure that traces a reported violation through the ledger to the original protocol triplet and synthesis constraints. This elaboration will make the claimed monitorable obligations more concrete and evaluable. revision: yes

Circularity Check

0 steps flagged

Conceptual governance model without circular derivations or self-referential reductions

full rationale

The paper presents Protocol-Driven Development as a high-level framework in which a protocol is defined as the triplet P = (S, B, O) whose conjunction is asserted to delimit the admissible implementation space, with admission conditioned on protocol satisfaction plus an Evidence Chain. No equations, fitted parameters, predictions, or mathematical derivations appear in the text. The central claims are organizing principles that combine existing techniques (formal methods, property testing, runtime verification, policy-as-code, software provenance) rather than results derived from prior results by the same authors. Because the manuscript supplies no load-bearing self-citation chain, no ansatz smuggled via citation, and no renaming of known results as new derivations, the proposal remains self-contained and does not reduce any claimed outcome to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The proposal rests on the domain assumption that invariants can be expressed in a machine-enforceable form that is both complete enough to replace code-level trust and practical to maintain; no free parameters or invented physical entities are introduced.

axioms (1)
  • domain assumption Machine-enforceable protocols can be defined that are sufficiently complete to govern admissible implementations without the ambiguity of natural-language specifications or the sampling limits of tests.
    Invoked in the abstract when stating that the conjunction of structural, behavioral, and operational invariants defines the admissible implementation space.
invented entities (3)
  • Protocol triplet P = (S, B, O) no independent evidence
    purpose: To specify structural, behavioral, and operational invariants as the primary software artifact.
    Introduced as the core definitional construct; no independent evidence provided in abstract.
  • Evidence Chain no independent evidence
    purpose: Verifiable record of protocol compliance for admission of implementations.
    New ledger-style construct for grounding admission decisions.
  • Dynamic Evidence Ledger no independent evidence
    purpose: Runtime extension that appends signed observations and violations for continuous attestation.
    Extends the evidence concept to deployed systems.

pith-pipeline@v0.9.0 · 5764 in / 1498 out tokens · 69365 ms · 2026-05-20T21:57:00.941593+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

  1. [1]

    Beck, Kent , title =

  2. [2]

    Hoare, C. A. R. , title =. Communications of the ACM , volume =. 1969 , doi =

  3. [3]

    and Grumberg, Orna and Peled, Doron A

    Clarke, Edmund M. and Grumberg, Orna and Peled, Doron A. , title =

  4. [4]

    Lamport, Leslie , title =

  5. [5]

    ACM Transactions on Software Engineering and Methodology , volume =

    Jackson, Daniel , title =. ACM Transactions on Software Engineering and Methodology , volume =. 2002 , doi =

  6. [6]

    Meyer, Bertrand , title =

  7. [7]

    Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming , pages =

    Claessen, Koen and Hughes, John , title =. Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming , pages =. 2000 , doi =

  8. [8]

    The Journal of Logic and Algebraic Programming , volume =

    Leucker, Martin and Schallhart, Christian , title =. The Journal of Logic and Algebraic Programming , volume =. 2009 , doi =

  9. [9]

    An Overview of the Runtime Verification Tool Java PathExplorer , journal =

    Havelund, Klaus and Ro. An Overview of the Runtime Verification Tool Java PathExplorer , journal =. 2004 , doi =

  10. [10]

    2020 , doi =

    Rose, Scott and Borchert, Oliver and Mitchell, Stu and Connelly, Sean , title =. 2020 , doi =

  11. [11]

    OpenAPI Specification Version 3.1.0 , year =

  12. [12]

    JSON Schema Draft 2020-12 , year =

  13. [13]

    Protocol Buffers , year =

  14. [14]

    Open Policy Agent , year =

  15. [15]

    Supply-chain Levels for Software Artifacts , year =

  16. [16]

    28th USENIX Security Symposium , pages =

    Torres-Arias, Santiago and Afzali, Hammad and Kuppusamy, Trishank Karthik and Curtmola, Reza and Cappos, Justin , title =. 28th USENIX Security Symposium , pages =. 2019 , publisher =

  17. [17]

    Morris, Kief , title =

  18. [18]

    What Is Terraform? , year =

  19. [19]

    Middleware 2013 , series =

    Hummer, Waldemar and Rosenberg, Florian and Oliveira, Fabio and Eilam, Tamar , title =. Middleware 2013 , series =. 2013 , doi =

  20. [20]

    2021 , eprint =

    Chen, Mark and others , title =. 2021 , eprint =

  21. [21]

    2023 , eprint =

    Peng, Sida and Kalliamvakou, Eirini and Cihon, Peter and Demirer, Mert , title =. 2023 , eprint =

  22. [22]

    and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik R

    Jimenez, Carlos E. and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik R. , title =. The Twelfth International Conference on Learning Representations , year =

  23. [23]

    and Wettig, Alexander and Lieret, Kilian and Yao, Shunyu and Narasimhan, Karthik R

    Yang, John and Jimenez, Carlos E. and Wettig, Alexander and Lieret, Kilian and Yao, Shunyu and Narasimhan, Karthik R. and Press, Ofir , title =. Advances in Neural Information Processing Systems , volume =. 2024 , url =

  24. [24]

    2024 , howpublished =

    Wu, Scott , title =. 2024 , howpublished =

  25. [25]

    2017 , howpublished =

    Karpathy, Andrej , title =. 2017 , howpublished =

  26. [26]

    2025 , howpublished =

    Karpathy, Andrej , title =. 2025 , howpublished =

  27. [27]

    2026 , note =

    He, Jun and Yu, Deying , title =. 2026 , note =

  28. [28]

    and Felleisen, Matthias , title =

    Wright, Andrew K. and Felleisen, Matthias , title =. Information and Computation , volume =. 1994 , doi =

  29. [29]

    Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation , pages =

    Freeman, Tim and Pfenning, Frank , title =. Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation , pages =

  30. [30]

    Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation , pages =

    Rondon, Patrick Maxim and Kawaguchi, Ming and Jhala, Ranjit , title =. Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation , pages =. 2008 , doi =

  31. [31]

    and Jhala, Ranjit and Vytiniotis, Dimitrios and Peyton Jones, Simon L

    Vazou, Niki and Seidel, Eric L. and Jhala, Ranjit and Vytiniotis, Dimitrios and Peyton Jones, Simon L. , title =. Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming , pages =. 2014 , doi =

  32. [32]

    Proceedings of the ACM on Programming Languages , volume =

    Borkowski, Michael and Vazou, Niki and Jhala, Ranjit , title =. Proceedings of the ACM on Programming Languages , volume =. 2024 , doi =

  33. [33]

    Dependent Types and Multi-Monadic Effects in

    Swamy, Nikhil and Hritcu, Catalin and Keller, Chantal and Rastogi, Aseem and Delignat-Lavaud, Antoine and Forest, Simon and Bhargavan, Karthikeyan and Fournet, C. Dependent Types and Multi-Monadic Effects in. Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages , pages =. 2016 , doi =

  34. [34]

    Rustan M

    Leino, K. Rustan M. , title =. Logic for Programming, Artificial Intelligence, and Reasoning , series =. 2010 , doi =

  35. [35]

    Proceedings of the ACM on Programming Languages , volume =

    Lattuada, Andrea and Hance, Travis and Cho, Chanhee and Brun, Matthias and Subasinghe, Isitha and Zhou, Yi and Howell, Jon and Parno, Bryan and Hawblitzel, Chris , title =. Proceedings of the ACM on Programming Languages , volume =. 2023 , doi =

  36. [36]

    , title =

    Necula, George C. , title =. Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages , pages =. 1997 , doi =

  37. [37]

    , title =

    Appel, Andrew W. , title =. Proceedings of the 16th Annual IEEE Symposium on Logic in Computer Science , pages =. 2001 , doi =

  38. [38]

    Automated Program Refinement: Guide and Verify Code Large Language Model with Refinement Calculus , journal =

    Cai, Yufan and H. Automated Program Refinement: Guide and Verify Code Large Language Model with Refinement Calculus , journal =. 2025 , doi =

  39. [39]

    2025 , eprint =

    Zhong, Sicheng and Zhu, Jiading and Tian, Yifang and Si, Xujie , title =. 2025 , eprint =

  40. [40]

    Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering , pages =

    Mukherjee, Prasita and Lu, Minghai and Delaware, Benjamin , title =. Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering , pages =. 2025 , url =

  41. [41]

    Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

    Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

  42. [42]

    Classification Problem Solving

    Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

  43. [43]

    , title =

    Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

  44. [44]

    New Ways to Make Microcircuits Smaller---Duplicate Entry

    Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

  45. [45]

    Clancey and Glenn Rennels , abstract =

    Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

  46. [46]

    and Rennels, Glenn R

    Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

  47. [47]

    Poligon: A System for Parallel Problem Solving

    Rice, James. Poligon: A System for Parallel Problem Solving

  48. [48]

    Transfer of Rule-Based Expertise through a Tutorial Dialogue

    Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

  49. [49]

    The Engineering of Qualitative Models

    Clancey, William J. The Engineering of Qualitative Models

  50. [50]

    2017 , eprint=

    Attention Is All You Need , author=. 2017 , eprint=

  51. [51]

    Pluto: The 'Other' Red Planet

    NASA. Pluto: The 'Other' Red Planet