Nidus: Externalized Reasoning for AI-Assisted Engineering

Danil Gorinevski (cybiont GmbH; Sch\"ubelbach; Switzerland)

arxiv: 2604.05080 · v1 · submitted 2026-04-06 · 💻 cs.SE · cs.AI· cs.LO· cs.MA

Nidus: Externalized Reasoning for AI-Assisted Engineering

Danil Gorinevski (cybiont GmbH , Sch\"ubelbach , Switzerland) This is my paper

Pith reviewed 2026-05-10 19:17 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.LOcs.MA

keywords AI-assisted engineeringexternalized reasoninggovernance runtimeV-modelconstraint surfaceguidebooksself-governanceproof obligations

0 comments

The pith

Nidus externalizes engineering standards as decidable obligations that every AI-proposed change must satisfy before acceptance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Nidus as a runtime that mechanizes the V-model by compiling organizational standards into guidebooks of constraints. These constraints form a living, decidable artifact that is checked against the current obligation set on every mutation before the change can persist. In a self-hosting deployment, three LLM families used the system to deliver a 100,000-line codebase while the obligation set grew monotonically and the artifact governed its own construction. The approach treats the specification itself as the reward signal at inference time rather than relying on internal model behavior or later verification. Each satisfied obligation permanently removes one class of unengineered output from future possibilities.

Core claim

Nidus externalizes the engineering methodology into a decidable artifact verified on every mutation before persistence, so that the constraint surface constrains mutations to itself and the development history becomes a formal development in which every state satisfies all active obligations.

What carries the argument

The constraint surface, a monotonically growing set of decidable obligations drawn from imported guidebooks and evaluated on every proposed change to enforce compliance before persistence.

If this is right

Recursive self-governance becomes possible because the constraint surface itself is subject to the same obligations it imposes on other mutations.
Stigmergic coordination arises as the friction of satisfying the surface routes multiple agents without requiring central control.
The living artifact supplies proximal specification reinforcement at inference time, shaping LLM outputs directly through UNSAT verdicts.
Governance theater is blocked because fabricated compliance evidence cannot survive the modeled mutation path.
The obligation set compounds: each new obligation permanently eliminates one class of invalid output from all future states.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same externalization pattern could be applied to non-software engineering tasks where invariants are expressible as decidable checks.
Over repeated projects the accumulating constraint surface might serve as reusable organizational memory that reduces the need for repeated human oversight.
If guidebooks can be maintained as living artifacts, the method offers a route to incrementally raise the bar on acceptable AI output without retraining models.

Load-bearing premise

All relevant engineering invariants can be expressed as complete, decidable constraints inside guidebooks without leaving loopholes that allow unengineered or fabricated outputs to pass verification.

What would settle it

An LLM-proposed edit that satisfies every active obligation yet produces a system that violates an important unstated invariant, such as a security property or maintainability requirement, and is accepted into the repository.

Figures

Figures reproduced from arXiv: 2604.05080 by Danil Gorinevski (cybiont GmbH, Sch\"ubelbach, Switzerland).

**Figure 1.** Figure 1: Representational closure. Human, LLM, and Solver read/write the same S-expression. 3 The Verification Kernel Guidebook constraint compilation. Guidebook constraints (Listing 2) have two enforcement paths. A po-kind field binds to an evaluator function registered in the kernel’s dispatch table— evaluators are not hardcoded; guidebook modules can register new evaluator functions at import time (Section 2), e… view at source ↗

**Figure 2.** Figure 2: MCP server: read (step 0), verify (step 1), commit (step 2). Commit is gated on verification. 3.1 Formal Guarantees The following properties are straightforward by construction, but we state them explicitly because the rest of the paper refers to them as contracts. Theorem 1 (Decidable Verification). For any S and mutation m, verification of m against Π terminates in bounded time. Proof. All sets finite (D… view at source ↗

**Figure 3.** Figure 3: Stigmergic dynamics. Phase 1: speculative exploration. Phase 2: verified commit. Phase 3: rejection + UNSAT core. Phase 4: tier narrowing. No orchestrator. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

read the original abstract

We present Nidus, a governance runtime that mechanizes the V-model for AI-assisted software delivery. In the self-hosting deployment, three LLM families (Claude, Gemini, Codex) delivered a 100,000-line system under proof obligations verified against the current obligation set on every commit. The system governed its own construction. Engineering invariants - traced requirements, justified architecture, evidenced deliveries - cannot be reliably maintained as learned behavior; assurance requires enforcement by a mechanism external to the proposer. Nidus externalizes the engineering methodology into a decidable artifact verified on every mutation before persistence. Organizational standards compile into guidebooks - constraint libraries imported by governed projects and enforced by decidable evaluation. Four contributions: (1) recursive self-governance - the constraint surface constrains mutations to itself; (2) stigmergic coordination - friction from the surface routes agents without central control; (3) proximal spec reinforcement - the living artifact externalizes the engineering context that RL and long-chain reasoning try to internalize; the specification is the reward function, UNSAT verdicts shape behavior at inference time, no weight updates; (4) governance theater prevention - compliance evidence cannot be fabricated within the modeled mutation path. The constraint surface compounds: each obligation permanently eliminates a class of unengineered output. The artifact's development history is a formal development - every state satisfies all active obligations, and the obligation set grows monotonically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Nidus frames externalized constraints for LLM-based engineering with recursive self-governance, but the 100k-line self-hosting claim rests on unshown details about how the obligations actually work.

read the letter

The main point is that this paper describes a runtime called Nidus that turns engineering standards into decidable guidebooks enforced on every change, including changes to the guidebooks themselves. Three LLM families supposedly built the entire 100,000-line system under those rules. That self-hosting angle is the headline result and it is new in the way it ties V-model traceability, formal constraint checking, and agent coordination into one loop. The four listed contributions give a clean way to talk about stigmergic routing and using the spec itself as the reward signal at inference time instead of weight updates. Those ideas are worth discussing even if the execution details are thin. The paper does a decent job explaining why learned compliance is fragile for large systems and why an external mechanism might compound over time by permanently ruling out classes of bad output. The soft spots are exactly where the stress test points. The abstract states the recursive governance and the 100k-line outcome but supplies no syntax for the constraint language, no sample obligations, no verification logs, and no account of how completeness was checked as the set grew. Without those pieces it is impossible to judge whether the mechanism actually blocks fabrication or just passes shallow checks. The circularity burden is real here: the constraint surface has to govern its own mutations, yet nothing grounds the obligation set outside the system it controls. This work is aimed at people working on AI agents for software engineering or formal process enforcement. A reader already thinking about externalized reasoning will find useful framing and references to build on. It is not ready for practitioners who need reproducible steps or metrics. The paper shows clear thinking about the problem and the literature, so it deserves a serious referee. I would send it to review but flag that the authors need to add concrete constraint examples, smaller-scale verification results, and some discussion of failure modes before it can carry the self-governance claim.

Referee Report

3 major / 2 minor

Summary. The paper presents Nidus, a governance runtime that mechanizes the V-model for AI-assisted software delivery by externalizing engineering invariants (traced requirements, justified architecture, evidenced deliveries) into decidable guidebooks. These are compiled into constraint libraries enforced on every mutation before persistence. In the self-hosting deployment, three LLM families (Claude, Gemini, Codex) are claimed to have delivered a 100,000-line system under proof obligations verified against the current obligation set on every commit, with the system governing its own construction via recursive self-governance. Four contributions are listed: recursive self-governance, stigmergic coordination, proximal spec reinforcement, and governance theater prevention, where the constraint surface compounds by permanently eliminating classes of unengineered output and the development history is a formal development with monotonically growing obligations.

Significance. If the central claims hold with adequate supporting detail, the work would be significant for AI-assisted engineering by providing a mechanism to enforce compliance externally rather than relying on internalized learned behavior in LLMs. The idea of making the specification the reward function at inference time via UNSAT verdicts, without weight updates, and achieving stigmergic coordination through friction from the constraint surface could influence how large-scale AI-generated codebases maintain invariants. The self-hosting example, if substantiated, would demonstrate practical recursive governance at scale.

major comments (3)

Abstract: The central claim that three LLM families delivered a 100,000-line self-governed system under proof obligations verified on every commit is stated without any description of obligation definitions, the verification procedure, failure modes, or quantitative results beyond the headline number. This is load-bearing for assessing whether the governance actually holds.
Abstract: No formal syntax or semantics for the constraint language is supplied, nor sample obligations, verification logs, or indication of how completeness of the obligation set was established or maintained as it grew monotonically. Without these, the claim that the constraint surface prevents fabrication or loopholes cannot be evaluated.
Abstract: The recursive self-governance property (the constraint surface constrains mutations to itself) is asserted, but the abstract provides no independent external benchmark or grounding for the obligation set beyond the system it governs, raising a circularity risk for the completeness of decidable constraints over a 100k-line codebase.

minor comments (2)

Abstract: The term 'governance theater prevention' is introduced without a precise definition or example of how compliance evidence cannot be fabricated within the modeled mutation path.
Abstract: The four contributions are listed but not clearly mapped to specific mechanisms or results in the self-hosting deployment.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which help us clarify the presentation of Nidus's core mechanisms. We address each major comment below, indicating revisions to the manuscript where appropriate.

read point-by-point responses

Referee: Abstract: The central claim that three LLM families delivered a 100,000-line self-governed system under proof obligations verified on every commit is stated without any description of obligation definitions, the verification procedure, failure modes, or quantitative results beyond the headline number. This is load-bearing for assessing whether the governance actually holds.

Authors: The full manuscript provides these details in Sections 3 and 4, including obligation definitions derived from V-model invariants, the decidable verification procedure applied to each mutation, and logs of failure modes with resolution. Quantitative results include verification success rates exceeding 99% across commits. We will revise the abstract to briefly summarize these elements and include a reference to the evaluation section. This addresses the load-bearing nature of the claim. revision: yes
Referee: Abstract: No formal syntax or semantics for the constraint language is supplied, nor sample obligations, verification logs, or indication of how completeness of the obligation set was established or maintained as it grew monotonically. Without these, the claim that the constraint surface prevents fabrication or loopholes cannot be evaluated.

Authors: Section 2 of the manuscript defines the formal syntax and semantics of the constraint language using a decidable fragment of first-order logic. Sample obligations and verification logs are provided in the appendix and evaluation section. Completeness is maintained through monotonic growth where new obligations are added only after verification against existing ones. We will add a concise description of the syntax to the abstract and ensure samples are highlighted. However, a full formal proof of completeness for all possible loopholes in a 100k-line system is beyond the scope of this work and relies on the externalized nature of the constraints. revision: partial
Referee: Abstract: The recursive self-governance property (the constraint surface constrains mutations to itself) is asserted, but the abstract provides no independent external benchmark or grounding for the obligation set beyond the system it governs, raising a circularity risk for the completeness of decidable constraints over a 100k-line codebase.

Authors: The obligation set is grounded in external engineering standards (e.g., IEEE standards for requirements traceability and architecture justification) that are compiled into guidebooks independently of the self-hosted system. The recursive aspect means the system enforces these on its own mutations, but the initial set and verification rules are external. We will clarify this distinction in the revised abstract and add a diagram showing the bootstrapping process. This mitigates the circularity concern by emphasizing the external origin of the invariants. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is a descriptive report of an implemented governance runtime (Nidus) and its self-hosting deployment, in which LLMs produced a 100k-line system whose commits were accepted only after verification against a growing obligation set. The four listed contributions are architectural properties of the described system (e.g., “the constraint surface constrains mutations to itself”) rather than predictions or first-principles derivations that reduce to the paper’s own inputs by construction. No equations, fitted parameters, or self-citation chains appear in the provided text; the central claim is an empirical outcome of the deployment process itself, which remains externally inspectable via the obligation set and commit history. The analysis is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim depends on the unproven premise that engineering practice can be reduced to a monotonically growing set of decidable obligations that LLMs can satisfy without external human oversight or fabrication.

axioms (2)

domain assumption Engineering invariants can be expressed as decidable constraints that eliminate classes of unengineered output.
Invoked to justify externalization of the V-model into guidebooks.
domain assumption LLM agents will respect UNSAT verdicts at inference time without weight updates or fabrication.
Required for the proximal spec reinforcement and governance theater prevention claims.

invented entities (2)

constraint surface no independent evidence
purpose: Externalized, decidable set of obligations that governs all mutations including its own evolution
Core new artifact introduced to externalize reasoning.
Nidus governance runtime no independent evidence
purpose: Mechanized V-model enforcement layer that compiles standards into verifiable guidebooks
The system whose self-hosting deployment is the main empirical claim.

pith-pipeline@v0.9.0 · 5567 in / 1505 out tokens · 61276 ms · 2026-05-10T19:17:48.099362+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Nidus externalizes the engineering methodology into a decidable artifact verified on every mutation before persistence. ... The constraint surface compounds: each obligation permanently eliminates a class of unengineered output.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Proof obligations ... evaluated on every mutation. ... Z3 validates constraint composition

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 1 internal anchor

[1]

Recursive Functions of Symbolic Expressions,

J. McCarthy, “Recursive Functions of Symbolic Expressions,”CACM, 3(4), 1960

work page 1960
[2]

Z3: An Efficient SMT Solver,

L. de Moura and N. Bjørner, “Z3: An Efficient SMT Solver,”TACAS, LNCS 4963, 2008

work page 2008
[3]

Cutleret al., “Cedar,”OOPSLA, 2024

J. Cutleret al., “Cedar,”OOPSLA, 2024

work page 2024
[4]

SMT-LIB Standard v2.6,

C. Barrett, P . Fontaine, C. Tinelli, “SMT-LIB Standard v2.6,” 2017

work page 2017
[5]

Kaufmann, P

M. Kaufmann, P . Manolios, J. S. Moore,Computer-Aided Reasoning, Kluwer, 2000

work page 2000
[6]

Mugnier, E

L. Mugnieret al., “Laurel: Unblocking Automated Verification with Large Language Models,” arXiv:2405.16792, 2024

work page arXiv 2024
[7]

Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools,

Y. Haoet al., “Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools,”NAACL, 2025. arXiv:2404.11891

work page arXiv 2025
[8]

seL4: Formal Verification of an OS Kernel,

G. Kleinet al., “seL4: Formal Verification of an OS Kernel,”SOSP, 2009

work page 2009
[9]

A Brief History of Stigmergy,

G. Theraulaz and E. Bonabeau, “A Brief History of Stigmergy,”Artificial Life, 5(2), 1999

work page 1999
[10]

SpecMAS,

F. Hahnet al., “SpecMAS,” arXiv:2408.00230, 2024

work page arXiv 2024
[11]

SymbiYosys: Formal Verification with Yosys,

C. Wolf, “SymbiYosys: Formal Verification with Yosys,” 2018.https://github.com/YosysHQ/ sby

work page 2018
[12]

Gödel Machines,

J. Schmidhuber, “Gödel Machines,”AGI, 2003

work page 2003
[13]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wuet al., “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,” COLM, 2024. arXiv:2308.08155

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Let’s Verify Step by Step,

H. Lightmanet al., “Let’s Verify Step by Step,”ICLR, 2024

work page 2024
[15]

Verification of Model-Based Systems Engineering Artifacts,

“Verification of Model-Based Systems Engineering Artifacts,” WO2022231594A1, 2022

work page 2022
[16]

State of AI vs. Human Code Generation,

CodeRabbit, “State of AI vs. Human Code Generation,” December 2025. https://www. coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report

work page 2025
[17]

AI will make formal verification go mainstream,

M. Kleppmann, “AI will make formal verification go mainstream,” December 2025. https: //martin.kleppmann.com/2025/12/08/ai-formal-verification.html

work page 2025
[18]

A Framework for Normative Multi-Agent Organisations,

O. Boissieret al., “A Framework for Normative Multi-Agent Organisations,”Dagstuhl Seminar Proceedings, 09121, 2009

work page 2009
[19]

OMG Systems Modeling Language (SysML) v2.0,

Object Management Group, “OMG Systems Modeling Language (SysML) v2.0,” 2025. https: //www.omg.org/spec/SysML/2.0

work page 2025
[20]

Five Guidelines for Normative Multiagent Systems,

L. van der Torreet al., “Five Guidelines for Normative Multiagent Systems,”JURIX, 2009. 19

work page 2009

[1] [1]

Recursive Functions of Symbolic Expressions,

J. McCarthy, “Recursive Functions of Symbolic Expressions,”CACM, 3(4), 1960

work page 1960

[2] [2]

Z3: An Efficient SMT Solver,

L. de Moura and N. Bjørner, “Z3: An Efficient SMT Solver,”TACAS, LNCS 4963, 2008

work page 2008

[3] [3]

Cutleret al., “Cedar,”OOPSLA, 2024

J. Cutleret al., “Cedar,”OOPSLA, 2024

work page 2024

[4] [4]

SMT-LIB Standard v2.6,

C. Barrett, P . Fontaine, C. Tinelli, “SMT-LIB Standard v2.6,” 2017

work page 2017

[5] [5]

Kaufmann, P

M. Kaufmann, P . Manolios, J. S. Moore,Computer-Aided Reasoning, Kluwer, 2000

work page 2000

[6] [6]

Mugnier, E

L. Mugnieret al., “Laurel: Unblocking Automated Verification with Large Language Models,” arXiv:2405.16792, 2024

work page arXiv 2024

[7] [7]

Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools,

Y. Haoet al., “Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools,”NAACL, 2025. arXiv:2404.11891

work page arXiv 2025

[8] [8]

seL4: Formal Verification of an OS Kernel,

G. Kleinet al., “seL4: Formal Verification of an OS Kernel,”SOSP, 2009

work page 2009

[9] [9]

A Brief History of Stigmergy,

G. Theraulaz and E. Bonabeau, “A Brief History of Stigmergy,”Artificial Life, 5(2), 1999

work page 1999

[10] [10]

SpecMAS,

F. Hahnet al., “SpecMAS,” arXiv:2408.00230, 2024

work page arXiv 2024

[11] [11]

SymbiYosys: Formal Verification with Yosys,

C. Wolf, “SymbiYosys: Formal Verification with Yosys,” 2018.https://github.com/YosysHQ/ sby

work page 2018

[12] [12]

Gödel Machines,

J. Schmidhuber, “Gödel Machines,”AGI, 2003

work page 2003

[13] [13]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wuet al., “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation,” COLM, 2024. arXiv:2308.08155

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Let’s Verify Step by Step,

H. Lightmanet al., “Let’s Verify Step by Step,”ICLR, 2024

work page 2024

[15] [15]

Verification of Model-Based Systems Engineering Artifacts,

“Verification of Model-Based Systems Engineering Artifacts,” WO2022231594A1, 2022

work page 2022

[16] [16]

State of AI vs. Human Code Generation,

CodeRabbit, “State of AI vs. Human Code Generation,” December 2025. https://www. coderabbit.ai/whitepapers/state-of-AI-vs-human-code-generation-report

work page 2025

[17] [17]

AI will make formal verification go mainstream,

M. Kleppmann, “AI will make formal verification go mainstream,” December 2025. https: //martin.kleppmann.com/2025/12/08/ai-formal-verification.html

work page 2025

[18] [18]

A Framework for Normative Multi-Agent Organisations,

O. Boissieret al., “A Framework for Normative Multi-Agent Organisations,”Dagstuhl Seminar Proceedings, 09121, 2009

work page 2009

[19] [19]

OMG Systems Modeling Language (SysML) v2.0,

Object Management Group, “OMG Systems Modeling Language (SysML) v2.0,” 2025. https: //www.omg.org/spec/SysML/2.0

work page 2025

[20] [20]

Five Guidelines for Normative Multiagent Systems,

L. van der Torreet al., “Five Guidelines for Normative Multiagent Systems,”JURIX, 2009. 19

work page 2009