pith. sign in

arxiv: 2605.17076 · v1 · pith:2E2PDC3Inew · submitted 2026-05-16 · 💻 cs.LG · cs.AI· cs.DC· cs.MA

S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination

Pith reviewed 2026-05-20 15:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DCcs.MA
keywords multi-agent LLMstructural race conditionsread-set reconstructionobservable-read isolationHTTP middlewareconsistency modelsDeliveryLogS-Bus
0
0 comments X

The pith

S-Bus uses a server-side DeliveryLog to automatically reconstruct each agent's read set from HTTP GET operations and enforce Observable-Read Isolation that prevents structural race conditions during shared-shard collaboration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the problem of structural race conditions that arise when multiple LLM agents read and write to the same mutable natural-language state. Its central proposal is an HTTP middleware whose DeliveryLog tracks observable GET requests to rebuild each agent's read set at commit time. This reconstruction supplies Observable-Read Isolation, a partial causal consistency guarantee that blocks both write-write conflicts and cross-shard stale reads. The work supplies three-tier mechanical evidence for the mechanism and shows that it matches the conflict-prevention behavior of PostgreSQL SERIALIZABLE and Redis WATCH/MULTI on large contention workloads. The approach requires no modifications to existing agent frameworks.

Core claim

The DeliveryLog mechanism automatically reconstructs each agent's read set from HTTP GET operations and provides Observable-Read Isolation that prevents structural race conditions when agents collaborate via shared shards, supported by machine-checked proofs in TLAPS, exhaustive TLC model checking at N=3, Dafny lemmas, and empirical parity with PostgreSQL SERIALIZABLE and Redis WATCH/MULTI showing zero Type-I corruptions across 427,308 conflicts.

What carries the argument

The DeliveryLog, a per-agent log of HTTP GET operations that reconstructs the read set at commit time without agent SDK changes under HTTP/1.1.

If this is right

  • Agents can safely share state shards without introducing silent output corruptions.
  • No changes to agent code or frameworks are required to obtain the isolation guarantee.
  • The mechanism achieves zero Type-I errors on contention workloads that produce hundreds of thousands of HTTP-409 conflicts.
  • Observable-Read Isolation remains neutral for dedicated-shard topologies but becomes harmful when preservation of concurrent contradictions occurs in single-shard writing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same read-set reconstruction idea could be applied to other observable request protocols beyond HTTP/1.1.
  • Frameworks that currently rely on explicit locks or version checks might simplify their coordination layer if they adopt an observable-traffic log.
  • Larger-scale empirical runs with agent counts well beyond the N=3 model would be a direct next measurement to check whether the zero-corruption result holds at production scale.

Load-bearing premise

The approach assumes agents communicate exclusively via observable HTTP/1.1 traffic and that the workload uses dedicated shards rather than single-shard collaborative writing.

What would settle it

Observation of even one Type-I corruption during a shared-shard contention run that exceeds the scale of the TLC-checked model or a counter-example to the retained typing axiom in the TLAPS proof.

read the original abstract

Concurrent LLM agents sharing mutable natural-language state produce Structural Race Conditions (SRCs): write-write and cross-shard stale-read conflicts that silently corrupt agent output. Existing multi-agent frameworks (LangGraph, CrewAI, AutoGen) provide no write-ownership semantics over shared state. We present S-Bus, an HTTP middleware whose central mechanism is a server-side DeliveryLog: a per-agent log of HTTP GET operations that automatically reconstructs each agent's read set at commit time without agent SDK changes under HTTP/1.1. The consistency property the DeliveryLog provides -- Observable-Read Isolation (ORI), a partial causal consistency over the HTTP-observable projection of the read set -- prevents structural race conditions when agents collaborate via shared shards. Three contributions: (C1) The DeliveryLog mechanism for automatic HTTP-traffic-based read-set reconstruction, with three-tier mechanised evidence: ReadSetSoundness and ORICommitSafety machine-checked in TLAPS (modulo one retained typing axiom); exhaustive TLC at N=3 (20,763,484 distinct states, zero violations); Dafny discharges 9 inductive soundness lemmas. (C2) Empirical structural-conflict prevention parity against PostgreSQL 17 SERIALIZABLE and Redis 7 WATCH/MULTI on shared-shard contention sweeps with 427,308 active HTTP-409 conflicts: zero Type-I corruptions across all three backends. (C3) ORI's operating envelope is topology-conditional: semantically neutral in dedicated-shard workloads; harmful in single-shard collaborative writing because preservation propagates concurrent contradictions. Source code: https://github.com/sajjadanwar0/sbus

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces S-Bus, an HTTP middleware for multi-agent LLM state coordination. Its central mechanism is a server-side DeliveryLog that automatically reconstructs each agent's read set from HTTP/1.1 GET operations without SDK changes. This enables Observable-Read Isolation (ORI), a partial causal consistency property over the observable read-set projection, which the authors claim prevents structural race conditions (write-write and cross-shard stale reads) on shared shards. Evidence includes TLAPS machine-checked proofs of ReadSetSoundness and ORICommitSafety (modulo one typing axiom), exhaustive TLC model checking at N=3 (20,763,484 states, zero violations), 9 Dafny inductive lemmas, and empirical parity with PostgreSQL SERIALIZABLE and Redis WATCH/MULTI (zero Type-I corruptions across 427,308 conflicts). The paper also analyzes ORI's topology-conditional envelope: neutral for dedicated shards but harmful for single-shard collaborative writing.

Significance. If the formal results and empirical findings hold, the work provides a practical, low-friction approach to state isolation in multi-agent LLM systems that avoids modifying agent code. The combination of machine-checked TLAPS proofs, exhaustive TLC checking, Dafny lemmas, and reproducible empirical comparison to established database mechanisms is a notable strength. The explicit discussion of when ORI becomes harmful adds useful nuance to the operating envelope.

major comments (3)
  1. [TLC Model Checking] TLC Model Checking section: exhaustive checking is performed only at N=3 (20,763,484 states, zero violations). Given the exponential growth of the state space, this leaves open the possibility of undetected violations of ReadSetSoundness or ORICommitSafety for N>3 or in cross-shard interleavings; the central claim that ORI prevents SRCs in general workloads therefore rests on an incomplete coverage argument.
  2. [TLAPS Proofs] TLAPS Proofs section: ReadSetSoundness and ORICommitSafety are machine-checked only modulo one retained typing axiom. The manuscript should either discharge the axiom or provide a concrete argument that it does not affect the isolation guarantees, as this directly impacts the soundness of the mechanized evidence.
  3. [Operating Envelope] Operating Envelope section: the claim that ORI is semantically neutral in dedicated-shard workloads but harmful in single-shard collaborative writing is stated qualitatively; no quantitative boundary conditions or workload characterization are supplied to delineate when preservation of concurrent contradictions becomes load-bearing.
minor comments (2)
  1. [Abstract] The abstract states 'three contributions' but the body does not explicitly label C1–C3 in the same order or with matching headings; adding explicit cross-references would improve readability.
  2. [Evaluation] Figure captions for the empirical conflict sweeps should include the exact number of runs and the definition of 'Type-I corruption' to allow direct replication.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the strengths of our multi-tier formal evidence and empirical comparisons. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [TLC Model Checking] TLC Model Checking section: exhaustive checking is performed only at N=3 (20,763,484 states, zero violations). Given the exponential growth of the state space, this leaves open the possibility of undetected violations of ReadSetSoundness or ORICommitSafety for N>3 or in cross-shard interleavings; the central claim that ORI prevents SRCs in general workloads therefore rests on an incomplete coverage argument.

    Authors: We agree that exhaustive TLC checking at N=3, while covering 20,763,484 states with zero violations, cannot exhaustively rule out violations at larger N due to state-space explosion. The TLAPS proofs of ReadSetSoundness and ORICommitSafety are parameterized over arbitrary N and are not restricted to small models; the model checking validates the specification at a tractable scale. We will revise the manuscript to explicitly discuss this complementarity and to state the coverage limitations of model checking for N>3. revision: partial

  2. Referee: [TLAPS Proofs] TLAPS Proofs section: ReadSetSoundness and ORICommitSafety are machine-checked only modulo one retained typing axiom. The manuscript should either discharge the axiom or provide a concrete argument that it does not affect the isolation guarantees, as this directly impacts the soundness of the mechanized evidence.

    Authors: The retained typing axiom asserts well-typedness of DeliveryLog operations in the TLA+ encoding. We will add a concrete argument in the revised manuscript showing that the axiom is orthogonal to the isolation properties: it only concerns syntactic validity of read-set reconstruction and does not affect the causal-ordering or commit-safety invariants, which are proved independently of it. This argument preserves the soundness of the mechanized evidence. revision: yes

  3. Referee: [Operating Envelope] Operating Envelope section: the claim that ORI is semantically neutral in dedicated-shard workloads but harmful in single-shard collaborative writing is stated qualitatively; no quantitative boundary conditions or workload characterization are supplied to delineate when preservation of concurrent contradictions becomes load-bearing.

    Authors: We agree the current discussion is qualitative. In the revision we will augment the Operating Envelope section with quantitative indicators drawn from the empirical evaluation (agent counts, conflict rates, and measured overhead) that delineate when preservation of concurrent contradictions becomes load-bearing, e.g., performance impact thresholds observed in single-shard versus dedicated-shard configurations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external formal verification and independent benchmarks

full rationale

The paper presents the DeliveryLog as a novel construction for HTTP-based read-set reconstruction and defines Observable-Read Isolation as its consistency property. Soundness is established via machine-checked proofs in TLAPS (ReadSetSoundness, ORICommitSafety), exhaustive TLC model checking at N=3, and Dafny inductive lemmas, plus direct empirical parity with PostgreSQL SERIALIZABLE and Redis WATCH/MULTI on 427k conflicts. These are independent external artifacts rather than self-citations or fitted parameters. No self-definitional equations, renamed predictions, or load-bearing author citations appear in the derivation chain. The topology-conditional operating envelope is stated explicitly as an assumption, not derived from prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper introduces the DeliveryLog and ORI as new constructs; formal verification relies on standard math axioms plus one retained typing axiom in TLAPS; no fitted numerical parameters are mentioned in the abstract.

axioms (1)
  • standard math One retained typing axiom in the TLAPS proof of ReadSetSoundness and ORICommitSafety
    Explicitly noted in the mechanised evidence section of the abstract as the only non-discharged element.
invented entities (2)
  • DeliveryLog no independent evidence
    purpose: Server-side per-agent log of HTTP GET operations for automatic read-set reconstruction
    Central mechanism of S-Bus; no independent evidence outside the paper is provided in the abstract.
  • Observable-Read Isolation (ORI) no independent evidence
    purpose: Partial causal consistency over the HTTP-observable projection of the read set
    Consistency property claimed to prevent structural race conditions; defined within the paper.

pith-pipeline@v0.9.0 · 5836 in / 1630 out tokens · 36917 ms · 2026-05-20T15:30:39.190032+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 4 internal anchors

  1. [1]

    LangGraph,

    LangChain, “LangGraph,” GitHub, 2024

  2. [2]

    Moura, “CrewAI,” GitHub, 2024

    J. Moura, “CrewAI,” GitHub, 2024

  3. [3]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Q. Wu et al., “AutoGen,” arXiv:2308.08155, 2023

  4. [4]

    MetaGPT,

    S. Hong et al., “MetaGPT,” ICLR, 2024

  5. [5]

    Li et al., “CAMEL,” NeurIPS, 2023

    G. Li et al., “CAMEL,” NeurIPS, 2023

  6. [6]

    Swarm / AG2,

    OpenAI, “Swarm / AG2,” GitHub, 2024

  7. [7]

    Agent-to-Agent Protocol,

    Google, “Agent-to-Agent Protocol,” GitHub, 2025

  8. [8]

    Yao et al., “ReAct,” ICLR, 2023

    S. Yao et al., “ReAct,” ICLR, 2023

  9. [9]

    DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

    O. Khattab et al., “DSPy,” arXiv:2310.03714, 2023

  10. [10]

    Semantic Kernel,

    Microsoft, “Semantic Kernel,” GitHub, 2023

  11. [11]

    Why do multi-agent LLM systems fail?

    M. Cemri et al., “Why do multi-agent LLM systems fail?” 2025

  12. [12]

    Generative Agents: Interactive Simulacra of Human Behavior,

    J. S. Park et al., “Generative Agents: Interactive Simulacra of Human Behavior,” UIST, 2023

  13. [13]

    Managing Update Conflicts in Bayou,

    D. Terry et al., “Managing Update Conflicts in Bayou,” SOSP, 1995

  14. [14]

    MemGPT: Towards LLMs as Operating Systems

    C. Packer et al., “MemGPT,” arXiv:2310.08560, 2023

  15. [15]

    Weak Consistency,

    A. Adya, “Weak Consistency,” PhD thesis, MIT, 1999

  16. [16]

    Don’t Settle for Eventual,

    W. Lloyd et al., “Don’t Settle for Eventual,” SOSP, 2011

  17. [17]

    Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary,

    C. Li et al., “Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary,” OSDI, 2012

  18. [18]

    https://github.com/sajjadanwar0/abus

  19. [19]

    Transactional Storage for Geo-Replicated Systems,

    Y . Sovran et al., “Transactional Storage for Geo-Replicated Systems,” SOSP, 2011

  20. [20]

    Serializable Snapshot Isolation in PostgreSQL,

    D. Ports & K. Grittner, “Serializable Snapshot Isolation in PostgreSQL,” VLDB, 2012

  21. [21]

    Spanner,

    J. Corbett et al., “Spanner,” OSDI, 2012

  22. [22]

    RAMP Transactions,

    P. Bailis et al., “RAMP Transactions,” VLDB, 2014

  23. [23]

    Coordination Avoidance in Database Systems,

    P. Bailis et al., “Coordination Avoidance in Database Systems,” VLDB, 2014

  24. [24]

    Ongaro & J

    D. Ongaro & J. Ousterhout, “Raft,” USENIX ATC, 2014

  25. [25]

    Time, Clocks, and the Ordering of Events in a Distributed System,

    L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System,” CACM, 21(7), 1978

  26. [26]

    IronFleet,

    C. Hawblitzel et al., “IronFleet,” SOSP, 2015

  27. [27]

    J. R. Wilcox et al., “Verdi,” PLDI, 2015

  28. [28]

    FoundationDB,

    J. Zhou et al., “FoundationDB,” SIGMOD, 2021

  29. [29]

    TigerBeetle,

    J. Betz, “TigerBeetle,” 2023

  30. [30]

    Speedy Transactions in Multicore In-Memory Databases,

    S. Tu et al., “Speedy Transactions in Multicore In-Memory Databases,” SOSP, 2013

  31. [31]

    Hekaton: SQL Server’s Memory-Optimized OLTP Engine,

    C. Diaconu et al., “Hekaton: SQL Server’s Memory-Optimized OLTP Engine,” SIGMOD, 2013

  32. [32]

    No Compromises,

    A. Dragojevic et al., “No Compromises,” SOSP, 2015

  33. [33]

    Orleans,

    P. A. Bernstein et al., “Orleans,” MSR TR-2014-41, 2014

  34. [34]

    Concurrency Control,

    P. Bernstein & N. Goodman, “Concurrency Control,” ACM Surv., 1981

  35. [35]

    H. T. Kung & J. T. Robinson, “OCC,” ACM TODS, 1981

  36. [36]

    Transactional Memory,

    M. Herlihy & J. E. Moss, “Transactional Memory,” ISCA, 1993

  37. [37]

    Dice et al., “TL2,” DISC, 2006

    D. Dice et al., “TL2,” DISC, 2006

  38. [38]

    Thomson et al., “Calvin,” SIGMOD, 2012

    A. Thomson et al., “Calvin,” SIGMOD, 2012

  39. [39]

    Percolator,

    D. Peng & F. Dabek, “Percolator,” OSDI, 2010

  40. [40]

    CockroachDB,

    R. Taft et al., “CockroachDB,” SIGMOD, 2020

  41. [41]

    SWE-bench,

    C. Jimenez et al., “SWE-bench,” ICLR, 2024

  42. [42]

    LLM-as-a-Judge,

    L. Zheng et al., “LLM-as-a-Judge,” NeurIPS, 2023

  43. [43]

    Loro: Movable Tree CRDT,

    Loro Team, “Loro: Movable Tree CRDT,” https://loro.dev, 2024

  44. [44]

    Automerge,

    M. Kleppmann et al., “Automerge,” https://automerge.org, 2024

  45. [45]

    Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests,

    R. Fielding and J. Reschke, “Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests,” IETF RFC 7232, June 2014

  46. [46]

    The Measurement of Observer Agreement for Categorical Data,

    J. R. Landis and G. G. Koch, “The Measurement of Observer Agreement for Categorical Data,”Biometrics, vol. 33, no. 1, pp. 159–174, 1977

  47. [47]

    Workflow Update API,

    Temporal Technologies, “Workflow Update API,” Temporal documenta- tion, 2024. https://docs.temporal.io/workflows#update

  48. [48]

    FoundationDB Directory Layer,

    A. Beamer et al., “FoundationDB Directory Layer,” FoundationDB doc- umentation, 2024. https://apple.github.io/foundationdb/developer-guide. html#directories

  49. [49]

    Letta: Stateful Agents Beyond Context Windows,

    C. Packer et al., “Letta: Stateful Agents Beyond Context Windows,” GitHub, 2024. https://github.com/letta-ai/letta

  50. [50]

    Large Language Models are not Fair Evaluators

    P. Wang et al., “Large Language Models are not Fair Evaluators,” arXiv:2305.17926, 2023

  51. [51]

    Scalable Transactions across Heterogeneous NoSQL Key-Value Data Stores,

    A. Dey, A. Fekete, R. Nambiar, U. Röhm, “Scalable Transactions across Heterogeneous NoSQL Key-Value Data Stores,”PVLDB, vol. 6, no. 12, pp. 1434–1439, 2013

  52. [52]

    AgentScope: A Flexible yet Robust Multi-Agent Platform,

    D. Gao et al., “AgentScope: A Flexible yet Robust Multi-Agent Platform,” arXiv:2402.14034, 2024

  53. [53]

    V oyager: An Open-Ended Embodied Agent with Large Language Models,

    G. Wang et al., “V oyager: An Open-Ended Embodied Agent with Large Language Models,”Transactions on Machine Learning Research, 2024

  54. [54]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,

    J. Yang et al., “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,”NeurIPS, 2024

  55. [55]

    Verus: Verifying Rust Programs using Linear Ghost Types,

    A. Lattuada et al., “Verus: Verifying Rust Programs using Linear Ghost Types,”OOPSLA, 2023

  56. [56]

    Creusot: A Foundry for the Deductive Verification of Rust Programs,

    X. Denis, J.-H. Jourdan, C. Marché, “Creusot: A Foundry for the Deductive Verification of Rust Programs,” inFormal Methods: 25th Intl. Symp., 2023

  57. [57]

    Boki: Stateful Serverless Computing with Shared Logs,

    Z. Jia, E. Witchel, “Boki: Stateful Serverless Computing with Shared Logs,”SOSP, 2021

  58. [58]

    Language Primitives and Type Discipline for Structured Communication-Based Programming,

    K. Honda, V . T. Vasconcelos, M. Kubo, “Language Primitives and Type Discipline for Structured Communication-Based Programming,”ESOP, 1998. APPENDIX The rubric used by both LLM judges in the Exp. PH-3 validation study (§7.9) is reproduced verbatim below. The prompt was frozen before observing inter-judge agreement and was not revised. You are a strict cod...