S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination

Sajjad Khan

arxiv: 2605.17076 · v1 · pith:2E2PDC3Inew · submitted 2026-05-16 · 💻 cs.LG · cs.AI· cs.DC· cs.MA

S-Bus: Automatic Read-Set Reconstruction for Multi-Agent LLM State Coordination

Sajjad Khan This is my paper

Pith reviewed 2026-05-20 15:30 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DCcs.MA

keywords multi-agent LLMstructural race conditionsread-set reconstructionobservable-read isolationHTTP middlewareconsistency modelsDeliveryLogS-Bus

0 comments

The pith

S-Bus uses a server-side DeliveryLog to automatically reconstruct each agent's read set from HTTP GET operations and enforce Observable-Read Isolation that prevents structural race conditions during shared-shard collaboration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the problem of structural race conditions that arise when multiple LLM agents read and write to the same mutable natural-language state. Its central proposal is an HTTP middleware whose DeliveryLog tracks observable GET requests to rebuild each agent's read set at commit time. This reconstruction supplies Observable-Read Isolation, a partial causal consistency guarantee that blocks both write-write conflicts and cross-shard stale reads. The work supplies three-tier mechanical evidence for the mechanism and shows that it matches the conflict-prevention behavior of PostgreSQL SERIALIZABLE and Redis WATCH/MULTI on large contention workloads. The approach requires no modifications to existing agent frameworks.

Core claim

The DeliveryLog mechanism automatically reconstructs each agent's read set from HTTP GET operations and provides Observable-Read Isolation that prevents structural race conditions when agents collaborate via shared shards, supported by machine-checked proofs in TLAPS, exhaustive TLC model checking at N=3, Dafny lemmas, and empirical parity with PostgreSQL SERIALIZABLE and Redis WATCH/MULTI showing zero Type-I corruptions across 427,308 conflicts.

What carries the argument

The DeliveryLog, a per-agent log of HTTP GET operations that reconstructs the read set at commit time without agent SDK changes under HTTP/1.1.

If this is right

Agents can safely share state shards without introducing silent output corruptions.
No changes to agent code or frameworks are required to obtain the isolation guarantee.
The mechanism achieves zero Type-I errors on contention workloads that produce hundreds of thousands of HTTP-409 conflicts.
Observable-Read Isolation remains neutral for dedicated-shard topologies but becomes harmful when preservation of concurrent contradictions occurs in single-shard writing.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same read-set reconstruction idea could be applied to other observable request protocols beyond HTTP/1.1.
Frameworks that currently rely on explicit locks or version checks might simplify their coordination layer if they adopt an observable-traffic log.
Larger-scale empirical runs with agent counts well beyond the N=3 model would be a direct next measurement to check whether the zero-corruption result holds at production scale.

Load-bearing premise

The approach assumes agents communicate exclusively via observable HTTP/1.1 traffic and that the workload uses dedicated shards rather than single-shard collaborative writing.

What would settle it

Observation of even one Type-I corruption during a shared-shard contention run that exceeds the scale of the TLC-checked model or a counter-example to the retained typing axiom in the TLAPS proof.

read the original abstract

Concurrent LLM agents sharing mutable natural-language state produce Structural Race Conditions (SRCs): write-write and cross-shard stale-read conflicts that silently corrupt agent output. Existing multi-agent frameworks (LangGraph, CrewAI, AutoGen) provide no write-ownership semantics over shared state. We present S-Bus, an HTTP middleware whose central mechanism is a server-side DeliveryLog: a per-agent log of HTTP GET operations that automatically reconstructs each agent's read set at commit time without agent SDK changes under HTTP/1.1. The consistency property the DeliveryLog provides -- Observable-Read Isolation (ORI), a partial causal consistency over the HTTP-observable projection of the read set -- prevents structural race conditions when agents collaborate via shared shards. Three contributions: (C1) The DeliveryLog mechanism for automatic HTTP-traffic-based read-set reconstruction, with three-tier mechanised evidence: ReadSetSoundness and ORICommitSafety machine-checked in TLAPS (modulo one retained typing axiom); exhaustive TLC at N=3 (20,763,484 distinct states, zero violations); Dafny discharges 9 inductive soundness lemmas. (C2) Empirical structural-conflict prevention parity against PostgreSQL 17 SERIALIZABLE and Redis 7 WATCH/MULTI on shared-shard contention sweeps with 427,308 active HTTP-409 conflicts: zero Type-I corruptions across all three backends. (C3) ORI's operating envelope is topology-conditional: semantically neutral in dedicated-shard workloads; harmful in single-shard collaborative writing because preservation propagates concurrent contradictions. Source code: https://github.com/sajjadanwar0/sbus

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

S-Bus gives a concrete HTTP middleware for automatic read-set reconstruction in multi-agent LLM state sharing, with formal checks and DB comparisons, but the model checking stops at three agents.

read the letter

The paper's core contribution is the DeliveryLog, which sits as server-side middleware and rebuilds each agent's read set from observable HTTP GET traffic. This lets them define Observable-Read Isolation that blocks structural race conditions on shared shards without touching the agent SDKs. The work ships public code and backs the claim with TLAPS proofs for ReadSetSoundness and ORICommitSafety, TLC exhaustive runs at N=3, nine Dafny lemmas, and head-to-head tests against PostgreSQL SERIALIZABLE and Redis WATCH/MULTI that report zero Type-I corruptions across hundreds of thousands of conflicts.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces S-Bus, an HTTP middleware for multi-agent LLM state coordination. Its central mechanism is a server-side DeliveryLog that automatically reconstructs each agent's read set from HTTP/1.1 GET operations without SDK changes. This enables Observable-Read Isolation (ORI), a partial causal consistency property over the observable read-set projection, which the authors claim prevents structural race conditions (write-write and cross-shard stale reads) on shared shards. Evidence includes TLAPS machine-checked proofs of ReadSetSoundness and ORICommitSafety (modulo one typing axiom), exhaustive TLC model checking at N=3 (20,763,484 states, zero violations), 9 Dafny inductive lemmas, and empirical parity with PostgreSQL SERIALIZABLE and Redis WATCH/MULTI (zero Type-I corruptions across 427,308 conflicts). The paper also analyzes ORI's topology-conditional envelope: neutral for dedicated shards but harmful for single-shard collaborative writing.

Significance. If the formal results and empirical findings hold, the work provides a practical, low-friction approach to state isolation in multi-agent LLM systems that avoids modifying agent code. The combination of machine-checked TLAPS proofs, exhaustive TLC checking, Dafny lemmas, and reproducible empirical comparison to established database mechanisms is a notable strength. The explicit discussion of when ORI becomes harmful adds useful nuance to the operating envelope.

major comments (3)

[TLC Model Checking] TLC Model Checking section: exhaustive checking is performed only at N=3 (20,763,484 states, zero violations). Given the exponential growth of the state space, this leaves open the possibility of undetected violations of ReadSetSoundness or ORICommitSafety for N>3 or in cross-shard interleavings; the central claim that ORI prevents SRCs in general workloads therefore rests on an incomplete coverage argument.
[TLAPS Proofs] TLAPS Proofs section: ReadSetSoundness and ORICommitSafety are machine-checked only modulo one retained typing axiom. The manuscript should either discharge the axiom or provide a concrete argument that it does not affect the isolation guarantees, as this directly impacts the soundness of the mechanized evidence.
[Operating Envelope] Operating Envelope section: the claim that ORI is semantically neutral in dedicated-shard workloads but harmful in single-shard collaborative writing is stated qualitatively; no quantitative boundary conditions or workload characterization are supplied to delineate when preservation of concurrent contradictions becomes load-bearing.

minor comments (2)

[Abstract] The abstract states 'three contributions' but the body does not explicitly label C1–C3 in the same order or with matching headings; adding explicit cross-references would improve readability.
[Evaluation] Figure captions for the empirical conflict sweeps should include the exact number of runs and the definition of 'Type-I corruption' to allow direct replication.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the strengths of our multi-tier formal evidence and empirical comparisons. We address each major comment point by point below.

read point-by-point responses

Referee: [TLC Model Checking] TLC Model Checking section: exhaustive checking is performed only at N=3 (20,763,484 states, zero violations). Given the exponential growth of the state space, this leaves open the possibility of undetected violations of ReadSetSoundness or ORICommitSafety for N>3 or in cross-shard interleavings; the central claim that ORI prevents SRCs in general workloads therefore rests on an incomplete coverage argument.

Authors: We agree that exhaustive TLC checking at N=3, while covering 20,763,484 states with zero violations, cannot exhaustively rule out violations at larger N due to state-space explosion. The TLAPS proofs of ReadSetSoundness and ORICommitSafety are parameterized over arbitrary N and are not restricted to small models; the model checking validates the specification at a tractable scale. We will revise the manuscript to explicitly discuss this complementarity and to state the coverage limitations of model checking for N>3. revision: partial
Referee: [TLAPS Proofs] TLAPS Proofs section: ReadSetSoundness and ORICommitSafety are machine-checked only modulo one retained typing axiom. The manuscript should either discharge the axiom or provide a concrete argument that it does not affect the isolation guarantees, as this directly impacts the soundness of the mechanized evidence.

Authors: The retained typing axiom asserts well-typedness of DeliveryLog operations in the TLA+ encoding. We will add a concrete argument in the revised manuscript showing that the axiom is orthogonal to the isolation properties: it only concerns syntactic validity of read-set reconstruction and does not affect the causal-ordering or commit-safety invariants, which are proved independently of it. This argument preserves the soundness of the mechanized evidence. revision: yes
Referee: [Operating Envelope] Operating Envelope section: the claim that ORI is semantically neutral in dedicated-shard workloads but harmful in single-shard collaborative writing is stated qualitatively; no quantitative boundary conditions or workload characterization are supplied to delineate when preservation of concurrent contradictions becomes load-bearing.

Authors: We agree the current discussion is qualitative. In the revision we will augment the Operating Envelope section with quantitative indicators drawn from the empirical evaluation (agent counts, conflict rates, and measured overhead) that delineate when preservation of concurrent contradictions becomes load-bearing, e.g., performance impact thresholds observed in single-shard versus dedicated-shard configurations. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external formal verification and independent benchmarks

full rationale

The paper presents the DeliveryLog as a novel construction for HTTP-based read-set reconstruction and defines Observable-Read Isolation as its consistency property. Soundness is established via machine-checked proofs in TLAPS (ReadSetSoundness, ORICommitSafety), exhaustive TLC model checking at N=3, and Dafny inductive lemmas, plus direct empirical parity with PostgreSQL SERIALIZABLE and Redis WATCH/MULTI on 427k conflicts. These are independent external artifacts rather than self-citations or fitted parameters. No self-definitional equations, renamed predictions, or load-bearing author citations appear in the derivation chain. The topology-conditional operating envelope is stated explicitly as an assumption, not derived from prior results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper introduces the DeliveryLog and ORI as new constructs; formal verification relies on standard math axioms plus one retained typing axiom in TLAPS; no fitted numerical parameters are mentioned in the abstract.

axioms (1)

standard math One retained typing axiom in the TLAPS proof of ReadSetSoundness and ORICommitSafety
Explicitly noted in the mechanised evidence section of the abstract as the only non-discharged element.

invented entities (2)

DeliveryLog no independent evidence
purpose: Server-side per-agent log of HTTP GET operations for automatic read-set reconstruction
Central mechanism of S-Bus; no independent evidence outside the paper is provided in the abstract.
Observable-Read Isolation (ORI) no independent evidence
purpose: Partial causal consistency over the HTTP-observable projection of the read set
Consistency property claimed to prevent structural race conditions; defined within the paper.

pith-pipeline@v0.9.0 · 5836 in / 1630 out tokens · 36917 ms · 2026-05-20T15:30:39.190032+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DeliveryLog mechanism for automatic HTTP-traffic-based read-set reconstruction... Observable-Read Isolation (ORI)... READSETSOUNDNESS and ORICOMMITSAFETY machine-checked in TLAPS
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ORI prevents G2-item over Robs... partial causal consistency over the HTTP-observable projection

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 4 internal anchors

[1]

LangGraph,

LangChain, “LangGraph,” GitHub, 2024

work page 2024
[2]

Moura, “CrewAI,” GitHub, 2024

J. Moura, “CrewAI,” GitHub, 2024

work page 2024
[3]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wu et al., “AutoGen,” arXiv:2308.08155, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

MetaGPT,

S. Hong et al., “MetaGPT,” ICLR, 2024

work page 2024
[5]

Li et al., “CAMEL,” NeurIPS, 2023

G. Li et al., “CAMEL,” NeurIPS, 2023

work page 2023
[6]

Swarm / AG2,

OpenAI, “Swarm / AG2,” GitHub, 2024

work page 2024
[7]

Agent-to-Agent Protocol,

Google, “Agent-to-Agent Protocol,” GitHub, 2025

work page 2025
[8]

Yao et al., “ReAct,” ICLR, 2023

S. Yao et al., “ReAct,” ICLR, 2023

work page 2023
[9]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

O. Khattab et al., “DSPy,” arXiv:2310.03714, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Semantic Kernel,

Microsoft, “Semantic Kernel,” GitHub, 2023

work page 2023
[11]

Why do multi-agent LLM systems fail?

M. Cemri et al., “Why do multi-agent LLM systems fail?” 2025

work page 2025
[12]

Generative Agents: Interactive Simulacra of Human Behavior,

J. S. Park et al., “Generative Agents: Interactive Simulacra of Human Behavior,” UIST, 2023

work page 2023
[13]

Managing Update Conflicts in Bayou,

D. Terry et al., “Managing Update Conflicts in Bayou,” SOSP, 1995

work page 1995
[14]

MemGPT: Towards LLMs as Operating Systems

C. Packer et al., “MemGPT,” arXiv:2310.08560, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Weak Consistency,

A. Adya, “Weak Consistency,” PhD thesis, MIT, 1999

work page 1999
[16]

Don’t Settle for Eventual,

W. Lloyd et al., “Don’t Settle for Eventual,” SOSP, 2011

work page 2011
[17]

Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary,

C. Li et al., “Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary,” OSDI, 2012

work page 2012
[18]

https://github.com/sajjadanwar0/abus

work page
[19]

Transactional Storage for Geo-Replicated Systems,

Y . Sovran et al., “Transactional Storage for Geo-Replicated Systems,” SOSP, 2011

work page 2011
[20]

Serializable Snapshot Isolation in PostgreSQL,

D. Ports & K. Grittner, “Serializable Snapshot Isolation in PostgreSQL,” VLDB, 2012

work page 2012
[21]

Spanner,

J. Corbett et al., “Spanner,” OSDI, 2012

work page 2012
[22]

RAMP Transactions,

P. Bailis et al., “RAMP Transactions,” VLDB, 2014

work page 2014
[23]

Coordination Avoidance in Database Systems,

P. Bailis et al., “Coordination Avoidance in Database Systems,” VLDB, 2014

work page 2014
[24]

Ongaro & J

D. Ongaro & J. Ousterhout, “Raft,” USENIX ATC, 2014

work page 2014
[25]

Time, Clocks, and the Ordering of Events in a Distributed System,

L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System,” CACM, 21(7), 1978

work page 1978
[26]

IronFleet,

C. Hawblitzel et al., “IronFleet,” SOSP, 2015

work page 2015
[27]

J. R. Wilcox et al., “Verdi,” PLDI, 2015

work page 2015
[28]

FoundationDB,

J. Zhou et al., “FoundationDB,” SIGMOD, 2021

work page 2021
[29]

TigerBeetle,

J. Betz, “TigerBeetle,” 2023

work page 2023
[30]

Speedy Transactions in Multicore In-Memory Databases,

S. Tu et al., “Speedy Transactions in Multicore In-Memory Databases,” SOSP, 2013

work page 2013
[31]

Hekaton: SQL Server’s Memory-Optimized OLTP Engine,

C. Diaconu et al., “Hekaton: SQL Server’s Memory-Optimized OLTP Engine,” SIGMOD, 2013

work page 2013
[32]

No Compromises,

A. Dragojevic et al., “No Compromises,” SOSP, 2015

work page 2015
[33]

Orleans,

P. A. Bernstein et al., “Orleans,” MSR TR-2014-41, 2014

work page 2014
[34]

Concurrency Control,

P. Bernstein & N. Goodman, “Concurrency Control,” ACM Surv., 1981

work page 1981
[35]

H. T. Kung & J. T. Robinson, “OCC,” ACM TODS, 1981

work page 1981
[36]

Transactional Memory,

M. Herlihy & J. E. Moss, “Transactional Memory,” ISCA, 1993

work page 1993
[37]

Dice et al., “TL2,” DISC, 2006

D. Dice et al., “TL2,” DISC, 2006

work page 2006
[38]

Thomson et al., “Calvin,” SIGMOD, 2012

A. Thomson et al., “Calvin,” SIGMOD, 2012

work page 2012
[39]

Percolator,

D. Peng & F. Dabek, “Percolator,” OSDI, 2010

work page 2010
[40]

CockroachDB,

R. Taft et al., “CockroachDB,” SIGMOD, 2020

work page 2020
[41]

SWE-bench,

C. Jimenez et al., “SWE-bench,” ICLR, 2024

work page 2024
[42]

LLM-as-a-Judge,

L. Zheng et al., “LLM-as-a-Judge,” NeurIPS, 2023

work page 2023
[43]

Loro: Movable Tree CRDT,

Loro Team, “Loro: Movable Tree CRDT,” https://loro.dev, 2024

work page 2024
[44]

Automerge,

M. Kleppmann et al., “Automerge,” https://automerge.org, 2024

work page 2024
[45]

Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests,

R. Fielding and J. Reschke, “Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests,” IETF RFC 7232, June 2014

work page 2014
[46]

The Measurement of Observer Agreement for Categorical Data,

J. R. Landis and G. G. Koch, “The Measurement of Observer Agreement for Categorical Data,”Biometrics, vol. 33, no. 1, pp. 159–174, 1977

work page 1977
[47]

Workflow Update API,

Temporal Technologies, “Workflow Update API,” Temporal documenta- tion, 2024. https://docs.temporal.io/workflows#update

work page 2024
[48]

FoundationDB Directory Layer,

A. Beamer et al., “FoundationDB Directory Layer,” FoundationDB doc- umentation, 2024. https://apple.github.io/foundationdb/developer-guide. html#directories

work page 2024
[49]

Letta: Stateful Agents Beyond Context Windows,

C. Packer et al., “Letta: Stateful Agents Beyond Context Windows,” GitHub, 2024. https://github.com/letta-ai/letta

work page 2024
[50]

Large Language Models are not Fair Evaluators

P. Wang et al., “Large Language Models are not Fair Evaluators,” arXiv:2305.17926, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[51]

Scalable Transactions across Heterogeneous NoSQL Key-Value Data Stores,

A. Dey, A. Fekete, R. Nambiar, U. Röhm, “Scalable Transactions across Heterogeneous NoSQL Key-Value Data Stores,”PVLDB, vol. 6, no. 12, pp. 1434–1439, 2013

work page 2013
[52]

AgentScope: A Flexible yet Robust Multi-Agent Platform,

D. Gao et al., “AgentScope: A Flexible yet Robust Multi-Agent Platform,” arXiv:2402.14034, 2024

work page arXiv 2024
[53]

V oyager: An Open-Ended Embodied Agent with Large Language Models,

G. Wang et al., “V oyager: An Open-Ended Embodied Agent with Large Language Models,”Transactions on Machine Learning Research, 2024

work page 2024
[54]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,

J. Yang et al., “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,”NeurIPS, 2024

work page 2024
[55]

Verus: Verifying Rust Programs using Linear Ghost Types,

A. Lattuada et al., “Verus: Verifying Rust Programs using Linear Ghost Types,”OOPSLA, 2023

work page 2023
[56]

Creusot: A Foundry for the Deductive Verification of Rust Programs,

X. Denis, J.-H. Jourdan, C. Marché, “Creusot: A Foundry for the Deductive Verification of Rust Programs,” inFormal Methods: 25th Intl. Symp., 2023

work page 2023
[57]

Boki: Stateful Serverless Computing with Shared Logs,

Z. Jia, E. Witchel, “Boki: Stateful Serverless Computing with Shared Logs,”SOSP, 2021

work page 2021
[58]

Language Primitives and Type Discipline for Structured Communication-Based Programming,

K. Honda, V . T. Vasconcelos, M. Kubo, “Language Primitives and Type Discipline for Structured Communication-Based Programming,”ESOP, 1998. APPENDIX The rubric used by both LLM judges in the Exp. PH-3 validation study (§7.9) is reproduced verbatim below. The prompt was frozen before observing inter-judge agreement and was not revised. You are a strict cod...

work page 1998

[1] [1]

LangGraph,

LangChain, “LangGraph,” GitHub, 2024

work page 2024

[2] [2]

Moura, “CrewAI,” GitHub, 2024

J. Moura, “CrewAI,” GitHub, 2024

work page 2024

[3] [3]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Q. Wu et al., “AutoGen,” arXiv:2308.08155, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

MetaGPT,

S. Hong et al., “MetaGPT,” ICLR, 2024

work page 2024

[5] [5]

Li et al., “CAMEL,” NeurIPS, 2023

G. Li et al., “CAMEL,” NeurIPS, 2023

work page 2023

[6] [6]

Swarm / AG2,

OpenAI, “Swarm / AG2,” GitHub, 2024

work page 2024

[7] [7]

Agent-to-Agent Protocol,

Google, “Agent-to-Agent Protocol,” GitHub, 2025

work page 2025

[8] [8]

Yao et al., “ReAct,” ICLR, 2023

S. Yao et al., “ReAct,” ICLR, 2023

work page 2023

[9] [9]

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

O. Khattab et al., “DSPy,” arXiv:2310.03714, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

Semantic Kernel,

Microsoft, “Semantic Kernel,” GitHub, 2023

work page 2023

[11] [11]

Why do multi-agent LLM systems fail?

M. Cemri et al., “Why do multi-agent LLM systems fail?” 2025

work page 2025

[12] [12]

Generative Agents: Interactive Simulacra of Human Behavior,

J. S. Park et al., “Generative Agents: Interactive Simulacra of Human Behavior,” UIST, 2023

work page 2023

[13] [13]

Managing Update Conflicts in Bayou,

D. Terry et al., “Managing Update Conflicts in Bayou,” SOSP, 1995

work page 1995

[14] [14]

MemGPT: Towards LLMs as Operating Systems

C. Packer et al., “MemGPT,” arXiv:2310.08560, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

Weak Consistency,

A. Adya, “Weak Consistency,” PhD thesis, MIT, 1999

work page 1999

[16] [16]

Don’t Settle for Eventual,

W. Lloyd et al., “Don’t Settle for Eventual,” SOSP, 2011

work page 2011

[17] [17]

Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary,

C. Li et al., “Making Geo-Replicated Systems Fast as Possible, Consistent when Necessary,” OSDI, 2012

work page 2012

[18] [18]

https://github.com/sajjadanwar0/abus

work page

[19] [19]

Transactional Storage for Geo-Replicated Systems,

Y . Sovran et al., “Transactional Storage for Geo-Replicated Systems,” SOSP, 2011

work page 2011

[20] [20]

Serializable Snapshot Isolation in PostgreSQL,

D. Ports & K. Grittner, “Serializable Snapshot Isolation in PostgreSQL,” VLDB, 2012

work page 2012

[21] [21]

Spanner,

J. Corbett et al., “Spanner,” OSDI, 2012

work page 2012

[22] [22]

RAMP Transactions,

P. Bailis et al., “RAMP Transactions,” VLDB, 2014

work page 2014

[23] [23]

Coordination Avoidance in Database Systems,

P. Bailis et al., “Coordination Avoidance in Database Systems,” VLDB, 2014

work page 2014

[24] [24]

Ongaro & J

D. Ongaro & J. Ousterhout, “Raft,” USENIX ATC, 2014

work page 2014

[25] [25]

Time, Clocks, and the Ordering of Events in a Distributed System,

L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System,” CACM, 21(7), 1978

work page 1978

[26] [26]

IronFleet,

C. Hawblitzel et al., “IronFleet,” SOSP, 2015

work page 2015

[27] [27]

J. R. Wilcox et al., “Verdi,” PLDI, 2015

work page 2015

[28] [28]

FoundationDB,

J. Zhou et al., “FoundationDB,” SIGMOD, 2021

work page 2021

[29] [29]

TigerBeetle,

J. Betz, “TigerBeetle,” 2023

work page 2023

[30] [30]

Speedy Transactions in Multicore In-Memory Databases,

S. Tu et al., “Speedy Transactions in Multicore In-Memory Databases,” SOSP, 2013

work page 2013

[31] [31]

Hekaton: SQL Server’s Memory-Optimized OLTP Engine,

C. Diaconu et al., “Hekaton: SQL Server’s Memory-Optimized OLTP Engine,” SIGMOD, 2013

work page 2013

[32] [32]

No Compromises,

A. Dragojevic et al., “No Compromises,” SOSP, 2015

work page 2015

[33] [33]

Orleans,

P. A. Bernstein et al., “Orleans,” MSR TR-2014-41, 2014

work page 2014

[34] [34]

Concurrency Control,

P. Bernstein & N. Goodman, “Concurrency Control,” ACM Surv., 1981

work page 1981

[35] [35]

H. T. Kung & J. T. Robinson, “OCC,” ACM TODS, 1981

work page 1981

[36] [36]

Transactional Memory,

M. Herlihy & J. E. Moss, “Transactional Memory,” ISCA, 1993

work page 1993

[37] [37]

Dice et al., “TL2,” DISC, 2006

D. Dice et al., “TL2,” DISC, 2006

work page 2006

[38] [38]

Thomson et al., “Calvin,” SIGMOD, 2012

A. Thomson et al., “Calvin,” SIGMOD, 2012

work page 2012

[39] [39]

Percolator,

D. Peng & F. Dabek, “Percolator,” OSDI, 2010

work page 2010

[40] [40]

CockroachDB,

R. Taft et al., “CockroachDB,” SIGMOD, 2020

work page 2020

[41] [41]

SWE-bench,

C. Jimenez et al., “SWE-bench,” ICLR, 2024

work page 2024

[42] [42]

LLM-as-a-Judge,

L. Zheng et al., “LLM-as-a-Judge,” NeurIPS, 2023

work page 2023

[43] [43]

Loro: Movable Tree CRDT,

Loro Team, “Loro: Movable Tree CRDT,” https://loro.dev, 2024

work page 2024

[44] [44]

Automerge,

M. Kleppmann et al., “Automerge,” https://automerge.org, 2024

work page 2024

[45] [45]

Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests,

R. Fielding and J. Reschke, “Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests,” IETF RFC 7232, June 2014

work page 2014

[46] [46]

The Measurement of Observer Agreement for Categorical Data,

J. R. Landis and G. G. Koch, “The Measurement of Observer Agreement for Categorical Data,”Biometrics, vol. 33, no. 1, pp. 159–174, 1977

work page 1977

[47] [47]

Workflow Update API,

Temporal Technologies, “Workflow Update API,” Temporal documenta- tion, 2024. https://docs.temporal.io/workflows#update

work page 2024

[48] [48]

FoundationDB Directory Layer,

A. Beamer et al., “FoundationDB Directory Layer,” FoundationDB doc- umentation, 2024. https://apple.github.io/foundationdb/developer-guide. html#directories

work page 2024

[49] [49]

Letta: Stateful Agents Beyond Context Windows,

C. Packer et al., “Letta: Stateful Agents Beyond Context Windows,” GitHub, 2024. https://github.com/letta-ai/letta

work page 2024

[50] [50]

Large Language Models are not Fair Evaluators

P. Wang et al., “Large Language Models are not Fair Evaluators,” arXiv:2305.17926, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[51] [51]

Scalable Transactions across Heterogeneous NoSQL Key-Value Data Stores,

A. Dey, A. Fekete, R. Nambiar, U. Röhm, “Scalable Transactions across Heterogeneous NoSQL Key-Value Data Stores,”PVLDB, vol. 6, no. 12, pp. 1434–1439, 2013

work page 2013

[52] [52]

AgentScope: A Flexible yet Robust Multi-Agent Platform,

D. Gao et al., “AgentScope: A Flexible yet Robust Multi-Agent Platform,” arXiv:2402.14034, 2024

work page arXiv 2024

[53] [53]

V oyager: An Open-Ended Embodied Agent with Large Language Models,

G. Wang et al., “V oyager: An Open-Ended Embodied Agent with Large Language Models,”Transactions on Machine Learning Research, 2024

work page 2024

[54] [54]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,

J. Yang et al., “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering,”NeurIPS, 2024

work page 2024

[55] [55]

Verus: Verifying Rust Programs using Linear Ghost Types,

A. Lattuada et al., “Verus: Verifying Rust Programs using Linear Ghost Types,”OOPSLA, 2023

work page 2023

[56] [56]

Creusot: A Foundry for the Deductive Verification of Rust Programs,

X. Denis, J.-H. Jourdan, C. Marché, “Creusot: A Foundry for the Deductive Verification of Rust Programs,” inFormal Methods: 25th Intl. Symp., 2023

work page 2023

[57] [57]

Boki: Stateful Serverless Computing with Shared Logs,

Z. Jia, E. Witchel, “Boki: Stateful Serverless Computing with Shared Logs,”SOSP, 2021

work page 2021

[58] [58]

Language Primitives and Type Discipline for Structured Communication-Based Programming,

K. Honda, V . T. Vasconcelos, M. Kubo, “Language Primitives and Type Discipline for Structured Communication-Based Programming,”ESOP, 1998. APPENDIX The rubric used by both LLM judges in the Exp. PH-3 validation study (§7.9) is reproduced verbatim below. The prompt was frozen before observing inter-judge agreement and was not revised. You are a strict cod...

work page 1998