arxiv: 2604.07988 · v1 · submitted 2026-04-09 · 💻 cs.DC · cs.AI

Recognition: unknown

LogAct: Enabling Agentic Reliability via Shared Logs

Mahesh Balakrishnan , Ashwin Bharambe , Davide Testuggine , David Geraghty , David Mao , Vidhya Venkat , Ilya Mironov , Rithesh Baradi

show 2 more authors

Gayathri Aiyer Victoria Dudin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:04 UTC · model grok-4.3

classification 💻 cs.DC cs.AI

keywords LLM agentsshared logsagent reliabilityfailure recoveryagent introspectionaction blockingmulti-agent systemsdistributed agents

0 comments

The pith

LogAct turns each LLM agent into a deconstructed state machine that writes to a shared log, making actions visible and blockable before execution while enabling consistent recovery from failures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LogAct as an abstraction for LLM-driven agents that can mutate environments in arbitrary ways. Agents act as state machines on a shared log so that actions appear before they run, separate voters can halt them, and failures lead to consistent recovery. The same log supports introspection where agents use LLM inference on their history to perform semantic recovery, health checks, and optimization. A reader would care because asynchrony and crashes currently make it hard to extract execution guarantees from agents in production settings, and this approach aims to provide those guarantees without major changes to agent code.

Core claim

By deconstructing agents into state machines that play a shared log, agentic actions become visible before execution, pluggable decoupled voters can stop them prior to running, and recovery remains consistent after agent or environment failure. The log further enables agentic introspection by letting agents analyze their execution history via LLM inference, which supports semantic variants of recovery, health checks, and optimization.

What carries the argument

The shared log, where each agent operates as a deconstructed state machine so actions are recorded visibly before execution and can be inspected or halted by external components.

If this is right

Agents recover efficiently and correctly from failures using the shared log.
Agents debug their own performance by analyzing their execution history through LLM inference.
Token usage in agent swarms can be optimized via introspection on the log.
All unwanted actions for a target model can be stopped on a representative benchmark with only a 3% drop in benign utility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The shared log could function as an audit trail for cross-organization agent oversight in production deployments.
Multiple independent voters on the log open the door to distributed governance of agent actions without central control.
Because the approach preserves original agent behavior, it could be layered onto existing agent frameworks with low integration cost.
In high-failure distributed environments the log might provide a foundation for atomicity guarantees similar to database transaction logs.

Load-bearing premise

A practical shared log can be maintained across asynchronous agents and environments such that actions remain reliably visible and blockable before execution while preserving the agents' original behavior.

What would settle it

Run a test with multiple asynchronous agents where at least one action executes without first appearing in the shared log or where recovery after an injected crash produces inconsistent state.

Figures

Figures reproduced from arXiv: 2604.07988 by Ashwin Bharambe, Davide Testuggine, David Geraghty, David Mao, Gayathri Aiyer, Ilya Mironov, Mahesh Balakrishnan, Rithesh Baradi, Victoria Dudin, Vidhya Venkat.

**Figure 1.** Figure 1: Existing ReAct / CodeAct agents run in an imperative loop; LogAct is a state machine playing a shared log. just by using more intelligent models or more capable agentic harnesses: even an omniscient and omnipotent agent is subject to failures and asynchrony. In this paper, we propose the novel abstraction of a LogAct agent: an agent implemented as a state machine over a shared log [3–6, 8, 20, 25, 26, 3… view at source ↗

**Figure 3.** Figure 3: A deconstructed LogAct agent on the AgentBus. Physical nodes – Driver, Voter(s), Decider, and Executor – each append and play a subset of entry types. strong notion of types: each entry is tagged with a type (e.g., intention, result, vote, decision, inference output, mailbox message), and the append / read calls take optional type parameters. Second, we add a blocking poll API that returns when an entry wi… view at source ↗

**Figure 4.** Figure 4: Pseudocode for the AgentBus API. The standard shared log operations are append, read, and tail. The AgentBus adds strong types on entries and a blocking poll that waits for entries of specified types. machine into multiple components, each of which plays some subset of entry types from the log and appends some other subset of entry types to it. Concretely, an agentic Driver runs the Inferring stage, inter… view at source ↗

**Figure 5.** Figure 5: LogAct imposes low overhead. For a simple task (write a C file, compile, run): (Top) most time is spent in inference rather than voting / deciding; (Middle) logging imposes low storage overhead (2.6KB/s); (Bottom) inference continues to dominate even with slower backends. 5.2 Pluggable Voters + Semantic Voters To obtain a representative set of unsafe actions, we used the AgentDojo benchmark [14]. AgentDojo… view at source ↗

**Figure 6.** Figure 6: LogAct stops all unwanted actions with a rule-based Voter; and restores benign Utility with a dual Voter [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Voters can be hot-swapped. We first add a rulebased Voter to stop attacks; and later, add a second LLMbased Voter to restore Utility. Each switch involves a dynamic change of Decider policy via the AgentBus. In contrast, the older model (Target) was roughly contemporaneous with AgentDojo and unlikely to be trained on it; it provides lower utility at faster speed and lower token usage, but with an unacce… view at source ↗

**Figure 8.** Figure 8: Agentic introspection enables semantic recovery, health check, and optimization. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: Agentic introspection makes swarms faster [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

read the original abstract

Agents are LLM-driven components that can mutate environments in powerful, arbitrary ways. Extracting guarantees for the execution of agents in production environments can be challenging due to asynchrony and failures. In this paper, we propose a new abstraction called LogAct, where each agent is a deconstructed state machine playing a shared log. In LogAct, agentic actions are visible in the shared log before they are executed; can be stopped prior to execution by pluggable, decoupled voters; and recovered consistently in the case of agent or environment failure. LogAct enables agentic introspection, allowing the agent to analyze its own execution history using LLM inference, which in turn enables semantic variants of recovery, health check, and optimization. In our evaluation, LogAct agents recover efficiently and correctly from failures; debug their own performance; optimize token usage in swarms; and stop all unwanted actions for a target model on a representative benchmark with just a 3% drop in benign utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LogAct's shared-log abstraction for agent visibility and control is a clean idea worth checking, but the abstract's strong claims on recovery and 3% utility drop can't be verified without the full evaluation details.

read the letter

The punchline is that LogAct models agents as state machines that append actions to a shared log before execution. This setup makes actions visible upfront, lets decoupled voters block them, supports consistent replay recovery after failures, and lets the LLM itself analyze the log for debugging or optimization. The abstract says this combination stops unwanted actions on a benchmark with only a 3% benign utility hit while also cutting token use in swarms and recovering efficiently from crashes. If the implementation holds up, it directly tackles a real pain point in running LLM agents in production where asynchrony and failures make guarantees hard.

Referee Report

2 major / 1 minor

Summary. The paper proposes LogAct, an abstraction in which LLM-driven agents are deconstructed into state machines that append actions to a shared log before any environment mutation occurs. This design makes actions visible and blockable by decoupled voters prior to execution, supports consistent recovery on replay after failures, and enables semantic introspection via LLM analysis of the log for debugging, health checks, and optimization. The evaluation claims that LogAct agents recover efficiently and correctly, debug their own performance, optimize token usage in swarms, and stop all unwanted actions on a representative benchmark with only a 3% drop in benign utility.

Significance. If the implementation and evaluation hold, LogAct offers a practical mechanism for adding reliability, safety, and observability to asynchronous agentic systems, which could influence production deployments of LLM agents where failures and unwanted behaviors are common risks.

major comments (2)

[Evaluation] Evaluation section: the claims of efficient recovery, self-debugging, token optimization, and stopping all unwanted actions with a 3% benign-utility drop are stated without any description of the benchmark, target model, number of trials, definitions of 'unwanted actions,' or error analysis, so the quantitative results cannot be assessed or reproduced.
[Design] LogAct design (shared-log abstraction): the central guarantee that actions are appended and visible before environment mutation, enabling pre-execution blocking and consistent recovery, is not accompanied by an argument or implementation detail addressing asynchrony, crashes, or potential lost updates; if eventual consistency or agent-side buffering is used, some actions could execute before becoming blockable, undermining the recovery and stopping results.

minor comments (1)

[Abstract] Abstract: 'a representative benchmark' is referenced but never named or characterized.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We agree that additional details are needed in both the evaluation and design sections to support the claims. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the claims of efficient recovery, self-debugging, token optimization, and stopping all unwanted actions with a 3% benign-utility drop are stated without any description of the benchmark, target model, number of trials, definitions of 'unwanted actions,' or error analysis, so the quantitative results cannot be assessed or reproduced.

Authors: We agree that the current manuscript does not provide sufficient methodological details in the evaluation section. In the revised version, we will expand this section to include a full description of the benchmark, the target models employed, the number of trials, precise definitions of 'unwanted actions,' and an error analysis. These additions will enable assessment and reproduction of the reported results on recovery, self-debugging, token optimization, and action stopping. revision: yes
Referee: [Design] LogAct design (shared-log abstraction): the central guarantee that actions are appended and visible before environment mutation, enabling pre-execution blocking and consistent recovery, is not accompanied by an argument or implementation detail addressing asynchrony, crashes, or potential lost updates; if eventual consistency or agent-side buffering is used, some actions could execute before becoming blockable, undermining the recovery and stopping results.

Authors: We acknowledge that the manuscript lacks an explicit argument and implementation details on how the shared-log abstraction maintains its guarantees under asynchrony, crashes, and lost updates. In the revision, we will add a dedicated subsection that specifies the consistency model, durability mechanisms for log appends, barriers to prevent execution prior to visibility, and recovery protocols that preserve pre-execution blocking even in the presence of failures. This will clarify that the design avoids premature execution. revision: yes

Circularity Check

0 steps flagged

No circularity: independent system design with no self-referential derivations

full rationale

The paper presents LogAct as a novel architectural abstraction in which agents are modeled as deconstructed state machines that append actions to a shared log before environment mutation. This enables visibility, blocking by voters, and consistent recovery. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. The central claims (recovery, introspection, unwanted-action stopping with 3% utility drop) are presented as consequences of the proposed implementation rather than reductions to prior self-citations or tautological fits. Evaluation results are described as empirical outcomes from the system, not forced by construction. The design is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on modeling agents as state machines that can reliably play a shared log; no free parameters or invented physical entities are introduced, but the log consistency assumption is domain-specific.

axioms (2)

domain assumption Agents can be deconstructed into state machines whose actions can be recorded in a shared log before execution
Core modeling choice stated in the abstract
domain assumption A shared log can be maintained consistently across asynchronous agents and environments
Required for visibility, stopping, and recovery claims

invented entities (1)

LogAct no independent evidence
purpose: Shared-log abstraction enabling pre-execution control and introspection for agents
Newly proposed construct; no independent evidence outside the paper

pith-pipeline@v0.9.0 · 5501 in / 1321 out tokens · 60826 ms · 2026-05-10T17:04:30.165968+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 3 canonical work pages · 2 internal anchors

[1]

[n. d.]. Restate: Durable Building Blocks for Code.https://restate.dev. Accessed: 2025

2025
[2]

[n. d.]. Temporal: Durable Execution Platform.https://temporal.io. Accessed: 2025

2025
[3]

Mahesh Balakrishnan, Jason Flinn, Chen Shen, Mihir Dharamshi, Ahmed Jafri, Xiao Shi, Santosh Ghosh, Hazem Hassan, Aaryaman Sagar, Rhed Shi, Jingming Liu, Filip Gruszczynski, Xianan Zhang, Huy Hoang, Ahmed Yossef, Francois Richard, and Yee Jiun Song. 2020. Vir- tual Consensus in Delos. InProceedings of the 14th USENIX Symposium on Operating Systems Design ...

2020
[4]

Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wob- ber, Michael Wei, and John D. Davis. 2012. CORFU: A Shared Log Design for Flash Clusters. InProceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI)

2012
[5]

Davis, Sriram Rao, Tao Zou, and Aviad Zuck

Mahesh Balakrishnan, Dahlia Malkhi, Ted Wobber, Ming Wu, Vijayan Prabhakaran, Michael Wei, John D. Davis, Sriram Rao, Tao Zou, and Aviad Zuck. 2013. Tango: Distributed Data Structures over a Shared Log. InProceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP)

2013
[6]

Mahesh Balakrishnan, Chen Shen, Ahmed Jafri, Suyog Mapara, David Geraghty, Jason Flinn, Vidhya Venkat, Ivailo Nedelchev, Santosh Ghosh, Mihir Dharamshi, Jingming Liu, Filip Gruszczynski, Jun Li, Rounak Tibrewal, Ali Zaveri, Rajeev Nagar, Ahmed Yossef, Francois Richard, and Yee Jiun Song. 2021. Log-structured Protocols in De- los. InProceedings of the 28th...

2021
[7]

Bernstein, Colin W

Philip A. Bernstein, Colin W. Reid, and Sudipto Das. 2011. Hyder – A Transactional Record Manager for Shared Flash. InProceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR). 9–20

2011
[8]

Bhat, Tony Hong, Xuhao Luo, Jiyu Hu, Aishwarya Gane- san, and Ramnatthan Alagappan

Shreesha G. Bhat, Tony Hong, Xuhao Luo, Jiyu Hu, Aishwarya Gane- san, and Ramnatthan Alagappan. 2025. Low End-to-End Latency atop a Speculative Shared Log with Fix-Ante Ordering. InProceedings of the 19th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI). 465–481

2025
[9]

Brewer, and John Wilkes

Brendan Burns, Brian Grant, David Oppenheimer, Eric A. Brewer, and John Wilkes. 2016. Borg, Omega, and Kubernetes.Commun. ACM59, 5 (2016), 50–57

2016
[10]

Mani Chandy and Leslie Lamport

K. Mani Chandy and Leslie Lamport. 1985. Distributed Snapshots: Determining Global States of Distributed Systems.ACM Transactions on Computer Systems (TOCS)3, 1 (1985), 63–75

1985
[11]

Sahana Chennabasappa, Cyrus Nikolaidis, Daniel Song, David Molnar, Stephanie Ding, Shengye Wan, Spencer Whitman, Lauren Deason, Nicholas Doucette, Abraham Montilla, Alekhya Gampa, Beto de Paola, Dominik Gabi, James Crnkovich, Jean-Christophe Testud, Kat He, Rashnil Chaturvedi, Wu Zhou, and Joshua Saxe. 2025. LlamaFirewall: An Open Source Guardrail System ...

work page arXiv 2025
[12]

Byung-Gon Chun, Petros Maniatis, Scott Shenker, and John Kubiatow- icz. 2007. Attested Append-Only Memory: Making Adversaries Stick to Their Word. InProceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP). 189–204

2007
[13]

Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, An- dreas Terzis, and Florian Tramèr. 2025. Defeating Prompt Injections by Design.CoRRabs/2503.18813 (2025)

work page internal anchor Pith review arXiv 2025
[14]

Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer- Kellner, Marc Fischer, and Florian Tramèr. 2024. AgentDojo: A Dy- namic Environment to Evaluate Prompt Injection Attacks and De- fenses for LLM Agents. InProceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track

2024
[15]

Patrick Th. Eugster. 2007. Type-Based Publish/Subscribe: Concepts and Experiences.ACM Transactions on Programming Languages and Systems29, 1 (2007), 6

2007
[16]

Eugster, Pascal A

Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne- Marie Kermarrec. 2003. The Many Faces of Publish/Subscribe.Comput. Surveys35, 2 (2003), 114–131

2003
[17]

Jim Gray and Leslie Lamport. 2006. Consensus on Transaction Commit. ACM Transactions on Database Systems31, 1 (2006), 133–160

2006
[18]

Andreas Haeberlen, Petr Kouznetsov, and Peter Druschel. 2007. PeerRe- view: Practical Accountability for Distributed Systems. InProceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP). 175–188

2007
[19]

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. Llama Guard: LLM-based Input- Output Safeguard for Human-AI Conversations.CoRRabs/2312.06674 (2023)

work page internal anchor Pith review arXiv 2023
[20]

Zhipeng Jia and Emmett Witchel. 2021. Boki: Stateful Serverless Com- puting with Shared Logs. InProceedings of the 28th ACM Symposium on Operating Systems Principles (SOSP). 691–707

2021
[21]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica
[22]

InProceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP)

Efficient Memory Management for Large Language Model Serv- ing with PagedAttention. InProceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP). 611–626
[23]

Leslie Lamport. 1998. The Part-Time Parliament.ACM Transactions on Computer Systems16, 2 (1998), 133–169

1998
[24]

Shostak, and Marshall C

Leslie Lamport, Robert E. Shostak, and Marshall C. Pease. 1982. The Byzantine Generals Problem.ACM Transactions on Programming Languages and Systems4, 3 (1982), 382–401

1982
[25]

Douceur, Jacob R

Dave Levin, John R. Douceur, Jacob R. Lorch, and Thomas Moscibroda
[26]

InProceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI)

TrInc: Small Trusted Hardware for Large Distributed Systems. InProceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 1–14
[27]

Faleiro, Juno Kim, Soham Sankaran, Daniel J

Joshua Lockerman, Jose M. Faleiro, Juno Kim, Soham Sankaran, Daniel J. Abadi, James Aspnes, Siddhartha Sen, and Mahesh Balakrish- nan. 2018. The FuzzyLog: A Partially Ordered Shared Log. InProceed- ings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI)

2018
[28]

Bhat, Jiyu Hu, Ramnatthan Alagappan, and Aishwarya Ganesan

Xuhao Luo, Shreesha G. Bhat, Jiyu Hu, Ramnatthan Alagappan, and Aishwarya Ganesan. 2024. LazyLog: A New Shared Log Abstraction for Low-Latency Applications. InProceedings of the 30th ACM Symposium on Operating Systems Principles (SOSP). 296–312

2024
[29]

Mohan, Don Haderle, Bruce G

C. Mohan, Don Haderle, Bruce G. Lindsay, Hamid Pirahesh, and Pe- ter M. Schwarz. 1992. ARIES: A Transaction Recovery Method Support- ing Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging.ACM Transactions on Database Systems17, 1 (1992), 94–162

1992
[30]

Jayashree Mohan, Amar Phanishayee, and Vijay Chidambaram. 2021. CheckFreq: Frequent, Fine-Grained DNN Checkpointing. InProceed- ings of the 19th USENIX Conference on File and Storage Technologies (FAST). 203–216

2021
[31]

Schneider

Fred B. Schneider. 1990. Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial.Comput. Surveys22, 4 (1990), 299–319

1990
[32]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language Agents with Verbal Rein- forcement Learning. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS)

2023
[33]

Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. 2024. Executable Code Actions Elicit Better LLM Agents. InProceedings of the 41st International Conference on Machine Learning (ICML)

2024
[34]

Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Jim Stabile, Udi Wieder, Scott 13 Fritchie, Steven Swanson, Michael J

Michael Wei, Amy Tai, Christopher J. Rossbach, Ittai Abraham, Maithem Munshed, Medhavi Dhawan, Jim Stabile, Udi Wieder, Scott 13 Fritchie, Steven Swanson, Michael J. Freedman, and Dahlia Malkhi
[35]

InProceed- ings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI)

vCorfu: A Cloud-Scale Object Store on a Shared Log. InProceed- ings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI). 35–49
[36]

Simon Willison. 2023. The Dual LLM Pattern for Building AI Assistants That Can Resist Prompt Injection.https://simonwillison.net/2023/Apr/ 25/dual-llm-pattern/

2023
[37]

Narasimhan, and Yuan Cao

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InProceedings of the 11th International Conference on Learning Representations (ICLR)

2023
[38]

Gonzalez, Clark W

Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark W. Barrett, and Ying Sheng. 2024. SGLang: Efficient Execution of Structured Language Model Programs. InPro- ceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS)

2024
[39]

Zhiting Zhu, Zhipeng Jia, Newton Ni, Dixin Tang, and Emmett Witchel
[40]

InProceedings of the 20th European Conference on Computer Systems (EuroSys)

Impeller: Stream Processing on Shared Logs. InProceedings of the 20th European Conference on Computer Systems (EuroSys). 637–653. 14