Robust Agent Compensation (RAC): Teaching AI Agents to Compensate

Frank Leymann; Kaviru Hapuarachchi; Rania Khalaf; Srinath Perera

arxiv: 2605.03409 · v2 · pith:WGOLOEUSnew · submitted 2026-05-05 · 💻 cs.AI

Robust Agent Compensation (RAC): Teaching AI Agents to Compensate

Srinath Perera , Kaviru Hapuarachchi , Frank Leymann , Rania Khalaf This is my paper

Pith reviewed 2026-05-21 00:16 UTC · model grok-4.3

classification 💻 cs.AI

keywords AI agentserror recoverylog-based compensationagent frameworksreliabilityLangChainperformance evaluationrobust execution

0 comments

The pith

A log-based recovery extension lets AI agents compensate for errors without rewriting their code and runs 1.5 to 8 times faster with fewer tokens than LLM recovery methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Robust Agent Compensation (RAC), a recovery system that records agent actions in logs and uses them to undo or correct mistakes. It is added as an extension to existing agent frameworks so users keep their current code unchanged. The authors show that this approach avoids side effects during recovery and delivers substantially lower latency and token costs than methods that ask the language model itself to fix errors. A reader would care because agent systems today often fail on complex tasks and current fixes add extra delay and expense.

Core claim

We present Robust Agent Compensation (RAC) as a log-based recovery paradigm that provides a safety net through an architectural extension applicable to most agent frameworks. This enables reliable executions by compensating for errors while avoiding unintended side effects. The implementation can be added to frameworks like LangChain via existing extension points without modifying user agent code, and evaluations on τ-bench and REALM-Bench demonstrate superior latency and token economy over state-of-the-art LLM-based approaches for complex problems.

What carries the argument

Robust Agent Compensation (RAC), a log-based recovery paradigm implemented as an architectural extension that records actions and enables compensation for errors without changes to user code.

If this is right

Agents recover from errors by replaying or compensating logged actions instead of making new LLM calls.
Existing agent code in frameworks such as LangGraph can stay unchanged while gaining a recovery safety net.
The same extension approach can be applied to other agent frameworks through their built-in extension points.
Complex problem solving becomes cheaper and faster because recovery avoids repeated full LLM reasoning steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be tested with non-LLM agents to see whether the same log compensation works outside language-model settings.
Combining RAC with other monitoring tools might further reduce side effects in long-running agent workflows.
If the log records prove sufficient for compensation, future agent designs might default to keeping detailed action histories rather than relying on model memory alone.

Load-bearing premise

That a log-based recovery mechanism can be added through existing extension points in most agent frameworks without requiring any changes to the user's current agent code while still avoiding unintended side effects.

What would settle it

Run the same complex tasks from τ-bench and REALM-Bench with both RAC and current LLM-based recovery, then measure whether RAC still shows 1.5-8X lower latency and token use.

Figures

Figures reproduced from arXiv: 2605.03409 by Frank Leymann, Kaviru Hapuarachchi, Rania Khalaf, Srinath Perera.

**Figure 1.** Figure 1: RAC Architecture To understand failure recovery, we need benchmarks. Wang et al. [35] present a benchmark “High or Hell Water” for simulating tool failures and prompting the agent to find an alternative tool. They observe that all LLMs struggled to adapt to the errors and find a good alternative, and their performance dropped significantly. However, we could not use the benchmark to study side effects bec… view at source ↗

read the original abstract

We present Robust Agent Compensation (RAC), a log-based recovery paradigm (providing a safety net) implemented through an architectural extension that can be applied to most Agent frameworks to support reliable executions (avoiding unintended side effects). Users can choose to enable RAC without changing their current agent code (e.g., LangGraph agents). The proposed approach can be implemented in most existing agent frameworks via their existing extension points. We present an implementation based on LangChain, demonstrate its viability through the $\tau$-bench and REALM-Bench, and show that when solving complex problems, RAC is 1.5-8X or more better in both latency and token economy compared to state-of-the-art LLM-based recovery approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RAC adds a practical log-based recovery layer to existing agent frameworks without code changes and reports efficiency gains, but the scale of those gains needs clearer baseline details to hold up.

read the letter

The main point here is that the authors built a log-based compensation system called RAC that slots into most agent frameworks through their existing extension points. Users can turn it on for LangGraph or similar setups without rewriting any agent code, and it uses logs to recover from issues instead of firing off more LLM calls. On the two benchmarks they ran, this gives 1.5-8X better latency and token counts for complex problems compared to LLM recovery methods.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Robust Agent Compensation (RAC), a log-based recovery paradigm implemented as an architectural extension to existing agent frameworks (e.g., LangChain/LangGraph). It claims that users can enable RAC without modifying their current agent code, that the mechanism avoids unintended side effects, and that on the τ-bench and REALM-Bench it delivers 1.5-8X or greater gains in both latency and token economy versus state-of-the-art LLM-based recovery methods when solving complex problems.

Significance. If the reported performance advantages are shown to hold under controlled, reproducible conditions with clearly documented baselines, RAC could offer a practical, low-overhead safety net for agent reliability. The log-based approach would constitute a useful alternative to purely LLM-driven recovery, with potential impact on production deployments where token cost and latency matter.

major comments (2)

[Evaluation section] Evaluation section (τ-bench and REALM-Bench results): The headline claim of 1.5-8X or greater improvement in latency and token economy is presented without naming the specific LLM-based recovery baselines, their exact implementations, hyper-parameters, failure-injection protocols, or confirmation that the underlying agent, model, and task distribution were held constant. This information is load-bearing for the central quantitative claim and must be supplied before the superiority result can be assessed.
[Implementation section] Implementation and integration description: The assertion that RAC can be added through existing extension points 'without requiring any changes to the user's current agent code' and 'while still avoiding unintended side effects' is stated but not accompanied by concrete integration examples, side-effect analysis, or failure cases across frameworks beyond the single LangChain demonstration. This directly affects the practicality claim.

minor comments (2)

[Abstract] Abstract: The phrase 'state-of-the-art LLM-based recovery approaches' should be replaced by the concrete method names used in the experiments so readers can immediately contextualize the comparison.
[Introduction] Notation and terminology: The term 'log-based recovery' is used without a precise definition or pseudocode early in the paper; a short formal description or diagram would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to provide the requested details on baselines and integration.

read point-by-point responses

Referee: [Evaluation section] Evaluation section (τ-bench and REALM-Bench results): The headline claim of 1.5-8X or greater improvement in latency and token economy is presented without naming the specific LLM-based recovery baselines, their exact implementations, hyper-parameters, failure-injection protocols, or confirmation that the underlying agent, model, and task distribution were held constant. This information is load-bearing for the central quantitative claim and must be supplied before the superiority result can be assessed.

Authors: We agree that explicit naming and documentation of the baselines is necessary for assessing the central claim. In the revised manuscript we have added a new subsection 4.1 that names the specific LLM-based recovery baselines (standard ReAct retry, Reflexion, and LLM error-correction variants from prior literature), provides their exact implementations and hyper-parameters, describes the failure-injection protocols, and confirms that the agent framework, model, and task distribution were held constant. Updated Tables 2 and 3 now include these details to support reproducibility. revision: yes
Referee: [Implementation section] Implementation and integration description: The assertion that RAC can be added through existing extension points 'without requiring any changes to the user's current agent code' and 'while still avoiding unintended side effects' is stated but not accompanied by concrete integration examples, side-effect analysis, or failure cases across frameworks beyond the single LangChain demonstration. This directly affects the practicality claim.

Authors: We acknowledge that the original text would benefit from additional concrete examples. The revised Implementation section now includes code snippets demonstrating integration via standard extension points in both LangGraph and AutoGen, a side-effect analysis confirming that RAC reads only execution logs without modifying agent state or logic, and a discussion of failure cases (e.g., log corruption) in Section 3.3. While the primary empirical demonstration uses LangChain, the architectural description applies to other frameworks through their documented hooks. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical implementation and benchmark results are self-contained

full rationale

The paper describes an architectural extension (RAC) for existing agent frameworks, implemented via LangChain extension points, and reports empirical results from τ-bench and REALM-Bench showing latency and token improvements. No equations, fitted parameters, self-citations as load-bearing premises, uniqueness theorems, or ansatzes appear in the provided text. The performance claims are presented as direct outcomes of experiments rather than predictions derived from self-referential definitions or prior self-results. The derivation chain is absent; the work is an implementation demonstration whose viability is externally testable via the named benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that agent frameworks expose sufficient extension points for log-based recovery and that logging can reliably capture and compensate for side effects without code changes. No free parameters or invented physical entities are described.

axioms (1)

domain assumption Most existing agent frameworks provide extension points that allow adding recovery mechanisms without modifying user agent code.
Directly stated in the abstract as the basis for broad applicability.

invented entities (1)

Robust Agent Compensation (RAC) no independent evidence
purpose: Log-based safety net for reliable agent execution and side-effect avoidance.
New named method introduced by the paper.

pith-pipeline@v0.9.0 · 5653 in / 1308 out tokens · 42108 ms · 2026-05-21T00:16:50.764512+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RAC is a log-based recovery paradigm... TransactionLog.add(record); topological sort and compensation pairs
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery theorem unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

decoupling recovery from the ReAct Agents... adds outcome of failure handling to the context

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 4 internal anchors

[1]

2024.LangGraph: Building stateful, multi-agent applications with LLMs

LangChain AI. 2024.LangGraph: Building stateful, multi-agent applications with LLMs. https://github.com/langchain-ai/langgraph

work page 2024
[2]

Muhammad Arslan, Hussam Ghanem, Saba Munawar, and Christophe Cruz. 2024. A Survey on RAG with LLMs.Procedia computer science246 (2024), 3781–3790

work page 2024
[3]

Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan. 2025. 𝜏 2-Bench: Evaluating Conversational Agents in a Dual-Control Environment. arXiv:2506.07982 [cs.AI] https://arxiv.org/abs/2506.07982

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

2021.Artificial Intelligence: A Modern Approach, 4th Edition

Stuart Russell by Peter Norvig (Author). 2021.Artificial Intelligence: A Modern Approach, 4th Edition. Pearson, Hoboken, NJ, USA

work page 2021
[5]

Edward Y Chang and Longling Geng. 2025. SagaLLM: Context Management, Val- idation, and Transaction Guarantees for Multi-Agent LLM Planning.Proceedings of the VLDB Endowment18, 12 (2025), 4874–4886

work page 2025
[6]

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al . 2024. A survey on evaluation of large language models.ACM transactions on intelligent systems and technology15, 3 (2024), 1–45

work page 2024
[7]

Christian Colombo and Gordon J Pace. 2013. Recovery within long-running transactions.ACM Computing Surveys (CSUR)45, 3 (2013), 1–35. ACM CAIS ’26, May 26–29, 2026, San Jose, CA, USA Perera et al

work page 2013
[8]

Eman Daraghmi, Cheng-Pu Zhang, and Shyan-Ming Yuan. 2022. Enhancing saga pattern for distributed transactions within a microservices architecture.Applied Sciences12, 12 (2022), 6242

work page 2022
[9]

Charles T Davies. 1978. Data processing spheres of control.IBM Systems Journal 17, 2 (1978), 179–198

work page 1978
[10]

2024.Haystack: The open source NLP framework for composable AI

deepset GmbH. 2024.Haystack: The open source NLP framework for composable AI. https://github.com/deepset-ai/haystack

work page 2024
[11]

Elmagarmid

Ahmed K. Elmagarmid. 1992.Database Transaction Models for Advanced Applica- tions. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1992
[12]

Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, and Jingren Zhou. 2024. AgentScope: A Flexible yet Robust Multi-Agent Platform. arXiv:2402.14034 [cs.MA] https://arxiv.org/abs/2402.14034

work page arXiv 2024
[13]

Hector Garcia-Molina and Kenneth Salem. 1987. Sagas.ACM Sigmod Record16, 3 (1987), 249–259

work page 1987
[14]

Longling Geng and Edward Y. Chang. 2025. REALM-Bench: A Benchmark for Evaluating Multi-Agent Systems on Real-world, Dynamic Planning and Schedul- ing Tasks. arXiv:2502.18836 [cs.AI] https://arxiv.org/abs/2502.18836

work page arXiv 2025
[15]

Longling Geng and Edward Y. Chang. 2025. SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning. https: //github.com/genglongling/SagaLLM

work page 2025
[16]

Jim Gray. 1981. The transaction concept: virtues and limitations (invited paper). InProceedings of the Seventh International Conference on Very Large Data Bases - Volume 7 (VLDB ’81). VLDB Endowment, Cannes, France, 144–154

work page 1981
[17]

1992.Transaction Processing: Concepts and Tech- niques

Jim Gray and Andreas Reuter. 1992.Transaction Processing: Concepts and Tech- niques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1992
[18]

2024.Griptape: Python framework for AI workflows and pipelines

Griptape Team. 2024.Griptape: Python framework for AI workflows and pipelines. https://github.com/griptape-ai/griptape

work page 2024
[19]

Theo Haerder and Andreas Reuter. 1983. Principles of transaction-oriented database recovery.ACM computing surveys (CSUR)15, 4 (1983), 287–317

work page 1983
[20]

Junda He, Christoph Treude, and David Lo. 2025. Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead.ACM Transactions on Software Engineering and Methodology34, 5 (2025), 1–30

work page 2025
[21]

Pat Helland. 2016. Life beyond distributed transactions: an apostate’s opinion. Queue14, 5 (2016), 69–98

work page 2016
[22]

2024.smolagents: A tiny library to build agents that write python code

Hugging Face Team. 2024.smolagents: A tiny library to build agents that write python code. https://github.com/huggingface/smolagents

work page 2024
[23]

On the Move to Meaningful Internet Systems

Rania Khalaf, Dieter Roller, and Frank Leymann. 2009. Revisiting the behavior of fault and compensation handlers in WS-BPEL. InOTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, Rhodes, Greece, 286–303

work page 2009
[24]

2025.Model Context Protocol (MCP) Documentation

LangChain AI. 2025.Model Context Protocol (MCP) Documentation. https: //docs.langchain.com/oss/python/langchain/mcp Accessed: 12 January 2026

work page 2025
[25]

Frank Leymann. 1995. Supporting business transactions via partial backward recovery in workflow management systems. InDatenbanksysteme in Büro, Technik und Wissenschaft: GI-Fachtagung, Dresden, 22.–24. März 1995. Springer, Dresden, Germany, 51–70

work page 1995
[26]

1999.Production Workflow-Concepts and Techniques

Frank Leymann and Dieter Roller. 1999.Production Workflow-Concepts and Techniques. Prentice Hall, Upper Saddle River, NJ, USA

work page 1999
[27]

2024.LlamaIndex: Data framework for LLM applications

LlamaIndex Team. 2024.LlamaIndex: Data framework for LLM applications. https://github.com/run-llama/llama_index

work page 2024
[28]

2024.Semantic Kernel: Integrate LLMs into your applications

Microsoft Semantic Kernel Team. 2024.Semantic Kernel: Integrate LLMs into your applications. https://github.com/microsoft/semantic-kernel

work page 2024
[29]

2024.CrewAI: Orchestrating Role-Playing, Autonomous AI Agents

João Moura. 2024.CrewAI: Orchestrating Role-Playing, Autonomous AI Agents. https://github.com/crewAIInc/crewAI

work page 2024
[30]

2006.Web Services Atomic Transaction (WS-AtomicTransaction) Version 1.1

OASIS WS-TX Technical Committee. 2006.Web Services Atomic Transaction (WS-AtomicTransaction) Version 1.1. OASIS Standard. OASIS. https://docs.oasis- open.org/ws-tx/wstx-wsat-1.1-spec-cd-01.pdf

work page 2006
[31]

2024.OpenAI Agents SDK

OpenAI. 2024.OpenAI Agents SDK. https://github.com/openai/openai-agents- python

work page 2024
[32]

Melissa Z. Pan, Negar Arabzadeh, Riccardo Cogo, Yuxuan Zhu, Alexander Xiong, Lakshya A Agrawal, Huanzhi Mao, Emma Shen, Sid Pallerla, Liana Patel, Shu Liu, Tianneng Shi, Xiaoyuan Liu, Jared Quincy Davis, Emmanuele Lacavalla, Alessandro Basile, Shuyi Yang, Paul Castro, Daniel Kang, Joseph E. Gonzalez, Koushik Sen, Dawn Song, Ion Stoica, Matei Zaharia, and ...

work page arXiv 2026
[33]

2024.PydanticAI: Agent Framework for Production-Grade Genera- tive AI

Pydantic Team. 2024.PydanticAI: Agent Framework for Production-Grade Genera- tive AI. https://github.com/pydantic/pydantic-ai

work page 2024
[34]

Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, and Niranjan Balasubramanian

work page
[35]

InProceedings of the 62nd Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers)

Appworld: A controllable world of apps and people for benchmarking interactive coding agents. InProceedings of the 62nd Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Bangkok, Thailand, 16022–16076

work page
[36]

Andrew Wang, Sophia Hager, Adi Asija, Daniel Khashabi, and Nicholas Andrews

work page
[37]

arXiv:2508.11027 [cs.CL] https://arxiv.org/abs/2508.11027

Hell or High Water: Evaluating Agentic Recovery from External Failures. arXiv:2508.11027 [cs.CL] https://arxiv.org/abs/2508.11027

work page arXiv
[38]

2005.Web services platform architecture: SOAP, WSDL, WS- policy, WS-addressing, WS-BPEL, WS-reliable messaging and more

Sanjiva Weerawarana, Francisco Curbera, Frank Leymann, Tony Storey, and Donald F Ferguson. 2005.Web services platform architecture: SOAP, WSDL, WS- policy, WS-addressing, WS-BPEL, WS-reliable messaging and more. Prentice Hall, Upper Saddle River, NJ, USA

work page 2005
[39]

WSO2. 2026. Source Code and Data for RAC. https://github.com/wso2- incubator/research-rac

work page 2026
[40]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. 2023. AutoGen: Enabling Next- Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155 [cs.AI] https://arxiv.org/abs/2308.08155

work page internal anchor Pith review Pith/arXiv arXiv 2023
[41]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL] https://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023
[42]

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, and Chenglin Wu. 2025. AFlow: Automating Agentic Workflow Generation. arXiv:2410.10762 [cs.AI] https://arxiv.org/abs/2410.10762

work page internal anchor Pith review Pith/arXiv arXiv 2025

[1] [1]

2024.LangGraph: Building stateful, multi-agent applications with LLMs

LangChain AI. 2024.LangGraph: Building stateful, multi-agent applications with LLMs. https://github.com/langchain-ai/langgraph

work page 2024

[2] [2]

Muhammad Arslan, Hussam Ghanem, Saba Munawar, and Christophe Cruz. 2024. A Survey on RAG with LLMs.Procedia computer science246 (2024), 3781–3790

work page 2024

[3] [3]

Victor Barres, Honghua Dong, Soham Ray, Xujie Si, and Karthik Narasimhan. 2025. 𝜏 2-Bench: Evaluating Conversational Agents in a Dual-Control Environment. arXiv:2506.07982 [cs.AI] https://arxiv.org/abs/2506.07982

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

2021.Artificial Intelligence: A Modern Approach, 4th Edition

Stuart Russell by Peter Norvig (Author). 2021.Artificial Intelligence: A Modern Approach, 4th Edition. Pearson, Hoboken, NJ, USA

work page 2021

[5] [5]

Edward Y Chang and Longling Geng. 2025. SagaLLM: Context Management, Val- idation, and Transaction Guarantees for Multi-Agent LLM Planning.Proceedings of the VLDB Endowment18, 12 (2025), 4874–4886

work page 2025

[6] [6]

Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al . 2024. A survey on evaluation of large language models.ACM transactions on intelligent systems and technology15, 3 (2024), 1–45

work page 2024

[7] [7]

Christian Colombo and Gordon J Pace. 2013. Recovery within long-running transactions.ACM Computing Surveys (CSUR)45, 3 (2013), 1–35. ACM CAIS ’26, May 26–29, 2026, San Jose, CA, USA Perera et al

work page 2013

[8] [8]

Eman Daraghmi, Cheng-Pu Zhang, and Shyan-Ming Yuan. 2022. Enhancing saga pattern for distributed transactions within a microservices architecture.Applied Sciences12, 12 (2022), 6242

work page 2022

[9] [9]

Charles T Davies. 1978. Data processing spheres of control.IBM Systems Journal 17, 2 (1978), 179–198

work page 1978

[10] [10]

2024.Haystack: The open source NLP framework for composable AI

deepset GmbH. 2024.Haystack: The open source NLP framework for composable AI. https://github.com/deepset-ai/haystack

work page 2024

[11] [11]

Elmagarmid

Ahmed K. Elmagarmid. 1992.Database Transaction Models for Advanced Applica- tions. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1992

[12] [12]

Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, and Jingren Zhou. 2024. AgentScope: A Flexible yet Robust Multi-Agent Platform. arXiv:2402.14034 [cs.MA] https://arxiv.org/abs/2402.14034

work page arXiv 2024

[13] [13]

Hector Garcia-Molina and Kenneth Salem. 1987. Sagas.ACM Sigmod Record16, 3 (1987), 249–259

work page 1987

[14] [14]

Longling Geng and Edward Y. Chang. 2025. REALM-Bench: A Benchmark for Evaluating Multi-Agent Systems on Real-world, Dynamic Planning and Schedul- ing Tasks. arXiv:2502.18836 [cs.AI] https://arxiv.org/abs/2502.18836

work page arXiv 2025

[15] [15]

Longling Geng and Edward Y. Chang. 2025. SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning. https: //github.com/genglongling/SagaLLM

work page 2025

[16] [16]

Jim Gray. 1981. The transaction concept: virtues and limitations (invited paper). InProceedings of the Seventh International Conference on Very Large Data Bases - Volume 7 (VLDB ’81). VLDB Endowment, Cannes, France, 144–154

work page 1981

[17] [17]

1992.Transaction Processing: Concepts and Tech- niques

Jim Gray and Andreas Reuter. 1992.Transaction Processing: Concepts and Tech- niques. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

work page 1992

[18] [18]

2024.Griptape: Python framework for AI workflows and pipelines

Griptape Team. 2024.Griptape: Python framework for AI workflows and pipelines. https://github.com/griptape-ai/griptape

work page 2024

[19] [19]

Theo Haerder and Andreas Reuter. 1983. Principles of transaction-oriented database recovery.ACM computing surveys (CSUR)15, 4 (1983), 287–317

work page 1983

[20] [20]

Junda He, Christoph Treude, and David Lo. 2025. Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead.ACM Transactions on Software Engineering and Methodology34, 5 (2025), 1–30

work page 2025

[21] [21]

Pat Helland. 2016. Life beyond distributed transactions: an apostate’s opinion. Queue14, 5 (2016), 69–98

work page 2016

[22] [22]

2024.smolagents: A tiny library to build agents that write python code

Hugging Face Team. 2024.smolagents: A tiny library to build agents that write python code. https://github.com/huggingface/smolagents

work page 2024

[23] [23]

On the Move to Meaningful Internet Systems

Rania Khalaf, Dieter Roller, and Frank Leymann. 2009. Revisiting the behavior of fault and compensation handlers in WS-BPEL. InOTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, Rhodes, Greece, 286–303

work page 2009

[24] [24]

2025.Model Context Protocol (MCP) Documentation

LangChain AI. 2025.Model Context Protocol (MCP) Documentation. https: //docs.langchain.com/oss/python/langchain/mcp Accessed: 12 January 2026

work page 2025

[25] [25]

Frank Leymann. 1995. Supporting business transactions via partial backward recovery in workflow management systems. InDatenbanksysteme in Büro, Technik und Wissenschaft: GI-Fachtagung, Dresden, 22.–24. März 1995. Springer, Dresden, Germany, 51–70

work page 1995

[26] [26]

1999.Production Workflow-Concepts and Techniques

Frank Leymann and Dieter Roller. 1999.Production Workflow-Concepts and Techniques. Prentice Hall, Upper Saddle River, NJ, USA

work page 1999

[27] [27]

2024.LlamaIndex: Data framework for LLM applications

LlamaIndex Team. 2024.LlamaIndex: Data framework for LLM applications. https://github.com/run-llama/llama_index

work page 2024

[28] [28]

2024.Semantic Kernel: Integrate LLMs into your applications

Microsoft Semantic Kernel Team. 2024.Semantic Kernel: Integrate LLMs into your applications. https://github.com/microsoft/semantic-kernel

work page 2024

[29] [29]

2024.CrewAI: Orchestrating Role-Playing, Autonomous AI Agents

João Moura. 2024.CrewAI: Orchestrating Role-Playing, Autonomous AI Agents. https://github.com/crewAIInc/crewAI

work page 2024

[30] [30]

2006.Web Services Atomic Transaction (WS-AtomicTransaction) Version 1.1

OASIS WS-TX Technical Committee. 2006.Web Services Atomic Transaction (WS-AtomicTransaction) Version 1.1. OASIS Standard. OASIS. https://docs.oasis- open.org/ws-tx/wstx-wsat-1.1-spec-cd-01.pdf

work page 2006

[31] [31]

2024.OpenAI Agents SDK

OpenAI. 2024.OpenAI Agents SDK. https://github.com/openai/openai-agents- python

work page 2024

[32] [32]

Melissa Z. Pan, Negar Arabzadeh, Riccardo Cogo, Yuxuan Zhu, Alexander Xiong, Lakshya A Agrawal, Huanzhi Mao, Emma Shen, Sid Pallerla, Liana Patel, Shu Liu, Tianneng Shi, Xiaoyuan Liu, Jared Quincy Davis, Emmanuele Lacavalla, Alessandro Basile, Shuyi Yang, Paul Castro, Daniel Kang, Joseph E. Gonzalez, Koushik Sen, Dawn Song, Ion Stoica, Matei Zaharia, and ...

work page arXiv 2026

[33] [33]

2024.PydanticAI: Agent Framework for Production-Grade Genera- tive AI

Pydantic Team. 2024.PydanticAI: Agent Framework for Production-Grade Genera- tive AI. https://github.com/pydantic/pydantic-ai

work page 2024

[34] [34]

Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, and Niranjan Balasubramanian

work page

[35] [35]

InProceedings of the 62nd Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers)

Appworld: A controllable world of apps and people for benchmarking interactive coding agents. InProceedings of the 62nd Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Bangkok, Thailand, 16022–16076

work page

[36] [36]

Andrew Wang, Sophia Hager, Adi Asija, Daniel Khashabi, and Nicholas Andrews

work page

[37] [37]

arXiv:2508.11027 [cs.CL] https://arxiv.org/abs/2508.11027

Hell or High Water: Evaluating Agentic Recovery from External Failures. arXiv:2508.11027 [cs.CL] https://arxiv.org/abs/2508.11027

work page arXiv

[38] [38]

2005.Web services platform architecture: SOAP, WSDL, WS- policy, WS-addressing, WS-BPEL, WS-reliable messaging and more

Sanjiva Weerawarana, Francisco Curbera, Frank Leymann, Tony Storey, and Donald F Ferguson. 2005.Web services platform architecture: SOAP, WSDL, WS- policy, WS-addressing, WS-BPEL, WS-reliable messaging and more. Prentice Hall, Upper Saddle River, NJ, USA

work page 2005

[39] [39]

WSO2. 2026. Source Code and Data for RAC. https://github.com/wso2- incubator/research-rac

work page 2026

[40] [40]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. 2023. AutoGen: Enabling Next- Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155 [cs.AI] https://arxiv.org/abs/2308.08155

work page internal anchor Pith review Pith/arXiv arXiv 2023

[41] [41]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL] https://arxiv.org/abs/2210.03629

work page internal anchor Pith review Pith/arXiv arXiv 2023

[42] [42]

Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, and Chenglin Wu. 2025. AFlow: Automating Agentic Workflow Generation. arXiv:2410.10762 [cs.AI] https://arxiv.org/abs/2410.10762

work page internal anchor Pith review Pith/arXiv arXiv 2025