Taming "Zombie'' Agents: A Markov State-Aware Framework for Resilient Multi-Agent Evolution
Pith reviewed 2026-05-20 13:45 UTC · model grok-4.3
The pith
AgentRevive manages LLM multi-agent teams with Markov states that keep recovering agents on standby rather than pruning them after temporary hallucinations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AgentRevive dynamically manages agent collaboration through soft state transitions implemented via State-Aware Policy Learning and State-Aware Edge Optimization. Agent states are divided into Active, Standby, and Terminated, with a risk estimator optimizing transitions by assessing hallucination risk to minimize unreliable nodes while protecting valuable ones. Subgraph edges are pruned according to learned states, permanently removing Terminated nodes and retaining Standby nodes to evaluate their future contributions.
What carries the argument
The Markov state-aware framework that divides agents into Active, Standby, and Terminated states and uses a risk estimator to guide soft transitions and selective edge pruning.
If this is right
- Outperforms strong baselines on general reasoning, domain-specific, and hallucination challenge tasks.
- Reduces token consumption through state-aware agent scheduling that avoids unnecessary computation on unreliable nodes.
- Preserves potentially valuable agents by avoiding premature termination due to transient issues like hallucinations or knowledge gaps.
- Enables selective message propagation based on agent memory and state to improve collaboration efficiency.
Where Pith is reading between the lines
- The standby mechanism could reduce waste in agent systems handling long multi-turn dialogues where temporary errors are common.
- Extending the risk estimator to incorporate external verification tools might further stabilize state transitions on factual tasks.
- The approach suggests a general pattern for resilient scheduling that neighboring multi-agent frameworks could adopt to handle noisy agent outputs.
Load-bearing premise
The risk estimator can reliably assess hallucination risk to optimize agent state transitions, and retaining Standby nodes will allow them to contribute meaningfully in subsequent rounds without excessive overhead.
What would settle it
A test set of tasks where the risk estimator consistently fails to flag hallucinating agents, resulting in Standby nodes that degrade overall performance instead of recovering.
Figures
read the original abstract
Recent advancements in LLM-based multi-agent systems have demonstrated remarkable collaborative capabilities across complex tasks. To improve overall efficiency, existing methods often rely on aggressive graph evolution among agents (e.g., node or edge pruning), which risks prematurely discarding valuable agents due to transient issues such as hallucinations or temporary knowledge gaps. However, such hard pruning overlooks the potential for ``zombie'' agents to recover and contribute in subsequent discussion rounds. In this paper, we propose AgentRevive, a Markov state-aware framework for resilient multi-agent evolution. Our approach dynamically manages agent collaboration through soft state transitions, implemented via two key components: (1) State-Aware Policy Learning: Agent states are divided into ``Active'', ``Standby'', and ``Terminated'' states, selectively propagating messages based on agent memory. The policy employs a risk estimator to optimize agent state transitions by assessing hallucination risk, minimizing the influence of unreliable nodes while safeguarding valuable ones. (2) State-Aware Edge Optimization: Subgraph edges are pruned according to states learned from the policy, permanently removing ``Terminated'' nodes and retaining ``Standby'' nodes for subsequent rounds to assess their potential future contributions. Extensive experiments on general reasoning, domain-specific, and hallucination challenge tasks show that our method consistently outperforms strong baselines and significantly reduces token consumption through state-aware agent scheduling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes AgentRevive, a Markov state-aware framework for resilient multi-agent evolution in LLM-based systems. It introduces soft state transitions among Active, Standby, and Terminated states to avoid hard pruning of potentially recoverable 'zombie' agents. The framework includes State-Aware Policy Learning, which uses a risk estimator to optimize transitions based on hallucination risk, and State-Aware Edge Optimization to prune edges accordingly. Experiments on general reasoning, domain-specific, and hallucination tasks demonstrate consistent outperformance over baselines and reduced token consumption.
Significance. If the experimental results hold under rigorous validation, this work offers a practical advance in multi-agent LLM systems by enabling recovery from transient failures like hallucinations instead of permanent agent loss. The structured use of Markov states for dynamic scheduling could improve both performance and efficiency in collaborative setups, addressing a real limitation in existing graph-evolution approaches.
major comments (1)
- [State-Aware Policy Learning (abstract and framework description)] State-Aware Policy Learning component: the risk estimator used to assess hallucination risk and drive transitions among Active/Standby/Terminated states is load-bearing for the token-reduction and outperformance claims. The description does not specify implementation details (e.g., whether it is LLM-prompted from the same model family) or provide independent validation against labeled hallucination datasets or ablations; without this, circularity risks remain unaddressed and could undermine the reliability of the soft-pruning mechanism.
minor comments (2)
- [Abstract] The abstract refers to 'strong baselines' without naming them or the specific metrics/statistical tests used; adding this information would improve immediate readability.
- [Framework overview] Notation for state transitions and the risk estimator could be formalized with a diagram or pseudocode early in the paper to clarify how messages are selectively propagated based on agent memory.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for recognizing the potential of the AgentRevive framework to address limitations in existing multi-agent LLM systems. We address the major comment below and commit to revisions that strengthen the presentation of the State-Aware Policy Learning component.
read point-by-point responses
-
Referee: State-Aware Policy Learning component: the risk estimator used to assess hallucination risk and drive transitions among Active/Standby/Terminated states is load-bearing for the token-reduction and outperformance claims. The description does not specify implementation details (e.g., whether it is LLM-prompted from the same model family) or provide independent validation against labeled hallucination datasets or ablations; without this, circularity risks remain unaddressed and could undermine the reliability of the soft-pruning mechanism.
Authors: We acknowledge that the current manuscript description of the risk estimator is high-level and does not include the requested implementation specifics or supporting experiments. This is a valid observation that could leave readers concerned about circularity. In the revised manuscript we will expand the relevant section to explicitly describe the risk estimator as an independent LLM-based module prompted from a model family distinct from the primary agents, include validation results against labeled hallucination datasets, and add ablation studies that isolate the contribution of the risk estimator to both task performance and token reduction. revision: yes
Circularity Check
No significant circularity; framework components defined independently
full rationale
The paper presents AgentRevive as a Markov state-aware framework with two components: State-Aware Policy Learning (using a risk estimator for hallucination assessment to manage Active/Standby/Terminated transitions) and State-Aware Edge Optimization (pruning edges based on learned states). No equations, derivations, or self-referential definitions are present that reduce performance claims or token reductions to fitted inputs by construction. The risk estimator and state transitions are described as independent mechanisms, with experimental validation on reasoning and hallucination tasks reported separately. No self-citation chains, uniqueness theorems, or ansatz smuggling are invoked in the abstract or framework description to load-bear the central claims.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Zombie agents
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Agent states are divided into “Active”, “Standby”, and “Terminated” states... risk estimator to optimize agent state transitions by assessing hallucination risk
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
f(t)_risk = -E[DKL(M(t)_v || M̄(t)_VA)] ... nuclear norm on adjacency matrices
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Hongzhan Lin and Yang Deng and Yuxuan Gu and Wenxuan Zhang and Jing Ma and See. ACL , pages =. 2025 , url =
work page 2025
-
[2]
Taolin Zhang and Dongyang Li and Qizhou Chen and Chengyu Wang and Xiaofeng He , title =. ACL , pages =. 2025 , url =
work page 2025
-
[3]
Chawla and Olaf Wiest and Xiangliang Zhang , title =
Taicheng Guo and Xiuying Chen and Yaqi Wang and Ruidi Chang and Shichao Pei and Nitesh V. Chawla and Olaf Wiest and Xiangliang Zhang , title =. IJCAI , pages =. 2024 , url =
work page 2024
-
[4]
Bingyu Yan and Xiaoming Zhang and Litian Zhang and Lian Zhang and Ziyi Zhou and Dezhuang Miao and Chaozhuo Li , title =. CoRR , volume =. 2025 , url =
work page 2025
-
[5]
Ronald J. Williams , title =. Mach. Learn. , pages =. 1992 , timestamp =
work page 1992
-
[6]
Guibin Zhang and Yanwei Yue and Zhixun Li and Sukwon Yun and Guancheng Wan and Kun Wang and Dawei Cheng and Jeffrey Xu Yu and Tianlong Chen , title =. ICLR , year =
-
[7]
Zhexuan Wang and Yutong Wang and Xuebo Liu and Liang Ding and Miao Zhang and Jie Liu and Min Zhang , title =. ACL , pages =. 2025 , url =
work page 2025
-
[8]
Song Wang and Zhen Tan and Zihan Chen and Shuang Zhou and Tianlong Chen and Jundong Li , title =. CoRR , volume =. 2025 , url =
work page 2025
-
[9]
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks , booktitle =
Yusen Zhang and Ruoxi Sun and Yanfei Chen and Tomas Pfister and Rui Zhang and Sercan. Chain of Agents: Large Language Models Collaborating on Long-Context Tasks , booktitle =. 2024 , url =
work page 2024
-
[10]
GPTSwarm: Language Agents as Optimizable Graphs , booktitle =
Mingchen Zhuge and Wenyi Wang and Louis Kirsch and Francesco Faccio and Dmitrii Khizbullin and J. GPTSwarm: Language Agents as Optimizable Graphs , booktitle =. 2024 , url =
work page 2024
-
[11]
Shiyuan Li and Yixin Liu and Qingsong Wen and Chengqi Zhang and Shirui Pan , title =. CoRR , volume =. 2025 , url =
work page 2025
-
[12]
Chen Qian and Zihao Xie and Yifei Wang and Wei Liu and Kunlun Zhu and Hanchen Xia and Yufan Dang and Zhuoyun Du and Weize Chen and Cheng Yang and Zhiyuan Liu and Maosong Sun , title =. ICLR , year =
-
[13]
Nicholas Metropolis and S. Ulam , journal =. The Monte Carlo Method , urldate =
-
[14]
Adaptive Graph Pruning for Multi-Agent Communication , journal =
Li Boyi and Zhonghan Zhao and Der. Adaptive Graph Pruning for Multi-Agent Communication , journal =. 2025 , url =
work page 2025
-
[15]
Bingzheng Gan and Yufan Zhao and Tianyi Zhang and Jing Huang and Yusu Li and Shu Xian Teo and Changwang Zhang and Wei Shi , title =. NAACL , pages =. 2025 , url =
work page 2025
-
[16]
Pan and Shuyi Yang and Lakshya A
Mert Cemri and Melissa Z. Pan and Shuyi Yang and Lakshya A. Agrawal and Bhavya Chopra and Rishabh Tiwari and Kurt Keutzer and Aditya G. Parameswaran and Dan Klein and Kannan Ramchandran and Matei Zaharia and Joseph E. Gonzalez and Ion Stoica , title =. CoRR , volume =. 2025 , url =
work page 2025
-
[17]
Shaokun Zhang and Ming Yin and Jieyu Zhang and Jiale Liu and Zhiguang Han and Jingyang Zhang and Beibin Li and Chi Wang and Huazheng Wang and Yiran Chen and Qingyun Wu , title =. CoRR , volume =. 2025 , url =
work page 2025
-
[18]
Jason Wei and Xuezhi Wang and Dale Schuurmans and Maarten Bosma and Brian Ichter and Fei Xia and Ed H. Chi and Quoc V. Le and Denny Zhou , title =. NeurIPS , year =
-
[19]
Shunyu Yao and Dian Yu and Jeffrey Zhao and Izhak Shafran and Tom Griffiths and Yuan Cao and Karthik Narasimhan , title =. NeurIPS , year =
-
[20]
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors , booktitle =
Weize Chen and Yusheng Su and Jingwei Zuo and Cheng Yang and Chenfei Yuan and Chi. AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors , booktitle =. 2024 , url =
work page 2024
-
[21]
Stephanie Lin and Jacob Hilton and Owain Evans , title =. ACL , pages =. 2022 , url =
work page 2022
-
[22]
Wang Ling and Dani Yogatama and Chris Dyer and Phil Blunsom , title =. ACL , pages =. 2017 , url =
work page 2017
-
[23]
Arkil Patel and Satwik Bhattamishra and Navin Goyal , title =. NAACL , pages =. 2021 , url =
work page 2021
-
[24]
Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Mark Chen and Heewoo Jun and Lukasz Kaiser and Matthias Plappert and Jerry Tworek and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman , title =. CoRR , volume =. 2021 , url =
work page 2021
- [25]
-
[26]
The Llama 3 Herd of Models , journal =
Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al. The Llama 3 Herd of Models , journal =. 2024 , url =
work page 2024
-
[27]
Vu Dinh Xuan and Hao Vo and David Murphy and Hoang D. Nguyen , title =. CoRR , volume =. 2025 , url =
work page 2025
- [28]
-
[29]
An Yang and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chengyuan Li and Dayiheng Liu and Fei Huang and Haoran Wei and Huan Lin and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Yang and Jiaxi Yang and Jingren Zhou and Junyang Lin and Kai Dang and Keming Lu and Keqin Bao and Kexin Yang and Le Yu and Mei Li and Mi...
work page 2024
-
[30]
Evaluating Large Language Models Trained on Code , journal =
Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Pond. Evaluating Large Language Models Trained on Code , journal =. 2021 , url =
work page 2021
-
[31]
Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt , title =. ICLR , year =
-
[32]
Qingyun Wu and Gagan Bansal and Jieyu Zhang and Yiran Wu and Beibin Li and Erkang Zhu and Li Jiang and Xiaoyun Zhang and Shaokun Zhang and Jiale Liu and Ahmed Hassan Awadallah and Ryen W White and Doug Burger and Chi Wang , booktitle=. AutoGen: Enabling Next-Gen. 2024 , url=
work page 2024
-
[33]
Maciej Besta and Nils Blach and Ales Kubicek and Robert Gerstenberger and Michal Podstawski and Lukas Gianinazzi and Joanna Gajda and Tomasz Lehmann and Hubert Niewiadomski and Piotr Nyczyk and Torsten Hoefler , title =. AAAI , pages =. 2024 , url =
work page 2024
-
[34]
Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , title =. ICLR , year =
-
[35]
Lingjiao Chen and Jared Quincy Davis and Boris Hanin and Peter Bailis and Ion Stoica and Matei Zaharia and James Zou , title =. CoRR , volume =. 2024 , url =
work page 2024
-
[36]
Weize Chen and Yusheng Su and Jingwei Zuo and Cheng Yang and Chenfei Yuan and Chen Qian and Chi. AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents , journal =. 2023 , url =
work page 2023
-
[37]
Jintian Zhang and Xin Xu and Ningyu Zhang and Ruibo Liu and Bryan Hooi and Shumin Deng , title =. ACL , pages =. 2024 , url =
work page 2024
-
[38]
Dongfu Jiang and Xiang Ren and Bill Yuchen Lin , title =. ACL , pages =. 2023 , url =
work page 2023
-
[39]
Tenenbaum and Igor Mordatch , title =
Yilun Du and Shuang Li and Antonio Torralba and Joshua B. Tenenbaum and Igor Mordatch , title =. ICML , year =
-
[40]
Chen Qian and Wei Liu and Hongzhang Liu and Nuo Chen and Yufan Dang and Jiahao Li and Cheng Yang and Weize Chen and Yusheng Su and Xin Cong and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun , title =. ACL , pages =. 2024 , url =
work page 2024
-
[41]
Sirui Hong and Mingchen Zhuge and Jonathan Chen and Xiawu Zheng and Yuheng Cheng and Jinlin Wang and Ceyao Zhang and Zili Wang and Steven Ka Shing Yau and Zijuan Lin and Liyang Zhou and Chenyu Ran and Lingfeng Xiao and Chenglin Wu and J. MetaGPT: Meta Programming for. ICLR , year =
-
[42]
Samuel Holt and Max Ruiz Luyten and Mihaela van der Schaar , title =. CoRR , volume =. 2023 , url =
work page 2023
-
[43]
Qingyun Wu and Gagan Bansal and Jieyu Zhang and Yiran Wu and Shaokun Zhang and Erkang Zhu and Beibin Li and Li Jiang and Xiaoyun Zhang and Chi Wang , title =. CoRR , volume =. 2023 , url =
work page 2023
-
[44]
Zihao Zhou and Bin Hu and Pu Zhang and Chenyang Zhao and Bin Liu , title =. CoRR , volume =. 2023 , url =
work page 2023
-
[45]
Yoichi Ishibashi and Yoshimasa Nishimura , title =. CoRR , volume =. 2024 , url =
work page 2024
-
[46]
Guibin Zhang and Yanwei Yue and Xiangguo Sun and Guancheng Wan and Miao Yu and Junfeng Fang and Kun Wang and Dawei Cheng , title =. ICML , year =
-
[47]
Joshi and Hanna Moazam and Heather Miller and Matei Zaharia and Christopher Potts , title =
Omar Khattab and Arnav Singhvi and Paridhi Maheshwari and Zhiyuan Zhang and Keshav Santhanam and Sri Vardhamanan and Saiful Haq and Ashutosh Sharma and Thomas T. Joshi and Hanna Moazam and Heather Miller and Matei Zaharia and Christopher Potts , title =. ICLR , year =
-
[48]
LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions , author=. 2025 , url=
work page 2025
-
[49]
Ziwei Xu and Sanjay Jain and Mohan S. Kankanhalli , title =. CoRR , volume =. 2024 , url =
work page 2024
-
[50]
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration , author=. 2024 , url=
work page 2024
-
[51]
Yue Hu and Yuzhu Cai and Yaxin Du and Xinyu Zhu and Xiangrui Liu and Zijie Yu and Yuchen Hou and Shuo Tang and Siheng Chen , title =. ICLR , year =
-
[52]
Hochreiter, Sepp and Schmidhuber, Jürgen , title =. Neural Computation , volume =. 1997 , month =
work page 1997
-
[53]
Jiarui Ji and Runlin Lei and Jialing Bi and Zhewei Wei and Yankai Lin and Xuchen Pan and Yaliang Li and Bolin Ding , title =. CoRR , volume =. 2024 , url =
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.