pith. sign in

arxiv: 2605.10057 · v3 · pith:CZGDPYCFnew · submitted 2026-05-11 · 💻 cs.AI · cs.MA

STAR: Failure-Aware Markovian Routing for Multi-Agent Spatiotemporal Reasoning

Pith reviewed 2026-05-19 14:40 UTC · model grok-4.3

classification 💻 cs.AI cs.MA
keywords failure-aware routingmulti-agent systemsspatiotemporal reasoningMarkovian routingrecovery transitionsexecution tracesLLM tool augmentationagent routing matrix
0
0 comments X

The pith

STAR models inter-agent routing as a Markovian transition policy conditioned on typed failure states to learn specific recovery transitions from unsuccessful traces.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents STAR as a framework that externalizes routing decisions among heterogeneous specialist agents in spatiotemporal reasoning tasks. Instead of leaving recovery implicit in language generation, it uses a routing matrix that blends expert-defined nominal paths with transitions learned from both successful and failed executions. The matrix distinguishes failure categories such as malformed outputs, missing dependencies, and tool mismatches, so the system can respond differently rather than issuing generic retries. Retaining unsuccessful traces during training expands the policy's coverage of error states, which the authors show produces measurable gains on queries that deviate from expected routes. This approach is tested across three benchmarks and eight backbone models, with the largest benefits appearing precisely where nominal routing breaks.

Core claim

STAR externalizes inter-agent control as a state-conditioned transition policy over the current agent, task type, and typed execution status. At its center is an agent routing matrix that fuses expert-specified nominal routes with recovery transitions learned from execution traces. Because the matrix conditions on distinct failure states rather than collapsing them, the router can select different recoveries for malformed outputs, missing dependencies, and tool-query mismatches. Retaining unsuccessful traces during training enlarges the support of the routing policy on error states, enabling recovery transitions that success-only training cannot represent. This yields improvements over prior

What carries the argument

The agent routing matrix, a state-conditioned transition policy that mixes nominal routes with learned recoveries conditioned on typed failure categories.

If this is right

  • The routing policy acquires explicit support on error states, allowing recovery transitions absent from success-only training.
  • Improvements appear most clearly on queries whose execution deviates from the nominal routing path.
  • Typed failure-aware routing, rather than specialist composition alone, drives the observed gains across benchmarks.
  • The blackboard protocol for intermediate results supports downstream fusion once recovery transitions restore valid state.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same matrix structure could be applied to other multi-agent domains where execution paths have qualitatively different failure modes.
  • Explicit state tracking may reduce reliance on prompt-based recovery heuristics in tool-augmented LLM systems.
  • Retaining failure traces suggests a general training principle for policy learning in environments with sparse success signals.

Load-bearing premise

Failure states can be accurately and consistently typed into distinct categories such as malformed outputs or missing dependencies during execution, so the matrix can learn type-specific recoveries instead of treating all errors as one signal.

What would settle it

A controlled run on the same benchmarks where failure types are deliberately collapsed into a single generic error signal or where unsuccessful traces are discarded, showing that the reported gains on deviated queries disappear.

Figures

Figures reproduced from arXiv: 2605.10057 by Flora D. Salim, Hao Xue, Lihuan Li, Ruiyi Yang.

Figure 1
Figure 1. Figure 1: STAR architecture. Queries are parsed into a task profile, the failure-aware routing matrix selects specialists conditioned on the current agent, task type, execution status, and specialists execute through an extract-compute-deposit protocol over a shared blackboard before final fusion. This paper proposes STAR (Spatio-Temporal Agent Router), a failure-aware routing framework that externalizes inter-agent… view at source ↗
Figure 2
Figure 2. Figure 2: State-conditioned routing matrix slices for a representative task type. Each panel visualizes [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Failure-aware routing and execution feedback in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Routing example through the dual-system kernel. System 1 nominal routes (green arrows) [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Empirical precision–coverage trade-off of [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Learned transition matrix M averaged across task types, rendered as four status-conditional heatmaps (SUCCESS / FAIL / INFO_MISSING / BLOCKED). Rows index the from_agent, columns index the to_agent, and each cell is P(next | from,status). Each benchmark produces a qualitatively different SUCCESS map (because task taxonomies differ), but every benchmark exhibits the same structural property—error-state rows… view at source ↗
Figure 7
Figure 7. Figure 7: Top-3 successors per failure status, grouped by originating agent. For every [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
read the original abstract

Compositional spatiotemporal reasoning often requires a system to invoke multiple heterogeneous specialists, such as geometric, temporal, topological, and trajectory agents. A central question is how such a system should route among specialists when execution does not simply succeed or fail, but fails in qualitatively different ways. Existing tool-augmented and multi-agent LLM systems typically leave this routing decision implicit in language generation, making recovery ad hoc, difficult to interpret, and hard to optimize. This paper presents STAR (Spatio-Temporal Agent Router), a failure-aware routing framework that externalizes inter-agent control as a state-conditioned transition policy over the current agent, task type, and typed execution status. At the center of STARis an agent routing matrix that combines expert-specified nominal routes with recovery transitions learned from execution traces. Because the matrix conditions on distinct failure states, the router can respond differently to malformed outputs, missing dependencies, and tool--query mismatches, rather than collapsing them into a generic retry signal. Specialists execute through a tool-grounded extract--compute--deposit protocol and write intermediate results to a shared blackboard for downstream fusion. Results prove that retaining unsuccessful traces during training enlarges the support of the routing policy on error states, enabling recovery transitions that success-only training cannot represent. Across three spatiotemporal benchmarks and eight backbone LLMs, STAR improves over multiple baselines with the clearest gains on queries whose execution deviates from the nominal routing path. Router-specific ablations and recovery analyses further show that typed failure-aware routing, rather than specialist composition alone, is a key factor for these improvements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces STAR, a failure-aware Markovian routing framework for multi-agent spatiotemporal reasoning. It externalizes inter-agent control as a state-conditioned transition policy over the current agent, task type, and typed execution status. At its core is an agent routing matrix that combines expert-specified nominal routes with recovery transitions learned from execution traces (including unsuccessful ones). The central claim is that conditioning on distinct failure types (malformed outputs, missing dependencies, tool-query mismatches) allows type-specific recoveries that enlarge policy support on error states, unlike success-only training or generic signals. Empirical results across three spatiotemporal benchmarks and eight backbone LLMs show improvements over baselines, with clearest gains on queries whose execution deviates from the nominal path.

Significance. If the empirical claims and the role of typed failure-aware routing hold after proper controls, this would be a meaningful contribution to multi-agent LLM systems. It provides an explicit, optimizable mechanism for handling qualitatively different failures rather than ad-hoc language-based recovery, and the use of unsuccessful traces to learn recovery transitions is a potentially useful idea for enlarging policy support on error states.

major comments (2)
  1. [Abstract / Results] Abstract and Results: the claim that retaining unsuccessful traces 'enlarges the support of the routing policy on error states, enabling recovery transitions that success-only training cannot represent' is load-bearing for the central contribution, yet the abstract provides no details on experimental setup, statistical significance, controls, or how success-only baselines were constructed. Without these, it is impossible to verify whether the reported gains on deviated paths are attributable to the typed recovery transitions rather than other factors.
  2. [Routing matrix / failure typing] Routing matrix description (central mechanism): the framework assumes failure states can be accurately and consistently typed into distinct categories during trace collection so that the matrix can learn type-specific recoveries. The paper should provide evidence (e.g., typing accuracy, inter-rater agreement, or an ablation on untyped vs. typed failures) because systematic misclassification would cause the learned transitions to collapse, rendering the gains on deviated paths indistinguishable from a generic-retry or success-only baseline.
minor comments (2)
  1. [Abstract] The abstract is dense; splitting the description of the routing matrix from the empirical claims would improve readability.
  2. [Method] Notation for the transition policy and routing matrix should be introduced with a clear equation or diagram early in the method section to avoid ambiguity when discussing nominal vs. recovery transitions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. The comments highlight important aspects of clarity and validation that will strengthen the manuscript. We address each major comment below and commit to revisions that incorporate the requested details and evidence.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results: the claim that retaining unsuccessful traces 'enlarges the support of the routing policy on error states, enabling recovery transitions that success-only training cannot represent' is load-bearing for the central contribution, yet the abstract provides no details on experimental setup, statistical significance, controls, or how success-only baselines were constructed. Without these, it is impossible to verify whether the reported gains on deviated paths are attributable to the typed recovery transitions rather than other factors.

    Authors: We agree that the abstract and results presentation would benefit from greater specificity to support the central claim. In the revised manuscript we will expand the abstract to summarize the experimental setup (three spatiotemporal benchmarks, eight backbone LLMs, multiple independent runs), note that statistical significance was evaluated with paired tests across runs, and briefly describe the success-only baseline construction (routing matrix trained exclusively on successful traces). The results section will be augmented with an explicit subsection on controls, including direct comparison to a generic-retry baseline and reporting of standard deviations and p-values. These additions will make the attribution of gains on deviated paths to typed recovery transitions explicit and verifiable. revision: yes

  2. Referee: [Routing matrix / failure typing] Routing matrix description (central mechanism): the framework assumes failure states can be accurately and consistently typed into distinct categories during trace collection so that the matrix can learn type-specific recoveries. The paper should provide evidence (e.g., typing accuracy, inter-rater agreement, or an ablation on untyped vs. typed failures) because systematic misclassification would cause the learned transitions to collapse, rendering the gains on deviated paths indistinguishable from a generic-retry or success-only baseline.

    Authors: We accept that explicit validation of the failure-typing process is required. The manuscript currently defines three failure categories from execution traces (malformed outputs, missing dependencies, tool-query mismatches) via a combination of deterministic rules and LLM-assisted labeling. In revision we will add (1) an ablation that trains and evaluates an untyped variant in which all failure states share a single recovery transition, and (2) quantitative evidence of typing reliability: accuracy on a held-out manually annotated subset together with inter-annotator agreement (Cohen’s kappa). The ablation will directly test whether type-specific transitions provide benefit beyond a collapsed generic-retry policy; if misclassification were dominant, the typed and untyped curves would be statistically indistinguishable. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is empirical and self-contained

full rationale

The paper presents STAR as an empirical framework that learns recovery transitions from execution traces including failures and evaluates improvements on separate spatiotemporal benchmarks against baselines. The abstract describes the routing matrix as combining expert nominal routes with trace-learned recoveries conditioned on typed failures, but this is a modeling choice whose performance gains are measured externally rather than defined into existence. No equations, self-citations, or fitted quantities are shown reducing the reported results to the inputs by construction. The derivation chain therefore remains independent of the evaluation data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the ability to identify and categorize execution failures into distinct types and on the existence of execution traces that capture both nominal and error states for training the recovery transitions.

axioms (1)
  • domain assumption Specialists execute through a tool-grounded extract-compute-deposit protocol and write results to a shared blackboard.
    Invoked as the execution mechanism for heterogeneous agents in the routing framework.

pith-pipeline@v0.9.0 · 5818 in / 1367 out tokens · 54207 ms · 2026-05-19T14:40:20.404072+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    At the center of STAR is an agent routing matrix that combines expert-specified nominal routes with recovery transitions learned from execution traces. Because the matrix conditions on distinct failure states, the router can respond differently to malformed outputs, missing dependencies, and tool–query mismatches

  • IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery and orbit embedding unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Theorem 1 (Recovery Reachability Dominance). Let Mα be the transition matrix trained with w(r)=r+α(1−r) for α>0, and let M0 be the success-only matrix. ... supp Mα[a,s,t,·] ≥ supp M0[a,s,t,·]

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 6 internal anchors

  1. [1]

    Can large language models be good path planners? a benchmark and investigation on spatial-temporal reasoning.arXiv preprint arXiv:2310.03249, 2023

    Mohamed Aghzal, Erion Plaku, and Ziyu Yao. Can large language models be good path planners? a benchmark and investigation on spatial-temporal reasoning.arXiv preprint arXiv:2310.03249, 2023

  2. [2]

    Graph of thoughts: Solving elaborate problems with large language models

    Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. InProceedings of the AAAI conference on artificial intelligence, volume 38, pages 17682–17690, 2024

  3. [3]

    V-star: Bench- marking video-llms on video spatio-temporal reasoning.arXiv preprint arXiv:2503.11495, 2025

    Zixu Cheng, Jian Hu, Ziquan Liu, Chenyang Si, Wei Li, and Shaogang Gong. V-star: Bench- marking video-llms on video spatio-temporal reasoning.arXiv preprint arXiv:2503.11495, 2025

  4. [4]

    From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

    Mohamed Amine Ferrag, Norbert Tihanyi, and Merouane Debbah. From llm reasoning to autonomous ai agents: A comprehensive review.arXiv preprint arXiv:2504.19678, 2025

  5. [5]

    Tremu: Towards neuro-symbolic temporal reasoning for llm-agents with memory in multi-session dialogues

    Yubin Ge, Salvatore Romeo, Jason Cai, Raphael Shu, Yassine Benajiba, Monica Sunkara, and Yi Zhang. Tremu: Towards neuro-symbolic temporal reasoning for llm-agents with memory in multi-session dialogues. InFindings of the Association for Computational Linguistics: ACL 2025, pages 18974–18988, 2025

  6. [6]

    Llm-based intent processing and network optimization using attention-based hierarchical reinforcement learning

    Md Arafat Habib, Pedro Enrique Iturria Rivera, Yigit Ozcan, Medhat Elsayed, Majid Bavand, Raimundus Gaigalas, and Melike Erol-Kantarci. Llm-based intent processing and network optimization using attention-based hierarchical reinforcement learning. In2025 IEEE Wireless Communications and Networking Conference (WCNC), pages 1–6. IEEE, 2025

  7. [7]

    Exploring advanced llm multi-agent systems based on blackboard architecture.arXiv preprint arXiv:2507.01701, 2025

    Bochen Han and Songmao Zhang. Exploring advanced llm multi-agent systems based on blackboard architecture.arXiv preprint arXiv:2507.01701, 2025

  8. [8]

    Metagpt: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations, 2023

  9. [9]

    Stbench: Assessing the ability of large language models in spatio-temporal analysis

    Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, and Jingping Bi. Stbench: Assessing the ability of large language models in spatio-temporal analysis. InCompanion Proceedings of the ACM on Web Conference 2025, pages 749–752, 2025

  10. [10]

    A survey on llm-based multi-agent systems: workflow, infrastructure, and challenges.Vicinagearth, 1(1):9, 2024

    Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, and Yi Yang. A survey on llm-based multi-agent systems: workflow, infrastructure, and challenges.Vicinagearth, 1(1):9, 2024

  11. [11]

    Zechen Li, Baiyu Chen, Hao Xue, and Flora D. Salim. Zara: Training-free motion time-series reasoning via evidence-grounded llm agents.arXiv preprint arXiv:2508.04038, 2026

  12. [12]

    Chameleon: Plug-and-play compositional reasoning with large language models.Advances in Neural Information Processing Systems, 36:43447–43478, 2023

    Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, and Jianfeng Gao. Chameleon: Plug-and-play compositional reasoning with large language models.Advances in Neural Information Processing Systems, 36:43447–43478, 2023

  13. [13]

    Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534–46594, 2023

    Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self-feedback.Advances in neural information processing systems, 36:46534–46594, 2023. 10

  14. [14]

    Omnirouter: Budget and performance controllable multi-llm routing.ACM SIGKDD Explorations Newsletter, 27(2):107–116, 2025

    Kai Mei, Wujiang Xu, Minghao Guo, Shuhang Lin, and Yongfeng Zhang. Omnirouter: Budget and performance controllable multi-llm routing.ACM SIGKDD Explorations Newsletter, 27(2):107–116, 2025

  15. [15]

    STReasoner: Empowering LLMs for Spatio-Temporal Reasoning in Time Series via Spatial-Aware Reinforcement Learning

    Juntong Ni, Shiyu Wang, Ming Jin, Qi He, and Wei Jin. Streasoner: Empowering llms for spatio-temporal reasoning in time series via spatial-aware reinforcement learning.arXiv preprint arXiv:2601.03248, 2026

  16. [16]

    Taskweaver: A code-first agent framework

    Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue Zhang, Lu Wang, et al. Taskweaver: A code-first agent framework.arXiv preprint arXiv:2311.17541, 2023

  17. [17]

    Benchmarking spatiotemporal reasoning in llms and reasoning models: Capabilities and challenges.arXiv preprint arXiv:2505.11618, 2025

    Pengrui Quan, Brian Wang, Kang Yang, Liying Han, and Mani Srivastava. Benchmarking spatiotemporal reasoning in llms and reasoning models: Capabilities and challenges.arXiv preprint arXiv:2505.11618, 2025

  18. [18]

    Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

    Matthew Renze and Erhan Guven. Self-reflection in llm agents: Effects on problem-solving performance.arXiv preprint arXiv:2405.06682, 2024

  19. [19]

    Llm-based multi-agent blackboard system for information discovery in data science.arXiv preprint arXiv:2510.01285, 2025

    Alireza Salemi, Mihir Parmar, Palash Goyal, Yiwen Song, Jinsung Yoon, Hamed Zamani, Tomas Pfister, and Hamid Palangi. Llm-based multi-agent blackboard system for information discovery in data science.arXiv preprint arXiv:2510.01285, 2025

  20. [20]

    Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems, 36:68539– 68551, 2023

  21. [21]

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017

  22. [22]

    Exploring multi-modal data with tool-augmented llm agents for precise causal discovery

    ChengAo Shen, Zhengzhang Chen, Dongsheng Luo, Dongkuan Xu, Haifeng Chen, and Jingchao Ni. Exploring multi-modal data with tool-augmented llm agents for precise causal discovery. In Findings of the Association for Computational Linguistics: ACL 2025, pages 636–660, 2025

  23. [23]

    Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

  24. [24]

    Learning options in reinforcement learning

    Martin Stolle and Doina Precup. Learning options in reinforcement learning. InInternational Symposium on abstraction, reformulation, and approximation, pages 212–223. Springer, 2002

  25. [25]

    Multi-Agent Collaboration Mechanisms: A Survey of LLMs

    Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322, 2025

  26. [26]

    Beyond react: A planner-centric framework for complex tool- augmented llm reasoning

    Xiaolong Wei, Yuehu Dong, Xingliang Wang, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, and Dawei Yin. Beyond react: A planner-centric framework for complex tool- augmented llm reasoning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 33845–33853, 2026

  27. [27]

    Autogen: Enabling next-gen llm applications via multi-agent conversations

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. InFirst conference on language modeling, 2024

  28. [28]

    Large language models can learn temporal reasoning

    Siheng Xiong, Ali Payani, Ramana Kompella, and Faramarz Fekri. Large language models can learn temporal reasoning. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10452–10470, 2024

  29. [29]

    Ruiyi Yang, Hao Xue, Imran Razzak, Hakim Hacid, and Flora D. Salim. Reloop: Recur- sive retrieval with multi-hop reasoner and planners for heterogeneous qa.arXiv preprint arXiv:2510.20505, 2025. 11

  30. [30]

    Tree of thoughts: Deliberate problem solving with large language models.Ad- vances in neural information processing systems, 36:11809–11822, 2023

    Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models.Ad- vances in neural information processing systems, 36:11809–11822, 2023

  31. [31]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022

  32. [32]

    which parameters?

    Yao Yao, Zuchao Li, and Hai Zhao. Got: Effective graph-of-thought reasoning in language models. InFindings of the Association for Computational Linguistics: NAACL 2024, pages 2901–2921, 2024. 12 A Theoretical Proofs and Structural Properties This appendix collects proofs, structural properties, and auxiliary analysis for STAR. Theorem 1 is stated in the m...