NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration
Pith reviewed 2026-05-19 08:08 UTC · model grok-4.3
Add this Pith Number to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{A3QYWFLZ}
Prints a linked pith:A3QYWFLZ badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
NaviAgent uses bilevel planning on a tool navigation graph to orchestrate thousands of interdependent tools without error buildup.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that modeling the tool set as a navigation graph and maintaining a continuously evolving Tool World Navigation Model that encodes structural and behavioral relations among tools allows the agent to generate scalable invocation sequences. At the planning level the model decides among direct answers, clarification, toolchain use, or output execution; at the execution level the navigation model guides concrete calls. Experiments show this architecture attains the highest success rates across models and tasks, with the navigation model adding gains of up to 17 points on complex tasks.
What carries the argument
The Tool World Navigation Model (TWNM), a continuously updated graph encoding that captures how tools relate to one another structurally and behaviorally so the agent can plan sequences without stepping through calls one at a time.
If this is right
- Task success rates become highest across different language models and task difficulties.
- Complex multi-tool workflows show measurable gains once the navigation model is active.
- Closed-loop updates from real executions improve both planning and execution over time.
- Agent behavior shifts from isolated tool calls to adaptive navigation of an entire tool ecosystem.
Where Pith is reading between the lines
- The same bilevel split could be applied to other settings where many components must be composed, such as library selection in code generation or service chaining in cloud workflows.
- Maintaining an explicit relation graph may reduce the cognitive load placed on the language model itself during long-horizon planning.
- If the model can be kept accurate at scale, the approach suggests a route toward agents that treat tool use as graph search rather than sequential guessing.
Load-bearing premise
Feedback from actual tool runs can keep updating the navigation model so that it continues to represent relations among thousands of tools accurately and without adding new sources of error or hitting scaling limits.
What would settle it
A controlled test that increases the tool count from hundreds to several thousand while tracking whether success rates stay above step-by-step baselines or begin to fall once the navigation model has received the same volume of real feedback.
Figures
read the original abstract
Large language models (LLMs) have recently demonstrated the ability to act as function call agents by invoking external tools, enabling them to solve tasks beyond their static knowledge. However, existing agents typically call tools step by step at a time without a global view of task structure. As tools depend on each other, this leads to error accumulation and limited scalability, particularly when scaling to thousands of tools. To address these limitations, we propose NaviAgent, a novel bilevel architecture that decouples task planning from tool execution through graph-based modeling of the tool ecosystem. At the task-planning level, the LLM-based agent decides whether to respond directly, clarify user intent, invoke a toolchain, or execute tool outputs, ensuring broad coverage of interaction scenarios independent of inter-tool complexity. At the execution level, a continuously evolving Tool World Navigation Model (TWNM) encodes structural and behavioral relations among tools, guiding the agent to generate scalable and robust invocation sequences. By incorporating feedback from real tool interactions, NaviAgent supports closed-loop optimization of planning and execution, moving beyond tool calling toward adaptive navigation of large-scale tool ecosystems. Experiments show that NaviAgent achieves the best task success rates across models and tasks, and integrating TWMN further boosts performance by up to 17 points on complex tasks, underscoring its key role in toolchain orchestration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes NaviAgent, a bilevel architecture for LLM tool agents that decouples high-level task planning (deciding to respond, clarify, invoke a toolchain, or execute outputs) from low-level execution. A continuously evolving Tool World Navigation Model (TWNM) encodes structural and behavioral relations among tools via real-interaction feedback to enable scalable, robust invocation sequences on large tool graphs. Experiments are claimed to show that NaviAgent achieves the best task success rates across models and tasks, with TWNM integration yielding up to 17-point gains on complex tasks.
Significance. If the reported performance improvements hold under rigorous evaluation, the bilevel graph-navigation approach could meaningfully advance scalable tool orchestration for LLM agents by mitigating error accumulation and providing closed-loop adaptation, addressing a recognized bottleneck when tool counts reach thousands.
major comments (2)
- [Experiments] Experiments section: the central claim of 'best task success rates' and 'up to 17-point boosts' is load-bearing yet unsupported by any reported baselines, metrics, task definitions, number of runs, or statistical tests in the provided text, preventing verification that the data actually supports superiority over prior agents.
- [§3] §3 (TWNM description): the claim that feedback from real tool interactions allows the model to 'accurately encode structural and behavioral relations among thousands of interdependent tools without introducing new error sources' lacks a concrete update rule, graph construction algorithm, or scalability analysis, making the weakest assumption untestable from the manuscript.
minor comments (2)
- [Abstract] Abstract, final sentence: 'TWMN' appears to be a typo for 'TWNM'.
- [§2] Notation: the distinction between 'toolchain' and 'tool invocation sequence' is used without a formal definition or diagram, which could be clarified for readers.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for recognizing the potential of the bilevel graph-navigation approach to address scalability in large tool ecosystems. We address each major comment below and will perform a major revision to strengthen the manuscript.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the central claim of 'best task success rates' and 'up to 17-point boosts' is load-bearing yet unsupported by any reported baselines, metrics, task definitions, number of runs, or statistical tests in the provided text, preventing verification that the data actually supports superiority over prior agents.
Authors: We agree that the Experiments section in the current manuscript lacks sufficient detail to allow full verification of the claims. In the revised version, we will expand this section to include: explicit descriptions of all baselines (ReAct, Plan-and-Execute, Toolformer, and other relevant agents), precise definitions of metrics (task success rate as primary, with secondary metrics such as average tool calls and error rate), task definitions and datasets (ToolBench, API-Bank, and our custom large-scale tool graph with 1000+ tools), number of runs (5 independent runs with different random seeds), and statistical analysis (paired t-tests with p-values and confidence intervals). We will also include tables reporting raw success rates with standard deviations to substantiate the up to 17-point gains from TWNM integration. revision: yes
-
Referee: [§3] §3 (TWNM description): the claim that feedback from real tool interactions allows the model to 'accurately encode structural and behavioral relations among thousands of interdependent tools without introducing new error sources' lacks a concrete update rule, graph construction algorithm, or scalability analysis, making the weakest assumption untestable from the manuscript.
Authors: We acknowledge that Section 3 would benefit from greater concreteness. In the revision, we will add: (1) the precise update rule for real-interaction feedback (an incremental edge-weight update formula based on observed success/failure and co-invocation frequency), (2) the graph construction algorithm (nodes as tools with feature vectors, directed edges initialized from API documentation and refined via execution traces using a thresholded dependency score), and (3) a scalability analysis (O(n log n) update complexity per interaction batch with empirical curves for tool counts from 100 to 5000, plus memory footprint measurements). These additions will make the claim that the closed-loop mechanism avoids new error sources directly testable. revision: yes
Circularity Check
No significant circularity; architecture and experiments are self-contained
full rationale
The paper proposes NaviAgent as a bilevel architecture decoupling task planning from tool execution via a graph-based Tool World Navigation Model (TWNM) that evolves from real-interaction feedback. Central claims rest on this design choice and reported experimental success rates (including up to 17-point gains on complex tasks), without any equations, fitted parameters renamed as predictions, self-citations invoked for uniqueness theorems, or ansatzes smuggled in. No derivation step reduces by construction to its own inputs; the work is an architectural proposal validated externally by experiments rather than a closed mathematical reduction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can reliably decide among broad interaction scenarios (respond, clarify, invoke toolchain, execute outputs) independent of inter-tool complexity.
- domain assumption Structural and behavioral relations among tools can be encoded in a continuously evolving graph model that guides scalable invocation sequences.
invented entities (1)
-
Tool World Navigation Model (TWNM)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NaviAgent features a bilevel planning architecture that integrates a Multi-Path Decider and a Graph-Encoded Navigator... constructs and navigates a Tool Dependency Heterogeneous Graph (TDHG)
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The Graph-Encoded Navigator... hybrid loss... heuristic search strategy
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Talm: Tool augmented language models
Aaron Parisi, Yao Zhao, and Noah Fiedel. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022
-
[2]
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36:68539–68551, 2023
work page 2023
-
[3]
Towards tool use alignment of large language models
Zhi-Yuan Chen, Shiqi Shen, Guangyao Shen, Gong Zhi, Xu Chen, and Yankai Lin. Towards tool use alignment of large language models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1382–1400, 2024
work page 2024
-
[4]
Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face
Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Processing Systems, 36:38154–38180, 2023
work page 2023
-
[5]
Gpt4tools: Teaching large language model to use tools via self-instruction
Rui Yang, Lin Song, Yanwei Li, Sijie Zhao, Yixiao Ge, Xiu Li, and Ying Shan. Gpt4tools: Teaching large language model to use tools via self-instruction. Advances in Neural Information Processing Systems, 36:71995–72007, 2023
work page 2023
-
[6]
Tool learning with large language models: A survey
Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Tool learning with large language models: A survey. Frontiers of Computer Science, 19(8):198343, 2025
work page 2025
-
[7]
Chameleon: Plug-and-play compositional reasoning with large language models
Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, and Jianfeng Gao. Chameleon: Plug-and-play compositional reasoning with large language models. Advances in Neural Information Processing Systems, 36:43447–43478, 2023
work page 2023
-
[8]
Toolverifier: Generalization to new tools via self-verification
Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, Jingbo Shang, and Jane Dwivedi-Yu. Toolverifier: Generalization to new tools via self-verification. arXiv preprint arXiv:2402.14158, 2024
-
[9]
Confucius: Iterative tool learning from introspection feedback by easy-to- difficult curriculum
Shen Gao, Zhengliang Shi, Minghang Zhu, Bowen Fang, Xin Xin, Pengjie Ren, Zhumin Chen, Jun Ma, and Zhaochun Ren. Confucius: Iterative tool learning from introspection feedback by easy-to- difficult curriculum. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 18030–18038, 2024
work page 2024
-
[10]
Multi-agent systems: A survey about its components, framework and workflow
Diego Maldonado, Edison Cruz, Jackeline Abad Torres, Patricio J Cruz, and Silvana Gamboa. Multi-agent systems: A survey about its components, framework and workflow. IEEE Access, 2024
work page 2024
-
[11]
Ai agents: Evolution, architecture, and real-world applications
Naveen Krishnan. Ai agents: Evolution, architecture, and real-world applications. arXiv preprint arXiv:2503.12687, 2025
-
[12]
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[13]
Appagentx: Evolving gui agents as proficient smartphone users
Wenjia Jiang, Yangyang Zhuang, Chenxi Song, Xu Yang, Joey Tianyi Zhou, and Chi Zhang. Appagentx: Evolving gui agents as proficient smartphone users. arXiv preprint arXiv:2503.02268, 2025. 10
-
[14]
Exploring autonomous agents through the lens of large language models: A review
Saikat Barua. Exploring autonomous agents through the lens of large language models: A review. arXiv preprint arXiv:2404.04442, 2024
-
[15]
Chain of tools: Large language model is an automatic multi-tool learner
Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Zhumin Chen, Suzan Verberne, and Zhaochun Ren. Chain of tools: Large language model is an automatic multi-tool learner. arXiv preprint arXiv:2405.16533, 2024
-
[16]
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
Toolchain*: Efficient action space navigation in large language models with a* search
Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A Rossi, Somdeb Sarkhel, and Chao Zhang. Toolchain*: Efficient action space navigation in large language models with a* search. arXiv preprint arXiv:2310.13227, 2023
-
[18]
Toolnet: Connecting large language models with massive tools via tool graph
Xukun Liu, Zhiyuan Peng, Xiaoyuan Yi, Xing Xie, Lirong Xiang, Yuchen Liu, and Dongkuan Xu. Toolnet: Connecting large language models with massive tools via tool graph. arXiv preprint arXiv:2403.00839, 2024
-
[19]
Weizhi Zhang, Yuanchen Bei, Liangwei Yang, Henry Peng Zou, Peilin Zhou, Aiwei Liu, Yinghui Li, Hao Chen, Jianling Wang, Yu Wang, et al. Cold-start recommendation towards the era of large language models (llms): A comprehensive survey and roadmap. arXiv preprint arXiv:2501.01945, 2025
-
[20]
Llmtreerec: Unleashing the power of large language models for cold-start recommendations
Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, and Ruiming Tang. Llmtreerec: Unleashing the power of large language models for cold-start recommendations. arXiv preprint arXiv:2404.00702, 2024
-
[21]
Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, et al. Can graph learning improve planning in llm-based agents? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[22]
arXiv preprint arXiv:2401.06201
Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Ren Kan, Dongsheng Li, and De- qing Yang. Easytool: Enhancing llm-based agents with concise tool instruction. arXiv preprint arXiv:2401.06201, 2024
-
[23]
Concise and precise context compression for tool-using language models
Yang Xu, Yunlong Feng, Honglin Mu, Yutai Hou, Yitong Li, Xinghao Wang, Wanjun Zhong, Zhongyang Li, Dandan Tu, Qingfu Zhu, et al. Concise and precise context compression for tool-using language models. arXiv preprint arXiv:2407.02043, 2024
-
[24]
Small llms are weak tool learners: A multi-llm agent, 2024
Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, and Fei Huang. Small llms are weak tool learners: A multi-llm agent. arXiv preprint arXiv:2401.07324, 2024
-
[25]
Making language models better tool learners with execution feedback
Shuofei Qiao, Honghao Gui, Chengfei Lv, Qianghuai Jia, Huajun Chen, and Ningyu Zhang. Making language models better tool learners with execution feedback. arXiv preprint arXiv:2305.13068, 2023
-
[26]
Toolfactory: Automating tool generation by leveraging llm to understand rest api documentations
Xinyi Ni, Qiuyang Wang, Yukun Zhang, and Pengyu Hong. Toolfactory: Automating tool generation by leveraging llm to understand rest api documentations. arXiv preprint arXiv:2501.16945, 2025
-
[27]
Re- act: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. Re- act: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[28]
Reflexion: Language agents with verbal reinforcement learning
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 36:8634–8652, 2023
work page 2023
-
[29]
Tree of thoughts: Deliberate problem solving with large language models
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 36:11809–11822, 2023
work page 2023
-
[30]
Controlllm: Augment language models with tools by searching on graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, et al. Controlllm: Augment language models with tools by searching on graphs. In European Conference on Computer Vision, pages 89–105. Springer, 2024
work page 2024
-
[31]
Graph of thoughts: Solving elaborate problems with large language models
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, et al. Graph of thoughts: Solving elaborate problems with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17682–17690, 2024. 11
work page 2024
-
[32]
Inductive representation learning on large graphs
Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017
work page 2017
-
[33]
Link prediction based on graph neural networks
Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. Advances in neural information processing systems, 31, 2018
work page 2018
-
[34]
Graph neural networks: A review of methods and applications
Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods and applications. AI open, 1:57–81, 2020
work page 2020
-
[35]
An analysis of alpha-beta pruning
Donald E Knuth and Ronald W Moore. An analysis of alpha-beta pruning. Artificial intelligence, 6(4):293–326, 1975
work page 1975
-
[36]
Optimization by simulated annealing
Scott Kirkpatrick, C Daniel Gelatt Jr, and Mario P Vecchi. Optimization by simulated annealing. science, 220(4598):671–680, 1983
work page 1983
-
[37]
Genetic algorithms in machine learning
Jonathan Shapiro. Genetic algorithms in machine learning. In Advanced course on artificial intelligence, pages 146–168. Springer, 1999
work page 1999
-
[38]
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs
Minghao Li, Yingxiu Zhao, Bowen Yu, Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei Huang, and Yongbin Li. Api-bank: A comprehensive benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report. arXiv preprint arXiv:2412.15115, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [40]
-
[41]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[43]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[44]
Nima Tajbakhsh, Jae Y Shin, Suryakanth R Gurudu, R Todd Hurst, Christopher B Kendall, Michael B Gotway, and Jianming Liang. Convolutional neural networks for medical image analysis: Full training or fine tuning? IEEE transactions on medical imaging, 35(5):1299–1312, 2016
work page 2016
-
[45]
Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3505–3506, 2020. 12 A Graph Construction Original API Original ParameterParameter...
work page 2020
-
[46]
Gastroenteritis Inquiry ACTION: Direct Response CONTEXT: { "analysis": "1. Common sense judgment: Gastroenteritis is a medical condition within general knowledge. 2. No API required for definition explanation. 3. Parameter verification not applicable", "action": "direct_answer", "intent": "1. Provide medical definition of gastroenteritis 2. User’s explici...
-
[47]
Health Data Logging ACTION: Tool Retrieval CONTEXT: { "analysis": "1. Health data recording requires specialized API 2. No matching API in candidate list 3. Critical parameters provided: user_id=12345, blood_pressure=130/80, heart_rate=90, timestamp=2025-03-05", "action": "retrieve_api", "intent": "1. Store user health metrics 2. Provided parameters: bloo...
work page 2025
-
[48]
Successful data storage confirmation", "action": "direct_answer", "answer": "Your health data has been successfully recorded..." } Last Round: Response Synthesis ACTION: Direct Response CONTEXT: 1.Inflammation of gastrointestinal tract Common causes: Viral/bacterial infections Key symptoms: Diarrhea, vomiting, abdominal cramps Management: Hydration and rest
-
[49]
No data integrity issues detected
Health Data Record Confirmation: All parameters validated successfully through api-RecordHealthData. No data integrity issues detected. 18 C.2 Case 2 Query • Is Beijing in China? What is the current weather forecast for the location at latitude 40.7128 and longitude -74.0060 in metric units? First Round: Intent Analysis
-
[50]
The specific problem addressed by this action is to determine if Beijing is located within China
Common Sense ACTION: Direct Response CONTEXT: {’dependency_rank’: 1, ’intent’: ’Is Beijing in China?’, ’analysis’: ’1. The specific problem addressed by this action is to determine if Beijing is located within China. 2. The user input directly asks about the geographical location of Beijing. 3. This is a common sense question and does not require an API c...
-
[51]
Get Weather ACTION: Tool Retrieval CONTEXT: {’dependency_rank’: 2, ’intent’: ’What is the current weather forecast for the location at latitude 40.7128 and longitude -74.0060 in metric units?’, ’analysis’: ’1. The specific problem addressed by this action is to retrieve the current weather forecast for a specific location using latitude and longitude coor...
-
[52]
The target API ’api-current_weather_data_of_a_location_for_weather’ has already been successfully called and its result is available. 4. The required parameters (lat and lon) are complete and the API call has been confirmed successful.", ’action’: ’direct_answer’, ’answer’: ’The current weather in New York is clear sky with a temperature of 11.0°C, feels ...
-
[53]
Intent Clarification 20 ACTION: Intent Clarification CONTEXT: { "analysis": "1. Weather data needs to be queried in real time -> not common sense\n2. Required parameter (location) is missing", "action": "clarify_intent", "recall_description": "", "answer": "Which city do you want to query tomorrow’s weather?" }
-
[54]
User Answer User Answer: I’m in BeiJing. Second Round: Intent Analysis
-
[55]
Get Weather ACTION: Tool Retrieval CONTEXT: {’dependency_rank’: 1, ’intent’: ’1. Query the weather forecast for tomorrow in Beijing 2. Extract location: Beijing and time: tomorrow from user input’, ’analysis’: ’1. The specific problem addressed by this action is to retrieve the weather forecast for tomorrow in Beijing. 2. The user input directly provides ...
-
[56]
Core Requirements: - Generate a natural-language question where: • Must explicitly contain initial parameters for leaf-node APIs • Implicitly requires chained API calls from leaf to root node • Root node API’s output directly resolves the user’s problem
-
[57]
Dependency Chain Rules: - Build parameter-passing paths where: • Parent API outputs must exactly match child API inputs (same parameter names & data types) • Root node API must be called last in the chain • The output of every leaf-node API must be utilized in downstream APIs or final results. • All input values must originate from either: Explicitly stat...
-
[58]
Parameter Constraints: - Enforce strict value inheritance: • Path/query parameters must use verbatim values from: - User’s question text - Preceding API response.data fields • Prohibit value transformation/format conversion - Root API output must contain realistic values matching its schema
-
[59]
Validation Requirements: - Reject generation if: • Missing parameter dependency between APIs • Input sources can’t be traced to question/prior responses • Output fields don’t fulfill next API’s input requirements
-
[60]
Response Structure: { "query": "<Real-world scenario requiring sequential API calls>", 22 "answer": "<Solution derived from root API output>", "call_chains": [ { "api_name": "<Leaf-node API>", "input": { "<param>": "<value explicitly stated in user query or previous API output>" }, "output": { "status": "success", "data": {"<field>": "<output used by next...
-
[61]
**Intent Analysis** - Decompose compound requests into independent ordered sub-intents • Sequential dependencies first, Must execute in declared order • Parallelizable sub-intents last • Dependency_rank numbering for ordered execution - Validate parallel execution eligibility: • No overlapping data requirements • No sequential dependencies • Distinct para...
-
[62]
**Atomic Action Formation** • For each validated sub-intent: - Create self-contained decision unit, action must implement full Decision Logic Flow - Maintain state separation between parallel processes - Focus analysis scope per sub-intent - Each action’s analysis focuses only on its own intent - Each action analysis only solves one intent - Must execute ...
-
[63]
**Common Sense Judgment Phase** - Input question -> Knowledge base matching Belongs to common sense -> action=direct_answer Requires external data -> Proceed to Phase 2
-
[64]
**API Matching Phase**
-
[65]
If candidate_apis is empty -> action=retrieve_api
-
[66]
Match intent with API list: API prioritization: - Complete parameters from user input - Minimal missing parameters - Shortest dependency chain API matching success: - Validate Observation in user input to confirm target API success: -> If successful -> action=direct_answer -> No explicit success indication: a) Complete parameters -> action=call_api (execu...
-
[67]
**Parameter Completion Phase** - Check required parameter set: All parameters ready -> action=call_api The target API does not require parameters -> action=call_api Missing parameters exist: a) Can be completed via dependent APIs -> Execute Rule 3.1 b) Use Retrieval APIs resolve parameter deficiencies in API dependencies -> action=retrieve_api c) Requires...
-
[68]
<extract data segments directly related to the subtask from user input>", "analysis": "<Four-level reasoning: 1.Explicitly state the specific decision-making sub-intent addressed by this action 2.Common sense judgment basis 3.API matching logic (if applicable) 4.Parameter completeness verification>", "action": "call_api|direct_answer|retrieve_api|clarify_...
-
[69]
Parameter names must strictly match API documentation
-
[70]
The ’answer’ field for clarify_intent must contain question words
-
[71]
Prioritize calling parent node APIs
-
[72]
When action in [retrieve_api]: - The recall_description field serves exclusively as an API retrieval identifier from predefined repositories. - parameter descriptions must distinguish between input and output parameters, retaining only essential parameters - Each recall_description can only recall one api,multiple APIs require 25 multiple actions
-
[73]
APIs absent from Candidate APIs MUST NOT be invented
-
[74]
When action=call_api is permitted only when candidate APIs exist and the target_api is present in the candidate APIs
-
[75]
The "action" field must be strictly limited to one of the following four predefined operation types: call_api, direct_answer,retrieve_api or clarify_intent
-
[76]
Use retrieve_api only when: - Required parameters unavailable in call_api action
-
[77]
Use call_api only when: - The target_api is not in the list of successfully executed APIs --------- # Candidate API Information: E.3.2 Input Generation Prompts Input generation prompts: Integrate current queries with observational data to formulate the final input, ensuring informational completeness. User input:{user_input}\nPlease generate the final res...
-
[78]
Integrate all available data
-
[79]
Indicate data limitations (if any failed APIs exist)
-
[80]
They achieve automated emulation of API chains through standardized JSON responses
Use natural and fluent English E.3.3 API Simulator Prompts API simulator prompts are based on historical data reuse (Case1) and intelligent simulation gen- eration (Case2/3). They achieve automated emulation of API chains through standardized JSON responses. The priority strategy is as follows: historical matching > structural cloning > contextual simulat...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.