Recognition: unknown
Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents
Pith reviewed 2026-05-08 15:42 UTC · model grok-4.3
The pith
A lightweight meta-planner separates planning from execution in Earth observation agents by grounding decisions in images, task semantics, and remote-sensing expertise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Lightweight Multimodal Meta-Planner (LMMP) decouples strategic planning from low-level execution, grounds plans through dual awareness of image features and task semantics, and injects domain logic via a Meta Task Library so that generated plans remain physically feasible; the planner is first initialized by supervised fine-tuning on expert trajectories and then aligned by Direct Preference Optimization on execution outcomes, yielding measurable gains in tool accuracy and mission completion across diverse backbones and unseen Earth-observation tasks.
What carries the argument
The Meta Task Library, which injects remote-sensing expert knowledge to standardize domain logic and produce physically feasible plans.
If this is right
- Tool-calling accuracy and overall task success rates rise on EarthBench- and ThinkGeo-derived datasets.
- The same planner module improves performance when attached to multiple different executor backbones.
- Gains persist on Earth-observation missions that were not seen during training.
- The two-stage training pipeline first distills expert plans then refines them from execution feedback.
Where Pith is reading between the lines
- The clean separation of planning from execution may limit error accumulation across long action sequences in other robotic or autonomous systems.
- Adding real-time sensor feedback loops into the Meta Task Library could further tighten the link between perception and feasible action.
- Deployment on physical platforms such as drones or satellites would expose whether the expert-injected plans remain robust under unmodeled environmental noise.
Load-bearing premise
The Meta Task Library successfully injects remote-sensing expert knowledge to standardize domain logic and guarantee physically feasible plans, and the two-stage training pipeline generalizes beyond the specific datasets used.
What would settle it
A controlled test on a fresh collection of Earth-observation missions in which LMMP produces no measurable increase in tool-calling accuracy or task success compared with an integrated single-model baseline would falsify the central claim.
Figures
read the original abstract
Autonomous Earth Observation (EO) agents are transitioning from passive perception to complex, multi-step task execution. However, current architectures that integrate planning and execution within a single model often struggle with combinatorial complexity and reasoning errors in dynamic EO scenarios. To resolve these challenges, we propose the Lightweight Multimodal Meta-Planner (LMMP) framework. LMMP incorporates a dual-awareness mechanism that grounds strategic plans in both multimodal image features and high-level task semantics. Crucially, we introduce a Meta Task Library to inject remote sensing expert knowledge directly into the workflow, which standardizes domain logic and ensures plans are physically feasible. We further implement a two-stage training pipeline, initializing the Meta-Planner via expert-distilled Supervised Fine-Tuning and refining it through Direct Preference Optimization based on execution feedback. Extensive experiments on a dataset derived from EarthBench and ThinkGeo demonstrate that LMMP significantly improves tool-calling accuracy and task success rates. Moreover, the framework exhibits strong ``plug-and-play'' versatility, consistently enhancing the performance of diverse executor backbones across previously unseen EO missions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Lightweight Multimodal Meta-Planner (LMMP) framework for autonomous Earth Observation agents. It introduces a dual-awareness mechanism to ground strategic plans in multimodal image features and high-level task semantics, a Meta Task Library that directly injects remote sensing expert knowledge to standardize domain logic and guarantee physically feasible plans, and a two-stage training pipeline (expert-distilled supervised fine-tuning followed by execution-feedback direct preference optimization). The central claims are that LMMP yields significant gains in tool-calling accuracy and task success rates on datasets derived from EarthBench and ThinkGeo, while exhibiting plug-and-play versatility that improves diverse executor backbones on previously unseen EO missions.
Significance. If the experimental results and feasibility guarantees are substantiated, the separation of a lightweight meta-planner from execution, combined with explicit domain-knowledge injection and preference-based refinement, could offer a practical route to more robust multimodal planning in dynamic EO settings. The plug-and-play design would be particularly valuable for integrating new backbones without retraining the planner. At present, however, the absence of concrete validation for the Meta Task Library and the experimental protocol limits any assessment of broader impact.
major comments (3)
- [Meta Task Library description] Framework description (Meta Task Library subsection): The central claim that the Meta Task Library 'standardizes domain logic and ensures plans are physically feasible' is load-bearing for the entire contribution, yet the manuscript supplies no construction details, explicit rule set, constraint checker, or reference to physical models (e.g., orbital mechanics or sensor coverage). Without these elements it is impossible to determine whether feasibility is enforced at generation time or merely approximated post-hoc by the DPO stage.
- [Experiments and evaluation] Experiments and evaluation section: The abstract asserts 'significant improvements' in tool-calling accuracy and task success rates together with generalization to unseen missions, but reports neither baselines, concrete metrics, statistical tests, dataset construction procedure, nor ablation isolating the Meta Task Library's contribution. This evidentiary gap directly undermines verification of the strongest empirical claims.
- [Two-stage training pipeline] Training pipeline description: The two-stage pipeline is presented as enabling generalization beyond the training distribution, yet no cross-validation, out-of-distribution mission splits, or failure-case analysis is described that would distinguish library-driven feasibility from dataset-specific artifacts or backbone improvements.
minor comments (1)
- [Framework overview] The dual-awareness mechanism would benefit from an explicit diagram or pseudocode showing how multimodal features and task semantics are fused before plan generation.
Simulated Author's Rebuttal
We thank the referee for the thorough and constructive review of our manuscript on the LMMP framework. The comments have helped us identify areas where additional clarity and detail are needed to strengthen the presentation. We address each major comment below and have made corresponding revisions to the manuscript.
read point-by-point responses
-
Referee: [Meta Task Library description] Framework description (Meta Task Library subsection): The central claim that the Meta Task Library 'standardizes domain logic and ensures plans are physically feasible' is load-bearing for the entire contribution, yet the manuscript supplies no construction details, explicit rule set, constraint checker, or reference to physical models (e.g., orbital mechanics or sensor coverage). Without these elements it is impossible to determine whether feasibility is enforced at generation time or merely approximated post-hoc by the DPO stage.
Authors: We acknowledge that the original manuscript did not provide sufficient construction details for the Meta Task Library. In the revised version, we have substantially expanded the relevant subsection to describe the library's construction from remote sensing expert knowledge, including the explicit rule set for standardizing domain logic, the constraint checker implementation, and references to physical models such as orbital mechanics and sensor coverage constraints. These elements are integrated to enforce physical feasibility directly at plan generation time within the meta-planner, prior to any refinement in the DPO stage. We have added pseudocode, examples, and a diagram to illustrate the process. revision: yes
-
Referee: [Experiments and evaluation] Experiments and evaluation section: The abstract asserts 'significant improvements' in tool-calling accuracy and task success rates together with generalization to unseen missions, but reports neither baselines, concrete metrics, statistical tests, dataset construction procedure, nor ablation isolating the Meta Task Library's contribution. This evidentiary gap directly undermines verification of the strongest empirical claims.
Authors: We agree that the experimental reporting in the initial submission required greater explicitness to allow full verification of the claims. The revised manuscript expands the Experiments and evaluation section to clearly list all baselines, report concrete metrics and statistical test results, provide a detailed account of the dataset construction procedure derived from EarthBench and ThinkGeo, and include an ablation study that isolates the Meta Task Library's specific contribution to the observed gains in tool-calling accuracy, task success rates, and generalization performance. revision: yes
-
Referee: [Two-stage training pipeline] Training pipeline description: The two-stage pipeline is presented as enabling generalization beyond the training distribution, yet no cross-validation, out-of-distribution mission splits, or failure-case analysis is described that would distinguish library-driven feasibility from dataset-specific artifacts or backbone improvements.
Authors: We appreciate the referee's point regarding the need for stronger evidence of generalization. In the revised manuscript, we have augmented the Training pipeline description with details on the cross-validation procedure, the explicit out-of-distribution mission splits drawn from the ThinkGeo dataset for testing on previously unseen EO missions, and a failure-case analysis. This analysis supports that the feasibility guarantees and performance improvements arise primarily from the Meta Task Library and dual-awareness mechanism rather than dataset artifacts or backbone-specific effects. revision: yes
Circularity Check
No circularity: framework and results are independently constructed and empirically validated
full rationale
The paper introduces a new LMMP architecture, Meta Task Library, and two-stage training pipeline as explicit contributions, then validates them via experiments on derived external datasets (EarthBench/ThinkGeo). No equations, parameters, or predictions reduce by construction to the inputs; the Meta Task Library is presented as an injected knowledge source rather than a self-defined output, and performance gains are measured against baselines rather than fitted to the same quantities. The derivation chain is self-contained against external benchmarks with no load-bearing self-citation or renaming of known results.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Meta Task Library
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Langley , title =
P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =
2000
-
[2]
T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980
1980
-
[3]
M. J. Kearns , title =
-
[4]
Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983
1983
-
[5]
R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000
2000
-
[6]
Suppressed for Anonymity , author=
-
[7]
Newell and P
A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981
1981
-
[8]
A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959
1959
-
[9]
Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
Earthdial: Turning multi-sensory earth observations to interactive dialogues , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
-
[10]
IEEE Transactions on Geoscience and Remote Sensing , volume=
SARCLIP: The First Vision--Language Foundation Model for SAR Image , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2025 , publisher=
2025
-
[11]
IEEE Transactions on Geoscience and Remote Sensing , volume=
Remoteclip: A vision language foundation model for remote sensing , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2024 , publisher=
2024
-
[12]
Nature Machine Intelligence , volume=
A semantic-enhanced multi-modal remote sensing foundation model for Earth observation , author=. Nature Machine Intelligence , volume=. 2025 , publisher=
2025
-
[13]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[14]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
SkySense V2: A unified foundation model for multi-modal remote sensing , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
-
[15]
IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
RingMo-aerial: An aerial remote sensing foundation model with affine transformation contrastive learning , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , year=
-
[16]
IEEE Transactions on Geoscience and Remote Sensing , volume=
RingMo-SAM: A foundation model for segment anything in multimodal remote-sensing images , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2023 , publisher=
2023
-
[17]
IEEE Transactions on Geoscience and Remote Sensing , volume=
RingMo: A remote sensing foundation model with masked image modeling , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2022 , publisher=
2022
-
[18]
Earth-agent: Unlocking the full landscape of earth observation with agents,
Earth-agent: Unlocking the full landscape of earth observation with agents , author=. arXiv preprint arXiv:2509.23141 , year=
-
[19]
Advances in neural information processing systems , volume=
Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=
-
[20]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=
work page internal anchor Pith review arXiv
-
[21]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=
work page internal anchor Pith review arXiv
-
[22]
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Dapo: An open-source llm reinforcement learning system at scale , author=. arXiv preprint arXiv:2503.14476 , year=
work page internal anchor Pith review arXiv
-
[23]
ToolRL: Reward is All Tool Learning Needs
Toolrl: Reward is all tool learning needs , author=. arXiv preprint arXiv:2504.13958 , year=
work page internal anchor Pith review arXiv
-
[24]
arXiv preprint arXiv:2503.02682 , year=
Mpo: Boosting llm agents with meta plan optimization , author=. arXiv preprint arXiv:2503.02682 , year=
-
[25]
ThinkGeo: Evaluating tool-augmented agents for remote sensing tasks,
ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks , author=. arXiv preprint arXiv:2505.23752 , year=
-
[26]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Swift: a scalable lightweight infrastructure for fine-tuning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[27]
ISPRS Journal of Photogrammetry and Remote Sensing , volume=
Lhrs-bot-nova: Improved multimodal large language model for remote sensing vision-language interpretation , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2025 , publisher=
2025
-
[28]
European Conference on Computer Vision , pages=
Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model , author=. European Conference on Computer Vision , pages=. 2024 , organization=
2024
-
[29]
RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning , author=. arXiv preprint arXiv:2507.20776 , year=
-
[30]
Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding , author=. arXiv preprint arXiv:2406.10100 , year=
-
[31]
The eleventh international conference on learning representations , year=
React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=
-
[32]
Advances in neural information processing systems , volume=
Visual instruction tuning , author=. Advances in neural information processing systems , volume=
-
[33]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Minigpt-4: Enhancing vision-language understanding with advanced large language models , author=. arXiv preprint arXiv:2304.10592 , year=
work page internal anchor Pith review arXiv
-
[34]
Advances in neural information processing systems , volume=
Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
-
[35]
Findings of the Association for Computational Linguistics: ACL 2024 , pages=
Agenttuning: Enabling generalized agent abilities for llms , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=
2024
-
[36]
IEEE Transactions on Geoscience and Remote Sensing , volume=
RSVQA: Visual question answering for remote sensing data , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2020 , publisher=
2020
-
[37]
IEEE Transactions on Geoscience and Remote Sensing , volume=
AID: A benchmark data set for performance evaluation of aerial scene classification , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2017 , publisher=
2017
-
[38]
IEEE Transactions on Geoscience and Remote Sensing , volume=
Exploring models and data for remote sensing image caption generation , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2017 , publisher=
2017
-
[39]
Proceedings of the IEEE , volume=
Remote sensing image scene classification: Benchmark and state of the art , author=. Proceedings of the IEEE , volume=. 2017 , publisher=
2017
-
[40]
2024 , publisher=
Steerable Visual Intelligence , author=. 2024 , publisher=
2024
-
[41]
IGARSS 2019-2019 IEEE international geoscience and remote sensing symposium , pages=
Bigearthnet: A large-scale benchmark archive for remote sensing image understanding , author=. IGARSS 2019-2019 IEEE international geoscience and remote sensing symposium , pages=. 2019 , organization=
2019
-
[42]
Remote Sensing , VOLUME =
Bazi, Yakoub and Bashmal, Laila and Al Rahhal, Mohamad Mahmoud and Ricci, Riccardo and Melgani, Farid , TITLE =. Remote Sensing , VOLUME =. 2024 , NUMBER =
2024
-
[43]
2023 , html =
Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle =. 2023 , html =
2023
-
[44]
2023 , eprint=
AgentTuning: Enabling Generalized Agent Abilities for LLMs , author=. 2023 , eprint=
2023
-
[45]
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Swe-bench: Can language models resolve real-world github issues? , author=. arXiv preprint arXiv:2310.06770 , year=
work page internal anchor Pith review arXiv
-
[46]
Forty-first International Conference on Machine Learning , year=
An llm compiler for parallel function calling , author=. Forty-first International Conference on Machine Learning , year=
-
[47]
AnyTool: Self-reflective, hierarchical agents for large-scale API calls,
Anytool: Self-reflective, hierarchical agents for large-scale api calls , author=. arXiv preprint arXiv:2402.04253 , year=
-
[48]
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
Octotools: An agentic framework with extensible tools for complex reasoning , author=. arXiv preprint arXiv:2502.11271 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[49]
Advances in Neural Information Processing Systems , volume=
Chameleon: Plug-and-play compositional reasoning with large language models , author=. Advances in Neural Information Processing Systems , volume=
-
[50]
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Toolllm: Facilitating large language models to master 16000+ real-world apis , author=. arXiv preprint arXiv:2307.16789 , year=
work page internal anchor Pith review arXiv
-
[51]
arXiv preprint arXiv:2509.05933 , year=
MapAgent: A Hierarchical Agent for Geospatial Reasoning with Dynamic Map Tool Integration , author=. arXiv preprint arXiv:2509.05933 , year=
-
[52]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Skyscript: A large and semantically diverse vision-language dataset for remote sensing , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[53]
ISPRS Journal of Photogrammetry and Remote Sensing , volume=
Skyeyegpt: Unifying remote sensing vision-language tasks via instruction tuning with large language model , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2025 , publisher=
2025
-
[54]
WebArena: A Realistic Web Environment for Building Autonomous Agents
Webarena: A realistic web environment for building autonomous agents , author=. arXiv preprint arXiv:2307.13854 , year=
work page internal anchor Pith review arXiv
-
[55]
Visualwebarena: Evaluating multimodal agents on realistic visual web tasks , author=. arXiv preprint arXiv:2401.13649 , year=
-
[56]
2, 2022-06-27 , author=
A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27 , author=. Open Review , volume=
2022
-
[57]
The right to be forgotten in federated learning: An efficient realization with rapid retraining
Tofu: A task of fictitious unlearning for llms , author=. arXiv preprint arXiv:2401.06121 , year=
-
[58]
IGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium , pages=
Remote sensing chatgpt: Solving remote sensing tasks with chatgpt and visual models , author=. IGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium , pages=. 2024 , organization=
2024
-
[59]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Geochat: Grounded large vision-language model for remote sensing , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[60]
arXiv preprint arXiv:2312.06960 , year=
Remote sensing vision-language foundation models without annotations via ground remote alignment , author=. arXiv preprint arXiv:2312.06960 , year=
-
[61]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Charting new territories: Exploring the geographic and geospatial capabilities of multimodal llms , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[62]
Evaluating tool-augmented agents in remote sensing platforms,
Evaluating tool-augmented agents in remote sensing platforms , author=. arXiv preprint arXiv:2405.00709 , year=
-
[63]
Earth System Science Data Discussions , volume=
Chatearthnet: A global-scale image-text dataset empowering vision-language geo-foundation models , author=. Earth System Science Data Discussions , volume=. 2024 , publisher=
2024
-
[64]
IEEE Transactions on Geoscience and Remote Sensing , volume=
EarthGPT: A universal multimodal large language model for multisensor image comprehension in remote sensing domain , author=. IEEE Transactions on Geoscience and Remote Sensing , volume=. 2024 , publisher=
2024
-
[65]
IEEE Transactions on Geoscience and Remote Sensing , year=
Rs5m and georsclip: A large scale vision-language dataset and a large vision-language model for remote sensing , author=. IEEE Transactions on Geoscience and Remote Sensing , year=
-
[66]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Geollm-engine: A realistic environment for building geospatial copilots , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[67]
International conference on machine learning , pages=
Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=
2021
-
[68]
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Eva-clip: Improved training techniques for clip at scale , author=. arXiv preprint arXiv:2303.15389 , year=
work page internal anchor Pith review arXiv
-
[69]
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , volume=
A general transitive transfer learning framework for cross-optical sensor remote sensing image scene understanding , author=. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , volume=. 2023 , publisher=
2023
-
[70]
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , volume=
Visual question generation from remote sensing images , author=. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , volume=. 2023 , publisher=
2023
-
[71]
ISPRS Journal of Photogrammetry and Remote Sensing , volume=
Rsgpt: A remote sensing vision language model and benchmark , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2025 , publisher=
2025
-
[72]
ISPRS Journal of Photogrammetry and Remote Sensing , volume=
PBNet: Part-based convolutional neural network for complex composite object detection in remote sensing imagery , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2021 , publisher=
2021
-
[73]
ISPRS Journal of Photogrammetry and Remote Sensing , volume=
A deep translation (GAN) based change detection network for optical and SAR remote sensing images , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2021 , publisher=
2021
-
[74]
ISPRS Journal of Photogrammetry and Remote Sensing , volume=
UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2022 , publisher=
2022
-
[75]
ISPRS Journal of Photogrammetry and Remote Sensing , volume=
Land-use/land-cover change detection based on a Siamese global learning framework for high spatial resolution remote sensing imagery , author=. ISPRS Journal of Photogrammetry and Remote Sensing , volume=. 2022 , publisher=
2022
-
[76]
Science China Information Sciences , volume=
The rise and potential of large language model based agents: A survey , author=. Science China Information Sciences , volume=. 2025 , publisher=
2025
-
[77]
A generalist agent , author=. arXiv preprint arXiv:2205.06175 , year=
work page internal anchor Pith review arXiv
-
[78]
Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
-
[79]
Frontiers of Computer Science , volume=
A survey on large language model based autonomous agents , author=. Frontiers of Computer Science , volume=. 2024 , publisher=
2024
-
[80]
Tptu: Task planning and tool usage of large language model-based ai agents
TPTU: large language model-based AI agents for task planning and tool usage , author=. arXiv preprint arXiv:2308.03427 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.