TerraBench is a new benchmark with 403 tasks across Earth-science domains that evaluates LLM agents on coordinating heterogeneous data using executable ReAct-style workflows and process-level metrics.
Hvr-met: A hypothesis-verification-replaning agentic system for extreme weather diagnosis.arXiv preprint arXiv:2603.01121, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TerraBench: Can Agents Reason Over Heterogeneous Earth-System Data?
TerraBench is a new benchmark with 403 tasks across Earth-science domains that evaluates LLM agents on coordinating heterogeneous data using executable ReAct-style workflows and process-level metrics.