pith. machine review for the scientific record. sign in
Pith Number

pith:FGMIIDEU

pith:2026:FGMIIDEUJ2UNHPZFZEFFYXBVTJ
not attested not anchored not stored refs resolved

GeoVista: Visually Grounded Active Perception for Ultra-High-Resolution Remote Sensing Understanding

Bo Yang, Haoran Liu, Jiasen Hu, Jiashun Zhu, Lang Sun, Nachuan Xing, Ronghao Fu, Weijie Zhang, Weipeng Zhang, Xiao Yang, Xu Na, Zhiheng Xue, Zhiwen Lin

GeoVista builds a global exploration plan then performs branch-wise inspections while tracking evidence to interpret ultra-high-resolution remote sensing images.

arxiv:2605.14475 v1 · 2026-05-14 · cs.CV

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open

Claims

C1strongest claim

Experiments on RSHR-Bench, XLRS-Bench, and LRS-VQA show that GeoVista achieves state-of-the-art performance.

C2weakest assumption

The assumption that building a global exploration plan followed by branch-wise local inspection with explicit evidence state maintenance will reliably handle sparse tiny evidence across large scenes without losing context or causing duplication, which depends on the effectiveness of the APEX-GRO trajectory corpus and GRPO alignment.

C3one line summary

GeoVista introduces a planning-driven active perception framework with global exploration plans, branch-wise local inspection, and explicit evidence tracking to achieve state-of-the-art results on ultra-high-resolution remote sensing benchmarks.

References

94 extracted · 94 resolved · 15 Pith anchors

[1] Towards large-scale small object detection: Survey and benchmarks.IEEE transactions on pattern analysis and machine intelligence, 45(11):13467–13488, 2023 2023
[2] Star: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery.IEEE Trans 2025
[3] When large vision-language model meets large remote sensing imagery: Coarse- to-fine text-guided token pruning.ArXiv, abs/2503.07588, 2025 2025
[4] Geoeyes: On-demand visual focusing for evidence-grounded understanding of ultra-high-resolution remote sensing imagery 2026
[5] GeoLLaVA-8K: Scaling remote-sensing multimodal large language models to 8K resolution 2025

Formal links

2 machine-checked theorem links

Receipt and verification
First computed2026-05-17T23:39:06.613503Z
Builderpith-number-builder-2026-05-17-v1
SignaturePith Ed25519 (pith-v1-2026-05) · public key
Schemapith-number/v1.0

Canonical hash

2998840c944ea8d3bf25c90a5c5c359a4e35b4ba42001f216cd9a39f9611bedd

Aliases

arxiv: 2605.14475 · arxiv_version: 2605.14475v1 · doi: 10.48550/arxiv.2605.14475
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FGMIIDEUJ2UNHPZFZEFFYXBVTJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2998840c944ea8d3bf25c90a5c5c359a4e35b4ba42001f216cd9a39f9611bedd
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f424f81b6896725b2cd07e3f3c675dc0ac60ae47379e05b40bb3e201ed41c12b",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-14T07:15:46Z",
    "title_canon_sha256": "5f29947e84643c048af1daf5e431c2efd97953599b092800f79034dc76140eb5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14475",
    "kind": "arxiv",
    "version": 1
  }
}