pith. sign in
Pith Number

pith:OU5Q6JBC

pith:2026:OU5Q6JBCNNHHG6LMFKX3LAG45J
not attested not anchored not stored refs resolved

ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence

Baoqin Sun, Haiyang Shen, Peilun Jia, Sixiong Xie, Xiang Jing, Yun Ma, Zhuofan Shi

Treating source figures as verifiable evidence objects improves the quality and verifiability of multimodal deep research reports.

arxiv:2605.13034 v1 · 2026-05-13 · cs.CV · cs.IR

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OU5Q6JBCNNHHG6LMFKX3LAG45J}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments show that ViDR improves overall report quality, source-figure integration, and verifiability over strong commercial and open-source baselines.

C2weakest assumption

That context-aware filtering, outline-aware reranking, and VLM-based visual analysis can reliably turn noisy web images into accurate, non-hallucinated evidence atoms without introducing new errors that affect report claims.

C3one line summary

ViDR treats source figures as retrievable and verifiable evidence objects in multimodal deep research reports and introduces MMR Bench+ to measure improvements in visual integration and verifiability.

References

36 extracted · 36 resolved · 7 Pith anchors

[1] Try deep research and our new experimental model in gemini, your ai assistant 2024
[2] Paladugu, Pranav Setlur, Jiahe Jin, James P 2025
[3] DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents 2025 · arXiv:2506.11763
[4] gpt-researcher 2025
[5] WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent 2025 · arXiv:2508.05748

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-18T03:08:59.668216Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

753b0f24226b4e73796c2aafb580dcea5ec446ce6a16d81fdd4bc30892f79c64

Aliases

arxiv: 2605.13034 · arxiv_version: 2605.13034v1 · doi: 10.48550/arxiv.2605.13034 · pith_short_12: OU5Q6JBCNNHH · pith_short_16: OU5Q6JBCNNHHG6LM · pith_short_8: OU5Q6JBC
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OU5Q6JBCNNHHG6LMFKX3LAG45J \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 753b0f24226b4e73796c2aafb580dcea5ec446ce6a16d81fdd4bc30892f79c64
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5c2192600db12b1070f8f5e9612cf07396e4e555763c9704013b3dd30493def5",
    "cross_cats_sorted": [
      "cs.IR"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-13T05:39:38Z",
    "title_canon_sha256": "9c917cbe3ab626a3c6082f975b679434e6e71edd7eca2c6dd2e85f1627553f49"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13034",
    "kind": "arxiv",
    "version": 1
  }
}