Pith Number

pith:YJE53WYM

pith:2026:YJE53WYMWR4XMLM4RBPF7WZSPW

not attested not anchored not stored refs resolved

VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority

(2) Nanyang Technological University), Chenhao Qiu (1), Shien Song (1), Xin Luo (1), Xusheng Liu (1) ((1) Mango TV, Yechao Zhang (2)

Separating planning from answer authority in video agents reduces evidence misalignment.

arxiv:2605.12571 v1 · 2026-05-12 · cs.CV · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{YJE53WYMWR4XMLM4RBPF7WZSPW}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

the decoupled planner-inspector framework, which separates planning from answer authority and gates final answering on pixel-level verification... improves both answer accuracy and evidence alignment, achieving 55.1% on LVBench and 62.0% on LongVideoBench

C2weakest assumption

that gating final answers on pixel-level verification will reliably eliminate evidence misalignment without introducing new failure modes in long-horizon search

C3one line summary

Decoupling planning from answer authority in long-video agents reduces evidence misalignment and raises accuracy to 55.1% on LVBench and 62.0% on LongVideoBench.

References

14 extracted · 14 resolved · 4 Pith anchors

[1] VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs 2024 · doi:10.48550/arxiv.2

[2] arXiv preprint arXiv:2509.24304 (2025) 9 2017 · doi:10.48550/arxiv.2509.24304

[4] InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency 2024 · doi:10.1109/cv

[5] URLhttps://doi.org/10.48550/arXiv 2025 · doi:10.48550/arxiv

[6] ReAct: Synergizing Reasoning and Acting in Language Models 2025 · doi:10.18653/v1/2024.emnlp-

Receipt and verification

First computed	2026-05-18T03:10:01.734727Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

c249dddb0cb479762d9c885e5fdb327d82229865bf0fc961c4d5421e138d1c55

Aliases

arxiv: 2605.12571 · arxiv_version: 2605.12571v1 · doi: 10.48550/arxiv.2605.12571 · pith_short_12: YJE53WYMWR4X · pith_short_16: YJE53WYMWR4XMLM4 · pith_short_8: YJE53WYM

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/YJE53WYMWR4XMLM4RBPF7WZSPW \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c249dddb0cb479762d9c885e5fdb327d82229865bf0fc961c4d5421e138d1c55

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "f36b82b034614d57dcad0b14f24af3bb8d8a699ef7a06125bd99148e245f9ae6",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-12T10:37:49Z",
    "title_canon_sha256": "4ae8fecac9b83544b61d4148257a242fc54d03d2217e43451520b5542fce0f5e"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12571",
    "kind": "arxiv",
    "version": 1
  }
}