pith. sign in
Pith Number

pith:JB5TOC7Q

pith:2026:JB5TOC7QDDQ53QAVZVMBXETMM7
not attested not anchored not stored refs resolved

TRACE: Evidence Grounding-Guided Multi-Video Event Understanding and Claim Generation

Abdul Wasi, Akhil Gorugantu, David Doermann, Mahesh Bhosale, Pengyu Yan, Vishvesh Trivedi

TRACE grounds evidence in text-searchable timelines before visual reasoning for multi-video events.

arxiv:2605.16740 v1 · 2026-05-16 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{JB5TOC7QDDQ53QAVZVMBXETMM7}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

TRACE raises macro-average MiRAGE F1 from 0.705 to 0.811 compared to an unguided Qwen3-VL-30B baseline, with especially strong improvements in citation recall from 0.440 to 0.628 on the MAGMaR validation split.

C2weakest assumption

The method assumes that OCR and object detection produce sufficiently accurate and complete structured timelines, and that a text-only LLM can reliably select query-relevant moments without missing critical visual cues not captured in text (abstract, method description paragraph).

C3one line summary

TRACE builds structured text timelines from videos via OCR and detection, then applies text-only LLM evidence localization before LVLM claim generation, raising MiRAGE F1 from 0.705 to 0.811 on MAGMaR.

References

14 extracted · 14 resolved · 8 Pith anchors

[1] PP-OCR: A practical ultra lightweight OCR system.CoRR, abs/2009.09941 2009
[2] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis · arXiv:2405.21075
[3] Verify exact arXiv ID and au- thor list on Scholar
[4] VideoChat: Chat-Centric Video Understanding · arXiv:2305.06355
[5] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection · arXiv:2303.05499
Receipt and verification
First computed 2026-05-20T00:02:39.251465Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

487b370bf018e1ddc015cd581b926c67e63e2ed656d596b8bef951a76f70e7c4

Aliases

arxiv: 2605.16740 · arxiv_version: 2605.16740v1 · doi: 10.48550/arxiv.2605.16740 · pith_short_12: JB5TOC7QDDQ5 · pith_short_16: JB5TOC7QDDQ53QAV · pith_short_8: JB5TOC7Q
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/JB5TOC7QDDQ53QAVZVMBXETMM7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 487b370bf018e1ddc015cd581b926c67e63e2ed656d596b8bef951a76f70e7c4
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9a910ccaaf755069dddeda69068f9352cbc58bd13c4b3a9a96e7083cc93d7a23",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-16T01:37:10Z",
    "title_canon_sha256": "30973130461cc3c7190208868ff913d5e109498b56748e2eb099c2e3e66a2b38"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16740",
    "kind": "arxiv",
    "version": 1
  }
}