pith. machine review for the scientific record. sign in
Pith Number

pith:LWVTT7QD

pith:2025:LWVTT7QDM7U2MZKY67CFNOK2OO
not attested not anchored not stored refs resolved

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Guangxuan Xiao, Kelly Peng, Liuning He, Ruyi Xu, Song Han, Yao Lu, Yukang Chen

A vision-language model achieves stable real-time understanding of arbitrarily long video streams through a streaming attention cache aligned with training on short clips.

arxiv:2510.09608 v1 · 2025-10-10 · cs.CV · cs.AI · cs.CL

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

On Inf-Streams-Eval, StreamingVLM achieves a 66.18% win rate against GPT-4O mini and maintains stable, real-time performance at up to 8 FPS on a single NVIDIA H100.

C2weakest assumption

That supervised fine-tuning with full attention on short overlapped video chunks will produce stable coherence and performance when the same model is later run with the streaming KV cache on arbitrarily long, non-overlapped video streams.

C3one line summary

StreamingVLM enables stable real-time understanding of infinite video streams at up to 8 FPS using a streaming KV cache and aligned SFT on overlapped chunks, with a 66.18% win rate over GPT-4O mini on a new two-hour video benchmark.

References

12 extracted · 12 resolved · 8 Pith anchors

[1] Qwen2.5-VL Technical Report · arXiv:2502.13923
[2] VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs · arXiv:2406.07476
[3] arXiv preprint arXiv:2503.00540 , year=
[4] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens · arXiv:2402.13753
[5] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis · arXiv:2405.21075

Formal links

2 machine-checked theorem links

Cited by

17 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.195787Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

5dab39fe0367e9a66558f7c456b95a738cc15ff70419b293c2e5ec8f7245c54c

Aliases

arxiv: 2510.09608 · arxiv_version: 2510.09608v1 · doi: 10.48550/arxiv.2510.09608 · pith_short_12: LWVTT7QDM7U2 · pith_short_16: LWVTT7QDM7U2MZKY · pith_short_8: LWVTT7QD
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LWVTT7QDM7U2MZKY67CFNOK2OO \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5dab39fe0367e9a66558f7c456b95a738cc15ff70419b293c2e5ec8f7245c54c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "87a6d0d45f33b663733e1d3ccab4840f56fea1a808eabf19e20b21ee3d318aa3",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-10-10T17:59:58Z",
    "title_canon_sha256": "b46c56aa5c0f7276e4ddc1686d851d012fd82e8fe2abf4e0a3fb109d49914448"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.09608",
    "kind": "arxiv",
    "version": 1
  }
}