Pith Number

pith:LWVTT7QD

pith:2025:LWVTT7QDM7U2MZKY67CFNOK2OO

not attested not anchored not stored refs resolved

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Guangxuan Xiao, Kelly Peng, Liuning He, Ruyi Xu, Song Han, Yao Lu, Yukang Chen

A vision-language model achieves stable real-time understanding of arbitrarily long video streams through a streaming attention cache aligned with training on short clips.

arxiv:2510.09608 v1 · 2025-10-10 · cs.CV · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{LWVTT7QDM7U2MZKY67CFNOK2OO}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

On Inf-Streams-Eval, StreamingVLM achieves a 66.18% win rate against GPT-4O mini and maintains stable, real-time performance at up to 8 FPS on a single NVIDIA H100.

C2weakest assumption

That supervised fine-tuning with full attention on short overlapped video chunks will produce stable coherence and performance when the same model is later run with the streaming KV cache on arbitrarily long, non-overlapped video streams.

C3one line summary

StreamingVLM enables stable real-time understanding of infinite video streams at up to 8 FPS using a streaming KV cache and aligned SFT on overlapped chunks, with a 66.18% win rate over GPT-4O mini on a new two-hour video benchmark.

References

12 extracted · 12 resolved · 8 Pith anchors

[1] Qwen2.5-VL Technical Report · arXiv:2502.13923

[2] VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs · arXiv:2406.07476

[3] arXiv preprint arXiv:2503.00540 , year=

[4] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens · arXiv:2402.13753

[5] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis · arXiv:2405.21075

Formal links

2 machine-checked theorem links

Cited by

31 papers in Pith

EgoSAT: A Comprehensive Benchmark of Egocentric Streaming Interaction Understanding

Kamera: Unified Position-Invariant Multimodal KV Cache for Training-Free Reuse

Streaming Interventions: Can Video Large Language Models Correct Mistakes as They Occur?

Harnessing Streaming Video in the Wild

OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs

Receipt and verification

First computed	2026-05-17T23:38:14.195787Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

5dab39fe0367e9a66558f7c456b95a738cc15ff70419b293c2e5ec8f7245c54c

Aliases

arxiv: 2510.09608 · arxiv_version: 2510.09608v1 · doi: 10.48550/arxiv.2510.09608 · pith_short_12: LWVTT7QDM7U2 · pith_short_16: LWVTT7QDM7U2MZKY · pith_short_8: LWVTT7QD

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/LWVTT7QDM7U2MZKY67CFNOK2OO \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 5dab39fe0367e9a66558f7c456b95a738cc15ff70419b293c2e5ec8f7245c54c

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "87a6d0d45f33b663733e1d3ccab4840f56fea1a808eabf19e20b21ee3d318aa3",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-10-10T17:59:58Z",
    "title_canon_sha256": "b46c56aa5c0f7276e4ddc1686d851d012fd82e8fe2abf4e0a3fb109d49914448"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.09608",
    "kind": "arxiv",
    "version": 1
  }
}