pith:EXNEXHS3
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
LongVU adaptively compresses long videos by removing redundant frames and tokens to fit hour-long clips into limited LLM context.
arxiv:2410.17434 v1 · 2024-10-22 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{EXNEXHS34HBJHXDIZFWV4V4ATB}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our LongVU consistently surpass existing methods across a variety of video understanding benchmarks, especially on hour-long video understanding tasks such as VideoMME and MLVU. Given a light-weight LLM, our LongVU also scales effectively into a smaller size with state-of-the-art video understanding performance.
The assumption that DINOv2 similarity reliably identifies redundant frames without discarding task-relevant visual information and that text-guided cross-modal queries plus temporal dependency reduction preserve all necessary details for downstream understanding.
LongVU adaptively compresses long video tokens using DINOv2-based frame deduplication, text-guided cross-modal selection, and temporal spatial reduction to improve video-language understanding in MLLMs with minimal detail loss.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:47.688103Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
25da4b9e5be1c293dc68c96d5e5780985e68ac4cf4cd275df3a443c98744cefc
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/EXNEXHS34HBJHXDIZFWV4V4ATB \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 25da4b9e5be1c293dc68c96d5e5780985e68ac4cf4cd275df3a443c98744cefc
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "2c655dcf5b26292ac4b16b56aefe6dbd68a6c412c51af52fcb02b16e3e68c63d",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-10-22T21:21:37Z",
"title_canon_sha256": "1b6ac9fd9476f5260c2a24fde0b0a0761b95e10915c781dc05fadc0f7ab7e229"
},
"schema_version": "1.0",
"source": {
"id": "2410.17434",
"kind": "arxiv",
"version": 1
}
}