pith:BUZ7W4MF
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
A parameter-free temporal pooling strategy lets image-language models extend directly to video dense captioning and question answering without added parameters or heavy retraining.
arxiv:2404.16994 v2 · 2024-04-25 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{BUZ7W4MF2MBT2C65V4HB3BKU6K}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
PLLaVA achieves 3.48/5 on VideoChatGPT (9% above GPT-4V IG-VLM) and 58.1% on MVBench (14.5% above GPT-4V IG-VLM) by applying a parameter-free temporal pooling strategy that mitigates high-norm feature bias.
That the performance drop when feeding multiple frames directly is caused primarily by high-norm visual feature bias rather than by other factors such as temporal modeling capacity or training data mismatch.
A temporal pooling layer added to LLaVA smooths video feature distributions and lifts performance on dense video captioning and QA to new SOTA levels without extra parameters.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:50.286434Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
0d33fb7185d3033d0bddaf0e1d8554f28a4ebede14e0f23f82c674abe7cb32e0
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/BUZ7W4MF2MBT2C65V4HB3BKU6K \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 0d33fb7185d3033d0bddaf0e1d8554f28a4ebede14e0f23f82c674abe7cb32e0
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "00e0c64dde30021d2f834453a2d09e667f67fe7f2f1ae48439d07281cc292fe5",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by-nc-nd/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2024-04-25T19:29:55Z",
"title_canon_sha256": "f6dc7cafb3ca23a25cca7272ff45a3eee92caa94cdd7e103c0d0d8552bddf719"
},
"schema_version": "1.0",
"source": {
"id": "2404.16994",
"kind": "arxiv",
"version": 2
}
}