pith:4KXQSEKC
SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models
SceneGraphVLM generates complete scene graphs from videos in about one second using compact vision-language models and targeted reinforcement learning.
arxiv:2605.13667 v1 · 2026-05-13 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{4KXQSEKCACNIJPCAS2CCXQUMFW}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
With compact VLMs and vLLM-accelerated decoding, SceneGraphVLM achieves a strong quality-speed trade-off, improves precision-oriented SGG metrics while preserving reasonable recall, and generates complete scene graphs with approximately one-second latency.
That the hallucination-aware RL rewards successfully balance coverage and precision on the target benchmarks without introducing new failure modes or requiring dataset-specific tuning that does not generalize.
SceneGraphVLM generates dynamic scene graphs from video using compact VLMs, TOON serialization, and hallucination-aware RL to improve precision and achieve one-second latency.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T02:44:17.232036Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
e2af091142009a84bc4096842bc28c2db62a356946ac6d3cd57d974549e11f2f
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/4KXQSEKCACNIJPCAS2CCXQUMFW \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: e2af091142009a84bc4096842bc28c2db62a356946ac6d3cd57d974549e11f2f
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "d07ffdb47a3c956d7a68e5aaa76b132c5c7815a7172b31fcc35853cc1d19a574",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-13T15:27:41Z",
"title_canon_sha256": "f04ec2be006a9833106289e9936921d6238aa3883a63199fe5bc8268919a8873"
},
"schema_version": "1.0",
"source": {
"id": "2605.13667",
"kind": "arxiv",
"version": 1
}
}