pith:IB4G5DUO
Sound Sparks Motion: Audio and Text Tuning for Video Editing
Tuning an audio latent and text residual at test time lets video generation models realize specific motions that text prompts alone cannot produce.
arxiv:2605.15307 v1 · 2026-05-14 · cs.GR · cs.CV · cs.MM · cs.SD
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IB4G5DUOJOVCBW3T6TKNGHGLQM}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
this combination can encourage motion edits that the underlying model often struggles to realize under prompt-only control
the vision-language model provides reliable feedback indicating whether the intended motion appears in the generated video, allowing effective guidance of the tuning process despite no direct temporal alignment metric
Sound Sparks Motion is a test-time tuning approach that adjusts audio and text conditioning signals in multimodal video models using VLM feedback to produce specific motion edits while preserving content.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:00:51.817862Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
40786e8e8e4baa20db73f4d4d31ccb8329c37fcfea3c2113b5102a7d904a243c
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IB4G5DUOJOVCBW3T6TKNGHGLQM \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 40786e8e8e4baa20db73f4d4d31ccb8329c37fcfea3c2113b5102a7d904a243c
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "16d0ca5f3239f79d22ae7a149a0d28cbf4c1751c20f2fa08f5e1f94be2aae262",
"cross_cats_sorted": [
"cs.CV",
"cs.MM",
"cs.SD"
],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.GR",
"submitted_at": "2026-05-14T18:20:50Z",
"title_canon_sha256": "8304416d8dda5c658ff26a12dba906436e2ab0455965da9f8a0b7df3c464e15a"
},
"schema_version": "1.0",
"source": {
"id": "2605.15307",
"kind": "arxiv",
"version": 1
}
}