pith. sign in
Pith Number

pith:IB4G5DUO

pith:2026:IB4G5DUOJOVCBW3T6TKNGHGLQM
not attested not anchored not stored refs resolved

Sound Sparks Motion: Audio and Text Tuning for Video Editing

Ali Mahdavi-Amiri, AmirHossein Naghi Razlighi, Aryan Mikaeili, Daniel Cohen-Or, Yiorgos Chrysanthou

Tuning an audio latent and text residual at test time lets video generation models realize specific motions that text prompts alone cannot produce.

arxiv:2605.15307 v1 · 2026-05-14 · cs.GR · cs.CV · cs.MM · cs.SD

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{IB4G5DUOJOVCBW3T6TKNGHGLQM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

this combination can encourage motion edits that the underlying model often struggles to realize under prompt-only control

C2weakest assumption

the vision-language model provides reliable feedback indicating whether the intended motion appears in the generated video, allowing effective guidance of the tuning process despite no direct temporal alignment metric

C3one line summary

Sound Sparks Motion is a test-time tuning approach that adjusts audio and text conditioning signals in multimodal video models using VLM feedback to produce specific motion edits while preserving content.

References

12 extracted · 12 resolved · 4 Pith anchors

[1] Chenjian Gao, Lihe Ding, Xin Cai, Zhanpeng Huang, Zibin Wang, and Tianfan Xue 2025
[2] InThe Fourteenth International Conference on Learning Representations
[3] Gemma 3 Technical Report 2024 · arXiv:2503.19786
[4] ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control 2026 · doi:10.48550/arxiv.2503.10592
[5] arXiv preprint arXiv:2602.08068 (2026) 2026 · doi:10.48550/arxiv.2602.08068

Formal links

1 machine-checked theorem link

Receipt and verification
First computed 2026-05-20T00:00:51.817862Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

40786e8e8e4baa20db73f4d4d31ccb8329c37fcfea3c2113b5102a7d904a243c

Aliases

arxiv: 2605.15307 · arxiv_version: 2605.15307v1 · doi: 10.48550/arxiv.2605.15307 · pith_short_12: IB4G5DUOJOVC · pith_short_16: IB4G5DUOJOVCBW3T · pith_short_8: IB4G5DUO
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/IB4G5DUOJOVCBW3T6TKNGHGLQM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 40786e8e8e4baa20db73f4d4d31ccb8329c37fcfea3c2113b5102a7d904a243c
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "16d0ca5f3239f79d22ae7a149a0d28cbf4c1751c20f2fa08f5e1f94be2aae262",
    "cross_cats_sorted": [
      "cs.CV",
      "cs.MM",
      "cs.SD"
    ],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.GR",
    "submitted_at": "2026-05-14T18:20:50Z",
    "title_canon_sha256": "8304416d8dda5c658ff26a12dba906436e2ab0455965da9f8a0b7df3c464e15a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15307",
    "kind": "arxiv",
    "version": 1
  }
}