pith:KCXG5LP5
CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR
CALM integrates speaker embeddings for target extraction with dynamic vocabulary biasing to halve biased error rates in overlapping multi-speaker ASR.
arxiv:2601.22792 v2 · 2026-01-30 · eess.AS · cs.CL · cs.SD
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{KCXG5LP5BUZSBAOZEMVTN3Z5PU}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
On two-speaker mixtures, CALM reduces biased word error rate (B-WER) from 12.7 to 4.7 on LibriSpeech2Mix and biased character error rate (B-CER) from 16.6 to 8.4 on CSJMix2 (eval3), demonstrating the effectiveness of joint acoustic-linguistic modeling across languages.
That the simulated two-speaker mixtures (LibriSpeechMix, CSJMix) and the IHM-mix condition of AMI sufficiently represent the acoustic and linguistic statistics of real overlapping conversations where speaker turns, noise, and context vary more widely.
CALM jointly models acoustic speaker identity and linguistic context to cut biased error rates by more than half on two-speaker English and Japanese mixtures.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-18T02:45:05.698741Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
50ae6eadfd0d332081d9232b36ef3d7d15a57379dc8f8b4303a5fe05bf752bb9
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/KCXG5LP5BUZSBAOZEMVTN3Z5PU \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 50ae6eadfd0d332081d9232b36ef3d7d15a57379dc8f8b4303a5fe05bf752bb9
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "cd2790a0db990c662e3544ea8ad77ba01427a8f6d561f01a89a1d7d1454cfb20",
"cross_cats_sorted": [
"cs.CL",
"cs.SD"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "eess.AS",
"submitted_at": "2026-01-30T10:12:16Z",
"title_canon_sha256": "539e452adb9e8cad4111a9bbe956f431d5145aca67b5f57c3d0557c80e80ca06"
},
"schema_version": "1.0",
"source": {
"id": "2601.22792",
"kind": "arxiv",
"version": 2
}
}