pith:OV4SSAHP
Improving Automatic Speech Recognition for Speakers Treated for Oral Cancer using Data Augmentation and LLM Error Correction
Combining data augmentation and LLM error correction cuts word error rates by 40-50% for oral cancer speech recognition.
arxiv:2605.15854 v1 · 2026-05-15 · eess.AS
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OV4SSAHPIPZCVVMNDPQR7ZCQB4}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Overall, we achieve a 40% relative WER decrease for Whisper and a 50% relative WER decrease for MMS, indicating that a combination of data augmentation and LLM correction is a viable strategy for the recognition of OC speech.
The synthetic data produced by the augmentation techniques sufficiently captures the acoustic variability of real oral cancer speech and the LLM corrections do not systematically alter medically relevant content.
TTS data augmentation and LLM error correction together cut relative WER by 40-50% on ASR models for oral cancer speech.
References
Formal links
Receipt and verification
| First computed | 2026-05-20T00:01:22.053845Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
75792900ef43f22ad58d1be11fe4500f304138f1b50c94237e0dcbc974d9c60d
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OV4SSAHPIPZCVVMNDPQR7ZCQB4 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 75792900ef43f22ad58d1be11fe4500f304138f1b50c94237e0dcbc974d9c60d
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "0d49769322e83b9d2455fc223f7be6c52ad783ee4640cdf2c10f840f962254a2",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "eess.AS",
"submitted_at": "2026-05-15T11:13:25Z",
"title_canon_sha256": "97dbdc4f0101ffe7ae96c8cc0b137102f5762d08b8a8445ad45215b0e78ff8d6"
},
"schema_version": "1.0",
"source": {
"id": "2605.15854",
"kind": "arxiv",
"version": 1
}
}