pith. sign in
Pith Number

pith:SOH5VECQ

pith:2026:SOH5VECQLZ76DNVTH6JJPW3OZ6
not attested not anchored not stored refs resolved

Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization

Alexander Polok, Ivan Medennikov, Jan \v{C}ernock\'y, Luk\'a\v{s} Burget, Samuele Cornell, Shinji Watanabe

Synthetic conversational data approaches real-data baselines and mixing both yields substantial gains for multi-talker ASR and speaker diarization.

arxiv:2605.15442 v1 · 2026-05-14 · eess.AS

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{SOH5VECQLZ76DNVTH6JJPW3OZ6}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

synthetic-only training approaches real-data baselines, and combining simulated data with real recordings yields substantial gains over real-only training across both tasks.

C2weakest assumption

The specific simulation choices and acoustic augmentations in FastMSS produce mixtures whose statistical properties are close enough to real conversational recordings that performance trends observed on synthetic data will transfer to real-world use.

C3one line summary

Task-dependent simulation strategies for synthetic conversational data allow synthetic-only training to approach real-data baselines for multi-talker ASR and diarization, with mixing yielding further gains.

References

72 extracted · 72 resolved · 3 Pith anchors

[1] Introduction Multi-talker conversational speech processing is undergoing a rapid transformation, driven largely by the shift from highly specialized pipelines to less data-hungry methods built on pret
[2] Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization 2026 · arXiv:2605.15442
[3] Multi-Speaker Conversation Simulation To enable controlled and fast experimentation along the axes described above, we developed FastMSS, an open-source multi- speaker conversation simulator focused o
[4] Experimental Setup 4.1. Datasets As source domains for synthetic generation, we use: Lib- riSpeech [49] (read speech, 960h), V oxPopuli [50] (semi- spontaneous parliamentary speech, 543h), otoSpeech [
[5] All datasets were re-aligned using the Montreal Forced Aligner [52] to ensure consistent word-level timestamps

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-20T00:00:58.798834Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

938fda90505e7fe1b6b33f9297db6ecfb4f833dcfcd2f4ee65238276b5fdcec2

Aliases

arxiv: 2605.15442 · arxiv_version: 2605.15442v1 · doi: 10.48550/arxiv.2605.15442 · pith_short_12: SOH5VECQLZ76 · pith_short_16: SOH5VECQLZ76DNVT · pith_short_8: SOH5VECQ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SOH5VECQLZ76DNVTH6JJPW3OZ6 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 938fda90505e7fe1b6b33f9297db6ecfb4f833dcfcd2f4ee65238276b5fdcec2
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c7db74dcf0436e3d25462141da0e6e42e0e66a74898fd36161926e839a4da411",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "eess.AS",
    "submitted_at": "2026-05-14T21:53:10Z",
    "title_canon_sha256": "e32003fb5efc1f3058cacc7ca202331703c5124bc925962b434adeddbd4d0377"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.15442",
    "kind": "arxiv",
    "version": 1
  }
}