The zero resource speech challenge 2021: Spoken language modelling

URL: https://doi · 2021 · DOI 10.21437/interspeech.2021-1755

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Moshi: a speech-text foundation model for real-time dialogue

eess.AS · 2024-09-17 · accept · novelty 7.0

Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

cs.CL · 2025-12-24 · unverdicted · novelty 6.0

SpidR-Adapt uses meta-learning with a first-order bi-level optimization heuristic to adapt speech representations to new languages with less than 1 hour of data, achieving 100x better efficiency than standard training.

PashtoTTS-Bench: automated screening for low-resource non-Latin-script text-to-speech

cs.CL · 2026-05-26 · unverdicted · novelty 5.0

Introduces INSV-A automated screening benchmark for Pashto TTS systems reporting WER, script fidelity, and LID results across five systems on FLEURS and Common Voice prompts.

citing papers explorer

Showing 3 of 3 citing papers.

Moshi: a speech-text foundation model for real-time dialogue eess.AS · 2024-09-17 · accept · none · ref 26
Moshi is the first real-time full-duplex spoken large language model that casts dialogue as speech-to-speech generation using parallel audio streams and an inner monologue of time-aligned text tokens.
SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation cs.CL · 2025-12-24 · unverdicted · none · ref 12
SpidR-Adapt uses meta-learning with a first-order bi-level optimization heuristic to adapt speech representations to new languages with less than 1 hour of data, achieving 100x better efficiency than standard training.
PashtoTTS-Bench: automated screening for low-resource non-Latin-script text-to-speech cs.CL · 2026-05-26 · unverdicted · none · ref 4
Introduces INSV-A automated screening benchmark for Pashto TTS systems reporting WER, script fidelity, and LID results across five systems on FLEURS and Common Voice prompts.

The zero resource speech challenge 2021: Spoken language modelling

fields

years

verdicts

representative citing papers

citing papers explorer