Loqua- cious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use

· 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models

cs.SD · 2026-01-28 · unverdicted · novelty 5.0

Prioritizing longest utterances in SSL speech pre-training data outperforms random or diversity-based sampling for ASR performance while using half the data volume.

citing papers explorer

Showing 1 of 1 citing paper.

A Study of Data Selection Strategies for Pre-training Self-Supervised Speech Models cs.SD · 2026-01-28 · unverdicted · none · ref 21
Prioritizing longest utterances in SSL speech pre-training data outperforms random or diversity-based sampling for ASR performance while using half the data volume.

Loqua- cious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use

fields

years

verdicts

representative citing papers

citing papers explorer