pith. machine review for the scientific record. sign in

Dolphin-CN-Dialect: Where Chinese Dialects Matter

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

We present Dolphin-CN-Dialect, a streaming-capable ASR model with a focus on Chinese and dialect-rich scenarios. Compared to the previous version, Dolphin-CN-Dialect introduces substantial improvements in data processing, tokenization, training stability, and data sampling strategies. To address the challenges of highly imbalanced dialect data, we propose a temperature-based sampling strategy that effectively balances standard Mandarin and low-resource dialects, leading to significant gains in dialect recognition performance. In addition, we redesign the tokenizer to better align with linguistic characteristics, adopting character-level modeling for Chinese and subword modeling for English, while introducing extensible dialect tokens. Experimental results show that Dolphin-CN-Dialect achieves improvement in dialect recognition accuracy and CER reduction compared to Dolphin. Furthermore, Dolphin-CN-Dialect reaches competitive performance with recent SOTA open-source ASR models, while maintaining a significantly smaller model size. Dolphin-CN-Dialect supports both streaming and non-streaming inference, enabling a practical balance between latency and accuracy. It also provides flexible customization through hotword support and efficient deployment optimized for specialized hardware. These improvements make Dolphin-CN-Dialect a strong and practical solution for real-world multi-dialect ASR applications.

fields

cs.CL 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Dolphin-CN-Dialect: Where Chinese Dialects Matter

cs.CL · 2026-05-09 · unverdicted · novelty 4.0

Dolphin-CN-Dialect is a compact ASR model that boosts Chinese dialect accuracy through balanced sampling of rare dialects and character-level tokenization while staying smaller than recent open-source competitors.

citing papers explorer

Showing 1 of 1 citing paper.

  • Dolphin-CN-Dialect: Where Chinese Dialects Matter cs.CL · 2026-05-09 · unverdicted · none · ref 1 · internal anchor

    Dolphin-CN-Dialect is a compact ASR model that boosts Chinese dialect accuracy through balanced sampling of rare dialects and character-level tokenization while staying smaller than recent open-source competitors.