Subword tokenization impairs phonological knowledge encoding in LMs, but an IPA-based fine-tuning method restores it with minimal impact on other capabilities.
LLM-Powered Grapheme-to- Phoneme Conversion: Benchmark and Case Study
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2representative citing papers
Hybrid OLaPh framework outperforms prior G2P baselines on WikiPron while enabling synthetic data for an LLM that generalizes well on out-of-vocabulary terms.
citing papers explorer
-
OLaPh: Optimal Language Phonemizer
Hybrid OLaPh framework outperforms prior G2P baselines on WikiPron while enabling synthetic data for an LLM that generalizes well on out-of-vocabulary terms.