OpenBibleTTS supplies speech data and alignments for 37 underrepresented languages and shows that no single TTS system leads on all metrics, with Gemini-TTS highest in listener ratings but monolingual EveryVoice models strongest on intelligibility for several African languages.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 4years
2026 4representative citing papers
Activation steering on early layers improves diversity of synthetic data for low-resource languages and often boosts downstream classifier performance compared to non-steered prompting.
Explicit purpose instructions improve LLM translation adaptedness across 50 languages and 8 domains, with larger gains on informal text, while standard metrics often penalize the adapted outputs.
Large-scale benchmarks of multilingual embeddings and QE models show no universal performer; direction-aware routing and calibration recommended for parallel data assessment.
citing papers explorer
-
OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource Languages
OpenBibleTTS supplies speech data and alignments for 37 underrepresented languages and shows that no single TTS system leads on all metrics, with Gemini-TTS highest in listener ratings but monolingual EveryVoice models strongest on intelligibility for several African languages.
-
Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation
Activation steering on early layers improves diversity of synthetic data for low-resource languages and often boosts downstream classifier performance compared to non-steered prompting.
-
Beyond "To whom it may concern": Tailoring Machine Translation to Audience and Intent
Explicit purpose instructions improve LLM translation adaptedness across 50 languages and 8 domains, with larger gains on informal text, while standard metrics often penalize the adapted outputs.
-
Model-Based Quality Assessment for Massively Multilingual Parallel Data
Large-scale benchmarks of multilingual embeddings and QE models show no universal performer; direction-aware routing and calibration recommended for parallel data assessment.