COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
On the Cross-lingual Transferability of Monolingual Representations
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
dataset 1polarities
use dataset 1representative citing papers
The study filters non-English Wikipedia, reveals quality problems, proposes a 4-level ranking, and shows filtered data matches or beats raw data in language modeling with largest gains for lower-quality editions.
Selective replacement of the worst 20-30% of text-only subtitle segments with visual-enhanced outputs raises COMET scores for Indic languages, but full visual grounding is ineffective because of temporal misalignment between subtitles and frames.
Incidental multilingualism from uneven web training makes LLMs unequal, brittle, and opaque across languages.
A new pre-training task that maps languages bidirectionally in embedding space improves machine translation by up to 11.9 BLEU, cross-lingual QA by 6.72 BERTScore points, and understanding accuracy by over 5% over strong baselines.
Activation steering with FLORES-derived language vectors produces modest, layer-sensitive and language-dependent gains on cultural awareness tasks, with some settings degrading performance and strong interaction with prompt design.
citing papers explorer
-
COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
-
How Good is Your Wikipedia? Auditing Data Quality for Low-resource and Multilingual NLP
The study filters non-English Wikipedia, reveals quality problems, proposes a 4-level ranking, and shows filtered data matches or beats raw data in language modeling with largest gains for lower-quality editions.
-
Towards Visually-Guided Movie Subtitle Translation for Indic Languages
Selective replacement of the worst 20-30% of text-only subtitle segments with visual-enhanced outputs raises COMET scores for Indic languages, but full visual grounding is ineffective because of temporal misalignment between subtitles and frames.
-
Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs
Incidental multilingualism from uneven web training makes LLMs unequal, brittle, and opaque across languages.
-
Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance
A new pre-training task that maps languages bidirectionally in embedding space improves machine translation by up to 11.9 BLEU, cross-lingual QA by 6.72 BERTScore points, and understanding accuracy by over 5% over strong baselines.
-
DFKI-MLT at SemEval-2026 TASK 7: Steering Multilingual Models Towards Cultural Knowledge
Activation steering with FLORES-derived language vectors produces modest, layer-sensitive and language-dependent gains on cultural awareness tasks, with some settings degrading performance and strong interaction with prompt design.