LLMs can provide cost-effective annotation of credibility in Danish asylum texts but produce inconsistent errors that vary by model and prompt, requiring checks beyond single-model accuracy.
The F lores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 5representative citing papers
Nsanku benchmark shows current LLMs achieve only modest zero-shot translation scores on 43 Ghanaian languages, with no model reaching both high average performance and high cross-language consistency.
LQM introduces a six-level linguistically motivated error taxonomy for MT evaluation and applies it via expert annotation to LLM outputs on a new 3,850-sentence multi-dialect Arabic corpus.
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
MMoA adds LSTM recurrence to Mixture-of-Agents routing, reaching 58.0% win rate on AlpacaEval 2.0 versus 59.8% for baseline MoA while cutting runtime by up to 4.6%.
citing papers explorer
-
LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics
LLMs can provide cost-effective annotation of credibility in Danish asylum texts but produce inconsistent errors that vary by model and prompt, requiring checks beyond single-model accuracy.
-
Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages
Nsanku benchmark shows current LLMs achieve only modest zero-shot translation scores on 43 Ghanaian languages, with no model reaching both high average performance and high cross-language consistency.
-
LQM: Linguistically Motivated Multidimensional Quality Metrics for Machine Translation
LQM introduces a six-level linguistically motivated error taxonomy for MT evaluation and applies it via expert annotation to LLM outputs on a new 3,850-sentence multi-dialect Arabic corpus.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
-
MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent
MMoA adds LSTM recurrence to Mixture-of-Agents routing, reaching 58.0% win rate on AlpacaEval 2.0 versus 59.8% for baseline MoA while cutting runtime by up to 4.6%.