Nsanku benchmark shows current LLMs achieve only modest zero-shot translation scores on 43 Ghanaian languages, with no model reaching both high average performance and high cross-language consistency.
CoRR , volume =
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 9roles
background 2polarities
background 2representative citing papers
This systematic survey organizes prompt engineering into a taxonomy of 58 LLM techniques and 40 others, supplies a shared vocabulary, and offers guidelines for state-of-the-art models.
Introduces the CHA-Gen dataset for Chinese ambiguity based on Potential Ambiguity Theory and shows LLMs struggle to detect ambiguity, exhibiting specific failure modes and overconfidence after instruction tuning.
RouteLMT learns to route MT requests to large or small LLMs by predicting marginal quality gain from small-model token representations, yielding a better quality-budget Pareto frontier than baselines.
Translating unsafe inputs to low-resource languages jailbreaks GPT-4 at rates on par with or exceeding state-of-the-art attacks.
Data augmentation via LLMs and back-translation produces task-specific effects on NER and POS tagging for Hausa and Fongbe, with no consistent gains over baseline and opposite outcomes across tasks for the same synthetic data.
GPT-4o Mini extracts 6-41 times more usable Hausa and Fongbe text per API call than Gemini 2.5 Flash, with optimal elicitation strategies differing by language.
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.
citing papers explorer
-
Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages
Nsanku benchmark shows current LLMs achieve only modest zero-shot translation scores on 43 Ghanaian languages, with no model reaching both high average performance and high cross-language consistency.
-
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques
This systematic survey organizes prompt engineering into a taxonomy of 58 LLM techniques and 40 others, supplies a shared vocabulary, and offers guidelines for state-of-the-art models.
-
Evaluating Chinese Ambiguity Understanding in Large Language Models
Introduces the CHA-Gen dataset for Chinese ambiguity based on Potential Ambiguity Theory and shows LLMs struggle to detect ambiguity, exhibiting specific failure modes and overconfidence after instruction tuning.
-
RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment
RouteLMT learns to route MT requests to large or small LLMs by predicting marginal quality gain from small-model token representations, yielding a better quality-budget Pareto frontier than baselines.
-
Low-Resource Languages Jailbreak GPT-4
Translating unsafe inputs to low-resource languages jailbreaks GPT-4 at rates on par with or exceeding state-of-the-art attacks.
-
When Does Data Augmentation Help? Evaluating LLM and Back-Translation Methods for Hausa and Fongbe NLP
Data augmentation via LLMs and back-translation produces task-specific effects on NER and POS tagging for Hausa and Fongbe, with no consistent gains over baseline and opposite outcomes across tasks for the same synthetic data.
-
Mining Large Language Models for Low-Resource Language Data: Comparing Elicitation Strategies for Hausa and Fongbe
GPT-4o Mini extracts 6-41 times more usable Hausa and Fongbe text per API call than Gemini 2.5 Flash, with optimal elicitation strategies differing by language.
-
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.
-
Benchmark Data Contamination of Large Language Models: A Survey
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.