MEGA : Multilingual Evaluation of Generative AI

Ahuja, K · 2023 · DOI 10.18653/v1/2023.emnlp-main.258

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

representative citing papers

Evaluating Commercial AI Chatbots as News Intermediaries

cs.CL · 2026-05-21 · conditional · novelty 7.0

Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.

LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics

cs.CL · 2026-05-13 · accept · novelty 7.0

LLMs can provide cost-effective annotation of credibility in Danish asylum texts but produce inconsistent errors that vary by model and prompt, requiring checks beyond single-model accuracy.

Do LLMs Use Cultural Knowledge Without Being Told? A Multilingual Evaluation of Implicit Pragmatic Adaptation

cs.CL · 2026-04-20 · conditional · novelty 7.0

LLMs recover only ~20% of explicit pragmatic shifts under implicit cultural cues across five languages, responding mainly to linguistic structure rather than cultural associations as shown by Hindi-Urdu controls.

Tracing the ongoing emergence of human-like reasoning in Large Language Models

cs.CL · 2026-05-20 · unverdicted · novelty 4.0

LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.

Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages

cs.CL · 2026-05-16 · unverdicted · novelty 2.0

A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.

citing papers explorer

Showing 5 of 5 citing papers.

Evaluating Commercial AI Chatbots as News Intermediaries cs.CL · 2026-05-21 · conditional · none · ref 1
Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.
LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics cs.CL · 2026-05-13 · accept · none · ref 50
LLMs can provide cost-effective annotation of credibility in Danish asylum texts but produce inconsistent errors that vary by model and prompt, requiring checks beyond single-model accuracy.
Do LLMs Use Cultural Knowledge Without Being Told? A Multilingual Evaluation of Implicit Pragmatic Adaptation cs.CL · 2026-04-20 · conditional · none · ref 16
LLMs recover only ~20% of explicit pragmatic shifts under implicit cultural cues across five languages, responding mainly to linguistic structure rather than cultural associations as shown by Hindi-Urdu controls.
Tracing the ongoing emergence of human-like reasoning in Large Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 93
LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.
Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages cs.CL · 2026-05-16 · unverdicted · none · ref 123
A tutorial synthesizing foundations, recent models such as PALO and Maya, and low-cost methods for tri-modal multilingual AI in resource-constrained settings.

MEGA : Multilingual Evaluation of Generative AI

fields

years

verdicts

representative citing papers

citing papers explorer