MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.
IndicGenBench : A multilingual benchmark to evaluate generation capabilities of LLM s on indic languages
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Localizing judge prompts to five languages shows that LLM backbones interact with language in agent-as-a-judge evaluations, inverting rankings and revealing no universal best model with low inter-judge agreement.
citing papers explorer
-
MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation
MORPHOGEN is a new multilingual benchmark for testing LLMs on gender-aware morphological generation via rewriting first-person sentences to the opposite gender in French, Arabic, and Hindi.
-
Multilingual Prompt Localization for Agent-as-a-Judge: Language and Backbone Sensitivity in Requirement-Level Evaluation
Localizing judge prompts to five languages shows that LLM backbones interact with language in agent-as-a-judge evaluations, inverting rankings and revealing no universal best model with low inter-judge agreement.