A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.
arXiv preprint arXiv:2302.14520 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Describes a conceptual agentic prototype for AI translation that operationalizes skopos theory and GEMBA-MQM verification into a four-stage cycle with user dialogue and memory for coherence.
A topic-modeling framework measures document-level thematic consistency in translations by aligning key tokens across languages with a bilingual dictionary and scoring via cosine similarity, providing explainable insights beyond sentence-level metrics.
ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standard BCE loss.
citing papers explorer
-
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
A panel of smaller diverse LLMs outperforms a single large model as an evaluator of generations, showing less intra-model bias and over 7x lower cost.
-
Agentic AI Translate: An Agentic Translator Prototype for Translation as Communication Design
Describes a conceptual agentic prototype for AI translation that operationalizes skopos theory and GEMBA-MQM verification into a four-stage cycle with user dialogue and memory for coherence.
-
An Explainable Approach to Document-level Translation Evaluation with Topic Modeling
A topic-modeling framework measures document-level thematic consistency in translations by aligning key tokens across languages with a bilingual dictionary and scoring via cosine similarity, providing explainable insights beyond sentence-level metrics.
-
Cross-Lingual Attention Distillation with Personality-Informed Generative Augmentation for Multilingual Personality Recognition
ADAM uses personality-guided LLM augmentation and cross-lingual attention distillation to raise balanced accuracy on multilingual personality recognition to 0.6332 on Essays and 0.7448 on Kaggle, outperforming standard BCE loss.