Prosa demonstrates that rubric-based binary scoring with multi-judge filtering yields full agreement on 16 LLM rankings across judges on Brazilian Portuguese chats, compared to only 7/16 under holistic scoring, while widening score gaps by 47%.
Computational Linguistics , volume =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2representative citing papers
300 high-quality Stoic examples align small LLMs with inward virtues via preference optimization but leave outward cosmopolitan duties unlearned.
citing papers explorer
-
Prosa: Rubric-Based Evaluation of LLMs on Real User Chats in Brazilian Portuguese
Prosa demonstrates that rubric-based binary scoring with multi-judge filtering yields full agreement on 16 LLM rankings across judges on Brazilian Portuguese chats, compared to only 7/16 under holistic scoring, while widening score gaps by 47%.
-
StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models
300 high-quality Stoic examples align small LLMs with inward virtues via preference optimization but leave outward cosmopolitan duties unlearned.