Decoding-time use of process reward models for bias mitigation raises fairness scores by up to 0.40 on a bilingual benchmark while preserving fluency across four LLMs and extends to open-ended generation with low overhead.
Holistic evaluation of language models.Transactions on Machine Learning Research
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
PSA-Eval reframes evaluation of trilingual public-space agents around traceable failures and regression testing, revealing cross-language score drift in a pilot despite high average performance.
citing papers explorer
-
Decoding-Time Debiasing via Process Reward Models: From Controlled Fill-in to Open-Ended Generation
Decoding-time use of process reward models for bias mitigation raises fairness scores by up to 0.40 on a bilingual benchmark while preserving fluency across four LLMs and extends to open-ended generation with low overhead.
-
Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents
PSA-Eval reframes evaluation of trilingual public-space agents around traceable failures and regression testing, revealing cross-language score drift in a pilot despite high average performance.