CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
Interpretable preferences via multi-objective reward modeling and mixture-of-experts
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
MNPO extends NLHF to multiplayer Nash games, inheriting equilibrium guarantees while showing empirical gains on instruction-following benchmarks under diverse preferences.
citing papers explorer
-
Common-agency Games for Multi-Objective Test-Time Alignment
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
-
Multiplayer Nash Preference Optimization
MNPO extends NLHF to multiplayer Nash games, inheriting equilibrium guarantees while showing empirical gains on instruction-following benchmarks under diverse preferences.