CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
arXiv preprint arXiv:2412.14922 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
REALM learns per-annotator expertise scalars unsupervised by modeling each label as an expertise-weighted mixture of the model's prediction and a uniform random guess, delivering up to 50% accuracy gains over naive noisy supervised fine-tuning on question-answering benchmarks.
Label noise hurts fine-tuning performance most while grammatical and typographical noise sometimes act as mild regularizers, with changes concentrated in task-specific layers.
citing papers explorer
-
Common-agency Games for Multi-Objective Test-Time Alignment
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
-
REALM: Reliable Expertise-Aware Language Model Fine-Tuning from Noisy Annotations
REALM learns per-annotator expertise scalars unsupervised by modeling each label as an expertise-weighted mixture of the model's prediction and a uniform random guess, delivering up to 50% accuracy gains over naive noisy supervised fine-tuning on question-answering benchmarks.
-
Analyzing the Effect of Noise in LLM Fine-tuning
Label noise hurts fine-tuning performance most while grammatical and typographical noise sometimes act as mild regularizers, with changes concentrated in task-specific layers.