Simultaneous multi-objective alignment across verifiable and non-verifiable rewards

Yiran Shen, Yu Xia, Jonathan Chang, Prithviraj Ammanabrolu · 2025 · arXiv 2510.01167

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

representative citing papers

Many Preferences, Few Policies: Towards Scalable Language Model Personalization

cs.CL · 2026-04-05 · unverdicted · novelty 7.0

PALM produces a small portfolio of LLMs that contains a near-optimal model for any user preference weight vector, with theoretical bounds on portfolio size and approximation quality.

SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

SURF derives weight sampling rules from the arc-length CDF of the scalarization path to uniformly traverse the Pareto front in multi-objective optimization.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Many Preferences, Few Policies: Towards Scalable Language Model Personalization cs.CL · 2026-04-05 · unverdicted · none · ref 11 · internal anchor
PALM produces a small portfolio of LLMs that contains a near-optimal model for any user preference weight vector, with theoretical bounds on portfolio size and approximation quality.

Simultaneous multi-objective alignment across verifiable and non-verifiable rewards

fields

years

verdicts

representative citing papers

citing papers explorer