SURF derives weight sampling rules from the arc-length CDF of the scalarization path to uniformly traverse the Pareto front in multi-objective optimization.
arXiv preprint arXiv:2505.20925 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
RVPO penalizes variance across multiple reward signals during RLHF advantage aggregation, using a LogSumExp operator as a smooth variance penalty to reduce constraint neglect in LLM alignment.
C-BPO personalizes LLMs via preference-calibrated binary signals and PU learning theory to isolate inter-user differences from shared task knowledge.
citing papers explorer
-
SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front
SURF derives weight sampling rules from the arc-length CDF of the scalarization path to uniformly traverse the Pareto front in multi-objective optimization.
-
Dynamic Model Merging Made Slim
DiDi-Merging achieves dynamic model merging performance matching or exceeding prior methods while using only 1.24x to 1.4x the parameters of a single fine-tuned model.
-
Common-agency Games for Multi-Objective Test-Time Alignment
CAGE uses common-agency games and an EPEC algorithm to compute equilibrium policies that balance multiple conflicting objectives for test-time LLM alignment.
-
RVPO: Risk-Sensitive Alignment via Variance Regularization
RVPO penalizes variance across multiple reward signals during RLHF advantage aggregation, using a LogSumExp operator as a smooth variance penalty to reduce constraint neglect in LLM alignment.
-
Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework
C-BPO personalizes LLMs via preference-calibrated binary signals and PU learning theory to isolate inter-user differences from shared task knowledge.