SENIOR improves feedback efficiency and policy learning speed in PbRL by combining motion-distinction query selection via kernel density estimation with preference-guided intrinsic rewards, showing gains on simulated and real robot tasks.
Rank analysis of incomplete block designs: I. the method of paired comparisons,
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Rubric changes shift implicit-commitment labels by 70-83% agreement and make some metrics uninformative, so model rankings only stabilize after a metric-identifiability audit.
BalancedDPO applies majority-vote consensus from multiple preference scorers and dynamic reference model updates within DPO to achieve multi-metric alignment for text-to-image diffusion models, reporting improved win rates on Pick-a-Pic, PartiPrompt, and HPD datasets across SD 1.5, 2.1, and SDXL.
citing papers explorer
-
SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning
SENIOR improves feedback efficiency and policy learning speed in PbRL by combining motion-distinction query selection via kernel density estimation with preference-guided intrinsic rewards, showing gains on simulated and real robot tasks.
-
Measurement Risk in Supervised Financial NLP: Rubric and Metric Sensitivity on JF-ICR
Rubric changes shift implicit-commitment labels by 70-83% agreement and make some metrics uninformative, so model rankings only stabilize after a metric-identifiability audit.
-
BalancedDPO: Adaptive Multi-Metric Alignment
BalancedDPO applies majority-vote consensus from multiple preference scorers and dynamic reference model updates within DPO to achieve multi-metric alignment for text-to-image diffusion models, reporting improved win rates on Pick-a-Pic, PartiPrompt, and HPD datasets across SD 1.5, 2.1, and SDXL.