PAWS: Preference Learning with Advantage-Weighted Segments

Aleksandar Taranovic; Ge Li; Gerhard Neumann; Huy Le; Niklas Freymuth; Onur Celik; Rania Rayyes; Serge Thilges; Tai Hoang

arxiv: 2606.11982 · v1 · pith:LYJ6LMLGnew · submitted 2026-06-10 · 💻 cs.LG

PAWS: Preference Learning with Advantage-Weighted Segments

Aleksandar Taranovic , Onur Celik , Niklas Freymuth , Ge Li , Serge Thilges , Huy Le , Tai Hoang , Rania Rayyes

show 1 more author

Gerhard Neumann

This is my paper

classification 💻 cs.LG

keywords learningpawspolicypreferenceutilityexistingfunctionsoptimization

0 comments

read the original abstract

Preference-based reinforcement learning (PbRL) learns policies from human trajectory-level comparisons, avoiding explicit reward design and expert demonstrations. Existing methods typically train utility functions on trajectory or segment-level preferences while relying on per-step utility estimates during policy optimization. This training and inference mismatch induces a distribution shift that severely degrades temporal credit assignment and limits policy learning. We analyze this issue and propose PAWS, a segment-based preference learning method that performs policy updates directly using segment-level advantage functions. By aligning utility training with policy optimization, PAWS preserves trajectory-level preference information and avoids unreliable per-step learning signals. Experiments on simulated robotic manipulation and locomotion tasks demonstrate that PAWS consistently outperforms existing PbRL approaches, highlighting the importance of distribution-consistent preference learning.

This paper has not been read by Pith yet.

PAWS: Preference Learning with Advantage-Weighted Segments

discussion (0)