← back to paper
arxiv: 2605.12288 · 2 revisions
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching