Personalizing reinforcement learning from human feedback with varia- tional preference learning.Advances in Neural Information Processing Systems, 37:52516–52544, 2024

Sriyash Poddar, Yanming Wan, Hamish Ivison, Abhishek Gupta, Natasha Jaques · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

POPI: Personalizing LLMs via Optimized Natural Language Preference Inference

cs.CL · 2025-10-17 · unverdicted · novelty 5.0

POPI distills user preferences into reusable natural-language summaries via a shared inference model and conditions a generator on them, trained jointly with RL to improve personalization quality while cutting context length by up to 10x on benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

POPI: Personalizing LLMs via Optimized Natural Language Preference Inference cs.CL · 2025-10-17 · unverdicted · none · ref 35
POPI distills user preferences into reusable natural-language summaries via a shared inference model and conditions a generator on them, trained jointly with RL to improve personalization quality while cutting context length by up to 10x on benchmarks.

Personalizing reinforcement learning from human feedback with varia- tional preference learning.Advances in Neural Information Processing Systems, 37:52516–52544, 2024

fields

years

verdicts

representative citing papers

citing papers explorer