SMTPO uses multi-task SFT to improve simulator feedback quality and RL with fine-grained rewards to optimize multi-turn preference reasoning in LLM-based conversational recommendation.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.IR 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
PFA adds a trainable fairness adapter to frozen recommenders and uses hierarchical exposure alignment to balance inter- and intra-group provider visibility, delivering substantial fairness gains with negligible accuracy loss on three public datasets.
citing papers explorer
-
User Simulator-Guided Multi-Turn Preference Optimization for Reasoning LLM-based Conversational Recommendation
SMTPO uses multi-task SFT to improve simulator feedback quality and RL with fine-grained rewards to optimize multi-turn preference reasoning in LLM-based conversational recommendation.
-
Post-hoc Provider Fairness Adaptation via Hierarchical Exposure Alignment
PFA adds a trainable fairness adapter to frozen recommenders and uses hierarchical exposure alignment to balance inter- and intra-group provider visibility, delivering substantial fairness gains with negligible accuracy loss on three public datasets.