RAD-DPO adds token-level gradient detachment, similarity-based dynamic reward weighting, and a multi-label global contrastive objective to DPO for better handling of hierarchical Semantic IDs and noisy feedback in e-commerce generative retrieval.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.IR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization for Generative Retrieval in E-commerce
RAD-DPO adds token-level gradient detachment, similarity-based dynamic reward weighting, and a multi-label global contrastive objective to DPO for better handling of hierarchical Semantic IDs and noisy feedback in e-commerce generative retrieval.