DynamicPO prevents preference optimization collapse in multi-negative DPO by adaptively selecting boundary-critical negatives and calibrating per-sample optimization strength, yielding higher recommendation accuracy on three public datasets.
arXiv preprint arXiv:2310.20487 (2023)
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
RecPIE jointly optimizes recommendation predictions and LLM-generated natural-language explanations via alternating training and reinforcement learning, yielding 3-4% accuracy gains and higher human preference on Google Maps POI data.
An Efficient Generative Targeting framework accelerates LLM inference in advertising via adaptive group quantization, layer-adaptive hierarchical sparsification, and prefix-tree parallel verification while accepting limited quality degradation.
citing papers explorer
-
DynamicPO: Dynamic Preference Optimization for Recommendation
DynamicPO prevents preference optimization collapse in multi-negative DPO by adaptively selecting boundary-critical negatives and calibrating per-sample optimization strength, yielding higher recommendation accuracy on three public datasets.
-
Can Explanations Improve Recommendations? Evidence from Prediction-Informed Explanations
RecPIE jointly optimizes recommendation predictions and LLM-generated natural-language explanations via alternating training and reinforcement learning, yielding 3-4% accuracy gains and higher human preference on Google Maps POI data.
-
Efficient LLM-based Advertising via Model Compression and Parallel Verification
An Efficient Generative Targeting framework accelerates LLM inference in advertising via adaptive group quantization, layer-adaptive hierarchical sparsification, and prefix-tree parallel verification while accepting limited quality degradation.