Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
arXiv preprint arXiv:2212.09611 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
DRaFT fine-tunes diffusion models by differentiating through sampling to maximize rewards, outperforming RL baselines and improving aesthetics on Stable Diffusion 1.4.
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.
citing papers explorer
-
Response Time Enhances Alignment with Heterogeneous Preferences
Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
-
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
DRaFT fine-tunes diffusion models by differentiating through sampling to maximize rewards, outperforming RL baselines and improving aesthetics on Stable Diffusion 1.4.
-
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT aligns generative models by ranking samples with a reward model and fine-tuning only on the top-ranked outputs, reporting gains on reward scores and automated metrics for LLMs and diffusion models.