Direct judgement preference optimization

Peifeng Wang, Austin Xu, Yilun Zhou, Caiming Xiong, Shafiq Joty · 2024 · arXiv 2409.14664

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization

cs.CL · 2025-09-28 · unverdicted · novelty 6.0

Fine-tuned LLM judges struggle with future-proofing to newer generators but maintain backward-compatibility more easily; DPO training and continual learning improve adaptation while all models degrade on unseen questions.

Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs

cs.AI · 2024-10-24 · unverdicted · novelty 4.0

Data-centric filtering yields an 80K preference dataset and reward models that lead RewardBench while boosting other top entries.

citing papers explorer

Showing 2 of 2 citing papers.

On the Shelf Life of Fine-Tuned LLM-Judges: Future-Proofing, Backward-Compatibility, and Question Generalization cs.CL · 2025-09-28 · unverdicted · none · ref 42
Fine-tuned LLM judges struggle with future-proofing to newer generators but maintain backward-compatibility more easily; DPO training and continual learning improve adaptation while all models degrade on unseen questions.
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs cs.AI · 2024-10-24 · unverdicted · none · ref 21
Data-centric filtering yields an 80K preference dataset and reward models that lead RewardBench while boosting other top entries.

Direct judgement preference optimization

fields

years

verdicts

representative citing papers

citing papers explorer