Manning, and Chelsea Finn

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

DDO-RM: Distribution-Level Policy Improvement after Reward Learning

stat.ML · 2026-04-13 · unverdicted · novelty 7.0

DDO-RM turns reward scores into a target distribution and applies KL-regularized mirror-descent projection on finite candidates to improve policies, outperforming DPO on Pythia-410M.

Pioneer Agent: Continual Improvement of Small Language Models in Production

cs.AI · 2026-04-10 · unverdicted · novelty 6.0

Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.

citing papers explorer

Showing 2 of 2 citing papers.

DDO-RM: Distribution-Level Policy Improvement after Reward Learning stat.ML · 2026-04-13 · unverdicted · none · ref 3
DDO-RM turns reward scores into a target distribution and applies KL-regularized mirror-descent projection on finite candidates to improve policies, outperforming DPO on Pythia-410M.
Pioneer Agent: Continual Improvement of Small Language Models in Production cs.AI · 2026-04-10 · unverdicted · none · ref 75
Pioneer Agent automates the full lifecycle of adapting and continually improving small language models via diagnosis-driven data synthesis and regression-constrained retraining, delivering gains of 1.6-83.8 points on benchmarks and large lifts in production-style tasks.

Manning, and Chelsea Finn

fields

years

verdicts

representative citing papers

citing papers explorer