Reinforcement learning on a reward model trained from human summary comparisons produces summaries humans prefer over supervised fine-tuning or human references on TL;DR and transfers to CNN/DM.
arXiv preprint arXiv:1910.00292 , year=
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
DSA adds a jointly trained confidence head to autoregressive video diffusion models that dynamically allocates fewer or more denoising steps per frame, achieving 22.63 FPS real-time generation on H100 while matching VBench quality.
NVAR models exhibit training error scaling laws tied to feature library representation of Lie-series coefficients, with delays reducing one-step error but aiding long-horizon forecasts only under sufficient nonlinearity.
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核
SDFlow learns a global transport map via similarity-driven flow matching in VQ latent space, using low-rank manifold decomposition and a categorical posterior to handle discreteness, yielding SOTA long-horizon performance and inference speedups.
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.
MT-EditFlow applies flow-matching RL with multi-reward aggregation to improve multi-turn image editing performance on models like FLUX.1-Kontext-dev by 6.85 points at turn-3.
citing papers explorer
-
Learning to summarize from human feedback
Reinforcement learning on a reward model trained from human summary comparisons produces summaries humans prefer over supervised fine-tuning or human references on TL;DR and transfers to CNN/DM.
-
DSA: Dynamic Step Allocation for Fast Autoregressive Video Generation
DSA adds a jointly trained confidence head to autoregressive video diffusion models that dynamically allocates fewer or more denoising steps per frame, achieving 22.63 FPS real-time generation on H100 while matching VBench quality.
-
Flow map learning in nonlinear vector autoregressive models: influence of the feature-library structure on the training error
NVAR models exhibit training error scaling laws tied to feature library representation of Lie-series coefficients, with delays reducing one-step error but aiding long-horizon forecasts only under sufficient nonlinearity.
-
Assisted Counterspeech Writing at the Crossroads of Hate Speech and Misinformation
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
-
Mechanisms of Misgeneralization in Physical Sequence Modeling
Generative sequence models for physical tasks exhibit physical misgeneralization where local prediction errors propagate through physical measurements to distort aggregate distributions over quantities like distance or energy; a data deviation kernel explains and predicts the shifts and supports a内核
-
SDFlow: Similarity-Driven Flow Matching for Time Series Generation
SDFlow learns a global transport map via similarity-driven flow matching in VQ latent space, using low-rank manifold decomposition and a categorical posterior to handle discreteness, yielding SOTA long-horizon performance and inference speedups.
-
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
-
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
Rolling Forcing generates multi-minute videos in real time by jointly denoising frames at increasing noise levels, anchoring attention to early frames, and using windowed distillation to limit error accumulation.
-
MT-EditFlow: Reinforcement Learning for Multi-Turn Image Editing with Flow Matching
MT-EditFlow applies flow-matching RL with multi-reward aggregation to improve multi-turn image editing performance on models like FLUX.1-Kontext-dev by 6.85 points at turn-3.