hub

Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663

Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, Omer Levy · 2023

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

browse 16 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 3 dataset 1

citation-polarity summary

background 3 use dataset 1

representative citing papers

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

cs.CV · 2026-05-07 · unverdicted · novelty 8.0

CDM migrates distribution matching distillation to continuous time via dynamic random-length schedules and active off-trajectory latent alignment, yielding competitive few-step image fidelity on SD3 and Longcat-Image.

Arena-T2I Hard: Benchmarking and Improving Faithfulness with Dependency-Aware Checklist

cs.AI · 2026-06-30 · unverdicted · novelty 7.0

Arena-T2I Hard benchmark with ~30 decomposed constraints per prompt and a dependency-aware checklist reward yields better faithfulness-aesthetics trade-off than single-reward or weighted-sum baselines on SD3.5-Medium and FLUX.1-dev.

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment

cs.AI · 2026-05-17 · unverdicted · novelty 7.0 · 2 refs

AutoRubric-T2I learns and selects explicit rubrics from preference pairs to guide VLM judges, producing high-quality interpretable rewards for T2I alignment with far less data than traditional Bradley-Terry models.

DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DiffusionOPD applies online policy distillation from per-task teachers to a unified diffusion student, with a derived closed-form per-step KL objective that unifies SDE and ODE sampling via mean matching.

CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating

cs.CV · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

CaC presents a new spatiotemporal concentrating reward model for video anomalies, built on a novel large-scale dataset and three-stage training with RL and IoU rewards, claiming 25.7% accuracy gains and 11.7% anomaly reduction.

LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.

Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.

Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching

cs.LG · 2025-09-26 · conditional · novelty 7.0

Derives exact guidance transition rates for discrete flow matching models that require only one model evaluation per sampling step and unify prior approximation-based methods.

Sequentially-Controlled Interactive Multi-Particle Flow-Maps for Online Feedback-Driven Search

cs.LG · 2026-07-01 · unverdicted · novelty 6.0

IMPFM is a multi-particle flow-map sampling method with sequential posterior sharing and interaction-aware correction that targets a KL-tilted distribution for global exploration in online feedback search.

Hierarchical Variational Policies for Reward-Guided Diffusion

cs.LG · 2026-05-20 · conditional · novelty 6.0

A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Super-Linear Advantage Shaping (SLAS) introduces a non-linear geometric policy update for RL post-training of text-to-image models that reshapes the local policy space via advantage-dependent Fisher-Rao weighting to reduce reward hacking and improve performance over GRPO baselines.

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

Auto-Rubric as Reward externalizes VLM preferences into structured rubrics and applies Rubric Policy Optimization to create more reliable binary rewards for multimodal generation, outperforming pairwise models on text-to-image and editing benchmarks.

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

cs.LG · 2026-04-08 · unverdicted · novelty 6.0

Sol-RL decouples FP4-based candidate exploration from BF16 policy optimization in diffusion RL, delivering up to 4.64x faster convergence with maintained or superior alignment performance on models like FLUX.1 and SD3.5.

Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence

cs.CV · 2026-07-08 · conditional · novelty 5.0

LingBot-Video is an open-source MoE video foundation model for embodied intelligence that scales to 120B parameters, integrates robot data, and uses multi-dimensional RL to improve physical plausibility.

A Systematic Post-Train Framework for Video Generation

cs.CV · 2026-04-28 · unverdicted · novelty 5.0

A post-training pipeline for video generation models combines SFT, RLHF with novel GRPO, prompt enhancement, and inference optimization to improve visual quality, temporal coherence, and instruction following.

Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

cs.LG · 2026-04-27 · unverdicted · novelty 5.0

Diffusion Templates is a unified plugin framework that allows injecting various controllable capabilities into diffusion models through a standardized interface.

citing papers explorer

Showing 16 of 16 citing papers.

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation cs.CV · 2026-05-07 · unverdicted · none · ref 19
CDM migrates distribution matching distillation to continuous time via dynamic random-length schedules and active off-trajectory latent alignment, yielding competitive few-step image fidelity on SD3 and Longcat-Image.
Arena-T2I Hard: Benchmarking and Improving Faithfulness with Dependency-Aware Checklist cs.AI · 2026-06-30 · unverdicted · none · ref 27
Arena-T2I Hard benchmark with ~30 decomposed constraints per prompt and a dependency-aware checklist reward yields better faithfulness-aesthetics trade-off than single-reward or weighted-sum baselines on SD3.5-Medium and FLUX.1-dev.
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment cs.AI · 2026-05-17 · unverdicted · none · ref 15 · 2 links
AutoRubric-T2I learns and selects explicit rubrics from preference pairs to guide VLM judges, producing high-quality interpretable rewards for T2I alignment with far less data than traditional Bradley-Terry models.
DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models cs.LG · 2026-05-14 · unverdicted · none · ref 7
DiffusionOPD applies online policy distillation from per-task teachers to a unified diffusion student, with a derived closed-form per-step KL objective that unifies SDE and ODE sampling via mean matching.
CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating cs.CV · 2026-05-12 · unverdicted · none · ref 21 · 2 links
CaC presents a new spatiotemporal concentrating reward model for video anomalies, built on a novel large-scale dataset and three-stage training with RL and IoU rewards, claiming 25.7% accuracy gains and 11.7% anomaly reduction.
LENS: Low-Frequency Eigen Noise Shaping for Efficient Diffusion Sampling cs.CV · 2026-05-08 · unverdicted · none · ref 18
LENS shapes low-frequency eigen noise with a lightweight network to enable efficient, high-quality sampling in distilled diffusion models.
Arena as Offline Reward: Efficient Fine-Grained Preference Optimization for Diffusion Models cs.CV · 2026-05-07 · unverdicted · none · ref 15
ArenaPO infers Gaussian capability distributions from pairwise preferences and applies truncated-normal latent inference to derive fine-grained offline rewards for preference optimization of text-to-image diffusion models.
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching cs.LG · 2025-09-26 · conditional · none · ref 34
Derives exact guidance transition rates for discrete flow matching models that require only one model evaluation per sampling step and unify prior approximation-based methods.
Sequentially-Controlled Interactive Multi-Particle Flow-Maps for Online Feedback-Driven Search cs.LG · 2026-07-01 · unverdicted · none · ref 18
IMPFM is a multi-particle flow-map sampling method with sequential posterior sharing and interaction-aware correction that targets a KL-tilted distribution for global exploration in online feedback search.
Hierarchical Variational Policies for Reward-Guided Diffusion cs.LG · 2026-05-20 · conditional · none · ref 27
A hierarchical variational formulation amortizes test-time guidance in diffusion models to achieve strong quality-speed tradeoffs with significantly reduced inference compute.
Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping cs.CV · 2026-05-11 · unverdicted · none · ref 27
Super-Linear Advantage Shaping (SLAS) introduces a non-linear geometric policy update for RL post-training of text-to-image models that reshapes the local policy space via advantage-dependent Fisher-Rao weighting to reduce reward hacking and improve performance over GRPO baselines.
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria cs.AI · 2026-05-08 · unverdicted · none · ref 19
Auto-Rubric as Reward externalizes VLM preferences into structured rubrics and applies Rubric Policy Optimization to create more reliable binary rewards for multimodal generation, outperforming pairwise models on text-to-image and editing benchmarks.
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling cs.LG · 2026-04-08 · unverdicted · none · ref 25
Sol-RL decouples FP4-based candidate exploration from BF16 policy optimization in diffusion RL, delivering up to 4.64x faster convergence with maintained or superior alignment performance on models like FLUX.1 and SD3.5.
Scaling Mixture-of-Experts Video Pretraining for Embodied Intelligence cs.CV · 2026-07-08 · conditional · none · ref 48
LingBot-Video is an open-source MoE video foundation model for embodied intelligence that scales to 120B parameters, integrates robot data, and uses multi-dimensional RL to improve physical plausibility.
A Systematic Post-Train Framework for Video Generation cs.CV · 2026-04-28 · unverdicted · none · ref 42
A post-training pipeline for video generation models combines SFT, RLHF with novel GRPO, prompt enhancement, and inference optimization to improve visual quality, temporal coherence, and instruction following.
Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion cs.LG · 2026-04-27 · unverdicted · none · ref 19
Diffusion Templates is a unified plugin framework that allows injecting various controllable capabilities into diffusion models through a standardized interface.

Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in neural information processing systems, 36:36652–36663

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer