pith. sign in

hub

Diffusion model alignment using direct preference optimization.arXiv preprint arXiv:2311.12908

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

hub tools

citation-role summary

background 2 baseline 1

citation-polarity summary

representative citing papers

Collective Recourse for Generative Urban Visualizations

cs.HC · 2025-09-15 · unverdicted · novelty 7.0

Collective recourse formalizes community reports to fix group harms in diffusion models for urban visualizations via a report-triage-fix-verify pipeline, four primitives, a mandate score, and synthetic evaluation of 240 reports.

VASR: Variance-Aware Systematic Resampling for Reward-Guided Diffusion

cs.AI · 2026-04-08 · unverdicted · novelty 6.0 · 2 refs

VASR separates continuation and residual variance in reward-guided diffusion SMC, using optimal mass allocation and systematic resampling to achieve up to 26% better FID scores and faster runtimes than prior SMC and MCTS methods.

IdGlow: Dynamic Identity Modulation for Multi-Subject Generation

cs.CV · 2026-02-28 · unverdicted · novelty 6.0

IdGlow is a progressive two-stage diffusion framework that uses task-adaptive timestep scheduling, temporal gating, VLM prompt synthesis, and group-level DPO to balance identity preservation and scene coherence in multi-subject image generation.

Listener-Rewarded Thinking in VLMs for Image Preferences

cs.CV · 2025-06-28 · unverdicted · novelty 6.0

Listener-augmented GRPO uses an independent frozen VLM to provide dense confidence scores on reasoning traces, yielding 67.4% accuracy on ImageReward, up to +6% OOD gains on 1.2M-vote human data, and fewer reasoning contradictions.

Diffusion Policy Policy Optimization

cs.RO · 2024-09-01 · unverdicted · novelty 6.0

DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.

D2 Actor Critic: Diffusion Actor Meets Distributional Critic

cs.LG · 2025-10-03 · unverdicted · novelty 5.0

D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.

citing papers explorer

Showing 12 of 12 citing papers.

  • $Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models cs.CV · 2026-04-26 · unverdicted · none · ref 44

    Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.

  • Collective Recourse for Generative Urban Visualizations cs.HC · 2025-09-15 · unverdicted · none · ref 19

    Collective recourse formalizes community reports to fix group harms in diffusion models for urban visualizations via a report-triage-fix-verify pipeline, four primitives, a mandate score, and synthetic evaluation of 240 reports.

  • Diffusion Domain Expansion: Learning to Coordinate Pre-trained Diffusion Models cs.LG · 2026-05-22 · unverdicted · none · ref 54

    DDE introduces a compact coordinator network that combines denoised outputs from pre-trained diffusion models to enable generation in larger domains and complex conditioning settings.

  • VASR: Variance-Aware Systematic Resampling for Reward-Guided Diffusion cs.AI · 2026-04-08 · unverdicted · none · ref 12 · 2 links

    VASR separates continuation and residual variance in reward-guided diffusion SMC, using optimal mass allocation and systematic resampling to achieve up to 26% better FID scores and faster runtimes than prior SMC and MCTS methods.

  • IdGlow: Dynamic Identity Modulation for Multi-Subject Generation cs.CV · 2026-02-28 · unverdicted · none · ref 30

    IdGlow is a progressive two-stage diffusion framework that uses task-adaptive timestep scheduling, temporal gating, VLM prompt synthesis, and group-level DPO to balance identity preservation and scene coherence in multi-subject image generation.

  • Listener-Rewarded Thinking in VLMs for Image Preferences cs.CV · 2025-06-28 · unverdicted · none · ref 27

    Listener-augmented GRPO uses an independent frozen VLM to provide dense confidence scores on reasoning traces, yielding 67.4% accuracy on ImageReward, up to +6% OOD gains on 1.2M-vote human data, and fewer reasoning contradictions.

  • Diffusion Policy Policy Optimization cs.RO · 2024-09-01 · unverdicted · none · ref 97

    DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.

  • VideoPhy: Evaluating Physical Commonsense for Video Generation cs.CV · 2024-06-05 · conditional · none · ref 101

    VideoPhy benchmark shows state-of-the-art text-to-video models follow physical commonsense and text prompts in only 39.6% of cases for the best model.

  • Seed-TTS: A Family of High-Quality Versatile Speech Generation Models eess.AS · 2024-06-04 · unverdicted · none · ref 38

    Seed-TTS models produce speech matching human naturalness and speaker similarity, with added controllability via self-distillation and reinforcement learning.

  • Scaling Rectified Flow Transformers for High-Resolution Image Synthesis cs.CV · 2024-03-05 · conditional · none · ref 191

    Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.

  • D2 Actor Critic: Diffusion Actor Meets Distributional Critic cs.LG · 2025-10-03 · unverdicted · none · ref 34

    D2AC combines a diffusion actor with a distributional critic via fused distributional RL and clipped double Q-learning to reach state-of-the-art results on 18 hard control benchmarks including Humanoid, Dog, and Shadow Hand.

  • Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey cs.CV · 2025-05-23 · accept · none · ref 9

    A literature survey that organizes diffusion model alignment methods along five axes (feedback source, reward form, optimization mechanism, distribution shift handling, and explicit safety constraints) and identifies open challenges for reliable deployment.