OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
Expand and prune: Maximizing trajectory diversity for effective grpo in generative models
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 2polarities
background 2representative citing papers
Sol-RL decouples FP4-based candidate exploration from BF16 policy optimization in diffusion RL, delivering up to 4.64x faster convergence with maintained or superior alignment performance on models like FLUX.1 and SD3.5.
RSEdit adapts off-the-shelf text-to-image models into a collection of editing systems that follow text instructions while keeping geospatial structure intact in remote sensing images.
citing papers explorer
-
Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation
OTCA improves GRPO training for visual generation by estimating step importance in trajectories and adaptively weighting multiple reward objectives.
-
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
Sol-RL decouples FP4-based candidate exploration from BF16 policy optimization in diffusion RL, delivering up to 4.64x faster convergence with maintained or superior alignment performance on models like FLUX.1 and SD3.5.
-
RSEdit: Text-Guided Image Editing for Remote Sensing
RSEdit adapts off-the-shelf text-to-image models into a collection of editing systems that follow text instructions while keeping geospatial structure intact in remote sensing images.