RubricRL: Simple Generalizable Rewards for Text-to-Image Generation

Chunming Qiao; Dongdong Chen; Junsong Yuan; Xuelu Feng; Yunsheng Li; Zixuan Gao; Ziyu Wan

arxiv: 2511.20651 · v3 · pith:7VCN6UPUnew · submitted 2025-11-25 · 💻 cs.CV

RubricRL: Simple Generalizable Rewards for Text-to-Image Generation

Xuelu Feng , Yunsheng Li , Ziyu Wan , Zixuan Gao , Junsong Yuan , Dongdong Chen , Chunming Qiao This is my paper

classification 💻 cs.CV

keywords rubricrltext-to-imageinterpretablerewarddesignhumaninterpretabilitymodels

0 comments

read the original abstract

Reinforcement learning (RL) has recently emerged as a promising approach for aligning text-to-image generative models with human preferences. A key challenge, however, lies in designing effective and interpretable rewards. Existing methods often rely on either composite metrics (e.g., CLIP, OCR, and realism scores) with fixed weights or a single scalar reward distilled from human preference models, which can limit interpretability and flexibility. We propose RubricRL, a simple and general framework for rubric-based reward design that offers greater interpretability, composability, and user control. Instead of using a black-box scalar signal, RubricRL dynamically constructs a structured rubric for each prompt--a decomposable checklist of fine-grained visual criteria such as object correctness, attribute accuracy, OCR fidelity, and realism--tailored to the input text. Each criterion is independently evaluated by a multimodal judge (e.g., o4-mini), and a prompt-adaptive weighting mechanism emphasizes the most relevant dimensions. This design not only produces interpretable and modular supervision signals for policy optimization (e.g., GRPO or PPO), but also enables users to directly adjust which aspects to reward or penalize. Experiments with an autoregressive text-to-image model demonstrate that RubricRL improves prompt faithfulness, visual detail, and generalizability, while offering a flexible and extensible foundation for interpretable RL alignment across text-to-image architectures.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
cs.AI 2026-05 unverdicted novelty 7.0

AutoRubric-T2I learns a small set of interpretable rubrics for VLM judges that outperform scalar reward models on T2I benchmarks while using far less preference data.
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
cs.AI 2026-05 unverdicted novelty 7.0

AutoRubric-T2I learns and selects explicit rubrics from preference pairs to guide VLM judges, producing high-quality interpretable rewards for T2I alignment with far less data than traditional Bradley-Terry models.
C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences
cs.CL 2026-04 unverdicted novelty 6.0

C2 synthesizes contrastive helpful/misleading rubric pairs from binary preferences to train cooperative generators and critical verifiers, yielding up to 6.5-point gains on RM-Bench and enabling smaller models to matc...
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
cs.CV 2026-05 unverdicted novelty 5.0

Lens is a 3.8B-parameter text-to-image model that reaches competitive or superior performance to >6B-parameter systems using 19.3% of the training compute of Z-Image through a densely captioned 800M dataset, multi-res...