hub

GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment

Dhruba Ghosh, Hannaneh Hajishirzi, Luke Zettlemoyer · 2023 · arXiv 2310.11513

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

cs.CV · 2026-05-20 · conditional · novelty 7.0

RankE co-evolves AR policy and decoder via alternating ranking optimization, improving both FID and CLIP scores on LlamaGen-XL and Janus-Pro where policy-only RL degrades FID.

Beyond Accuracy: Benchmarking Cross-Task Consistency in Unified Multimodal Models

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

XTC-Bench reveals that strong performance on generation or understanding tasks in unified multimodal models does not guarantee cross-task semantic consistency, which instead depends on how tightly coupled the learning objectives are across modalities.

Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.

Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Process-driven image generation decomposes text-to-image synthesis into interleaved cycles of textual planning, visual drafting, textual reflection, and visual refinement with dense consistency supervision.

Reflective Flow Sampling Enhancement

cs.CV · 2026-03-06 · unverdicted · novelty 7.0

RF-Sampling enhances flow matching models by implicitly performing gradient ascent on text-image alignment scores via linear textual combinations and flow inversion.

Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching

cs.LG · 2025-09-26 · conditional · novelty 7.0

Derives exact guidance transition rates for discrete flow matching models that require only one model evaluation per sampling step and unify prior approximation-based methods.

Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

CLVR framework adds closed-loop visual verification, proxy prompt reinforcement learning, and delta-space weight merge to improve complex text-to-image generation over single-step or unverified multi-step baselines.

How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

cs.LG · 2026-04-29 · unverdicted · novelty 6.0 · 2 refs

FMRG is a training-free single-trajectory guidance framework for flow-based models that matches or exceeds baselines on reward-guided tasks and inverse problems using as few as 3 NFEs.

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

cs.CV · 2024-03-05 · conditional · novelty 6.0

Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.

Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding

cs.CV · 2026-04-15 · unverdicted · novelty 5.0

UniRect-CoT is a training-free rectification chain-of-thought framework that treats diffusion denoising as visual reasoning and uses the model's inherent understanding to align and correct intermediate generation results.

citing papers explorer

Showing 10 of 10 citing papers.

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution cs.CV · 2026-05-20 · conditional · none · ref 18
RankE co-evolves AR policy and decoder via alternating ranking optimization, improving both FID and CLIP scores on LlamaGen-XL and Janus-Pro where policy-only RL degrades FID.
Beyond Accuracy: Benchmarking Cross-Task Consistency in Unified Multimodal Models cs.CV · 2026-04-27 · unverdicted · none · ref 12
XTC-Bench reveals that strong performance on generation or understanding tasks in unified multimodal models does not guarantee cross-task semantic consistency, which instead depends on how tightly coupled the learning objectives are across modalities.
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization cs.CV · 2026-04-26 · unverdicted · none · ref 11
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning cs.CV · 2026-04-06 · unverdicted · none · ref 4
Process-driven image generation decomposes text-to-image synthesis into interleaved cycles of textual planning, visual drafting, textual reflection, and visual refinement with dense consistency supervision.
Reflective Flow Sampling Enhancement cs.CV · 2026-03-06 · unverdicted · none · ref 36
RF-Sampling enhances flow matching models by implicitly performing gradient ascent on text-image alignment scores via linear textual combinations and flow inversion.
Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching cs.LG · 2025-09-26 · conditional · none · ref 24
Derives exact guidance transition rates for discrete flow matching models that require only one model evaluation per sampling step and unify prior approximation-based methods.
Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning cs.CV · 2026-05-14 · unverdicted · none · ref 14
CLVR framework adds closed-loop visual verification, proxy prompt reinforcement learning, and delta-space weight merge to improve complex text-to-image generation over single-step or unverified multi-step baselines.
How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance cs.LG · 2026-04-29 · unverdicted · none · ref 61 · 2 links
FMRG is a training-free single-trajectory guidance framework for flow-based models that matches or exceeds baselines on reward-guided tasks and inverse problems using as few as 3 NFEs.
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis cs.CV · 2024-03-05 · conditional · none · ref 134
Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.
Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding cs.CV · 2026-04-15 · unverdicted · none · ref 4
UniRect-CoT is a training-free rectification chain-of-thought framework that treats diffusion denoising as visual reasoning and uses the model's inherent understanding to align and correct intermediate generation results.

GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer