ParetoSlider conditions diffusion models on continuous preference weights to approximate the full Pareto front, providing dynamic control over multi-objective rewards at inference time.
hub
Flux.1 kontext: Flow matching for in-context image generation and editing in latent space,
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
ChArtist generates pictorial charts via a Diffusion Transformer using skeleton-based spatial control and reference-image subject control, supported by a new 30,000-triplet dataset and data accuracy metric.
LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
Do-Undo Bench is a new evaluation task and dataset that forces models to simulate forward action effects and then undo them to measure genuine action understanding in image generation.
MICo-150K is a new 150K-image dataset with 7 tasks, a De&Re real-image subset, MICo-Bench, and Weighted-Ref-VIEScore metric that improves AI models for generating consistent composites from arbitrary numbers of reference images.
The paper introduces a framework of four complementary analyses to evaluate the faithfulness of synthetic concept images from zero-shot T2I models versus real images for concept-based XAI.
Stepper uses stepwise panoramic expansion with a multi-view 360-degree diffusion model and geometry reconstruction to produce high-fidelity, structurally consistent immersive 3D scenes from text.
RenderFlow replaces iterative diffusion with flow matching for deterministic single-step neural rendering that achieves near real-time photorealistic quality and extends to inverse rendering via an adapter module.
Scone unifies subject understanding and generation in a two-stage trained model to improve both composition and distinction in multi-subject image generation, outperforming prior open-source models on new benchmarks.
The method warps pixels inside object boundaries with Snell's Law during generation and synchronizes with a second panorama image to produce optically plausible refraction in text-to-image outputs.
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.
GrOCE uses dynamic semantic graphs for online, training-free erasure of target concepts from diffusion model prompts via cluster identification and selective severing.
citing papers explorer
-
ParetoSlider: Diffusion Models Post-Training for Continuous Reward Control
ParetoSlider conditions diffusion models on continuous preference weights to approximate the full Pareto front, providing dynamic control over multi-objective rewards at inference time.
-
ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control
ChArtist generates pictorial charts via a Diffusion Transformer using skeleton-based spatial control and reference-image subject control, supported by a new 30,000-triplet dataset and data accuracy metric.
-
LooseRoPE: Content-aware Attention Manipulation for Semantic Harmonization
LooseRoPE modulates RoPE in diffusion attention maps to continuously trade off between preserving a pasted object's identity and harmonizing it with its new surroundings.
-
Do-Undo Bench: Reversibility for Action Understanding in Image Generation
Do-Undo Bench is a new evaluation task and dataset that forces models to simulate forward action effects and then undo them to measure genuine action understanding in image generation.
-
MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition
MICo-150K is a new 150K-image dataset with 7 tasks, a De&Re real-image subset, MICo-Bench, and Weighted-Ref-VIEScore metric that improves AI models for generating consistent composites from arbitrary numbers of reference images.
-
A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability
The paper introduces a framework of four complementary analyses to evaluate the faithfulness of synthetic concept images from zero-shot T2I models versus real images for concept-based XAI.
-
Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas
Stepper uses stepwise panoramic expansion with a multi-view 360-degree diffusion model and geometry reconstruction to produce high-fidelity, structurally consistent immersive 3D scenes from text.
-
RenderFlow: Single-Step Neural Rendering via Flow Matching
RenderFlow replaces iterative diffusion with flow matching for deterministic single-step neural rendering that achieves near real-time photorealistic quality and extends to inverse rendering via an adapter module.
-
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Scone unifies subject understanding and generation in a two-stage trained model to improve both composition and distinction in multi-subject image generation, outperforming prior open-source models on new benchmarks.
-
Refracting Reality: Generating Images with Realistic Transparent Objects
The method warps pixels inside object boundaries with Snell's Law during generation and synchronizes with a second panorama image to produce optically plausible refraction in text-to-image outputs.
-
SkyReels-Text: Fine-Grained Font-Controllable Text Editing for Poster Design
SkyReels-Text enables simultaneous fine-grained editing of multiple text regions in posters using arbitrary glyph patches for font control without labels or test-time fine-tuning.
-
GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models
GrOCE uses dynamic semantic graphs for online, training-free erasure of target concepts from diffusion model prompts via cluster identification and selective severing.