Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
hub Baseline reference
Pick-a-pic: An open dataset of user preferences for text-to-image generation
Baseline reference. 50% of citing Pith papers use this work as a benchmark or comparison.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
A new adjoint matching framework formulates flow model alignment as optimal control, enabling direct regression training and terminal-trajectory truncation for efficiency gains on models like SiT-XL and FLUX.
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.
SAMG uses spatially adaptive guidance scales derived from a geometric analysis of classifier-free guidance to resolve the detail-artifact dilemma in diffusion-based image and video generation.
MaskAttn-SDXL adds token-conditioned spatial gating to SDXL cross-attention to sparsify irrelevant token-to-location bindings and improve region-level controllability without retraining or inference edits.
DRaFT fine-tunes diffusion models by differentiating through sampling to maximize rewards, outperforming RL baselines and improving aesthetics on Stable Diffusion 1.4.
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.
HPD v2 is the largest human preference dataset for text-to-image images with 798k choices, and HPS v2 is the resulting CLIP-based scorer that better predicts human judgments and responds to model improvements.
Diff.-NPO frames diffusion alignment as a self-play game reaching Nash equilibrium and reports better text-to-image results than prior DPO-style methods.
ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.
Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image model outputs.
A literature survey that organizes diffusion model alignment methods along five axes (feedback source, reward form, optimization mechanism, distribution shift handling, and explicit safety constraints) and identifies open challenges for reliable deployment.
citing papers explorer
-
Functionalization via Structure Completion and Motion Rectification
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
-
Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline
A new adjoint matching framework formulates flow model alignment as optimal control, enabling direct regression training and terminal-trajectory truncation for efficiency gains on models like SiT-XL and FLUX.
-
Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization
Oracle Noise optimizes diffusion model noise on a Riemannian hypersphere guided by key prompt words to preserve the Gaussian prior, eliminate norm inflation, and achieve faster semantic alignment than Euclidean methods.
-
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
-
Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models
Matched benchmarking reveals FID misleads in few-step regimes under CFG, prompting CLIP-scaled and PickScore-scaled FID and IS variants for better semantic evaluation of one-step image generators.
-
Delta Score Matters! Spatial Adaptive Multi Guidance in Diffusion Models
SAMG uses spatially adaptive guidance scales derived from a geometric analysis of classifier-free guidance to resolve the detail-artifact dilemma in diffusion-based image and video generation.
-
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation
MaskAttn-SDXL adds token-conditioned spatial gating to SDXL cross-attention to sparsify irrelevant token-to-location bindings and improve region-level controllability without retraining or inference edits.
-
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
DRaFT fine-tunes diffusion models by differentiating through sampling to maximize rewards, outperforming RL baselines and improving aesthetics on Stable Diffusion 1.4.
-
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.
-
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
HPD v2 is the largest human preference dataset for text-to-image images with 798k choices, and HPS v2 is the resulting CLIP-based scorer that better predicts human judgments and responds to model improvements.
-
Towards General Preference Alignment: Diffusion Models at Nash Equilibrium
Diff.-NPO frames diffusion alignment as a self-play game reaching Nash equilibrium and reports better text-to-image results than prior DPO-style methods.
-
ACPO: Anchor-Constrained Perceptual Optimization for Diffusion Models with No-Reference Quality Guidance
ACPO uses anchor-based regularization with NR-IQA guidance to enable stable perceptual quality improvements in diffusion model fine-tuning.
-
Evaluating AI-Generated Images of Cultural Artifacts with Community-Informed Rubrics
Case studies with blind UK residents and people from Kerala and Tamil Nadu demonstrate that community input at the systematization stage produces culturally grounded definitions of appropriateness for text-to-image model outputs.
-
Alignment and Safety of Diffusion Models via Reinforcement Learning and Reward Modeling: A Survey
A literature survey that organizes diffusion model alignment methods along five axes (feedback source, reward form, optimization mechanism, distribution shift handling, and explicit safety constraints) and identifies open challenges for reliable deployment.