DiSI disentangles stochastic interpolants into separate generation and regression paths, allowing controllable transitions between regression and generative image restoration with a unified few-step sampler.
In: Proceedings of the IEEE/CVF international conference on computer vision
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9roles
background 1polarities
background 1representative citing papers
KVPO aligns streaming autoregressive video generators with human preferences via ODE-native GRPO, using KV cache for semantic exploration and TVE for velocity-based policy modeling, yielding gains in quality and alignment.
Rule-VLN is the first large-scale benchmark injecting 177 regulatory categories into an urban environment, and the proposed SNRM module equips pre-trained VLN agents with zero-shot semantic reasoning and detour planning to reduce constraint violations by 19.26% and improve task completion.
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
RTR-DiT distills a bidirectional DiT teacher into an autoregressive few-step model using Self Forcing and Distribution Matching Distillation, plus a reference-preserving KV cache, to enable stable real-time text- and reference-guided video stylization.
OT-NFM parameterizes the flow map directly with neural flows and uses optimal transport for consistent noise-data couplings to achieve ODE-free one-step generation while avoiding mean collapse.
Latent diffusability is quantified by decomposing the MMSE rate along diffusion trajectories into Fisher Information and Fisher Information Rate, with three geometric penalties (dimensional compression, tangential distortion, curvature injection) identified as sources of failure.
Structured state-space regularization induces spectral structure in image tokenizer latent spaces via an SSM-derived objective, improving generative performance with minimal reconstruction loss.
citing papers explorer
-
Disentangling Generation and Regression in Stochastic Interpolants for Controllable Image Restoration
DiSI disentangles stochastic interpolants into separate generation and regression paths, allowing controllable transitions between regression and generative image restoration with a unified few-step sampler.
-
KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration
KVPO aligns streaming autoregressive video generators with human preferences via ODE-native GRPO, using KV cache for semantic exploration and TVE for velocity-based policy modeling, yielding gains in quality and alignment.
-
Rule-VLN: Bridging Perception and Compliance via Semantic Reasoning and Geometric Rectification
Rule-VLN is the first large-scale benchmark injecting 177 regulatory categories into an urban environment, and the proposed SNRM module equips pre-trained VLN agents with zero-shot semantic reasoning and detour planning to reduce constraint violations by 19.26% and improve task completion.
-
Human Cognition in Machines: A Unified Perspective of World Models
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.
-
DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer
RTR-DiT distills a bidirectional DiT teacher into an autoregressive few-step model using Self Forcing and Distribution Matching Distillation, plus a reference-preserving KV cache, to enable stable real-time text- and reference-guided video stylization.
-
ODE-free Neural Flow Matching for One-Step Generative Modeling
OT-NFM parameterizes the flow map directly with neural flows and uses optimal transport for consistent noise-data couplings to achieve ODE-free one-step generation while avoiding mean collapse.
-
Understanding Latent Diffusability via Fisher Geometry
Latent diffusability is quantified by decomposing the MMSE rate along diffusion trajectories into Fisher Information and Fisher Information Rate, with three geometric penalties (dimensional compression, tangential distortion, curvature injection) identified as sources of failure.
-
Structured State-Space Regularization for Generation-Friendly Image Tokenization
Structured state-space regularization induces spectral structure in image tokenizer latent spaces via an SSM-derived objective, improving generative performance with minimal reconstruction loss.
- D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models