IPR improves valid solution rates on MNIST Sudoku from 55.8% to 75.0% by iteratively refining partial regions in sequential diffusion models without external verifiers or reward models.
hub
arXiv preprint arXiv:2408.08252 , year =
12 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
MSDDA derives a closed-form optimal reverse denoising distribution for multi-objective diffusion alignment that is exactly equivalent to step-level RL fine-tuning with no approximation error.
CliqueFlowmer combines clique-based model-based optimization with transformer and flow models to generate materials that optimize target properties better than generative baselines.
Derives exact guidance transition rates for discrete flow matching models that require only one model evaluation per sampling step and unify prior approximation-based methods.
PG-DLM applies particle Gibbs sampling over full trajectories in diffusion language models to enable iterative refinement, yielding higher accuracy on reward-guided generation with theoretical convergence guarantees.
URGE performs unbiased inference-time scaling for diffusion models by attaching multiplicative path weights from Girsanov estimation and resampling trajectories, with a proven equivalence to prior particle-wise SMC schemes.
LPDP adds a local re-solving operator to edit-flow DNA generators so that reward signals can guide insertions, deletions, and substitutions without retraining.
dFlowGRPO is a new rate-aware RL method for discrete flow models that outperforms prior GRPO approaches on image generation and matches continuous flow models while supporting broad probability paths.
DAG-STL decomposes long-horizon STL planning into decomposition, timed waypoint allocation, and diffusion-based trajectory generation to enable zero-shot planning under unknown dynamics.
VASR separates continuation and residual variance in reward-guided diffusion SMC, using optimal mass allocation and systematic resampling to achieve up to 26% better FID scores and faster runtimes than prior SMC and MCTS methods.
Proposes Latent Interacting Particle Systems with an efficient parameterization of twist potentials to enable approximate posterior inference for coupled continuous-time hidden Markov models via twisted sequential Monte Carlo, demonstrated on a latent SIRS graph model and real wildfire data.
RAPIDDS unifies task-level and motion-level adaptation in human-robot teaming by modeling individualized spatial and temporal behaviors across multiple cycles and jointly optimizing schedules and diffusion-based motions.
citing papers explorer
-
LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows
LPDP adds a local re-solving operator to edit-flow DNA generators so that reward signals can guide insertions, deletions, and substitutions without retraining.