{"total":39,"items":[{"citing_arxiv_id":"2606.27094","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learning Climate Variability from Scarce Data with Diffusion Models: A Test Case for ENSO","primary_cat":"physics.ao-ph","submitted_at":"2026-06-25T14:31:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Diffusion models recover known ENSO variability structure from synthetic LIM data when given enough samples, but require pre-training on CMIP6 plus fine-tuning to match observations with the ~700 samples available in ERSSTv5.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.08448","ref_index":21,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Multiscale Fourier Neural Operator for Inverse Wave Scattering in Highly Oscillatory Media","primary_cat":"math.NA","submitted_at":"2026-06-07T04:36:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MscaleFNO learns mappings from oscillatory media to wavefields for Helmholtz inverse problems and pairs it with diffusion regularization for partial-aperture 2D reconstructions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04165","ref_index":76,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CaloTrilogy: Toward a Breakthrough in One-Step, End-to-End, Physics-Guided Shower Generation for Modern Calorimeters","primary_cat":"hep-ex","submitted_at":"2026-06-02T19:27:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Presents CaloTrilogy, a unified one-step generative model for high-granularity calorimeter showers that combines velocity field integration, learned priors, and physics losses to match SOTA quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28200","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Geometry-First Generative Spatial Single-Cell Reconstruction","primary_cat":"cs.LG","submitted_at":"2026-05-27T09:24:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GEARS is a geometry-first generative framework that learns domain-invariant encoders and permutation-equivariant diffusion generators to reconstruct intrinsic 2D cell coordinates and distance matrices from unpaired scRNA-seq guided by ST.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16520","ref_index":118,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Global Convergence of Sampling-Based Nonconvex Optimization through Diffusion-Style Smoothing","primary_cat":"cs.LG","submitted_at":"2026-05-15T18:14:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Recasts sampling-based nonconvex optimization as smoothed gradient descent to obtain non-asymptotic convergence guarantees and introduces the DIDA annealed algorithm that converges to the global optimum.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"Then the following bounds hold for the raw moments and central moments: E[f(x) 2n]≤C(f) n L2ntn (114) m2n[f(x)] =E[|f(x)−E[f(x)]|2n]≤C(f) n L2ntn (115) E[(xf(x)) 2n]≤C(xf) n tn +C (xf) 2n L2nt2n (116) m2n[xf(x)] =E[|xf(x)−E[xf(x)]|2n]≤C(xf) n tn +C (xf) 2n L2nt2n (117) whereC (f) n andC (xf) n are constants depending only onn. Proof.We begin with the triangle inequality: for anyx,y∈Randn∈N, (x+y) n≤2n−1(xn +yn)(118) Applying this to the central moment ofxf(x): E[|xf(x)−E[xf(x)]|2n]≤22n−1(E[|xf(x)|2n] +|E[xf(x)]|2n)(119) For the first term, we use the fact thatfisL-Lipschitz, which means|f(x)−f(0)|≤L|x|. This implies: |f(x)|≤|f(0)|+L|x|(120) |xf(x)|≤|x|·|f(x)|≤|x|·(|f(0)|+L|x|) =|f(0)||x|+Lx2 (121) Therefore: (xf(x))2n≤22n−1(|f(0)|2n|x|2n +L 2nx4n)(122) E[|xf(x)|2n]≤22n−1(|f(0)|2nE[|x|2n] +L 2nE[x4n])(123)"},{"citing_arxiv_id":"2605.13910","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Covariance-aware sampling for Diffusion Models","primary_cat":"stat.ML","submitted_at":"2026-05-13T07:46:06+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A covariance-aware extension of DDIM sampling for pixel-space diffusion models that uses Tweedie's formula and Fourier decomposition to model reverse-process covariance and improves sample quality at low NFE.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12836","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Discrete Stochastic Localization for Non-autoregressive Generation","primary_cat":"cs.LG","submitted_at":"2026-05-13T00:12:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DSL provides a continuous embedding framework where one denoiser supports a family of SNR paths for discrete sequences, improving MAUVE scores on OpenWebText and allowing random-order and hybrid sampling from a fine-tuned MDLM checkpoint.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18829","ref_index":93,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Lossless Anti-Distillation Sampling","primary_cat":"cs.LG","submitted_at":"2026-05-12T21:34:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LADS is a sampling method that keeps benign user generations statistically identical to the original model while forcing correlated samples across a distiller's multiple accounts, provably worsening their generalization via uniform convergence bounds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12011","ref_index":59,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation","primary_cat":"physics.ins-det","submitted_at":"2026-05-12T12:00:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditional flow matching.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The resulting generation cost therefore depends not only on the Gflops of a single backbone forward pass, but also on the solver step count and the num- ber of backbone evaluations required per step. For a solver withSsteps, the fourth-order Runge-Kutta (RK4) method requires4Sbackbone evaluations, whereas Heun's method requires2S−1evaluations when the final correction step is omitted [59, 60]. Accordingly, backbone Gflops alone do not fully determine inference cost, and comparisons between published models should also take solver dependent evaluation counts into account. 4 CCD2 Experiments and Results ExperimentsinthisworkarecarriedoutonCaloChallengeDataset-2(CCD2)andDataset-3 (CCD3), two public regular-grid calorimeter benchmarks from the Fast Calorimeter Simula-"},{"citing_arxiv_id":"2605.09433","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Offline Preference Optimization for Rectified Flow with Noise-Tracked Pairs","primary_cat":"cs.CV","submitted_at":"2026-05-10T09:13:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"PNAPO augments preference data with prior noise pairs and uses straight-line interpolation to create a tighter surrogate objective for offline alignment of rectified flow models.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"However,with positive margins (indicating good training), increasing β conversely reduces the margin, yielding smaller updates.As training progresses, strong regularization grad- ually pulls the model back toward the reference model.This motivates our dynamic regularizationβ(δr, n): β(δr, n) =β·f(δr)·g(n).(14) Here training sample controller f must increase monoton- ically to 1, where δr∈[0,+∞) and training process con- troller g decays as a annealing factor. These are defined as: f(δr) = 2·σ(δr)−1 g(n) =    1,ifn≤n 1, 1 2 + 1 2 ·cos( 1 2 · n−n1 n2−n1 π),ifn 1 < n < n 2, 1 2 ,ifn≥n 2. (15) Here σ denotes the sigmoid function, n represents the train- ing step, and n1, n2 are user-defined thresholds. The func- tion f(δr) links β(δr, n) to reward difference δr: when the margin is negative, increasing δr raises β(δr, n) to acceler- ate training; otherwise, the opposite effect occurs. Mean- while, g(n) starts high in early training, then gradually de- creases forn > n 1, halving byn=n 2. 5. Experiments 5.1. Experimental Setup Implementation Details.We employ FLUX.1-dev (FLUX) and Stable Diffusion 3 Medium (SD3-M) as our rectified 5 Submission and Formatting Instructions for ICML 2026 Four cats surrounding a dog. A pineapple with one beer to its left and two beers on its right. A high-contrast photo of a panda riding a horse. ⋯ and the word \\\"PEACE\\\" are painted on the wall. ⋯ FLUX DPO-FLUX PNAPO-FLUX SD3 PNAPO-SD3DPO-SD3 Professional portrait of a Nepali woman sipping red wine, ultra-high resolution, photorealistic, intricate details ⋯ Figure 4.User Study and Qualitative Comparison.Top, human evaluations show PNAPO-FLUX significantly outperforming DPO- FLUX and the base FLUX model. Bottom, we present qualitative comparisons between PNAPO and Diffusion-DPO when applied to the FLUX and SD3-M. The results demonstrate that our model achieves superior image generation quality. flow models for T2I generation. For each model, we utilize 20,000 prompts from Diffus"},{"citing_arxiv_id":"2605.07907","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Consistency Regularised Gradient Flows for Inverse Problems","primary_cat":"stat.ML","submitted_at":"2026-05-08T15:45:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A consistency-regularized Euclidean-Wasserstein-2 gradient flow performs joint posterior sampling and prompt optimization in latent space for efficient low-NFE inverse problem solving with diffusion models.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"[2026] also derived a similar objective for the likelihood gradient, albeit using heuristic arguments. We summarise the updates in Algorithm 1. 6 4 Experiments Algorithm 1. Consistency-regularised Wasserstein Gradient Flow (CWGF) 1: Inputs: y, particles {z(n) 0 }N n=1, prompt c0 2: for k = 0, . . . , K − 1 do 3: Sample t = t(k) and ε(n) ∼ N (0, I ) 4: z(n) t ← √α tz(n) k + σtε(n) 5: ck+1 ← ck − ηc [∇cR (21) 6: πnm(t) ∝ q(z(n) t | z(m) k ), 7: ˆmµ,N t (z(n) t ) ← ∑ m πnm(t)z(m) k 8: ¯z(n) k ← z(n) k + ηR ( gθ(z(n) t , t, c k) − z(n) k ) + ηR ( z(n) k − ˆmµ,N t (z(n) t ) ) (19) 9: Compute m( ¯z(n) k ) as in ( 24) 10: ˆg( ¯z(n) k ; y) ← λΣ −1 ϕ− ( Eϕ− (m( ¯z(n) k )) − ¯z(n) k ) + λ ¯z(n) k (25) 11: z(n) k+1 ← ¯z(n) k + ηL ˆg( ¯z(n) k ; y) 12: end for 13: Outputs: {Dϕ− (z(n) K )}N n=1, cK Datasets and prior model. We evaluate our method on two high- quality datasets at resolution 512×512: FFHQ [ Karras et al. , 2019] and Ima- geNet [ Deng et al. , 2009]. For FFHQ, we use the ﬁrst 1k test images, as in Chung et al. [2023]; for ImageNet, we use the cval1k validation set introduced by Larsson et al. [2016]. Our prior is the LCM-LoRA [ Luo et al. , 2023b] dis- tilled from Stable Diﬀusion 1.5. This is the same latent diﬀusion backbone used to evaluate diﬀusion-based base-"},{"citing_arxiv_id":"2605.06829","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A Unified Measure-Theoretic View of Diffusion, Score-Based, and Flow Matching Generative Models","primary_cat":"cs.LG","submitted_at":"2026-05-07T18:32:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Diffusion, score-based, and flow matching models are unified as instances of learning time-dependent vector fields inducing marginal distributions governed by continuity and Fokker-Planck equations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"densityq(x t |x 0) is Gaussian and its score with respect tox t is available in closed form: ∇xt logq(x t |x 0) =− 1 s(t)2 \u0010 xt −m(t)x 0 \u0011 .(15) DSM then regresses the model scores θ(xt, t) toward (15) in expectation over (x 0, t, ε). 5.2 DDPM training as noise prediction (and score prediction) DDPMs are often trained by predicting the noiseεin the reparameterization (9): Lε(θ) =E t,x0,ε h ε−ε θ(xt, t) 2i , x t = √¯αtx0 + √ 1−¯αtε,(16) possibly with time-dependent weights (Ho et al., 2020). This objective is equivalent to score regression under a change of variables: for the Gaussian kernel (8), the conditional score (15) can be expressed in terms ofε, and a predictorε θ induces a score model via sθ(xt, t)≈ − 1√1−¯αt εθ(xt, t) (up to known scalings). (17) Thus \"noise prediction\" and \"score prediction\" are largely different parameterizations of the"},{"citing_arxiv_id":"2605.04569","ref_index":193,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention","primary_cat":"cs.CV","submitted_at":"2026-05-06T07:15:29+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03712","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Tempered Guided Diffusion","primary_cat":"stat.ML","submitted_at":"2026-05-05T13:00:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Tempered Guided Diffusion uses annealed SMC to produce consistent particle approximations to the posterior for training-free conditional diffusion sampling, outperforming independent guided trajectories in experiments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26503","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Delta Score Matters! Spatial Adaptive Multi Guidance in Diffusion Models","primary_cat":"cs.CV","submitted_at":"2026-04-29T10:08:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SAMG uses spatially adaptive guidance scales derived from a geometric analysis of classifier-free guidance to resolve the detail-artifact dilemma in diffusion-based image and video generation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"ISBN 978-1-4612-0919-5. doi: 10.1007/ 978-1-4612-0919-5 25. URLhttps://doi.org/10.1007/978-1-4612-0919-5_25. [9] Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance, 2022. URLhttps://arxiv. org/abs/2207.12598. [10] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models, 2020. URLhttps://arxiv.org/abs/2006.11239. [11] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models, 2022. URLhttps://arxiv.org/abs/2206.00364. [12] Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation, 2023. URL"},{"citing_arxiv_id":"2604.25608","ref_index":1,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The Physical Limit of Neural Hypoxia Detection in the Black Sea from Satellite Observations","primary_cat":"physics.ao-ph","submitted_at":"2026-04-28T13:13:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Neural networks can detect 38% of summer hypoxic events shelf-wide from satellites with 47% precision, but only within the homogeneous mixed layer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23536","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models","primary_cat":"cs.CV","submitted_at":"2026-04-26T05:16:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Wu, Songning Lai, Bowen Tian, and Yutao Yue. Physics-informed representation alignment for sparse radio-map reconstruction, 2025. URLhttps://arxiv.org/abs/2501.19160. [14] Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Direct inversion: Boosting diffusion-based editing with 3 lines of code, 2023. URLhttps://arxiv.org/abs/2310.01506. 23 [15] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models, 2022. URLhttps://arxiv.org/abs/2206.00364. [16] Diederik P. Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models, 2023. URLhttps://arxiv.org/abs/2107.00630. [17] Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy."},{"citing_arxiv_id":"2604.17734","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Score-Based Matching with Target Guidance for Cryo-EM Denoising","primary_cat":"cs.CV","submitted_at":"2026-04-20T02:46:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Score-based denoising with reference-density guidance improves particle-background separability and downstream 3D reconstruction consistency on cryo-EM datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13028","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Conflated Inverse Modeling to Generate Diverse and Temperature-Change Inducing Urban Vegetation Patterns","primary_cat":"cs.CV","submitted_at":"2026-04-14T17:58:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A diffusion generative inverse model conditioned on temperature targets produces diverse, physically plausible urban vegetation patterns that achieve specified regional temperature shifts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10465","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Rethinking the Diffusion Model from a Langevin Perspective","primary_cat":"cs.LG","submitted_at":"2026-04-12T05:18:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Diffusion models are reorganized under a Langevin perspective that unifies ODE and SDE formulations and shows flow matching is equivalent to denoising under maximum likelihood.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09041","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster","primary_cat":"cs.LG","submitted_at":"2026-04-10T07:02:20+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03303","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Downscaling weather forecasts from Low- to High-Resolution with Diffusion Models","primary_cat":"physics.ao-ph","submitted_at":"2026-03-30T09:38:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A conditional diffusion model downscales global atmospheric forecasts from 100 km to 30 km resolution while improving probabilistic skill, matching power spectra, and preserving physical relationships.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.26571","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow","primary_cat":"cs.CV","submitted_at":"2026-03-27T16:33:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GVCC achieves the lowest LPIPS on UVG at bitrates down to 0.003 bpp by encoding stochastic innovations in a marginal-preserving stochastic process derived from a pretrained rectified-flow video model, with 65% LPIPS reduction over DCVC-RT.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2603.13419","ref_index":36,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Diffusion Models Memorize in Training -- and Generalize in Inference","primary_cat":"cs.LG","submitted_at":"2026-03-12T21:02:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.23748","ref_index":49,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Review of Diffusion-based Simulation-Based Inference: Foundations and Applications in Non-Ideal Data Scenarios","primary_cat":"cs.LG","submitted_at":"2025-12-26T18:18:25+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A synthesis of diffusion-based simulation-based inference methods that address model misspecification, irregular observations, and missing data in scientific applications.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.13255","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"B\\'ezierFlow: Learning B\\'ezier Stochastic Interpolant Schedulers for Few-Step Generation","primary_cat":"cs.LG","submitted_at":"2025-12-15T12:09:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Béz ierFlow parameterizes stochastic interpolant schedulers as Béz ier functions to learn optimal sampling trajectories, achieving 2-3x better few-step performance than prior timestep optimization methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.03015","ref_index":16,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Discrete Bayesian Sample Inference for Graph Generation","primary_cat":"cs.LG","submitted_at":"2025-11-04T21:25:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GraphBSI uses Bayesian Sample Inference as noise-controlled SDEs to generate discrete graphs in one shot, achieving state-of-the-art results on molecular benchmarks Moses and GuacaMol.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.16344","ref_index":63,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Diff-ANO: Towards Fast High-Resolution Ultrasound Computed Tomography via Conditional Consistency Models and Adjoint Neural Operators","primary_cat":"math.NA","submitted_at":"2025-07-22T08:24:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Diff-ANO uses conditional consistency models and adjoint neural operator surrogates to enable fast, high-quality USCT reconstructions under sparse and partial views by replacing slow PDE solvers and enabling few-step sampling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.16827","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Blur: A Fluid Perspective on Generative Diffusion Models","primary_cat":"cs.GR","submitted_at":"2025-06-20T08:31:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Proposes an advection-diffusion PDE corruption process with stochastic velocity fields and Lattice Boltzmann solver for diffusion models, generalizing prior PDE methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2503.03206","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models","primary_cat":"cs.LG","submitted_at":"2025-03-05T05:50:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Analytic solution of full-batch gradient flow for linear and convolutional denoisers in diffusion models yields a universal inverse-variance spectral law for learning times of eigenmodes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2403.03206","ref_index":145,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Scaling Rectified Flow Transformers for High-Resolution Image Synthesis","primary_cat":"cs.CV","submitted_at":"2024-03-05T18:45:39+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2311.15127","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets","primary_cat":"cs.CV","submitted_at":"2023-11-25T22:28:38+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Stable Video Diffusion scales latent video diffusion models via text-to-image pretraining, video pretraining on curated data, and high-quality finetuning to produce competitive text-to-video and image-to-video results while enabling motion LoRA and multi-view 3D applications.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"tion the model on a text prompt [9, 97] or make use of an additional text-to-image prior [23, 82]. In our work, we follow the former approach and show that the resulting model is a strong general motion prior, which can easily be finetuned into an image-to-video or multi-view synthesis model. Additionally, we introduce micro-conditioning [64] on frame rate. We also employ the EDM-framework [51] and significantly shift the noise schedule towards higher noise values, which we find to be essential for high-resolution finetuning. See Section 4 for a detailed discussion of the latter. Data Curation Pretraining on large-scale datasets [80] is an essential ingredient for powerful models in several tasks such as discriminative text-image [66, 105] and lan-"},{"citing_arxiv_id":"2307.01952","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","primary_cat":"cs.CV","submitted_at":"2023-07-04T23:04:57+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SDXL improves upon prior Stable Diffusion versions through a larger UNet backbone, dual text encoders, novel conditioning, and a refinement model, producing higher-fidelity images competitive with black-box state-of-the-art generators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2305.02463","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Shap-E: Generating Conditional 3D Implicit Functions","primary_cat":"cs.CV","submitted_at":"2023-05-03T23:59:13+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Shap-E encodes 3D assets into implicit function parameters then uses a conditional diffusion model to generate new ones from text, enabling fast multi-representation 3D asset creation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2304.12906","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Score-Difference Flow for Implicit Generative Modeling","primary_cat":"cs.LG","submitted_at":"2023-04-25T15:21:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Score-difference flow reduces KL divergence between distributions and is formally equivalent to denoising diffusion models and a hidden subproblem in optimal GAN training under stated conditions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2303.04137","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Diffusion Policy: Visuomotor Policy Learning via Action Diffusion","primary_cat":"cs.RO","submitted_at":"2023-03-07T18:50:03+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Diffusion Policy models robot actions as a conditional diffusion process, outperforming prior state-of-the-art methods by 46.9% on average across 12 manipulation tasks from four benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2211.15089","ref_index":41,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Continuous diffusion for categorical data","primary_cat":"cs.CL","submitted_at":"2022-11-28T06:08:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper proposes CDCD, a continuous-time and continuous-space diffusion framework for categorical data, and reports results on language modeling tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2210.02303","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Imagen Video: High Definition Video Generation with Diffusion Models","primary_cat":"cs.CV","submitted_at":"2022-10-05T14:41:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Imagen Video generates high-definition text-conditional videos via a cascade of base and super-resolution diffusion models, achieving high fidelity and controllability.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2209.03003","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow","primary_cat":"cs.LG","submitted_at":"2022-09-07T08:59:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Rectified flow learns straight-path neural ODEs for distribution transport, yielding efficient generative models and domain transfers that work well even with a single simulation step.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"times models is the high computational cost in inference time: drawing a single point (e.g., image) requires to solve the ODE/SDE with a numerical solver that needs to repeatedly call the expensive neural drift function. In addition, the existing denoising diffusion techniques require substantial hyper-parameter search in an involved design space and are still poorly understood both empirically and theoretically [29]. In existing approaches, generative modeling and domain transfer are typically treated separately. It often requires to extend or customize a generative learning techniques to solve domain transfer problems; see e.g., Cycle GAN [100] and diffusion-based image-to-image translation [e.g., 75, 97]. One framework that naturally uniﬁes both domains is optimal transport (OT) [e."}],"limit":50,"offset":0}