arxiv: 2604.24959 · v1 · submitted 2026-04-27 · 💻 cs.LG · stat.ML

Recognition: unknown

CoreFlow: Low-Rank Matrix Generative Models

Dongze Wu , Linglingzhi Zhu , Yao Xie

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:04 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords low-rank matricesgenerative modelingcontinuous normalizing flowsincomplete datamatrix distributionssubspace separationfew-sample learning

0 comments

The pith

CoreFlow improves matrix generation quality in few-sample regimes by flowing only on shared low-rank cores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

High-dimensional matrix distributions are difficult to model when the number of samples is small because full-dimensional approaches become both computationally costly and statistically unreliable. CoreFlow first identifies the shared row and column subspaces that encode the common geometry across the distribution, then trains a continuous normalizing flow exclusively on the much smaller core space induced by those subspaces. This separation keeps the matrix structure intact, reduces the effective dimension dramatically, and produces better spectral and moment matching in limited-data settings while also supporting training on incomplete matrices through masked Riemannian updates. The same model stays competitive when abundant data is available and continues to work under compression to only 9 percent of the original dimension.

Core claim

CoreFlow learns shared row and column subspaces across the matrix distribution and restricts a continuous normalizing flow to the low-dimensional core they induce, using masked Riemannian updates to accommodate incomplete training matrices; this yields substantially better spectral and moment-level generation quality in few-sample regimes and remains competitive when data is plentiful.

What carries the argument

The low-rank core obtained by projecting matrices onto learned shared row and column subspaces, on which a continuous normalizing flow is trained while preserving matrix geometry.

If this is right

Substantially improves spectral and moment-level generation quality in few-sample regimes
Remains competitive in data-rich settings
Supports compression to 9% of ambient dimension
Handles up to 40% missing training entries via masked updates
Preserves matrix structure throughout the generative process

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may generalize to other high-dimensional structured data with latent low-rank factors, such as tensors or covariance matrices.
It implies that generative modeling can often exploit shared geometry to avoid the curse of dimensionality in matrix spaces.
One could test extensions by applying the core-flow idea to diffusion models instead of normalizing flows.
The method suggests practical scaling benefits for applications like image or video generation where data can be viewed as matrices.

Load-bearing premise

The matrices in the distribution share common low-rank row and column subspaces that capture the essential geometry separate from sample-specific details.

What would settle it

A benchmark dataset of high-dimensional matrices drawn from a distribution without shared low-rank subspaces, on which CoreFlow shows no improvement or degradation relative to standard ambient-space generative models in the few-sample regime.

Figures

Figures reproduced from arXiv: 2604.24959 by Dongze Wu, Linglingzhi Zhu, Yao Xie.

**Figure 1.** Figure 1: Overview of CoreFlow. Left: learning shared row/column subspaces (U ∗ , V ∗ ) and constructing induced low-dimensional cores {Si}. Middle: learning a continuous normalizing flow in the induced core space, matching the Stage-I cores {Si} ∼ p ∗ S to a Gaussian base. Right: generation starts from Gaussian noise, reverses the learned flow to obtain a generated core Sˆ i , and decodes it with (U ∗ , V ∗ ) to pr… view at source ↗

**Figure 2.** Figure 2: True samples (left) and CoreFlow-generated samples (right) on real and synthetic benchmarks. The close visual agreement shows that CoreFlow captures the matrix distributions well even under aggressive dimensionality reduction and substantial training missingness. LSPF (80×80, data-rich). On LSPF, where training data are abundant, the gap naturally narrows. Nevertheless, view at source ↗

**Figure 3.** Figure 3: Performance–efficiency tradeoff on Solar among generative models. The horizontal axis is the learneddimension ratio as a proxy for generative-model cost, and the vertical axis is test singular-value discrepancy (lower is better). Full training times are reported in Appendix C.4. 10 view at source ↗

**Figure 4.** Figure 4: Patchified vs. original matrices (Solar). Patchification (Eq. (24)) reshapes local patches into rows of a patch-matrix. In the patchified view (left), many rows/columns become visually near-repetitions due to recurring local textures, while the original matrices (right) remain globally diverse. 1 2 3 4 5 6 7 8 9 10 Rank index i 10 3 10 2 10 1 10 0 N orm aliz e d eig e n v alu e i/ 1 Solar: Normalized eigen… view at source ↗

**Figure 5.** Figure 5: Consistent rank reduction via patchification. Normalized spectra (λi/λ1) show that the patchified representation decays faster: for i ≥ 2, the patchified curve lies consistently below the original, indicating that more energy concentrates in the leading components and the representation has a lower effective rank. Each matrix is cropped to Hc × Wc, divided into non-overlapping p × p patches, and rearranged… view at source ↗

**Figure 6.** Figure 6: Case Blobs view at source ↗

**Figure 8.** Figure 8: 26 view at source ↗

**Figure 8.** Figure 8: Case Waves view at source ↗

**Figure 10.** Figure 10: Comparisons of generated samples on Solar Flare with view at source ↗

**Figure 11.** Figure 11: Comparison of generated samples across methods. view at source ↗

read the original abstract

Learning matrix-valued distributions from high-dimensional and possibly incomplete training data is challenging: ambient-space generative modeling is computationally expensive and statistically fragile when the matrix dimension is large but the sample size is limited. We propose CoreFlow, a geometry-preserving low-rank flow model that learns shared row/column subspaces across the matrix distribution, and then trains a continuous normalizing flow only on the induced low-dimensional core. CoreFlow is designed for settings where shared low-rank matrix geometry is present, especially in high-dimensional limited-sample regimes. This separates shared matrix geometry from sample-specific variation, preserves matrix structure, and substantially improves training efficiency. The same framework also handles incomplete training matrices through masked Riemannian updates and iterative completion. Across real and synthetic benchmarks, CoreFlow substantially improves spectral and moment-level generation quality in few-sample regimes while remaining competitive in data-rich settings, even under compression to 9% of the ambient dimension and with up to 40% missing training entries.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoreFlow's low-rank core flow for matrix generation is a coherent idea that targets real efficiency issues in low-sample regimes, but the gains hinge on an unquantified shared-geometry assumption that lacks ablation support.

read the letter

CoreFlow learns shared row and column subspaces across a matrix distribution, then runs a continuous normalizing flow only on the resulting low-dimensional core while using masked Riemannian updates for incomplete entries. This setup aims to preserve matrix structure and cut computational costs when ambient dimensions are high but samples are few. The abstract positions it specifically for cases where that shared low-rank geometry exists. What looks new is the explicit separation of shared subspaces from sample-specific variation combined with the flow on the induced core, plus the incomplete-data extension. I have not seen this exact pairing framed this way in prior matrix generative work. The approach does address a practical bottleneck: ambient-space flows become expensive and statistically unreliable under sample constraints, and the claimed improvements in spectral and moment quality under compression to 9% dimension or 40% missing entries would be useful if they hold. The paper also notes competitiveness in data-rich settings, which suggests the method does not sacrifice performance when samples are plentiful. The central soft spot is the untested necessity of the shared low-rank geometry. The strongest claims rest on substantial gains in few-sample regimes, yet there is no reported ablation or counter-example showing how much those gains degrade when the geometry is weak or absent. Without that, it remains possible that any dimensionality reduction would deliver similar benefits and that the flow component itself is not the driver. The abstract supplies no derivation details, error bars, or baseline comparisons, so the experimental support cannot be fully assessed from what is visible. This paper is for researchers working on generative models for structured matrix data in statistics or machine learning, especially those facing high-dimensional limited-sample problems or missing entries. A reader interested in efficient flows or Riemannian handling of incomplete matrices would extract usable ideas even if they adapt the method. The work shows clear thinking on its own terms and engages honestly with the limitations of ambient-space modeling, so it deserves a serious referee. I recommend sending it to peer review, with the expectation that the authors add ablations on the low-rank assumption and fuller experimental controls.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CoreFlow, a geometry-preserving low-rank generative model for matrix-valued distributions. It learns shared row/column subspaces across the training matrices, projects to an induced low-dimensional core, and trains a continuous normalizing flow only on that core; the framework also incorporates masked Riemannian updates to handle up to 40% missing entries. The central empirical claim is that CoreFlow yields substantial gains in spectral and moment-level generation quality under few-sample regimes and strong compression (down to 9% ambient dimension), while remaining competitive in data-rich settings.

Significance. If the empirical claims are robustly supported, the work would be significant for generative modeling of structured high-dimensional data (e.g., images, covariance matrices, recommender data) where sample size is limited relative to matrix dimension. By explicitly separating shared low-rank geometry from sample-specific variation and restricting the flow to the core, it offers a principled route to both statistical efficiency and computational tractability that standard ambient-space flows lack.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: the central claim of 'substantial' spectral/moment improvements in few-sample regimes rests on the design assumption that shared low-rank matrix geometry is present and separable. No ablation, counter-example, or quantitative sensitivity analysis is described that measures how much the reported gains degrade when this geometry is weak or absent; without such evidence it remains unclear whether the core-flow construction itself drives the edge or whether any dimensionality reduction would suffice.
[Method] Method section (presumably §3): the description of how the shared subspaces are estimated and how the continuous normalizing flow is defined on the core is not accompanied by any derivation showing that the overall procedure is parameter-free or that performance metrics are independent of the subspace estimation step. If the reported metrics are computed after fitting the subspaces to the same data used for generation, the circularity concern noted in the reader report applies directly to the load-bearing claims.

minor comments (2)

[Abstract] The abstract states improvements 'across real and synthetic benchmarks' but supplies no table or figure references; the full manuscript should include explicit baseline comparisons, error bars, and ablation tables so that the magnitude of the gains can be assessed.
[Method] Notation for the core projection and the masked Riemannian update should be introduced with a short equation or diagram early in the method section to improve readability for readers unfamiliar with Riemannian optimization on matrix manifolds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and outline revisions to strengthen the manuscript's clarity and empirical support.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim of 'substantial' spectral/moment improvements in few-sample regimes rests on the design assumption that shared low-rank matrix geometry is present and separable. No ablation, counter-example, or quantitative sensitivity analysis is described that measures how much the reported gains degrade when this geometry is weak or absent; without such evidence it remains unclear whether the core-flow construction itself drives the edge or whether any dimensionality reduction would suffice.

Authors: The CoreFlow framework is explicitly designed for distributions exhibiting shared low-rank matrix geometry, as stated in the abstract and introduction. We agree that an explicit sensitivity analysis would strengthen the claims. In the revised manuscript we will add a controlled synthetic experiment in which we vary the strength of the shared row/column subspace alignment (by modulating subspace overlap and injecting isotropic noise) and report the resulting degradation in spectral and moment metrics. We will also include a baseline that applies generic dimensionality reduction (e.g., PCA on vectorized matrices) followed by an ambient-space flow, thereby isolating the contribution of the geometry-preserving core construction from generic compression. revision: yes
Referee: [Method] Method section (presumably §3): the description of how the shared subspaces are estimated and how the continuous normalizing flow is defined on the core is not accompanied by any derivation showing that the overall procedure is parameter-free or that performance metrics are independent of the subspace estimation step. If the reported metrics are computed after fitting the subspaces to the same data used for generation, the circularity concern noted in the reader report applies directly to the load-bearing claims.

Authors: We do not claim the procedure is parameter-free; the core dimension is a tunable hyperparameter selected via explained-variance heuristics or cross-validation on the training set, as described in the experimental protocol. Subspace estimation is performed on the training matrices and the flow is subsequently trained on the resulting cores; this is standard unsupervised practice. Generation quality is assessed on held-out test matrices using distribution-level metrics (spectral norms and moment distances) that do not reuse the fitted subspaces directly. In revision we will expand §3 with an explicit derivation of the composite training objective, clarifying the separation between subspace learning and the core flow, and we will restate the evaluation protocol to address any perceived circularity. revision: partial

Circularity Check

0 steps flagged

No circularity identified; derivation chain not inspectable from provided text

full rationale

The abstract and description present CoreFlow at a conceptual level—learning shared row/column subspaces then applying a continuous normalizing flow on the induced low-dimensional core—without any equations, parameter-fitting steps, self-citations, or derivation chains. No load-bearing claim reduces by construction to a fitted input or self-referential definition, as no mathematical steps are visible to analyze. The method's design target (presence of shared low-rank geometry) is stated explicitly but not derived from prior results within the text, leaving the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit equations or sections, so free parameters, axioms, and invented entities cannot be extracted; the central claim rests on the unverified assumption of shared low-rank geometry.

pith-pipeline@v0.9.0 · 5457 in / 1123 out tokens · 56264 ms · 2026-05-08T04:04:46.262214+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 11 canonical work pages · 5 internal anchors

[1]

Princeton University Press, 2008

P-A Absil, Robert Mahony, and Rodolphe Sepulchre.Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008

2008
[2]

, Cottet, V

Pierre Alquier, Vincent Cottet, Nicolas Chopin, and Judith Rousseau. Bayesian matrix comple- tion: prior specification.arXiv preprint arXiv:1406.1440, 2014. 11

work page arXiv 2014
[3]

A non-local algorithm for image denoising

Antoni Buades, Bartomeu Coll, and J-M Morel. A non-local algorithm for image denoising. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 2, pages 60–65. Ieee, 2005

2005
[4]

Exact matrix completion via convex optimization

Emmanuel Candes and Benjamin Recht. Exact matrix completion via convex optimization. Communications of the ACM, 55(6):111–119, 2012

2012
[5]

The power of convex relaxation: Near-optimal matrix completion.IEEE Transactions on Information Theory, 56(5):2053–2080, 2010

Emmanuel J Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix completion.IEEE Transactions on Information Theory, 56(5):2053–2080, 2010

2053
[6]

Neural ordinary differential equations.Advances in Neural Information Processing Systems, 31, 2018

Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations.Advances in Neural Information Processing Systems, 31, 2018

2018
[7]

Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019

Yuejie Chi, Yue M Lu, and Yuxin Chen. Nonconvex optimization meets low-rank matrix factorization: An overview.IEEE Transactions on Signal Processing, 67(20):5239–5269, 2019

2019
[8]

Inference for low-rank completion without sample splitting with application to treatment effect estimation.Journal of Econometrics, 240 (1):105682, 2024

Jungjun Choi, Hyukjun Kwon, and Yuan Liao. Inference for low-rank completion without sample splitting with application to treatment effect estimation.Journal of Econometrics, 240 (1):105682, 2024

2024
[9]

Era5 hourly data on single levels from 1940 to present.Datos recuperados entre noviembre, 2024

Copernicus Climate Data Store. Era5 hourly data on single levels from 1940 to present.Datos recuperados entre noviembre, 2024

1940
[10]

Image denoising by sparse 3-d transform-domain collaborative filtering.IEEE Transactions on Image Processing, 16(8):2080–2095, 2007

Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering.IEEE Transactions on Image Processing, 16(8):2080–2095, 2007

2080
[11]

An overview of low-rank matrix recovery from incomplete observations.IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016

Mark A Davenport and Justin Romberg. An overview of low-rank matrix recovery from incomplete observations.IEEE Journal of Selected Topics in Signal Processing, 10(4):608–622, 2016

2016
[12]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review arXiv 2010
[13]

The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints.SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998

1998
[14]

Compressive sensing in medical imaging.Applied optics, 54(8):C23–C44, 2015

Christian G Graff and Emil Y Sidky. Compressive sensing in medical imaging.Applied optics, 54(8):C23–C44, 2015

2015
[15]

Low rank matrix completion via robust alternating minimization in nearly linear time

Yuzhou Gu, Zhao Song, Junze Yin, and Lichen Zhang. Low rank matrix completion via robust alternating minimization in nearly linear time. InThe Twelfth International Conference on Learning Representations, 2024

2024
[16]

Accelerating diffusion via compressed sensing: Applications to imaging and finance

Zhengyi Guo, Jiatu Li, Wenpin Tang, and David Yao. Accelerating diffusion via compressed sensing: Applications to imaging and finance. InNeurIPS 2025 Workshop MLxOR: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision- Making, 2025. 12

2025
[17]

Chapman and Hall/CRC, 2018

Arjun K Gupta and Daya K Nagar.Matrix variate distributions. Chapman and Hall/CRC, 2018

2018
[18]

Applications of matrix factorization methods to climate data.Nonlinear Processes in Geophysics Discussions, 2020:1–27, 2020

Dylan Harries and Terence J O’Kane. Applications of matrix factorization methods to climate data.Nonlinear Processes in Geophysics Discussions, 2020:1–27, 2020

2020
[19]

Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in Neural Information Processing Systems, 33:6840–6851, 2020

2020
[20]

Cambridge University Press, 2012

Roger A Horn and Charles R Johnson.Matrix Analysis. Cambridge University Press, 2012

2012
[21]

Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion.The Annals of Statistics, 39(5): 2302–2329, 2011

Vladimir Koltchinskii, Karim Lounici, and Alexandre B Tsybakov. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion.The Annals of Statistics, 39(5): 2302–2329, 2011

2011
[22]

Back to Basics: Let Denoising Generative Models Denoise

Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720, 2025

work page internal anchor Pith review arXiv 2025
[23]

Diffusion models for time- series applications: a survey.Frontiers of Information Technology & Electronic Engineering, 25 (1):19–41, 2024

Lequan Lin, Zhengkun Li, Ruikun Li, Xuliang Li, and Junbin Gao. Diffusion models for time- series applications: a survey.Frontiers of Information Technology & Electronic Engineering, 25 (1):19–41, 2024

2024
[24]

Flow matching for generative modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations
[25]

Flow matching for generative modeling

Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

2023
[26]

Maximum entropy low-rank matrix recovery.IEEE Journal of Selected Topics in Signal Processing, 12(5):886–901, 2018

Simon Mak and Yao Xie. Maximum entropy low-rank matrix recovery.IEEE Journal of Selected Topics in Signal Processing, 12(5):886–901, 2018

2018
[27]

Information-guided sampling for low-rank matrix completion.arXiv preprint arXiv:1706.08037, 2017

Simon Mak, Henry Shaowu Yushi, and Yao Xie. Information-guided sampling for low-rank matrix completion.arXiv preprint arXiv:1706.08037, 2017

work page arXiv 2017
[28]

Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.Journal of Functional Analysis, 173(2):361–400, 2000

Felix Otto and Cédric Villani. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality.Journal of Functional Analysis, 173(2):361–400, 2000

2000
[29]

Missdiff: Training diffusion models on tabular data with missing values.arXiv preprint arXiv:2307.00467, 2023

Yidong Ouyang, Liyan Xie, Chongxuan Li, and Guang Cheng. Missdiff: Training diffusion models on tabular data with missing values.arXiv preprint arXiv:2307.00467, 2023

work page arXiv 2023
[30]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023

2023
[31]

Tensor based missing traffic data com- pletion with spatial–temporal correlation.Physica A: Statistical Mechanics and its Applications, 446:54–63, 2016

Bin Ran, Huachun Tan, Yuankai Wu, and Peter J Jin. Tensor based missing traffic data com- pletion with spatial–temporal correlation.Physica A: Statistical Mechanics and its Applications, 446:54–63, 2016

2016
[32]

Mcflow: Monte carlo flow models for data imputation

Trevor W Richardson, Wencheng Wu, Lei Lin, Beilei Xu, and Edgar A Bernal. Mcflow: Monte carlo flow models for data imputation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14205–14214, 2020. 13

2020
[33]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022

2022
[34]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review arXiv 2010
[35]

Cambridge University Press, 2017

Paul Suetens.Fundamentals of Medical Imaging. Cambridge University Press, 2017

2017
[36]

Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021

Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021

2021
[37]

Liouville flow importance sampler.arXiv preprint arXiv:2405.06672, 2024

Yifeng Tian, Nishant Panda, and Yen Ting Lin. Liouville flow importance sampler.arXiv preprint arXiv:2405.06672, 2024

work page arXiv 2024
[38]

An introduction to matrix concentration inequalities.Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015

Joel A Tropp. An introduction to matrix concentration inequalities.Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015

2015
[39]

Springer, 2009

Cédric Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften. Springer, 2009

2009
[40]

Spatial modelling and prediction with the spatio-temporal matrix: A study on predicting future settlement growth.Land, 11(8):1174, 2022

Zhiyuan Wang, Felix Bachofer, Jonas Koehler, Juliane Huth, Thorsten Hoeser, Mattia Mar- concini, Thomas Esch, and Claudia Kuenzer. Spatial modelling and prediction with the spatio-temporal matrix: A study on predicting future settlement growth.Land, 11(8):1174, 2022

2022
[41]

Annealing flow generative models towards sampling high-dimensional and multi-modal distributions

Dongze Wu and Yao Xie. Annealing flow generative models towards sampling high-dimensional and multi-modal distributions. InForty-second International Conference on Machine Learning, 2025

2025
[42]

Flow-based Generative Modeling of Potential Outcomes and Counterfactuals

Dongze Wu, David I Inouye, and Yao Xie. Po-flow: Flow-based generative models for sampling potential outcomes and counterfactuals.arXiv preprint arXiv:2505.16051, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Doflow: Flow-based generative models for interven- tional and counterfactual forecasting on time series

Dongze Wu, Feng Qiu, and Yao Xie. Doflow: Flow-based generative models for interven- tional and counterfactual forecasting on time series. InInternational Conference on Learning Representations, 2026

2026
[44]

Local flow matching generative models.IEEE Transactions on Information Theory, 2026

Chen Xu, Xiuyuan Cheng, and Yao Xie. Local flow matching generative models.IEEE Transactions on Information Theory, 2026. Accepted

2026
[45]

Bayesian uncertainty quantification for low-rank matrix completion.Bayesian Analysis, 18(2):491–518, 2023

Henry Shaowu Yuchi, Simon Mak, and Yao Xie. Bayesian uncertainty quantification for low-rank matrix completion.Bayesian Analysis, 18(2):491–518, 2023

2023
[46]

Diffputer: Empowering diffusion models for missing data imputation.arXiv preprint arXiv:2405.20690, 2024

Hengrui Zhang, Liancheng Fang, Qitian Wu, and Philip S Yu. Diffputer: Empowering diffusion models for missing data imputation.arXiv preprint arXiv:2405.20690, 2024

work page arXiv 2024
[47]

Diffusion Transformers with Representation Autoencoders

Boyang Zheng, Nanye Ma, Shengbang Tong, and Saining Xie. Diffusion transformers with representation autoencoders.arXiv preprint arXiv:2510.11690, 2025

work page internal anchor Pith review arXiv 2025
[48]

Zheng and N

Shuhan Zheng and Nontawat Charoenphakdee. Diffusion models for missing value imputation in tabular data.arXiv preprint arXiv:2210.17128, 2022. 14 A Proofs Proof of Proposition 4.2.Let U∈R m1×R and V∈R m2×R satisfy U ⊤U = IR and V ⊤V = IR. Define the orthogonal projectors PU :=U U ⊤, P V :=V V ⊤. Note thatP ⊤ U PU =P U andP ⊤ V PV =P V. The population loss...

work page arXiv 2022