STREAM: Stochastic Riemannian Flow Matching with Anisotropic Decoder for Digital Histopathology Image Generation

Daeky Jeong; Hongjun Yoon; Hyeongyeol Lim; Won June Cho

arxiv: 2606.07036 · v1 · pith:DCKQATYYnew · submitted 2026-06-05 · 💻 cs.CV · cs.AI· cs.CE· cs.LG

STREAM: Stochastic Riemannian Flow Matching with Anisotropic Decoder for Digital Histopathology Image Generation

Won June Cho , Daeky Jeong , Hyeongyeol Lim , Hongjun Yoon This is my paper

Pith reviewed 2026-06-27 22:37 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CEcs.LG

keywords histopathology image generationRiemannian flow matchinganisotropic decodervision foundation modelsconditioning collapsesynthetic datapatch-token featureslatent diffusion

0 comments

The pith

STREAM uses Riemannian flow matching on hypersphere patch features from vision models to generate synthetic histopathology images without conditioning collapse.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that patch-token features extracted from pretrained histopathology vision foundation models are l2-normalized and sit on the unit hypersphere with angular dominance. Treating these features as the latent space rather than as conditioning signals prevents the conditioning from overwhelming generation. A bridge-type stochastic perturbation creates per-token rectifiability on the sphere so a Diffusion Transformer can be trained directly in that space. An anisotropic decoder then modulates the velocity-field Jacobian to maintain fidelity along high-energy directions while adding robustness along low-energy ones. If correct, this produces higher-quality and more diverse synthetic images for breast and colorectal cancer data than prior conditioning-based latent diffusion approaches.

Core claim

The central claim is that pretrained histopathology VFM patch-token features are l2-normalized and lie on the unit hypersphere S^{d-1} with strong angular dominance and intrinsic curvature, making them naturally suited for Riemannian formulation; STREAM therefore applies a bridge-type stochastic perturbation to establish per-token rectifiability on S^{d-1} for training a DiT in latent space together with a novel anisotropic decoder that allocates robustness to low-energy directions of the velocity-field Jacobian while preserving fidelity along its high-energy directions, yielding state-of-the-art reconstruction and generation on breast and colorectal cancer datasets.

What carries the argument

Stochastic Riemannian flow matching on unit-hypersphere patch-token features, using a bridge-type perturbation for rectifiability plus an anisotropic decoder that weights the velocity-field Jacobian by energy direction.

If this is right

Avoids conditioning collapse by using VFM features as the latent space itself instead of as external conditioning.
Achieves state-of-the-art reconstruction and generation performance on breast and colorectal cancer histopathology datasets.
The stochastic perturbation establishes per-token rectifiability on the hypersphere for stable DiT training.
The anisotropic decoder preserves high-energy fidelity while adding robustness in low-energy Jacobian directions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same hypersphere formulation may apply to other medical imaging domains where VFM patch features show comparable angular structure.
Manifold-aware flow matching could reduce reliance on elaborate conditioning schemes across latent generative models.
Higher diversity from this approach might improve downstream training of pathology foundation models on synthetic data.
Direct tests on additional tissue types or non-cancer histopathology would clarify the range of the spherical assumption.

Load-bearing premise

The l2-normalized patch-token features possess strong angular dominance and intrinsic curvature that make Riemannian geometry on the hypersphere superior to Euclidean alternatives for this generation task.

What would settle it

Training the identical DiT and decoder pipeline with standard Euclidean flow matching on the same VFM features and observing whether reconstruction FID rises or sample diversity falls compared with the Riemannian version.

Figures

Figures reproduced from arXiv: 2606.07036 by Daeky Jeong, Hongjun Yoon, Hyeongyeol Lim, Won June Cho.

**Figure 1.** Figure 1: Overview of STREAM. A frozen pathology VFM encoder maps each input patch to N=256 tokens on the unit hypersphere S d−1 ; a Diffusion Transformer learns to transport a uniform source distribution on (S d−1 ) N to the data distribution along bridge-perturbed geodesics. A separately-trained anisotropic decoder reconstructs histopathology images from generated features, with directional noise injection guided… view at source ↗

**Figure 2.** Figure 2: Generation samples from each trained model on TCGA-BRCA and TCGA-COADREAD. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Intrinsic geometry of histopathology VFM feature manifolds on TCGA-BRCA. (a) Eigen [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Reconstruction comparison on TCGA-BRCA: ground truth (GT) and reconstruction [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Reconstruction comparison on TCGA-COADREAD: ground truth (GT) and reconstruc [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Generation samples on TCGA-BRCA: unconditional samples (used for gFID evaluation) [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Generation samples on TCGA-COADREAD: unconditional samples (used for gFID eval [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

read the original abstract

Synthetic histopathology image generation addresses critical challenges in computational pathology, including patient privacy and the growing need for large-scale training data for foundation models. Latent diffusion models have dominated the image generation domain, with recent works emphasizing that the choice of latent space is critical to the quality of generated images. Existing state-of-the-art generative models in histopathology use pretrained Vision Foundation Models (VFMs) as conditioning signals, and we observe that this leads to "conditioning collapse," where the conditioning signal dominates the latent space and lowers the quality and diversity of generated samples. Therefore, we instead use pretrained histopathology VFMs as the latent space itself, leveraging their patch-token features that encode rich semantic information. We empirically show that these features are $\ell_2$-normalized and lie on the unit hypersphere $\mathcal{S}^{d-1}$ with strong angular dominance and intrinsic curvature, making them naturally suited for a Riemannian formulation. We therefore present STREAM, the first framework to apply Riemannian flow matching in the pathology domain. STREAM consists of two stages: 1) a bridge-type stochastic perturbation that establishes per-token rectifiability on $\mathcal{S}^{d-1}$ for training a Diffusion Transformer (DiT) in latent space, and 2) a novel anisotropic decoder that allocates robustness to low-energy directions of the velocity-field Jacobian while preserving fidelity along its high-energy directions. Together, STREAM achieves state-of-the-art reconstruction and generation performance on breast and colorectal cancer datasets. The code will be publicly released upon acceptance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

STREAM applies Riemannian flow matching to VFM patch tokens in histopathology with a new anisotropic decoder, but the abstract gives no numbers and the need for the manifold step over Euclidean is not isolated.

read the letter

The core of this paper is using pretrained histopathology VFMs as the latent space itself rather than as conditioning, then running stochastic Riemannian flow matching on the resulting l2-normalized patch tokens that sit on the hypersphere. They add a bridge-type perturbation for rectifiability and an anisotropic decoder that tries to protect high-energy directions in the velocity Jacobian while allowing more flexibility in low-energy ones. That combination is positioned as the first Riemannian flow matching setup in this domain and is claimed to beat prior work on breast and colorectal cancer image generation and reconstruction.

The observation that the tokens show strong angular dominance is a reasonable starting point, and treating the latent space as a manifold rather than forcing it into Euclidean flow matching is a clear departure from the usual latent diffusion setups in pathology. The decoder design is the most concrete technical addition; if it actually improves Jacobian stability without hurting fidelity, that piece could travel to other flow-matching work.

The main weakness is that the abstract supplies zero quantitative results, no baseline tables, no dataset sizes, and no error bars. Without those, the SOTA claim cannot be checked, and the stress-test point lands: there is no evidence yet that the Riemannian geometry itself is required versus running ordinary Euclidean flow matching on the same normalized features. The paper would need to show that ablation clearly.

This is aimed at computational pathology groups that already use VFMs and need synthetic images for privacy or data-scarce settings. Readers working on manifold-based generative models might also pick up the decoder idea. The work shows clear thinking about the latent-space choice and is coherent on its own terms, so it deserves a serious referee even if the experiments turn out to need strengthening.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes STREAM, a two-stage framework for histopathology image generation that treats ℓ_{2}-normalized patch-token features from pretrained VFMs as points on the unit hypersphere S^{d-1}. It introduces a bridge-type stochastic perturbation to enable rectifiable training of a DiT via Riemannian flow matching, paired with a novel anisotropic decoder that modulates Jacobian robustness along low- versus high-energy directions, claiming this avoids conditioning collapse and yields SOTA reconstruction and generation on breast and colorectal cancer datasets.

Significance. If the performance claims hold with proper controls, the work could meaningfully advance generative modeling in computational pathology by showing that Riemannian geometry on VFM latents can outperform conditioning-based baselines, while the public code release would aid reproducibility.

major comments (2)

[Abstract] Abstract: the SOTA reconstruction and generation claims are unsupported by any quantitative results, error bars, baseline comparisons, dataset sizes, or metric values. This is load-bearing for the central empirical claim.
[Abstract] Abstract (paragraph on empirical observation and method description): the assertion that the observed ℓ_{2}-normalization and angular dominance make the features 'naturally suited' for a Riemannian formulation is not accompanied by an ablation or comparison isolating the manifold geometry from the bridge perturbation or anisotropic decoder; without this, the necessity of the Riemannian component over Euclidean flow matching on the same normalized tokens remains unverified.

minor comments (1)

[Abstract] Abstract: consider specifying the exact datasets (e.g., names and splits) and metrics used to support the SOTA statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We agree that the abstract must be revised to include quantitative support for the SOTA claims. On the second point, we will strengthen the presentation by adding a targeted ablation that isolates the contribution of the Riemannian geometry.

read point-by-point responses

Referee: [Abstract] Abstract: the SOTA reconstruction and generation claims are unsupported by any quantitative results, error bars, baseline comparisons, dataset sizes, or metric values. This is load-bearing for the central empirical claim.

Authors: We agree that the abstract, as currently written, does not contain the specific metric values, error bars, baseline names, or dataset sizes needed to substantiate the SOTA claim on first reading. The full manuscript reports these quantities (FID, reconstruction MSE, diversity metrics, dataset cardinalities for the breast and colorectal cohorts, and statistical significance) in the experimental sections and tables. We will revise the abstract to incorporate the key quantitative results and error bars so that the central empirical claim is self-contained and verifiable. revision: yes
Referee: [Abstract] Abstract (paragraph on empirical observation and method description): the assertion that the observed ℓ₂-normalization and angular dominance make the features 'naturally suited' for a Riemannian formulation is not accompanied by an ablation or comparison isolating the manifold geometry from the bridge perturbation or anisotropic decoder; without this, the necessity of the Riemannian component over Euclidean flow matching on the same normalized tokens remains unverified.

Authors: The abstract condenses the empirical observation that VFM patch tokens are ℓ₂-normalized and exhibit angular dominance. The full manuscript motivates the Riemannian formulation from these properties and evaluates the complete STREAM pipeline against Euclidean baselines. To directly address the referee’s request for isolation, we will add an explicit ablation that trains an otherwise identical Euclidean flow-matching model on the same normalized tokens and compares it to the Riemannian version (keeping the bridge perturbation and anisotropic decoder fixed where applicable). This will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation self-contained via empirical observation and new components

full rationale

The paper's chain starts from an empirical observation (ℓ2-normalized VFM patch tokens on S^{d-1}) that is presented as data-driven motivation rather than a definitional input. It then introduces independent elements (bridge-type stochastic perturbation for rectifiability and anisotropic decoder for Jacobian robustness) to build the Riemannian flow matching framework. No equations reduce a claimed prediction to a fitted parameter by construction, no self-citations are invoked as load-bearing uniqueness theorems, and no ansatz is smuggled via prior author work. The SOTA claim is tied to reported performance on external datasets, not internal redefinition. This matches the default non-circular case.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Review based solely on abstract; full parameter counts, training details, and assumption justifications unavailable.

axioms (2)

domain assumption Pretrained histopathology VFMs produce ℓ2-normalized patch-token features lying on the unit hypersphere S^{d-1}
Abstract states this is empirically shown and forms the basis for the Riemannian formulation.
domain assumption These features exhibit strong angular dominance and intrinsic curvature making them naturally suited for Riemannian flow matching
Abstract invokes this property to justify the choice of geometry over Euclidean latent diffusion.

invented entities (1)

Anisotropic decoder no independent evidence
purpose: Allocates robustness to low-energy directions of the velocity-field Jacobian while preserving fidelity along high-energy directions
Novel component introduced to handle the Riemannian velocity field; no independent evidence provided outside the paper.

pith-pipeline@v0.9.1-grok · 5821 in / 1462 out tokens · 30344 ms · 2026-06-27T22:37:02.235474+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 21 canonical work pages · 7 internal anchors

[1]

Albergo and Eric Vanden-Eijnden

Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. InICLR, 2023

2023
[2]

Diffusions hypercontractives

Dominique Bakry and Michel Émery. Diffusions hypercontractives. InSéminaire de Proba- bilités XIX. Springer, 1985

1985
[3]

On the Convergence and Straightness of Rectified Flow

Vansh Bansal, Saptarshi Roy, Purnamrita Sarkar, and Alessandro Rinaldo. On the Wasserstein convergence and straightness of rectified flows.arXiv preprint arXiv:2410.14949, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[4]

A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

Jean-David Benamou and Yves Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

2000
[5]

Sutherland, Michael Arbel, and Arthur Gretton

Mikołaj Bi ´nkowski, Dougal J. Sutherland, Michael Arbel, and Arthur Gretton. Demystifying MMD GANs. InInternational Conference on Learning Representations (ICLR), 2018

2018
[6]

Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J

Gabriele Campanella, Matthew G. Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J. Busam, Edi Brogi, Victor E. Reuter, David S. Klimstra, and Thomas J. Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature Medicine, 25:1301–1309, 2019

2019
[7]

Hyperspherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Hun Chang, Byunghee Cha, and Jong Chul Ye. DINO-SAE: DINO spherical autoencoder for high-fidelity image reconstruction and generation.arXiv preprint arXiv:2601.22904, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[8]

Aligning visual foundation encoders to tokenizers for diffusion models

Bowei Chen, Sai Bi, Hao Tan, He Zhang, Tianyuan Zhang, Zhengqi Li, Yuanjun Xiong, Jian- ming Zhang, and Kai Zhang. Aligning visual foundation encoders to tokenizers for diffusion models. InICLR, 2026

2026
[9]

Masked autoencoders are effective tokenizers for diffusion models

Hao Chen, Yujin Han, Fangyi Chen, et al. Masked autoencoders are effective tokenizers for diffusion models. InICML, 2025

2025
[10]

Chen, Tong Ding, Ming Y

Richard J. Chen, Tong Ding, Ming Y . Lu, Drew F. K. Williamson, et al. Towards a general- purpose foundation model for computational pathology.Nature Medicine, 30(3):850–862, 2024

2024
[11]

Ricky T. Q. Chen and Yaron Lipman. Flow matching on general geometries. InICLR, 2024

2024
[12]

do Carmo.Riemannian Geometry

Manfredo P. do Carmo.Riemannian Geometry. Birkhäuser, 1992

1992
[13]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021

2021
[14]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Björn Ommer. Taming transformers for high-resolution image synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

2021
[15]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorber, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InICML, 2024

2024
[16]

The Vendi score: A diversity evaluation metric for machine learning.Transactions on Machine Learning Research, 2023

Dan Friedman and Adji Bousso Dieng. The Vendi score: A diversity evaluation metric for machine learning.Transactions on Machine Learning Research, 2023

2023
[17]

Learned representation-guided diffusion models for large- image generation

Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, and Dimitris Samaras. Learned representation-guided diffusion models for large- image generation. InCVPR, 2024

2024
[18]

Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.SIAM Review, 53(2):217–288, 2011

2011
[19]

On the relation between rectified flows and optimal transport.arXiv preprint arXiv:2505.19712, 2025

Johannes Hertrich, Antonin Chambolle, and Julie Delon. On the relation between rectified flows and optimal transport.arXiv preprint arXiv:2505.19712, 2025

work page arXiv 2025
[20]

GANs trained by a two time-scale update rule converge to a local Nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochre- iter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In NeurIPS, 2017. 10

2017
[21]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020

2020
[22]

Conditional Vendi Score: Prompt-Aware Diversity Evaluation for Generative AI Models and LLMs

Mohammad Jalali, Azim Ospanov, Amin Gohari, and Farzan Farnia. Conditional vendi score: An information-theoretic approach to diversity evaluation of prompt-based generative models. arXiv preprint arXiv:2411.02817, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Towards an explainable comparison and alignment of feature embeddings.arXiv preprint arXiv:2506.06231, 2025

Mohammad Jalali, Bahar Dibaei Nia, and Farzan Farnia. Towards an explainable comparison and alignment of feature embeddings.arXiv preprint arXiv:2506.06231, 2025

work page arXiv 2025
[24]

Rethinking FID: Towards a better evaluation metric for image generation

Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, and Sanjiv Kumar. Rethinking FID: Towards a better evaluation metric for image generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[25]

de Jong, Ioannis Gatopoulos, Nicolas Känzig, Mikhail Karasikov, Axel Lagré, Roman Moser, Joost van Doorn, and Fei Tang

kaiko.ai, Nanne Aben, Edwin D. de Jong, Ioannis Gatopoulos, Nicolas Känzig, Mikhail Karasikov, Axel Lagré, Roman Moser, Joost van Doorn, and Fei Tang. Towards large-scale training of pathology foundation models.arXiv preprint arXiv:2404.15217, 2024

work page arXiv 2024
[26]

EQ-V AE: Equivariance regularized latent space for improved generative image modeling

Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, and Nikos Komodakis. EQ-V AE: Equivariance regularized latent space for improved generative image modeling. InICML, 2025

2025
[27]

Amandeep Kumar and Vishal M. Patel. Learning on the manifold: Unlocking standard diffu- sion transformers with representation encoders.arXiv preprint arXiv:2602.10099, 2026

work page arXiv 2026
[28]

InECCV, 2024

Minh-Quan Le, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, and Dim- itris Samaras.∞-Brush: Controllable large image synthesis with diffusion models in infinite dimensions. InECCV, 2024

2024
[29]

Lee.Introduction to Riemannian Manifolds

John M. Lee.Introduction to Riemannian Manifolds. Springer, 2nd edition, 2018

2018
[30]

Geometry-aware image flow matching

Junho Lee, Kwanseok Kim, and Joonseok Lee. Geometry-aware image flow matching. In International Conference on Machine Learning (ICML), 2026

2026
[31]

Improving the training of rectified flows

Sangyun Lee, Zinan Lin, and Giulia Fanti. Improving the training of rectified flows. In NeurIPS, 2024

2024
[32]

REPA-E: Unlocking V AE for end-to-end tuning with latent diffusion transformers

Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, and Liang Zheng. REPA-E: Unlocking V AE for end-to-end tuning with latent diffusion transformers. InICCV, 2025

2025
[33]

Back to Basics: Let Denoising Generative Models Denoise

Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InICLR, 2023

2023
[35]

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A. W. M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. A survey on deep learning in medical image analysis.Medical Image Analysis, 42:60–88, 2017

2017
[36]

Rectified Flow: A Marginal Preserving Approach to Optimal Transport

Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport.arXiv preprint arXiv:2209.14577, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[37]

Improving reconstruction of representation autoencoder.arXiv preprint arXiv:2602.08620, 2026

Siyu Liu, Chujie Qin, Hubery Yin, Qixin Yan, Zheng-Peng Duan, Chen Li, Jing Lyu, Chun-Le Guo, and Chongyi Li. Improving reconstruction of representation autoencoder.arXiv preprint arXiv:2602.08620, 2026

work page arXiv 2026
[38]

Flow straight and fast: Learning to generate and transfer data with rectified flows

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flows. InICLR, 2023

2023
[39]

On the regularity of solutions of optimal transportation problems.Acta Mathematica, 202(2):241–283, 2009

Grégoire Loeper. On the regularity of solutions of optimal transportation problems.Acta Mathematica, 202(2):241–283, 2009

2009
[40]

Lu, Bowen Chen, Drew F

Ming Y . Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, Anil V . Sahai, and Faisal Mah- mood. A visual-language foundation model for computational pathology.Nature Medicine, 2024

2024
[41]

Albergo, Nicholas M

Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden, and Saining Xie. SiT: Exploring flow and diffusion-based generative models with scalable interpolant transformers. InECCV, 2024. 11

2024
[42]

Trudinger, and Xu-Jia Wang

Xi-Nan Ma, Neil S. Trudinger, and Xu-Jia Wang. Regularity of potential functions of the optimal transportation problem.Archive for Rational Mechanics and Analysis, 177(2):151– 183, 2005

2005
[43]

Co-synthesis of histopathology nuclei image- label pairs using a context-conditioned joint diffusion model

Seonghui Min, Hyun-Jic Oh, and Won-Ki Jeong. Co-synthesis of histopathology nuclei image- label pairs using a context-conditioned joint diffusion model. InECCV, 2024. doi: 10.1007/ 978-3-031-72624-8_9

2024
[44]

Spider: A com- prehensive multi-organ supervised pathology dataset and baseline models

Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova. SPIDER: A compre- hensive multi-organ supervised pathology dataset and baseline models.arXiv preprint arXiv:2503.02876, 2025

work page arXiv 2025
[45]

DINOv2: Learning robust visual fea- tures without supervision.TMLR, 2024

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khali- dov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud As- sran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patric...

2024
[46]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InICCV, 2023

2023
[47]

Nonparametric regression estimation on closed Riemannian manifolds.Jour- nal of Nonparametric Statistics, 18(1):57–67, 2006

Bruno Pelletier. Nonparametric regression estimation on closed Riemannian manifolds.Jour- nal of Nonparametric Statistics, 18(1):57–67, 2006

2006
[48]

Image tokenizer needs post-training.arXiv preprint arXiv:2509.12474, 2025

Kai Qiu, Xiang Li, Hao Chen, Jason Kuen, Xiaohao Xu, Jiuxiang Gu, Yinyi Luo, Bhiksha Raj, Zhe Lin, and Marios Savvides. Image tokenizer needs post-training.arXiv preprint arXiv:2509.12474, 2025

work page arXiv 2025
[49]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022

2022
[50]

Latent diffusion model without variational autoencoder.arXiv preprint arXiv:2510.15301, 2025

Minglei Shi, Haolin Wang, Wenzhao Zheng, Ziyang Yuan, Xiaoshi Wu, Xintao Wang, Pengfei Wan, Jie Zhou, and Jiwen Lu. Latent diffusion model without variational autoencoder.arXiv preprint arXiv:2510.15301, 2025

work page arXiv 2025
[51]

RecTok: Reconstruction distillation along rectified flow

Qingyu Shi, Size Wu, Jinbin Bai, Kaidong Yu, Yujing Wang, Yunhai Tong, Xiangtai Li, and Xuelong Li. RecTok: Reconstruction distillation along rectified flow. InCVPR, 2026

2026
[52]

DINOv3

Oriane Simeoni et al. DINOv3.arXiv preprint arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[53]

UniLiP: Adapting clip for unified multimodal understanding, generation and editing.arXiv preprint arXiv:2507.23278, 2025

Hao Tang, Chenwei Xie, Xiaoyi Bao, Tingyu Weng, Pandeng Li, Yun Zheng, and Liwei Wang. UniLiP: Adapting clip for unified multimodal understanding, generation and editing.arXiv preprint arXiv:2507.23278, 2025

work page arXiv 2025
[54]

Comprehensive molecular characterization of human colon and rectal cancer.Nature, 487(7407):330–337, 2012

The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer.Nature, 487(7407):330–337, 2012. doi: 10.1038/nature11252

work page doi:10.1038/nature11252 2012
[55]

Comprehensive molecular portraits of human breast tumours.Nature, 490:61–70, 2012

The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours.Nature, 490:61–70, 2012

2012
[56]

von Renesse and Karl-Theodor Sturm

Max-K. von Renesse and Karl-Theodor Sturm. Transport inequalities, gradient estimates, en- tropy and Ricci curvature.Communications on Pure and Applied Mathematics, 58(7):923–940, 2005

2005
[57]

Self-improving generative foundation model for synthetic medical image generation and clinical applications.Nature Medicine, 31 (2):609–617, 2025

Jinzhuo Wang, Kai Wang, Yunfang Yu, Yuxing Lu, et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications.Nature Medicine, 31 (2):609–617, 2025

2025
[58]

Diffuse and Disperse: Im- age Generation with Representation Regularization, 2025

Runqian Wang and Kaiming He. Diffuse and disperse: Image generation with representation regularization.arXiv preprint arXiv:2506.09027, 2025

work page arXiv 2025
[59]

Olguin, Jeffrey J

Jinxi Xiang, Xiyue Wang, Xiaoming Zhang, Yinghua Xi, Feyisope Eweje, Yuchen Chen, Yuan- feng Li, Colin Bergstrom, Matthew Gopaulchan, Ted Kim, Kun-Hsing Yu, Sierra Willens, Francesca M. Olguin, Jeffrey J. Nirschl, Joel Neal, Maximilian Diehn, Sen Yang, and Ruijiang Li. A vision-language foundation model for precision oncology.Nature, 638(8051):769–778, 2025

2025
[60]

Conghao Xiong, Zhengrui Guo, Zhe Xu, Yifei Zhang, Raymond Kai-yu Tong, Si Yong Yeo, Hao Chen, Joseph J. Y . Sung, and Irwin King. Exploiting low-dimensional manifold of features for few-shot whole slide image classification. InICLR, 2026. 12

2026
[61]

A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

2024
[62]

TopoCellGen: Generating histopathology cell topology with a diffusion model

Meilong Xu, Saumya Gupta, Xiaoling Hu, Chen Li, Shahira Abousamra, Dimitris Samaras, Prateek Prasanna, and Chao Chen. TopoCellGen: Generating histopathology cell topology with a diffusion model. InCVPR, 2025

2025
[63]

Latent denoising makes good visual tokenizers.arXiv preprint arXiv:2507.15856, 2025

Jiawei Yang, Tianhong Li, Lijie Fan, Yonglong Tian, and Yue Wang. Latent denoising makes good visual tokenizers.arXiv preprint arXiv:2507.15856, 2025

work page arXiv 2025
[64]

Reconstruction vs

Jingfeng Yao, Bin Yang, and Xinggang Wang. Reconstruction vs. generation: Taming opti- mization dilemma in latent diffusion models. InCVPR, 2025

2025
[65]

PathLDM: Text conditioned latent diffusion model for histopathology

Srikar Yellapragada, Alexandros Graikos, Prateek Prasanna, Tahsin Kurc, Joel Saltz, and Dim- itris Samaras. PathLDM: Text conditioned latent diffusion model for histopathology. In IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024

2024
[66]

Knudsen, Tahsin Kurc, Rajarsi R

Srikar Yellapragada, Alexandros Graikos, Zilinghan Li, Kostas Triaridis, Varun Belagali, Tarak Nath Nandi, Karen Bai, Beatrice S. Knudsen, Tahsin Kurc, Rajarsi R. Gupta, Prateek Prasanna, Ravi K. Madduri, Joel Saltz, and Dimitris Samaras. PixCell: A generative founda- tion model for digital histopathology images.arXiv preprint arXiv:2506.05127, 2025

work page arXiv 2025
[67]

Representation alignment for generation: Training diffusion transformers is easier than you think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training diffusion transformers is easier than you think. InICLR, 2025

2025
[68]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InCVPR, 2018

2018
[69]

Vision foundation models as effective visual tokenizers for autore- gressive image generation

Anlin Zheng, Xin Wen, Xuanyang Zhang, Chuofan Ma, Tiancai Wang, Gang Yu, Xiangyu Zhang, and Xiaojuan Qi. Vision foundation models as effective visual tokenizers for autore- gressive image generation. InNeurIPS, 2025

2025
[70]

Diffusion Transformers with Representation Autoencoders

Boyang Zheng, Nanye Ma, Shengbang Tong, and Saining Xie. Diffusion transformers with representation autoencoders.arXiv preprint arXiv:2510.11690, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[71]

Stabilize the latent space for image autoregressive modeling: A unified perspective

Yongxin Zhu, Bocheng Li, Hang Zhang, Xin Li, Linli Xu, and Lidong Bing. Stabilize the latent space for image autoregressive modeling: A unified perspective. InNeurIPS, 2024

2024
[72]

Virchow2:Scalingself-supervisedmixedmagnification models in pathology

Eric Zimmermann, Eugene V orontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi, Thomas Fuchs, Nicolo Fusi, Siqi Liu, and Kristen Severson. Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738, 2024. 13 A Mathematical Prelim...

work page arXiv 2024

[1] [1]

Albergo and Eric Vanden-Eijnden

Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. InICLR, 2023

2023

[2] [2]

Diffusions hypercontractives

Dominique Bakry and Michel Émery. Diffusions hypercontractives. InSéminaire de Proba- bilités XIX. Springer, 1985

1985

[3] [3]

On the Convergence and Straightness of Rectified Flow

Vansh Bansal, Saptarshi Roy, Purnamrita Sarkar, and Alessandro Rinaldo. On the Wasserstein convergence and straightness of rectified flows.arXiv preprint arXiv:2410.14949, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[4] [4]

A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

Jean-David Benamou and Yves Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

2000

[5] [5]

Sutherland, Michael Arbel, and Arthur Gretton

Mikołaj Bi ´nkowski, Dougal J. Sutherland, Michael Arbel, and Arthur Gretton. Demystifying MMD GANs. InInternational Conference on Learning Representations (ICLR), 2018

2018

[6] [6]

Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J

Gabriele Campanella, Matthew G. Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J. Busam, Edi Brogi, Victor E. Reuter, David S. Klimstra, and Thomas J. Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature Medicine, 25:1301–1309, 2019

2019

[7] [7]

Hyperspherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Hun Chang, Byunghee Cha, and Jong Chul Ye. DINO-SAE: DINO spherical autoencoder for high-fidelity image reconstruction and generation.arXiv preprint arXiv:2601.22904, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[8] [8]

Aligning visual foundation encoders to tokenizers for diffusion models

Bowei Chen, Sai Bi, Hao Tan, He Zhang, Tianyuan Zhang, Zhengqi Li, Yuanjun Xiong, Jian- ming Zhang, and Kai Zhang. Aligning visual foundation encoders to tokenizers for diffusion models. InICLR, 2026

2026

[9] [9]

Masked autoencoders are effective tokenizers for diffusion models

Hao Chen, Yujin Han, Fangyi Chen, et al. Masked autoencoders are effective tokenizers for diffusion models. InICML, 2025

2025

[10] [10]

Chen, Tong Ding, Ming Y

Richard J. Chen, Tong Ding, Ming Y . Lu, Drew F. K. Williamson, et al. Towards a general- purpose foundation model for computational pathology.Nature Medicine, 30(3):850–862, 2024

2024

[11] [11]

Ricky T. Q. Chen and Yaron Lipman. Flow matching on general geometries. InICLR, 2024

2024

[12] [12]

do Carmo.Riemannian Geometry

Manfredo P. do Carmo.Riemannian Geometry. Birkhäuser, 1992

1992

[13] [13]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021

2021

[14] [14]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Björn Ommer. Taming transformers for high-resolution image synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

2021

[15] [15]

Scaling rectified flow transformers for high-resolution image synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorber, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InICML, 2024

2024

[16] [16]

The Vendi score: A diversity evaluation metric for machine learning.Transactions on Machine Learning Research, 2023

Dan Friedman and Adji Bousso Dieng. The Vendi score: A diversity evaluation metric for machine learning.Transactions on Machine Learning Research, 2023

2023

[17] [17]

Learned representation-guided diffusion models for large- image generation

Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, and Dimitris Samaras. Learned representation-guided diffusion models for large- image generation. InCVPR, 2024

2024

[18] [18]

Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.SIAM Review, 53(2):217–288, 2011

2011

[19] [19]

On the relation between rectified flows and optimal transport.arXiv preprint arXiv:2505.19712, 2025

Johannes Hertrich, Antonin Chambolle, and Julie Delon. On the relation between rectified flows and optimal transport.arXiv preprint arXiv:2505.19712, 2025

work page arXiv 2025

[20] [20]

GANs trained by a two time-scale update rule converge to a local Nash equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochre- iter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In NeurIPS, 2017. 10

2017

[21] [21]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020

2020

[22] [22]

Conditional Vendi Score: Prompt-Aware Diversity Evaluation for Generative AI Models and LLMs

Mohammad Jalali, Azim Ospanov, Amin Gohari, and Farzan Farnia. Conditional vendi score: An information-theoretic approach to diversity evaluation of prompt-based generative models. arXiv preprint arXiv:2411.02817, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

Towards an explainable comparison and alignment of feature embeddings.arXiv preprint arXiv:2506.06231, 2025

Mohammad Jalali, Bahar Dibaei Nia, and Farzan Farnia. Towards an explainable comparison and alignment of feature embeddings.arXiv preprint arXiv:2506.06231, 2025

work page arXiv 2025

[24] [24]

Rethinking FID: Towards a better evaluation metric for image generation

Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, and Sanjiv Kumar. Rethinking FID: Towards a better evaluation metric for image generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

2024

[25] [25]

de Jong, Ioannis Gatopoulos, Nicolas Känzig, Mikhail Karasikov, Axel Lagré, Roman Moser, Joost van Doorn, and Fei Tang

kaiko.ai, Nanne Aben, Edwin D. de Jong, Ioannis Gatopoulos, Nicolas Känzig, Mikhail Karasikov, Axel Lagré, Roman Moser, Joost van Doorn, and Fei Tang. Towards large-scale training of pathology foundation models.arXiv preprint arXiv:2404.15217, 2024

work page arXiv 2024

[26] [26]

EQ-V AE: Equivariance regularized latent space for improved generative image modeling

Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, and Nikos Komodakis. EQ-V AE: Equivariance regularized latent space for improved generative image modeling. InICML, 2025

2025

[27] [27]

Amandeep Kumar and Vishal M. Patel. Learning on the manifold: Unlocking standard diffu- sion transformers with representation encoders.arXiv preprint arXiv:2602.10099, 2026

work page arXiv 2026

[28] [28]

InECCV, 2024

Minh-Quan Le, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, and Dim- itris Samaras.∞-Brush: Controllable large image synthesis with diffusion models in infinite dimensions. InECCV, 2024

2024

[29] [29]

Lee.Introduction to Riemannian Manifolds

John M. Lee.Introduction to Riemannian Manifolds. Springer, 2nd edition, 2018

2018

[30] [30]

Geometry-aware image flow matching

Junho Lee, Kwanseok Kim, and Joonseok Lee. Geometry-aware image flow matching. In International Conference on Machine Learning (ICML), 2026

2026

[31] [31]

Improving the training of rectified flows

Sangyun Lee, Zinan Lin, and Giulia Fanti. Improving the training of rectified flows. In NeurIPS, 2024

2024

[32] [32]

REPA-E: Unlocking V AE for end-to-end tuning with latent diffusion transformers

Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, and Liang Zheng. REPA-E: Unlocking V AE for end-to-end tuning with latent diffusion transformers. InICCV, 2025

2025

[33] [33]

Back to Basics: Let Denoising Generative Models Denoise

Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[34] [34]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InICLR, 2023

2023

[35] [35]

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A. W. M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. A survey on deep learning in medical image analysis.Medical Image Analysis, 42:60–88, 2017

2017

[36] [36]

Rectified Flow: A Marginal Preserving Approach to Optimal Transport

Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport.arXiv preprint arXiv:2209.14577, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[37] [37]

Improving reconstruction of representation autoencoder.arXiv preprint arXiv:2602.08620, 2026

Siyu Liu, Chujie Qin, Hubery Yin, Qixin Yan, Zheng-Peng Duan, Chen Li, Jing Lyu, Chun-Le Guo, and Chongyi Li. Improving reconstruction of representation autoencoder.arXiv preprint arXiv:2602.08620, 2026

work page arXiv 2026

[38] [38]

Flow straight and fast: Learning to generate and transfer data with rectified flows

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flows. InICLR, 2023

2023

[39] [39]

On the regularity of solutions of optimal transportation problems.Acta Mathematica, 202(2):241–283, 2009

Grégoire Loeper. On the regularity of solutions of optimal transportation problems.Acta Mathematica, 202(2):241–283, 2009

2009

[40] [40]

Lu, Bowen Chen, Drew F

Ming Y . Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, Anil V . Sahai, and Faisal Mah- mood. A visual-language foundation model for computational pathology.Nature Medicine, 2024

2024

[41] [41]

Albergo, Nicholas M

Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden, and Saining Xie. SiT: Exploring flow and diffusion-based generative models with scalable interpolant transformers. InECCV, 2024. 11

2024

[42] [42]

Trudinger, and Xu-Jia Wang

Xi-Nan Ma, Neil S. Trudinger, and Xu-Jia Wang. Regularity of potential functions of the optimal transportation problem.Archive for Rational Mechanics and Analysis, 177(2):151– 183, 2005

2005

[43] [43]

Co-synthesis of histopathology nuclei image- label pairs using a context-conditioned joint diffusion model

Seonghui Min, Hyun-Jic Oh, and Won-Ki Jeong. Co-synthesis of histopathology nuclei image- label pairs using a context-conditioned joint diffusion model. InECCV, 2024. doi: 10.1007/ 978-3-031-72624-8_9

2024

[44] [44]

Spider: A com- prehensive multi-organ supervised pathology dataset and baseline models

Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova. SPIDER: A compre- hensive multi-organ supervised pathology dataset and baseline models.arXiv preprint arXiv:2503.02876, 2025

work page arXiv 2025

[45] [45]

DINOv2: Learning robust visual fea- tures without supervision.TMLR, 2024

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khali- dov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud As- sran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patric...

2024

[46] [46]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InICCV, 2023

2023

[47] [47]

Nonparametric regression estimation on closed Riemannian manifolds.Jour- nal of Nonparametric Statistics, 18(1):57–67, 2006

Bruno Pelletier. Nonparametric regression estimation on closed Riemannian manifolds.Jour- nal of Nonparametric Statistics, 18(1):57–67, 2006

2006

[48] [48]

Image tokenizer needs post-training.arXiv preprint arXiv:2509.12474, 2025

Kai Qiu, Xiang Li, Hao Chen, Jason Kuen, Xiaohao Xu, Jiuxiang Gu, Yinyi Luo, Bhiksha Raj, Zhe Lin, and Marios Savvides. Image tokenizer needs post-training.arXiv preprint arXiv:2509.12474, 2025

work page arXiv 2025

[49] [49]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022

2022

[50] [50]

Latent diffusion model without variational autoencoder.arXiv preprint arXiv:2510.15301, 2025

Minglei Shi, Haolin Wang, Wenzhao Zheng, Ziyang Yuan, Xiaoshi Wu, Xintao Wang, Pengfei Wan, Jie Zhou, and Jiwen Lu. Latent diffusion model without variational autoencoder.arXiv preprint arXiv:2510.15301, 2025

work page arXiv 2025

[51] [51]

RecTok: Reconstruction distillation along rectified flow

Qingyu Shi, Size Wu, Jinbin Bai, Kaidong Yu, Yujing Wang, Yunhai Tong, Xiangtai Li, and Xuelong Li. RecTok: Reconstruction distillation along rectified flow. InCVPR, 2026

2026

[52] [52]

DINOv3

Oriane Simeoni et al. DINOv3.arXiv preprint arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[53] [53]

UniLiP: Adapting clip for unified multimodal understanding, generation and editing.arXiv preprint arXiv:2507.23278, 2025

Hao Tang, Chenwei Xie, Xiaoyi Bao, Tingyu Weng, Pandeng Li, Yun Zheng, and Liwei Wang. UniLiP: Adapting clip for unified multimodal understanding, generation and editing.arXiv preprint arXiv:2507.23278, 2025

work page arXiv 2025

[54] [54]

Comprehensive molecular characterization of human colon and rectal cancer.Nature, 487(7407):330–337, 2012

The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer.Nature, 487(7407):330–337, 2012. doi: 10.1038/nature11252

work page doi:10.1038/nature11252 2012

[55] [55]

Comprehensive molecular portraits of human breast tumours.Nature, 490:61–70, 2012

The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours.Nature, 490:61–70, 2012

2012

[56] [56]

von Renesse and Karl-Theodor Sturm

Max-K. von Renesse and Karl-Theodor Sturm. Transport inequalities, gradient estimates, en- tropy and Ricci curvature.Communications on Pure and Applied Mathematics, 58(7):923–940, 2005

2005

[57] [57]

Self-improving generative foundation model for synthetic medical image generation and clinical applications.Nature Medicine, 31 (2):609–617, 2025

Jinzhuo Wang, Kai Wang, Yunfang Yu, Yuxing Lu, et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications.Nature Medicine, 31 (2):609–617, 2025

2025

[58] [58]

Diffuse and Disperse: Im- age Generation with Representation Regularization, 2025

Runqian Wang and Kaiming He. Diffuse and disperse: Image generation with representation regularization.arXiv preprint arXiv:2506.09027, 2025

work page arXiv 2025

[59] [59]

Olguin, Jeffrey J

Jinxi Xiang, Xiyue Wang, Xiaoming Zhang, Yinghua Xi, Feyisope Eweje, Yuchen Chen, Yuan- feng Li, Colin Bergstrom, Matthew Gopaulchan, Ted Kim, Kun-Hsing Yu, Sierra Willens, Francesca M. Olguin, Jeffrey J. Nirschl, Joel Neal, Maximilian Diehn, Sen Yang, and Ruijiang Li. A vision-language foundation model for precision oncology.Nature, 638(8051):769–778, 2025

2025

[60] [60]

Conghao Xiong, Zhengrui Guo, Zhe Xu, Yifei Zhang, Raymond Kai-yu Tong, Si Yong Yeo, Hao Chen, Joseph J. Y . Sung, and Irwin King. Exploiting low-dimensional manifold of features for few-shot whole slide image classification. InICLR, 2026. 12

2026

[61] [61]

A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

2024

[62] [62]

TopoCellGen: Generating histopathology cell topology with a diffusion model

Meilong Xu, Saumya Gupta, Xiaoling Hu, Chen Li, Shahira Abousamra, Dimitris Samaras, Prateek Prasanna, and Chao Chen. TopoCellGen: Generating histopathology cell topology with a diffusion model. InCVPR, 2025

2025

[63] [63]

Latent denoising makes good visual tokenizers.arXiv preprint arXiv:2507.15856, 2025

Jiawei Yang, Tianhong Li, Lijie Fan, Yonglong Tian, and Yue Wang. Latent denoising makes good visual tokenizers.arXiv preprint arXiv:2507.15856, 2025

work page arXiv 2025

[64] [64]

Reconstruction vs

Jingfeng Yao, Bin Yang, and Xinggang Wang. Reconstruction vs. generation: Taming opti- mization dilemma in latent diffusion models. InCVPR, 2025

2025

[65] [65]

PathLDM: Text conditioned latent diffusion model for histopathology

Srikar Yellapragada, Alexandros Graikos, Prateek Prasanna, Tahsin Kurc, Joel Saltz, and Dim- itris Samaras. PathLDM: Text conditioned latent diffusion model for histopathology. In IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024

2024

[66] [66]

Knudsen, Tahsin Kurc, Rajarsi R

Srikar Yellapragada, Alexandros Graikos, Zilinghan Li, Kostas Triaridis, Varun Belagali, Tarak Nath Nandi, Karen Bai, Beatrice S. Knudsen, Tahsin Kurc, Rajarsi R. Gupta, Prateek Prasanna, Ravi K. Madduri, Joel Saltz, and Dimitris Samaras. PixCell: A generative founda- tion model for digital histopathology images.arXiv preprint arXiv:2506.05127, 2025

work page arXiv 2025

[67] [67]

Representation alignment for generation: Training diffusion transformers is easier than you think

Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training diffusion transformers is easier than you think. InICLR, 2025

2025

[68] [68]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InCVPR, 2018

2018

[69] [69]

Vision foundation models as effective visual tokenizers for autore- gressive image generation

Anlin Zheng, Xin Wen, Xuanyang Zhang, Chuofan Ma, Tiancai Wang, Gang Yu, Xiangyu Zhang, and Xiaojuan Qi. Vision foundation models as effective visual tokenizers for autore- gressive image generation. InNeurIPS, 2025

2025

[70] [70]

Diffusion Transformers with Representation Autoencoders

Boyang Zheng, Nanye Ma, Shengbang Tong, and Saining Xie. Diffusion transformers with representation autoencoders.arXiv preprint arXiv:2510.11690, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[71] [71]

Stabilize the latent space for image autoregressive modeling: A unified perspective

Yongxin Zhu, Bocheng Li, Hang Zhang, Xin Li, Linli Xu, and Lidong Bing. Stabilize the latent space for image autoregressive modeling: A unified perspective. InNeurIPS, 2024

2024

[72] [72]

Virchow2:Scalingself-supervisedmixedmagnification models in pathology

Eric Zimmermann, Eugene V orontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi, Thomas Fuchs, Nicolo Fusi, Siqi Liu, and Kristen Severson. Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738, 2024. 13 A Mathematical Prelim...

work page arXiv 2024