pith. sign in

arxiv: 2606.07036 · v1 · pith:DCKQATYYnew · submitted 2026-06-05 · 💻 cs.CV · cs.AI· cs.CE· cs.LG

STREAM: Stochastic Riemannian Flow Matching with Anisotropic Decoder for Digital Histopathology Image Generation

Pith reviewed 2026-06-27 22:37 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CEcs.LG
keywords histopathology image generationRiemannian flow matchinganisotropic decodervision foundation modelsconditioning collapsesynthetic datapatch-token featureslatent diffusion
0
0 comments X

The pith

STREAM uses Riemannian flow matching on hypersphere patch features from vision models to generate synthetic histopathology images without conditioning collapse.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that patch-token features extracted from pretrained histopathology vision foundation models are l2-normalized and sit on the unit hypersphere with angular dominance. Treating these features as the latent space rather than as conditioning signals prevents the conditioning from overwhelming generation. A bridge-type stochastic perturbation creates per-token rectifiability on the sphere so a Diffusion Transformer can be trained directly in that space. An anisotropic decoder then modulates the velocity-field Jacobian to maintain fidelity along high-energy directions while adding robustness along low-energy ones. If correct, this produces higher-quality and more diverse synthetic images for breast and colorectal cancer data than prior conditioning-based latent diffusion approaches.

Core claim

The central claim is that pretrained histopathology VFM patch-token features are l2-normalized and lie on the unit hypersphere S^{d-1} with strong angular dominance and intrinsic curvature, making them naturally suited for Riemannian formulation; STREAM therefore applies a bridge-type stochastic perturbation to establish per-token rectifiability on S^{d-1} for training a DiT in latent space together with a novel anisotropic decoder that allocates robustness to low-energy directions of the velocity-field Jacobian while preserving fidelity along its high-energy directions, yielding state-of-the-art reconstruction and generation on breast and colorectal cancer datasets.

What carries the argument

Stochastic Riemannian flow matching on unit-hypersphere patch-token features, using a bridge-type perturbation for rectifiability plus an anisotropic decoder that weights the velocity-field Jacobian by energy direction.

If this is right

  • Avoids conditioning collapse by using VFM features as the latent space itself instead of as external conditioning.
  • Achieves state-of-the-art reconstruction and generation performance on breast and colorectal cancer histopathology datasets.
  • The stochastic perturbation establishes per-token rectifiability on the hypersphere for stable DiT training.
  • The anisotropic decoder preserves high-energy fidelity while adding robustness in low-energy Jacobian directions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hypersphere formulation may apply to other medical imaging domains where VFM patch features show comparable angular structure.
  • Manifold-aware flow matching could reduce reliance on elaborate conditioning schemes across latent generative models.
  • Higher diversity from this approach might improve downstream training of pathology foundation models on synthetic data.
  • Direct tests on additional tissue types or non-cancer histopathology would clarify the range of the spherical assumption.

Load-bearing premise

The l2-normalized patch-token features possess strong angular dominance and intrinsic curvature that make Riemannian geometry on the hypersphere superior to Euclidean alternatives for this generation task.

What would settle it

Training the identical DiT and decoder pipeline with standard Euclidean flow matching on the same VFM features and observing whether reconstruction FID rises or sample diversity falls compared with the Riemannian version.

Figures

Figures reproduced from arXiv: 2606.07036 by Daeky Jeong, Hongjun Yoon, Hyeongyeol Lim, Won June Cho.

Figure 1
Figure 1. Figure 1: Overview of STREAM. A frozen pathology VFM encoder maps each input patch to N=256 tokens on the unit hypersphere S d−1 ; a Diffusion Transformer learns to transport a uni￾form source distribution on (S d−1 ) N to the data distribution along bridge-perturbed geodesics. A separately-trained anisotropic decoder reconstructs histopathology images from generated features, with directional noise injection guided… view at source ↗
Figure 2
Figure 2. Figure 2: Generation samples from each trained model on TCGA-BRCA and TCGA-COADREAD. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Intrinsic geometry of histopathology VFM feature manifolds on TCGA-BRCA. (a) Eigen [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Reconstruction comparison on TCGA-BRCA: ground truth (GT) and reconstruction [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Reconstruction comparison on TCGA-COADREAD: ground truth (GT) and reconstruc [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Generation samples on TCGA-BRCA: unconditional samples (used for gFID evaluation) [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Generation samples on TCGA-COADREAD: unconditional samples (used for gFID eval [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
read the original abstract

Synthetic histopathology image generation addresses critical challenges in computational pathology, including patient privacy and the growing need for large-scale training data for foundation models. Latent diffusion models have dominated the image generation domain, with recent works emphasizing that the choice of latent space is critical to the quality of generated images. Existing state-of-the-art generative models in histopathology use pretrained Vision Foundation Models (VFMs) as conditioning signals, and we observe that this leads to "conditioning collapse," where the conditioning signal dominates the latent space and lowers the quality and diversity of generated samples. Therefore, we instead use pretrained histopathology VFMs as the latent space itself, leveraging their patch-token features that encode rich semantic information. We empirically show that these features are $\ell_2$-normalized and lie on the unit hypersphere $\mathcal{S}^{d-1}$ with strong angular dominance and intrinsic curvature, making them naturally suited for a Riemannian formulation. We therefore present STREAM, the first framework to apply Riemannian flow matching in the pathology domain. STREAM consists of two stages: 1) a bridge-type stochastic perturbation that establishes per-token rectifiability on $\mathcal{S}^{d-1}$ for training a Diffusion Transformer (DiT) in latent space, and 2) a novel anisotropic decoder that allocates robustness to low-energy directions of the velocity-field Jacobian while preserving fidelity along its high-energy directions. Together, STREAM achieves state-of-the-art reconstruction and generation performance on breast and colorectal cancer datasets. The code will be publicly released upon acceptance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes STREAM, a two-stage framework for histopathology image generation that treats ℓ_{2}-normalized patch-token features from pretrained VFMs as points on the unit hypersphere S^{d-1}. It introduces a bridge-type stochastic perturbation to enable rectifiable training of a DiT via Riemannian flow matching, paired with a novel anisotropic decoder that modulates Jacobian robustness along low- versus high-energy directions, claiming this avoids conditioning collapse and yields SOTA reconstruction and generation on breast and colorectal cancer datasets.

Significance. If the performance claims hold with proper controls, the work could meaningfully advance generative modeling in computational pathology by showing that Riemannian geometry on VFM latents can outperform conditioning-based baselines, while the public code release would aid reproducibility.

major comments (2)
  1. [Abstract] Abstract: the SOTA reconstruction and generation claims are unsupported by any quantitative results, error bars, baseline comparisons, dataset sizes, or metric values. This is load-bearing for the central empirical claim.
  2. [Abstract] Abstract (paragraph on empirical observation and method description): the assertion that the observed ℓ_{2}-normalization and angular dominance make the features 'naturally suited' for a Riemannian formulation is not accompanied by an ablation or comparison isolating the manifold geometry from the bridge perturbation or anisotropic decoder; without this, the necessity of the Riemannian component over Euclidean flow matching on the same normalized tokens remains unverified.
minor comments (1)
  1. [Abstract] Abstract: consider specifying the exact datasets (e.g., names and splits) and metrics used to support the SOTA statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We agree that the abstract must be revised to include quantitative support for the SOTA claims. On the second point, we will strengthen the presentation by adding a targeted ablation that isolates the contribution of the Riemannian geometry.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the SOTA reconstruction and generation claims are unsupported by any quantitative results, error bars, baseline comparisons, dataset sizes, or metric values. This is load-bearing for the central empirical claim.

    Authors: We agree that the abstract, as currently written, does not contain the specific metric values, error bars, baseline names, or dataset sizes needed to substantiate the SOTA claim on first reading. The full manuscript reports these quantities (FID, reconstruction MSE, diversity metrics, dataset cardinalities for the breast and colorectal cohorts, and statistical significance) in the experimental sections and tables. We will revise the abstract to incorporate the key quantitative results and error bars so that the central empirical claim is self-contained and verifiable. revision: yes

  2. Referee: [Abstract] Abstract (paragraph on empirical observation and method description): the assertion that the observed ℓ₂-normalization and angular dominance make the features 'naturally suited' for a Riemannian formulation is not accompanied by an ablation or comparison isolating the manifold geometry from the bridge perturbation or anisotropic decoder; without this, the necessity of the Riemannian component over Euclidean flow matching on the same normalized tokens remains unverified.

    Authors: The abstract condenses the empirical observation that VFM patch tokens are ℓ₂-normalized and exhibit angular dominance. The full manuscript motivates the Riemannian formulation from these properties and evaluates the complete STREAM pipeline against Euclidean baselines. To directly address the referee’s request for isolation, we will add an explicit ablation that trains an otherwise identical Euclidean flow-matching model on the same normalized tokens and compares it to the Riemannian version (keeping the bridge perturbation and anisotropic decoder fixed where applicable). This will be included in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation self-contained via empirical observation and new components

full rationale

The paper's chain starts from an empirical observation (ℓ2-normalized VFM patch tokens on S^{d-1}) that is presented as data-driven motivation rather than a definitional input. It then introduces independent elements (bridge-type stochastic perturbation for rectifiability and anisotropic decoder for Jacobian robustness) to build the Riemannian flow matching framework. No equations reduce a claimed prediction to a fitted parameter by construction, no self-citations are invoked as load-bearing uniqueness theorems, and no ansatz is smuggled via prior author work. The SOTA claim is tied to reported performance on external datasets, not internal redefinition. This matches the default non-circular case.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

Review based solely on abstract; full parameter counts, training details, and assumption justifications unavailable.

axioms (2)
  • domain assumption Pretrained histopathology VFMs produce ℓ2-normalized patch-token features lying on the unit hypersphere S^{d-1}
    Abstract states this is empirically shown and forms the basis for the Riemannian formulation.
  • domain assumption These features exhibit strong angular dominance and intrinsic curvature making them naturally suited for Riemannian flow matching
    Abstract invokes this property to justify the choice of geometry over Euclidean latent diffusion.
invented entities (1)
  • Anisotropic decoder no independent evidence
    purpose: Allocates robustness to low-energy directions of the velocity-field Jacobian while preserving fidelity along high-energy directions
    Novel component introduced to handle the Riemannian velocity field; no independent evidence provided outside the paper.

pith-pipeline@v0.9.1-grok · 5821 in / 1462 out tokens · 30344 ms · 2026-06-27T22:37:02.235474+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 21 canonical work pages · 7 internal anchors

  1. [1]

    Albergo and Eric Vanden-Eijnden

    Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. InICLR, 2023

  2. [2]

    Diffusions hypercontractives

    Dominique Bakry and Michel Émery. Diffusions hypercontractives. InSéminaire de Proba- bilités XIX. Springer, 1985

  3. [3]

    On the Convergence and Straightness of Rectified Flow

    Vansh Bansal, Saptarshi Roy, Purnamrita Sarkar, and Alessandro Rinaldo. On the Wasserstein convergence and straightness of rectified flows.arXiv preprint arXiv:2410.14949, 2024

  4. [4]

    A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

    Jean-David Benamou and Yves Brenier. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem.Numerische Mathematik, 84(3):375–393, 2000

  5. [5]

    Sutherland, Michael Arbel, and Arthur Gretton

    Mikołaj Bi ´nkowski, Dougal J. Sutherland, Michael Arbel, and Arthur Gretton. Demystifying MMD GANs. InInternational Conference on Learning Representations (ICLR), 2018

  6. [6]

    Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J

    Gabriele Campanella, Matthew G. Hanna, Luke Geneslaw, Allen Miraflor, Vitor Werneck Krauss Silva, Klaus J. Busam, Edi Brogi, Victor E. Reuter, David S. Klimstra, and Thomas J. Fuchs. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.Nature Medicine, 25:1301–1309, 2019

  7. [7]

    Hyperspherical Autoencoder for High-Fidelity Image Reconstruction and Generation

    Hun Chang, Byunghee Cha, and Jong Chul Ye. DINO-SAE: DINO spherical autoencoder for high-fidelity image reconstruction and generation.arXiv preprint arXiv:2601.22904, 2026

  8. [8]

    Aligning visual foundation encoders to tokenizers for diffusion models

    Bowei Chen, Sai Bi, Hao Tan, He Zhang, Tianyuan Zhang, Zhengqi Li, Yuanjun Xiong, Jian- ming Zhang, and Kai Zhang. Aligning visual foundation encoders to tokenizers for diffusion models. InICLR, 2026

  9. [9]

    Masked autoencoders are effective tokenizers for diffusion models

    Hao Chen, Yujin Han, Fangyi Chen, et al. Masked autoencoders are effective tokenizers for diffusion models. InICML, 2025

  10. [10]

    Chen, Tong Ding, Ming Y

    Richard J. Chen, Tong Ding, Ming Y . Lu, Drew F. K. Williamson, et al. Towards a general- purpose foundation model for computational pathology.Nature Medicine, 30(3):850–862, 2024

  11. [11]

    Ricky T. Q. Chen and Yaron Lipman. Flow matching on general geometries. InICLR, 2024

  12. [12]

    do Carmo.Riemannian Geometry

    Manfredo P. do Carmo.Riemannian Geometry. Birkhäuser, 1992

  13. [13]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. InICLR, 2021

  14. [14]

    Taming transformers for high-resolution image synthesis

    Patrick Esser, Robin Rombach, and Björn Ommer. Taming transformers for high-resolution image synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  15. [15]

    Scaling rectified flow transformers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorber, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis. InICML, 2024

  16. [16]

    The Vendi score: A diversity evaluation metric for machine learning.Transactions on Machine Learning Research, 2023

    Dan Friedman and Adji Bousso Dieng. The Vendi score: A diversity evaluation metric for machine learning.Transactions on Machine Learning Research, 2023

  17. [17]

    Learned representation-guided diffusion models for large- image generation

    Alexandros Graikos, Srikar Yellapragada, Minh-Quan Le, Saarthak Kapse, Prateek Prasanna, Joel Saltz, and Dimitris Samaras. Learned representation-guided diffusion models for large- image generation. InCVPR, 2024

  18. [18]

    Nathan Halko, Per-Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.SIAM Review, 53(2):217–288, 2011

  19. [19]

    On the relation between rectified flows and optimal transport.arXiv preprint arXiv:2505.19712, 2025

    Johannes Hertrich, Antonin Chambolle, and Julie Delon. On the relation between rectified flows and optimal transport.arXiv preprint arXiv:2505.19712, 2025

  20. [20]

    GANs trained by a two time-scale update rule converge to a local Nash equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochre- iter. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In NeurIPS, 2017. 10

  21. [21]

    Denoising diffusion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020

  22. [22]

    Conditional Vendi Score: Prompt-Aware Diversity Evaluation for Generative AI Models and LLMs

    Mohammad Jalali, Azim Ospanov, Amin Gohari, and Farzan Farnia. Conditional vendi score: An information-theoretic approach to diversity evaluation of prompt-based generative models. arXiv preprint arXiv:2411.02817, 2024

  23. [23]

    Towards an explainable comparison and alignment of feature embeddings.arXiv preprint arXiv:2506.06231, 2025

    Mohammad Jalali, Bahar Dibaei Nia, and Farzan Farnia. Towards an explainable comparison and alignment of feature embeddings.arXiv preprint arXiv:2506.06231, 2025

  24. [24]

    Rethinking FID: Towards a better evaluation metric for image generation

    Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, and Sanjiv Kumar. Rethinking FID: Towards a better evaluation metric for image generation. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  25. [25]

    de Jong, Ioannis Gatopoulos, Nicolas Känzig, Mikhail Karasikov, Axel Lagré, Roman Moser, Joost van Doorn, and Fei Tang

    kaiko.ai, Nanne Aben, Edwin D. de Jong, Ioannis Gatopoulos, Nicolas Känzig, Mikhail Karasikov, Axel Lagré, Roman Moser, Joost van Doorn, and Fei Tang. Towards large-scale training of pathology foundation models.arXiv preprint arXiv:2404.15217, 2024

  26. [26]

    EQ-V AE: Equivariance regularized latent space for improved generative image modeling

    Theodoros Kouzelis, Ioannis Kakogeorgiou, Spyros Gidaris, and Nikos Komodakis. EQ-V AE: Equivariance regularized latent space for improved generative image modeling. InICML, 2025

  27. [27]

    Amandeep Kumar and Vishal M. Patel. Learning on the manifold: Unlocking standard diffu- sion transformers with representation encoders.arXiv preprint arXiv:2602.10099, 2026

  28. [28]

    InECCV, 2024

    Minh-Quan Le, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, and Dim- itris Samaras.∞-Brush: Controllable large image synthesis with diffusion models in infinite dimensions. InECCV, 2024

  29. [29]

    Lee.Introduction to Riemannian Manifolds

    John M. Lee.Introduction to Riemannian Manifolds. Springer, 2nd edition, 2018

  30. [30]

    Geometry-aware image flow matching

    Junho Lee, Kwanseok Kim, and Joonseok Lee. Geometry-aware image flow matching. In International Conference on Machine Learning (ICML), 2026

  31. [31]

    Improving the training of rectified flows

    Sangyun Lee, Zinan Lin, and Giulia Fanti. Improving the training of rectified flows. In NeurIPS, 2024

  32. [32]

    REPA-E: Unlocking V AE for end-to-end tuning with latent diffusion transformers

    Xingjian Leng, Jaskirat Singh, Yunzhong Hou, Zhenchang Xing, Saining Xie, and Liang Zheng. REPA-E: Unlocking V AE for end-to-end tuning with latent diffusion transformers. InICCV, 2025

  33. [33]

    Back to Basics: Let Denoising Generative Models Denoise

    Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720, 2025

  34. [34]

    Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InICLR, 2023

  35. [35]

    Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen A. W. M. van der Laak, Bram van Ginneken, and Clara I. Sánchez. A survey on deep learning in medical image analysis.Medical Image Analysis, 42:60–88, 2017

  36. [36]

    Rectified Flow: A Marginal Preserving Approach to Optimal Transport

    Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport.arXiv preprint arXiv:2209.14577, 2022

  37. [37]

    Improving reconstruction of representation autoencoder.arXiv preprint arXiv:2602.08620, 2026

    Siyu Liu, Chujie Qin, Hubery Yin, Qixin Yan, Zheng-Peng Duan, Chen Li, Jing Lyu, Chun-Le Guo, and Chongyi Li. Improving reconstruction of representation autoencoder.arXiv preprint arXiv:2602.08620, 2026

  38. [38]

    Flow straight and fast: Learning to generate and transfer data with rectified flows

    Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flows. InICLR, 2023

  39. [39]

    On the regularity of solutions of optimal transportation problems.Acta Mathematica, 202(2):241–283, 2009

    Grégoire Loeper. On the regularity of solutions of optimal transportation problems.Acta Mathematica, 202(2):241–283, 2009

  40. [40]

    Lu, Bowen Chen, Drew F

    Ming Y . Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, Anil V . Sahai, and Faisal Mah- mood. A visual-language foundation model for computational pathology.Nature Medicine, 2024

  41. [41]

    Albergo, Nicholas M

    Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden, and Saining Xie. SiT: Exploring flow and diffusion-based generative models with scalable interpolant transformers. InECCV, 2024. 11

  42. [42]

    Trudinger, and Xu-Jia Wang

    Xi-Nan Ma, Neil S. Trudinger, and Xu-Jia Wang. Regularity of potential functions of the optimal transportation problem.Archive for Rational Mechanics and Analysis, 177(2):151– 183, 2005

  43. [43]

    Co-synthesis of histopathology nuclei image- label pairs using a context-conditioned joint diffusion model

    Seonghui Min, Hyun-Jic Oh, and Won-Ki Jeong. Co-synthesis of histopathology nuclei image- label pairs using a context-conditioned joint diffusion model. InECCV, 2024. doi: 10.1007/ 978-3-031-72624-8_9

  44. [44]

    Spider: A com- prehensive multi-organ supervised pathology dataset and baseline models

    Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova. SPIDER: A compre- hensive multi-organ supervised pathology dataset and baseline models.arXiv preprint arXiv:2503.02876, 2025

  45. [45]

    DINOv2: Learning robust visual fea- tures without supervision.TMLR, 2024

    Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khali- dov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud As- sran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patric...

  46. [46]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InICCV, 2023

  47. [47]

    Nonparametric regression estimation on closed Riemannian manifolds.Jour- nal of Nonparametric Statistics, 18(1):57–67, 2006

    Bruno Pelletier. Nonparametric regression estimation on closed Riemannian manifolds.Jour- nal of Nonparametric Statistics, 18(1):57–67, 2006

  48. [48]

    Image tokenizer needs post-training.arXiv preprint arXiv:2509.12474, 2025

    Kai Qiu, Xiang Li, Hao Chen, Jason Kuen, Xiaohao Xu, Jiuxiang Gu, Yinyi Luo, Bhiksha Raj, Zhe Lin, and Marios Savvides. Image tokenizer needs post-training.arXiv preprint arXiv:2509.12474, 2025

  49. [49]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022

  50. [50]

    Latent diffusion model without variational autoencoder.arXiv preprint arXiv:2510.15301, 2025

    Minglei Shi, Haolin Wang, Wenzhao Zheng, Ziyang Yuan, Xiaoshi Wu, Xintao Wang, Pengfei Wan, Jie Zhou, and Jiwen Lu. Latent diffusion model without variational autoencoder.arXiv preprint arXiv:2510.15301, 2025

  51. [51]

    RecTok: Reconstruction distillation along rectified flow

    Qingyu Shi, Size Wu, Jinbin Bai, Kaidong Yu, Yujing Wang, Yunhai Tong, Xiangtai Li, and Xuelong Li. RecTok: Reconstruction distillation along rectified flow. InCVPR, 2026

  52. [52]

    DINOv3

    Oriane Simeoni et al. DINOv3.arXiv preprint arXiv:2508.10104, 2025

  53. [53]

    UniLiP: Adapting clip for unified multimodal understanding, generation and editing.arXiv preprint arXiv:2507.23278, 2025

    Hao Tang, Chenwei Xie, Xiaoyi Bao, Tingyu Weng, Pandeng Li, Yun Zheng, and Liwei Wang. UniLiP: Adapting clip for unified multimodal understanding, generation and editing.arXiv preprint arXiv:2507.23278, 2025

  54. [54]

    Comprehensive molecular characterization of human colon and rectal cancer.Nature, 487(7407):330–337, 2012

    The Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer.Nature, 487(7407):330–337, 2012. doi: 10.1038/nature11252

  55. [55]

    Comprehensive molecular portraits of human breast tumours.Nature, 490:61–70, 2012

    The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours.Nature, 490:61–70, 2012

  56. [56]

    von Renesse and Karl-Theodor Sturm

    Max-K. von Renesse and Karl-Theodor Sturm. Transport inequalities, gradient estimates, en- tropy and Ricci curvature.Communications on Pure and Applied Mathematics, 58(7):923–940, 2005

  57. [57]

    Self-improving generative foundation model for synthetic medical image generation and clinical applications.Nature Medicine, 31 (2):609–617, 2025

    Jinzhuo Wang, Kai Wang, Yunfang Yu, Yuxing Lu, et al. Self-improving generative foundation model for synthetic medical image generation and clinical applications.Nature Medicine, 31 (2):609–617, 2025

  58. [58]

    Diffuse and Disperse: Im- age Generation with Representation Regularization, 2025

    Runqian Wang and Kaiming He. Diffuse and disperse: Image generation with representation regularization.arXiv preprint arXiv:2506.09027, 2025

  59. [59]

    Olguin, Jeffrey J

    Jinxi Xiang, Xiyue Wang, Xiaoming Zhang, Yinghua Xi, Feyisope Eweje, Yuchen Chen, Yuan- feng Li, Colin Bergstrom, Matthew Gopaulchan, Ted Kim, Kun-Hsing Yu, Sierra Willens, Francesca M. Olguin, Jeffrey J. Nirschl, Joel Neal, Maximilian Diehn, Sen Yang, and Ruijiang Li. A vision-language foundation model for precision oncology.Nature, 638(8051):769–778, 2025

  60. [60]

    Conghao Xiong, Zhengrui Guo, Zhe Xu, Yifei Zhang, Raymond Kai-yu Tong, Si Yong Yeo, Hao Chen, Joseph J. Y . Sung, and Irwin King. Exploiting low-dimensional manifold of features for few-shot whole slide image classification. InICLR, 2026. 12

  61. [61]

    A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

    Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, 630(8015):181–188, 2024

  62. [62]

    TopoCellGen: Generating histopathology cell topology with a diffusion model

    Meilong Xu, Saumya Gupta, Xiaoling Hu, Chen Li, Shahira Abousamra, Dimitris Samaras, Prateek Prasanna, and Chao Chen. TopoCellGen: Generating histopathology cell topology with a diffusion model. InCVPR, 2025

  63. [63]

    Latent denoising makes good visual tokenizers.arXiv preprint arXiv:2507.15856, 2025

    Jiawei Yang, Tianhong Li, Lijie Fan, Yonglong Tian, and Yue Wang. Latent denoising makes good visual tokenizers.arXiv preprint arXiv:2507.15856, 2025

  64. [64]

    Reconstruction vs

    Jingfeng Yao, Bin Yang, and Xinggang Wang. Reconstruction vs. generation: Taming opti- mization dilemma in latent diffusion models. InCVPR, 2025

  65. [65]

    PathLDM: Text conditioned latent diffusion model for histopathology

    Srikar Yellapragada, Alexandros Graikos, Prateek Prasanna, Tahsin Kurc, Joel Saltz, and Dim- itris Samaras. PathLDM: Text conditioned latent diffusion model for histopathology. In IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024

  66. [66]

    Knudsen, Tahsin Kurc, Rajarsi R

    Srikar Yellapragada, Alexandros Graikos, Zilinghan Li, Kostas Triaridis, Varun Belagali, Tarak Nath Nandi, Karen Bai, Beatrice S. Knudsen, Tahsin Kurc, Rajarsi R. Gupta, Prateek Prasanna, Ravi K. Madduri, Joel Saltz, and Dimitris Samaras. PixCell: A generative founda- tion model for digital histopathology images.arXiv preprint arXiv:2506.05127, 2025

  67. [67]

    Representation alignment for generation: Training diffusion transformers is easier than you think

    Sihyun Yu, Sangkyung Kwak, Huiwon Jang, Jongheon Jeong, Jonathan Huang, Jinwoo Shin, and Saining Xie. Representation alignment for generation: Training diffusion transformers is easier than you think. InICLR, 2025

  68. [68]

    Efros, Eli Shechtman, and Oliver Wang

    Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unrea- sonable effectiveness of deep features as a perceptual metric. InCVPR, 2018

  69. [69]

    Vision foundation models as effective visual tokenizers for autore- gressive image generation

    Anlin Zheng, Xin Wen, Xuanyang Zhang, Chuofan Ma, Tiancai Wang, Gang Yu, Xiangyu Zhang, and Xiaojuan Qi. Vision foundation models as effective visual tokenizers for autore- gressive image generation. InNeurIPS, 2025

  70. [70]

    Diffusion Transformers with Representation Autoencoders

    Boyang Zheng, Nanye Ma, Shengbang Tong, and Saining Xie. Diffusion transformers with representation autoencoders.arXiv preprint arXiv:2510.11690, 2025

  71. [71]

    Stabilize the latent space for image autoregressive modeling: A unified perspective

    Yongxin Zhu, Bocheng Li, Hang Zhang, Xin Li, Linli Xu, and Lidong Bing. Stabilize the latent space for image autoregressive modeling: A unified perspective. InNeurIPS, 2024

  72. [72]

    Virchow2:Scalingself-supervisedmixedmagnification models in pathology

    Eric Zimmermann, Eugene V orontsov, Julian Viret, Adam Casson, Michal Zelechowski, George Shaikovski, Neil Tenenholtz, James Hall, David Klimstra, Razik Yousfi, Thomas Fuchs, Nicolo Fusi, Siqi Liu, and Kristen Severson. Virchow2: Scaling self-supervised mixed magnification models in pathology.arXiv preprint arXiv:2408.00738, 2024. 13 A Mathematical Prelim...