PathAR: Structure-First Autoregressive Synthesis of Multimodal Pathology Images

Feng Chen; Guanyu Yang; Huazhu Fu; Jiahao Xia; Junzhang Huang; Meng Wang; Yuan Zhang

arxiv: 2606.01543 · v1 · pith:P5ICPQYLnew · submitted 2026-06-01 · 💻 cs.CV

PathAR: Structure-First Autoregressive Synthesis of Multimodal Pathology Images

Yuan Zhang , Jiahao Xia , Junzhang Huang , Meng Wang , Feng Chen , Guanyu Yang , Huazhu Fu This is my paper

Pith reviewed 2026-06-28 15:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords pathology image synthesisautoregressive modelingmultimodal generationstructure factorizationvector quantizationmedical image generationdual tokenizer

0 comments

The pith

PathAR explicitly factorizes structure and appearance tokens to generate anatomically coherent multimodal pathology images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes PathAR to address data scarcity in multimodal pathology by synthesizing images that preserve morphological structure while varying appearance across modalities. It claims that current methods implicitly couple structure with appearance in a single token stream, reducing controllability when modalities shift. PathAR instead uses a dual vector quantization tokenizer to separate mask-grounded structure tokens from appearance tokens, then applies an interleaved autoregressive transformer with asymmetric attention to enforce structure-first generation conditioned on modality labels. A sympathetic reader would care because reliable synthetic data with preserved anatomy could support training of downstream models like segmenters in low-data settings without introducing structural artifacts.

Core claim

PathAR employs a dual vector quantization tokenizer to decompose samples into mask-grounded structure and appearance tokens, and an interleaved autoregressive transformer with asymmetric attention visibility to enforce structure-to-appearance dependence, stabilizing morphology under heterogeneous modality-specific appearances and enabling spatially aligned image-mask pair generation.

What carries the argument

Dual-VQ tokenizer that produces separate structure and appearance tokens, combined with an IAR transformer using asymmetric attention to enforce structure-to-appearance dependence.

If this is right

PathAR improves structural consistency and modality fidelity over baselines.
The method maintains sample diversity while supporting spatially aligned outputs.
It enables downstream segmentation training in data-scarce regimes.
The framework extends to finer-grained intra-modality organ-label variation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The structure-first approach could apply to other medical imaging domains where anatomical features remain stable across different acquisition methods.
Conditional generation on modality labels might allow efficient adaptation without full model retraining for new imaging protocols.
Providing explicit structure tokens could support controllable editing of pathology images for augmentation tasks.

Load-bearing premise

Morphological structures such as cellular topology and tissue boundaries are largely preserved across acquisition protocols, allowing explicit factorization of structure and appearance without loss of anatomical coherence.

What would settle it

Generating images from a modality pair where cellular topology or tissue boundaries visibly differ would produce misaligned masks or anatomically incoherent outputs if the factorization premise fails.

Figures

Figures reproduced from arXiv: 2606.01543 by Feng Chen, Guanyu Yang, Huazhu Fu, Jiahao Xia, Junzhang Huang, Meng Wang, Yuan Zhang.

**Figure 1.** Figure 1: Motivation of PathAR for pathology image generation conditioned on the modality label. (a) Entangled stochastic generation can match modality-specific appearance but often disrupts morphology under multimodal heterogeneity. (b) PathAR uses factorized generation: it splits tokens into structure and appearance streams and enforces a structure-first autoregressive dependency to better maintain morphological c… view at source ↗

**Figure 2.** Figure 2: Overview of PathAR. Stage 1: Dual-VQ factorizes pathological images into structure and appearance token streams, where the former is anchored by mask-guided reconstruction to yield discrete, semantically rich tokens. Stage 2: The interleaved autoregressive transformer (IAR) models the modalityconditional joint token distribution via next-token prediction under asymmetric attention visibility, enforcing st… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of multimodal pathology generation at 256 × 256. Top: (L+M)2I. Bottom: L2I. Generally, (L+M)2I stabilizes structure but may distort appearance across modalities, whereas L2I preserves modality appearance but often degrades structure without spatial anchors. PathAR (ours) maintains both structure and appearance in L2I and additionally generates an auxiliary mask. Representative defect… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on PanNuke. Generated samples from representative methods are shown across the 19 organ labels of PanNuke. At inference, (L+M)2I methods are given the organ label and the ground-truth mask to generate an image, whereas L2I methods are given only the organ label to generate an image. PathAR maintains stable cellular morphology and fine-grained histological texture across diverse categ… view at source ↗

**Figure 3.** Figure 3: Visualization of original a d idd b Dual-VQ reconstruction examples. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 6.** Figure 6: Cross-sample swapping of structure and appearance tokens across modalities. Each row presents one token-swapping example constructed from two real samples of different modalities. (a) and (b) denote two real images indexed by p and q, which are encoded into structure–appearance token pairs (Sp, Ap) and (Sq, Aq), respectively, where S and A denote the structure and appearance token streams. After token swap… view at source ↗

**Figure 7.** Figure 7: t-SNE analysis of generated images and sampling hyperparameters. (a–d) Feature t-SNE. t-SNE visualizations of UNI embeddings for real and generated images. (a–c) Per-modality alignment: PathAR (orange) closely overlaps with Real (green) distributions. (d) Joint embedding of PathAR samples demonstrates distinct inter-modality separation and rich intra-modality diversity. (e–f) Hyperparameter tuning. Impact … view at source ↗

read the original abstract

Data scarcity in multimodal pathology motivates unified generative models that synthesize modality-specific appearance while preserving anatomically coherent structure. Although modalities differ in appearance statistics, morphological structures such as cellular topology and tissue boundaries are largely preserved across acquisition protocols. However, existing methods often model these factors within a homogeneous token stream, implicitly coupling structure with appearance and weakening structural controllability under modality shifts. To address this, we propose pathology Autorgressive modeling (PathAR), a structure-first autoregressive synthesis framework that explicitly factorizes structure and appearance for modality-label-conditioned pathology generation.PathAR employs a dual vector quantization (Dual-VQ) tokenizer to decompose samples into mask-grounded structure and appearance tokens, and an interleaved autoregressive (IAR) transformer with asymmetric attention visibility to enforce structure-to-appearance dependence. PathAR stabilizes morphology under heterogeneous modality-specific appearances and enables spatially aligned image--mask pair generation. Extensive experiments show that PathAR improves structural consistency and modality fidelity over baselines, maintains sample diversity, supports downstream segmentation in data-scarce regimes, and demonstrates extensibility to finer-grained intra-modality organ-label variation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PathAR adds Dual-VQ tokenization and asymmetric-attention IAR to enforce structure-before-appearance in pathology synthesis, which is a clean engineering move but incremental.

read the letter

The paper's main contribution is a structure-first autoregressive setup for multimodal pathology images. It uses a Dual-VQ tokenizer to pull out mask-grounded structure tokens separately from appearance tokens, then feeds them into an interleaved autoregressive transformer that uses asymmetric attention to keep structure visible while generating appearance. This directly targets the coupling issue in standard homogeneous token streams.

It does a few things cleanly. The factorization lets them generate spatially aligned image-mask pairs, which is useful for data-scarce medical settings. The abstract reports gains in structural consistency and modality fidelity over baselines, plus some downstream segmentation benefit. Those are the kinds of practical outcomes that matter for pathology work.

The soft spot is the central assumption that morphological structures stay largely preserved across modalities. The paper treats this as an empirical precondition and claims the experiments back it, but the strength of that evidence is hard to judge from the description alone. If the gains shrink once you control for stronger baselines or different datasets, the factorization benefit would look smaller. No obvious circularity or fitting issues show up.

This is for people building generative models for medical imaging who need controllable structure. A reader working on data augmentation for segmentation or cross-modality synthesis would get the most out of it. The proposal is concrete enough and the application relevant enough that it deserves a serious referee rather than a desk reject.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes PathAR, a structure-first autoregressive framework for synthesizing modality-specific pathology images while preserving anatomical coherence. It introduces a Dual-VQ tokenizer that decomposes inputs into mask-grounded structure tokens and appearance tokens, paired with an interleaved autoregressive (IAR) transformer using asymmetric attention to enforce strict structure-to-appearance ordering. The approach is motivated by data scarcity in multimodal pathology and claims to stabilize morphology under heterogeneous appearances, enable spatially aligned image-mask generation, and improve structural consistency, modality fidelity, sample diversity, and downstream segmentation performance over baselines.

Significance. If the empirical results hold, the explicit factorization of structure and appearance via Dual-VQ and ordered autoregressive modeling represents a targeted advance for controllable generation in medical imaging. The ability to produce aligned multimodal pairs could directly aid data augmentation in data-scarce regimes and support tasks such as segmentation. The paper positions its results as direct validation of the factorization without loss of coherence.

major comments (1)

[Abstract / Introduction] The central claim that Dual-VQ produces mask-grounded structure tokens that remain anatomically coherent across modality shifts rests on the precondition that morphological structures (cellular topology, tissue boundaries) are largely preserved across acquisition protocols. This assumption is stated in the abstract but requires a concrete quantitative test (e.g., cross-modality mask overlap or topology metrics on paired samples) to confirm it does not introduce coherence loss; without such evidence the factorization benefit cannot be isolated from the assumption.

minor comments (1)

[Abstract] The abstract refers to 'extensive experiments' and 'improved structural consistency' without naming datasets, metrics (e.g., FID, Dice, structural similarity), or baseline methods; adding one or two quantitative highlights would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. Below we provide a point-by-point response to the single major comment.

read point-by-point responses

Referee: [Abstract / Introduction] The central claim that Dual-VQ produces mask-grounded structure tokens that remain anatomically coherent across modality shifts rests on the precondition that morphological structures (cellular topology, tissue boundaries) are largely preserved across acquisition protocols. This assumption is stated in the abstract but requires a concrete quantitative test (e.g., cross-modality mask overlap or topology metrics on paired samples) to confirm it does not introduce coherence loss; without such evidence the factorization benefit cannot be isolated from the assumption.

Authors: We agree that a direct quantitative validation of cross-modality structural preservation would strengthen the presentation. The assumption is standard in the pathology literature because different modalities (e.g., H&E versus IHC) target the same underlying cellular and tissue morphology. Our Dual-VQ explicitly grounds structure tokens in segmentation masks that are intended to be modality-agnostic, and the reported gains in structural consistency metrics plus downstream segmentation performance provide indirect empirical support for the factorization. However, the training datasets used in the manuscript are unpaired across modalities, precluding the exact overlap or topology metrics suggested. We will add a dedicated paragraph in the revised introduction and methods sections that (i) cites supporting pathology references for the assumption and (ii) explicitly notes the lack of paired cross-modality evaluation as a limitation. This change will better isolate the contribution of Dual-VQ while remaining faithful to the available data. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces PathAR as a structure-first autoregressive model that uses a Dual-VQ tokenizer to produce mask-grounded structure tokens and appearance tokens, followed by an IAR transformer with asymmetric attention to enforce ordering. No equations, fitted parameters, or self-citations are shown in the abstract or description that reduce any claimed prediction or uniqueness result to the inputs by construction. The factorization is presented as an explicit design choice whose success is evaluated empirically on structural consistency and modality fidelity, rather than assumed via prior self-referential theorems or renaming of known patterns. The morphological preservation assumption is acknowledged as an empirical precondition, not a derived axiom. The derivation chain is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities; all such elements are unknown.

pith-pipeline@v0.9.1-grok · 5733 in / 990 out tokens · 27819 ms · 2026-06-28T15:48:56.854742+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 12 canonical work pages · 8 internal anchors

[1]

Synthetic data and its utility in pathology and laboratory medicine,

J. Pantanowitz, C. D. Manko, L. Pantanowitz, and H. H. Rashidi, “Synthetic data and its utility in pathology and laboratory medicine,” Laboratory Investigation, vol. 104, no. 8, p. 102095, 2024

2024
[2]

Content generation models in computational pathology: A comprehen- sive survey on methods, applications, and challenges,

Y . Zhang, X. Zhang, X. Qi, X. Wu, F. Chen, G. Yang, and H. Fu, “Content generation models in computational pathology: A comprehen- sive survey on methods, applications, and challenges,”IEEE Reviews in Biomedical Engineering, pp. 1–22, 2025

2025
[3]

Clusterseg: A crowd cluster pinpointed nucleus segmentation framework with cross-modality datasets,

J. Ke, Y . Lu, Y . Shen, J. Zhu, Y . Zhou, J. Huang, J. Yao, X. Liang, Y . Guo, Z. Weiet al., “Clusterseg: A crowd cluster pinpointed nucleus segmentation framework with cross-modality datasets,”Medical Image Analysis, vol. 85, p. 102758, 2023

2023
[4]

Structure- preserving color normalization and sparse stain separation for histologi- cal images,

A. Vahadane, T. Peng, A. Sethi, S. Albarqouni, L. Wang, M. Baust, K. Steiger, A. M. Schlitter, I. Esposito, and N. Navab, “Structure- preserving color normalization and sparse stain separation for histologi- cal images,”IEEE Transactions on Medical Imaging, vol. 35, no. 8, pp. 1962–1971, 2016

1962
[5]

Fluorescence confocal microscopy for pathologists,

M. Ragazzi, S. Piana, C. Longo, F. Castagnetti, M. Foroni, G. Ferrari, G. Gardini, and G. Pellacani, “Fluorescence confocal microscopy for pathologists,”Modern Pathology, vol. 27, no. 3, pp. 460–471, 2014

2014
[6]

Maskfactory: Towards high-quality synthetic data generation for dichotomous image segmentation,

H. Qian, Y . Chen, S. Lou, F. Shahbaz Khan, X. Jin, and D.-P. Fan, “Maskfactory: Towards high-quality synthetic data generation for dichotomous image segmentation,”Advances in Neural Information Processing Systems, vol. 37, pp. 66 455–66 478, 2024

2024
[7]

Distribution matching losses can hallucinate features in medical image translation,

J. P. Cohen, M. Luck, and S. Honari, “Distribution matching losses can hallucinate features in medical image translation,” inInternational con- ference on medical image computing and computer-assisted intervention. Springer, 2018, pp. 529–536

2018
[8]

On hallucinations in tomographic image reconstruction,

S. Bhadra, V . A. Kelkar, F. J. Brooks, and M. A. Anastasio, “On hallucinations in tomographic image reconstruction,”IEEE Transactions on Medical Imaging, vol. 40, no. 11, pp. 3249–3260, 2021

2021
[9]

Imagenet-trained cnns are biased towards texture; in- creasing shape bias improves accuracy and robustness,

R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “Imagenet-trained cnns are biased towards texture; in- creasing shape bias improves accuracy and robustness,” inInternational conference on learning representations, 2018

2018
[10]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[11]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[12]

Llama 2: Open Foundation and Fine-Tuned Chat Models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[13]

Autoregressive visual tracking,

X. Wei, Y . Bai, Y . Zheng, D. Shi, and Y . Gong, “Autoregressive visual tracking,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9697–9706

2023
[14]

Autoregressive models in vision: A survey,

J. Xiong, G. Liu, L. Huang, C. Wu, T. Wu, Y . Mu, Y . Yao, H. Shen, Z. Wan, J. Huanget al., “Autoregressive models in vision: A survey,” arXiv preprint arXiv:2411.05902, 2024

work page arXiv 2024
[15]

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

P. Sun, Y . Jiang, S. Chen, S. Zhang, B. Peng, P. Luo, and Z. Yuan, “Autoregressive model beats diffusion: Llama for scalable image gener- ation,”arXiv preprint arXiv:2406.06525, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

Emu3: Next-Token Prediction is All You Need

X. Wang, X. Zhang, Z. Luo, Q. Sun, Y . Cui, J. Wang, F. Zhang, Y . Wang, Z. Li, Q. Yuet al., “Emu3: Next-token prediction is all you need,”arXiv preprint arXiv:2409.18869, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Janus: Decoupling visual encoding for unified multi- modal understanding and generation,

C. Wu, X. Chen, Z. Wu, Y . Ma, X. Liu, Z. Pan, W. Liu, Z. Xie, X. Yu, C. Ruanet al., “Janus: Decoupling visual encoding for unified multi- modal understanding and generation,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 966–12 977

2025
[18]

Visual autore- gressive modeling: Scalable image generation via next-scale prediction,

K. Tian, Y . Jiang, Z. Yuan, B. Peng, and L. Wang, “Visual autore- gressive modeling: Scalable image generation via next-scale prediction,” Advances in neural information processing systems, vol. 37, pp. 84 839– 84 865, 2024

2024
[19]

Imagefolder: Autoregressive image generation with folded tokens,

X. Li, K. Qiu, H. Chen, J. Kuen, J. Gu, B. Raj, and Z. Lin, “Imagefolder: Autoregressive image generation with folded tokens,” inThe Thirteenth International Conference on Learning Representations, 2025

2025
[20]

Controlar: Controllable image generation with autoregressive models,

Z. Li, T. Cheng, S. Chen, P. Sun, H. Shen, L. Ran, X. Chen, W. Liu, and X. Wang, “Controlar: Controllable image generation with autoregressive models,” inThe Thirteenth International Conference on Learning Rep- resentations, 2025

2025
[21]

Nuclear morphological abnormalities in cancer: a search for unifying mechanisms,

I. Singh and T. P. Lele, “Nuclear morphological abnormalities in cancer: a search for unifying mechanisms,” inNuclear, chromosomal, and genomic architecture in biology and medicine. Springer, 2022, pp. 443–467

2022
[22]

Pathdiff: Histopathology image synthesis with unpaired text and mask conditions,

M. Bhosale, A. Wasi, Y . Zhai, Y . Tian, S. Border, N. Xi, P. Sarder, J. Yuan, D. Doermann, and X. Gong, “Pathdiff: Histopathology image synthesis with unpaired text and mask conditions,” 2025. [Online]. Available: https://arxiv.org/abs/2506.23440

work page arXiv 2025
[23]

Nasdm: Nuclei-aware semantic histopathology image generation using diffusion models,

A. Shrivastava and P. T. Fletcher, “Nasdm: Nuclei-aware semantic histopathology image generation using diffusion models,” ininterna- tional conference on medical image computing and computer-assisted intervention. Springer, 2023, pp. 786–796

2023
[24]

A robust image segmentation and synthesis pipeline for histopathology,

M. Jehanzaib, Y . Almalioglu, K. B. Ozyoruk, D. F. Williamson, T. Ab- dullah, K. Basak, D. Demir, G. E. Keles, K. Zafar, and M. Turan, “A robust image segmentation and synthesis pipeline for histopathology,” Medical Image Analysis, vol. 99, p. 103344, 2025

2025
[25]

Accelerating histopathology workflows with generative ai- based virtually multiplexed tumour profiling,

P. Pati, S. Karkampouna, F. Bonollo, E. Comp ´erat, M. Radi ´c, M. Spahn, A. Martinelli, M. Wartenberg, M. Kruithof-de Julio, and M. Rap- somaniki, “Accelerating histopathology workflows with generative ai- based virtually multiplexed tumour profiling,”Nature machine intelli- gence, vol. 6, no. 9, pp. 1077–1093, 2024

2024
[26]

Virtual staining for histology by deep learning,

L. Latonen, S. Koivukoski, U. Khan, and P. Ruusuvuori, “Virtual staining for histology by deep learning,”Trends in Biotechnology, vol. 42, no. 9, pp. 1177–1191, 2024

2024
[27]

Deep learning- based transformation of h&e stained tissues into special stains,

K. De Haan, Y . Zhang, J. E. Zuckerman, T. Liu, A. E. Sisk, M. F. Diaz, K.-Y . Jen, A. Nobori, S. Liou, S. Zhanget al., “Deep learning- based transformation of h&e stained tissues into special stains,”Nature communications, vol. 12, no. 1, p. 4884, 2021

2021
[28]

Next token prediction towards multimodal in- telligence: A comprehensive survey,

L. Chen, Z. Wang, S. Ren, L. Li, H. Zhao, Y . Li, Z. Cai, H. Guo, L. Zhang, Y . Xionget al., “Next token prediction towards multimodal in- telligence: A comprehensive survey,”arXiv preprint arXiv:2412.18619, 2024

work page arXiv 2024
[29]

Neural discrete representation learning,

A. Van Den Oord, O. Vinyalset al., “Neural discrete representation learning,”Advances in neural information processing systems, vol. 30, 2017

2017
[30]

Taming transformers for high- resolution image synthesis,

P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high- resolution image synthesis,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 12 873–12 883

2021
[31]

Maskgit: Masked generative image transformer,

H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman, “Maskgit: Masked generative image transformer,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11 315–11 325

2022
[32]

Structure and intensity unbiased translation for 2d medical image segmentation,

T. Zhang, S. Zheng, J. Cheng, X. Jia, J. Bartlett, X. Cheng, Z. Qiu, H. Fu, J. Liu, A. Leonardiset al., “Structure and intensity unbiased translation for 2d medical image segmentation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10 060– 10 075, 2024

2024
[33]

Semantic image synthesis with spatially-adaptive normalization,

T. Park, M.-Y . Liu, T.-C. Wang, and J.-Y . Zhu, “Semantic image synthesis with spatially-adaptive normalization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2337–2346

2019
[34]

Efficient semantic image synthesis via class-adaptive normalization,

Z. Tan, D. Chen, Q. Chu, M. Chai, J. Liao, M. He, L. Yuan, G. Hua, and N. Yu, “Efficient semantic image synthesis via class-adaptive normalization,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4852–4866, 2021

2021
[35]

Edge guided gans with multi-scale contrastive learning for semantic image synthesis,

H. Tang, G. Sun, N. Sebe, L. Van Goolet al., “Edge guided gans with multi-scale contrastive learning for semantic image synthesis,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 12, pp. 14 435–14 452, 2023

2023
[36]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

2020
[37]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4195–4205

2023
[38]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 3836–3847

2023
[39]

T2i- adapter: Learning adapters to dig out more controllable ability for text- to-image diffusion models,

C. Mou, X. Wang, L. Xie, Y . Wu, J. Zhang, Z. Qi, and Y . Shan, “T2i- adapter: Learning adapters to dig out more controllable ability for text- to-image diffusion models,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 5, 2024, pp. 4296–4304

2024
[40]

Roformer: En- hanced transformer with rotary position embedding,

J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu, “Roformer: En- hanced transformer with rotary position embedding,”Neurocomputing, vol. 568, p. 127063, 2024

2024
[41]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,” 2022. [Online]. Available: https://arxiv.org/abs/2207.12598 14

work page internal anchor Pith review Pith/arXiv arXiv 2022
[42]

arXiv preprint arXiv:2003.10778 (2020)

J. Gamper, N. A. Koohbanani, K. Benes, S. Graham, M. Jahanifar, S. A. Khurram, A. Azam, K. Hewitt, and N. Rajpoot, “Pannuke dataset extension, insights and baselines,”arXiv preprint arXiv:2003.10778, 2020

work page arXiv 2003
[43]

Gans trained by a two time-scale update rule converge to a local nash equilibrium,

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,”Advances in neural information processing systems, vol. 30, 2017

2017
[44]

Demystifying MMD GANs

M. Bi ´nkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demysti- fying mmd gans,”arXiv preprint arXiv:1801.01401, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[45]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[46]

Towards a general-purpose foundation model for computational pathology,

R. J. Chen, T. Ding, M. Y . Lu, D. F. Williamson, G. Jaume, A. H. Song, B. Chen, A. Zhang, D. Shao, M. Shabanet al., “Towards a general-purpose foundation model for computational pathology,”Nature medicine, vol. 30, no. 3, pp. 850–862, 2024

2024
[47]

Measures of the amount of ecologic association between species,

L. R. Dice, “Measures of the amount of ecologic association between species,”Ecology, vol. 26, no. 3, pp. 297–302, 1945

1945
[48]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595

2018
[49]

Im- proved precision and recall metric for assessing generative models,

T. Kynk ¨a¨anniemi, T. Karras, S. Laine, J. Lehtinen, and T. Aila, “Im- proved precision and recall metric for assessing generative models,” Advances in neural information processing systems, vol. 32, 2019

2019
[50]

Diffinfinite: Large mask-image synthesis via parallel random patch diffusion in histopathology,

M. Aversa, G. Nobis, M. H ¨agele, K. Standvoss, M. Chirica, R. Murray- Smith, A. M. Alaa, L. Ruff, D. Ivanova, W. Sameket al., “Diffinfinite: Large mask-image synthesis via parallel random patch diffusion in histopathology,”Advances in Neural Information Processing Systems, vol. 36, pp. 78 126–78 141, 2023

2023
[51]

Training generative adversarial networks with limited data,

T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila, “Training generative adversarial networks with limited data,”Advances in neural information processing systems, vol. 33, pp. 12 104–12 114, 2020

2020
[52]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695

2022
[53]

Efficientnet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” inInternational conference on machine learning. PMLR, 2019, pp. 6105–6114

2019
[54]

nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021. VI. BIOGRAPHYSECTION Yuan ZhangYuan Zhang received the B.S. degree in 2019 from Northwestern Polytechnical University, Xi’an, China, and the M...

2021

[1] [1]

Synthetic data and its utility in pathology and laboratory medicine,

J. Pantanowitz, C. D. Manko, L. Pantanowitz, and H. H. Rashidi, “Synthetic data and its utility in pathology and laboratory medicine,” Laboratory Investigation, vol. 104, no. 8, p. 102095, 2024

2024

[2] [2]

Content generation models in computational pathology: A comprehen- sive survey on methods, applications, and challenges,

Y . Zhang, X. Zhang, X. Qi, X. Wu, F. Chen, G. Yang, and H. Fu, “Content generation models in computational pathology: A comprehen- sive survey on methods, applications, and challenges,”IEEE Reviews in Biomedical Engineering, pp. 1–22, 2025

2025

[3] [3]

Clusterseg: A crowd cluster pinpointed nucleus segmentation framework with cross-modality datasets,

J. Ke, Y . Lu, Y . Shen, J. Zhu, Y . Zhou, J. Huang, J. Yao, X. Liang, Y . Guo, Z. Weiet al., “Clusterseg: A crowd cluster pinpointed nucleus segmentation framework with cross-modality datasets,”Medical Image Analysis, vol. 85, p. 102758, 2023

2023

[4] [4]

Structure- preserving color normalization and sparse stain separation for histologi- cal images,

A. Vahadane, T. Peng, A. Sethi, S. Albarqouni, L. Wang, M. Baust, K. Steiger, A. M. Schlitter, I. Esposito, and N. Navab, “Structure- preserving color normalization and sparse stain separation for histologi- cal images,”IEEE Transactions on Medical Imaging, vol. 35, no. 8, pp. 1962–1971, 2016

1962

[5] [5]

Fluorescence confocal microscopy for pathologists,

M. Ragazzi, S. Piana, C. Longo, F. Castagnetti, M. Foroni, G. Ferrari, G. Gardini, and G. Pellacani, “Fluorescence confocal microscopy for pathologists,”Modern Pathology, vol. 27, no. 3, pp. 460–471, 2014

2014

[6] [6]

Maskfactory: Towards high-quality synthetic data generation for dichotomous image segmentation,

H. Qian, Y . Chen, S. Lou, F. Shahbaz Khan, X. Jin, and D.-P. Fan, “Maskfactory: Towards high-quality synthetic data generation for dichotomous image segmentation,”Advances in Neural Information Processing Systems, vol. 37, pp. 66 455–66 478, 2024

2024

[7] [7]

Distribution matching losses can hallucinate features in medical image translation,

J. P. Cohen, M. Luck, and S. Honari, “Distribution matching losses can hallucinate features in medical image translation,” inInternational con- ference on medical image computing and computer-assisted intervention. Springer, 2018, pp. 529–536

2018

[8] [8]

On hallucinations in tomographic image reconstruction,

S. Bhadra, V . A. Kelkar, F. J. Brooks, and M. A. Anastasio, “On hallucinations in tomographic image reconstruction,”IEEE Transactions on Medical Imaging, vol. 40, no. 11, pp. 3249–3260, 2021

2021

[9] [9]

Imagenet-trained cnns are biased towards texture; in- creasing shape bias improves accuracy and robustness,

R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, “Imagenet-trained cnns are biased towards texture; in- creasing shape bias improves accuracy and robustness,” inInternational conference on learning representations, 2018

2018

[10] [10]

GPT-4 Technical Report

J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkatet al., “Gpt-4 technical report,”arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[11] [11]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[12] [12]

Llama 2: Open Foundation and Fine-Tuned Chat Models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[13] [13]

Autoregressive visual tracking,

X. Wei, Y . Bai, Y . Zheng, D. Shi, and Y . Gong, “Autoregressive visual tracking,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9697–9706

2023

[14] [14]

Autoregressive models in vision: A survey,

J. Xiong, G. Liu, L. Huang, C. Wu, T. Wu, Y . Mu, Y . Yao, H. Shen, Z. Wan, J. Huanget al., “Autoregressive models in vision: A survey,” arXiv preprint arXiv:2411.05902, 2024

work page arXiv 2024

[15] [15]

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

P. Sun, Y . Jiang, S. Chen, S. Zhang, B. Peng, P. Luo, and Z. Yuan, “Autoregressive model beats diffusion: Llama for scalable image gener- ation,”arXiv preprint arXiv:2406.06525, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[16] [16]

Emu3: Next-Token Prediction is All You Need

X. Wang, X. Zhang, Z. Luo, Q. Sun, Y . Cui, J. Wang, F. Zhang, Y . Wang, Z. Li, Q. Yuet al., “Emu3: Next-token prediction is all you need,”arXiv preprint arXiv:2409.18869, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

Janus: Decoupling visual encoding for unified multi- modal understanding and generation,

C. Wu, X. Chen, Z. Wu, Y . Ma, X. Liu, Z. Pan, W. Liu, Z. Xie, X. Yu, C. Ruanet al., “Janus: Decoupling visual encoding for unified multi- modal understanding and generation,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 12 966–12 977

2025

[18] [18]

Visual autore- gressive modeling: Scalable image generation via next-scale prediction,

K. Tian, Y . Jiang, Z. Yuan, B. Peng, and L. Wang, “Visual autore- gressive modeling: Scalable image generation via next-scale prediction,” Advances in neural information processing systems, vol. 37, pp. 84 839– 84 865, 2024

2024

[19] [19]

Imagefolder: Autoregressive image generation with folded tokens,

X. Li, K. Qiu, H. Chen, J. Kuen, J. Gu, B. Raj, and Z. Lin, “Imagefolder: Autoregressive image generation with folded tokens,” inThe Thirteenth International Conference on Learning Representations, 2025

2025

[20] [20]

Controlar: Controllable image generation with autoregressive models,

Z. Li, T. Cheng, S. Chen, P. Sun, H. Shen, L. Ran, X. Chen, W. Liu, and X. Wang, “Controlar: Controllable image generation with autoregressive models,” inThe Thirteenth International Conference on Learning Rep- resentations, 2025

2025

[21] [21]

Nuclear morphological abnormalities in cancer: a search for unifying mechanisms,

I. Singh and T. P. Lele, “Nuclear morphological abnormalities in cancer: a search for unifying mechanisms,” inNuclear, chromosomal, and genomic architecture in biology and medicine. Springer, 2022, pp. 443–467

2022

[22] [22]

Pathdiff: Histopathology image synthesis with unpaired text and mask conditions,

M. Bhosale, A. Wasi, Y . Zhai, Y . Tian, S. Border, N. Xi, P. Sarder, J. Yuan, D. Doermann, and X. Gong, “Pathdiff: Histopathology image synthesis with unpaired text and mask conditions,” 2025. [Online]. Available: https://arxiv.org/abs/2506.23440

work page arXiv 2025

[23] [23]

Nasdm: Nuclei-aware semantic histopathology image generation using diffusion models,

A. Shrivastava and P. T. Fletcher, “Nasdm: Nuclei-aware semantic histopathology image generation using diffusion models,” ininterna- tional conference on medical image computing and computer-assisted intervention. Springer, 2023, pp. 786–796

2023

[24] [24]

A robust image segmentation and synthesis pipeline for histopathology,

M. Jehanzaib, Y . Almalioglu, K. B. Ozyoruk, D. F. Williamson, T. Ab- dullah, K. Basak, D. Demir, G. E. Keles, K. Zafar, and M. Turan, “A robust image segmentation and synthesis pipeline for histopathology,” Medical Image Analysis, vol. 99, p. 103344, 2025

2025

[25] [25]

Accelerating histopathology workflows with generative ai- based virtually multiplexed tumour profiling,

P. Pati, S. Karkampouna, F. Bonollo, E. Comp ´erat, M. Radi ´c, M. Spahn, A. Martinelli, M. Wartenberg, M. Kruithof-de Julio, and M. Rap- somaniki, “Accelerating histopathology workflows with generative ai- based virtually multiplexed tumour profiling,”Nature machine intelli- gence, vol. 6, no. 9, pp. 1077–1093, 2024

2024

[26] [26]

Virtual staining for histology by deep learning,

L. Latonen, S. Koivukoski, U. Khan, and P. Ruusuvuori, “Virtual staining for histology by deep learning,”Trends in Biotechnology, vol. 42, no. 9, pp. 1177–1191, 2024

2024

[27] [27]

Deep learning- based transformation of h&e stained tissues into special stains,

K. De Haan, Y . Zhang, J. E. Zuckerman, T. Liu, A. E. Sisk, M. F. Diaz, K.-Y . Jen, A. Nobori, S. Liou, S. Zhanget al., “Deep learning- based transformation of h&e stained tissues into special stains,”Nature communications, vol. 12, no. 1, p. 4884, 2021

2021

[28] [28]

Next token prediction towards multimodal in- telligence: A comprehensive survey,

L. Chen, Z. Wang, S. Ren, L. Li, H. Zhao, Y . Li, Z. Cai, H. Guo, L. Zhang, Y . Xionget al., “Next token prediction towards multimodal in- telligence: A comprehensive survey,”arXiv preprint arXiv:2412.18619, 2024

work page arXiv 2024

[29] [29]

Neural discrete representation learning,

A. Van Den Oord, O. Vinyalset al., “Neural discrete representation learning,”Advances in neural information processing systems, vol. 30, 2017

2017

[30] [30]

Taming transformers for high- resolution image synthesis,

P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high- resolution image synthesis,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 12 873–12 883

2021

[31] [31]

Maskgit: Masked generative image transformer,

H. Chang, H. Zhang, L. Jiang, C. Liu, and W. T. Freeman, “Maskgit: Masked generative image transformer,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11 315–11 325

2022

[32] [32]

Structure and intensity unbiased translation for 2d medical image segmentation,

T. Zhang, S. Zheng, J. Cheng, X. Jia, J. Bartlett, X. Cheng, Z. Qiu, H. Fu, J. Liu, A. Leonardiset al., “Structure and intensity unbiased translation for 2d medical image segmentation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10 060– 10 075, 2024

2024

[33] [33]

Semantic image synthesis with spatially-adaptive normalization,

T. Park, M.-Y . Liu, T.-C. Wang, and J.-Y . Zhu, “Semantic image synthesis with spatially-adaptive normalization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2337–2346

2019

[34] [34]

Efficient semantic image synthesis via class-adaptive normalization,

Z. Tan, D. Chen, Q. Chu, M. Chai, J. Liao, M. He, L. Yuan, G. Hua, and N. Yu, “Efficient semantic image synthesis via class-adaptive normalization,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4852–4866, 2021

2021

[35] [35]

Edge guided gans with multi-scale contrastive learning for semantic image synthesis,

H. Tang, G. Sun, N. Sebe, L. Van Goolet al., “Edge guided gans with multi-scale contrastive learning for semantic image synthesis,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 12, pp. 14 435–14 452, 2023

2023

[36] [36]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

2020

[37] [37]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4195–4205

2023

[38] [38]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 3836–3847

2023

[39] [39]

T2i- adapter: Learning adapters to dig out more controllable ability for text- to-image diffusion models,

C. Mou, X. Wang, L. Xie, Y . Wu, J. Zhang, Z. Qi, and Y . Shan, “T2i- adapter: Learning adapters to dig out more controllable ability for text- to-image diffusion models,” inProceedings of the AAAI conference on artificial intelligence, vol. 38, no. 5, 2024, pp. 4296–4304

2024

[40] [40]

Roformer: En- hanced transformer with rotary position embedding,

J. Su, M. Ahmed, Y . Lu, S. Pan, W. Bo, and Y . Liu, “Roformer: En- hanced transformer with rotary position embedding,”Neurocomputing, vol. 568, p. 127063, 2024

2024

[41] [41]

Classifier-Free Diffusion Guidance

J. Ho and T. Salimans, “Classifier-free diffusion guidance,” 2022. [Online]. Available: https://arxiv.org/abs/2207.12598 14

work page internal anchor Pith review Pith/arXiv arXiv 2022

[42] [42]

arXiv preprint arXiv:2003.10778 (2020)

J. Gamper, N. A. Koohbanani, K. Benes, S. Graham, M. Jahanifar, S. A. Khurram, A. Azam, K. Hewitt, and N. Rajpoot, “Pannuke dataset extension, insights and baselines,”arXiv preprint arXiv:2003.10778, 2020

work page arXiv 2003

[43] [43]

Gans trained by a two time-scale update rule converge to a local nash equilibrium,

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,”Advances in neural information processing systems, vol. 30, 2017

2017

[44] [44]

Demystifying MMD GANs

M. Bi ´nkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demysti- fying mmd gans,”arXiv preprint arXiv:1801.01401, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[45] [45]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[46] [46]

Towards a general-purpose foundation model for computational pathology,

R. J. Chen, T. Ding, M. Y . Lu, D. F. Williamson, G. Jaume, A. H. Song, B. Chen, A. Zhang, D. Shao, M. Shabanet al., “Towards a general-purpose foundation model for computational pathology,”Nature medicine, vol. 30, no. 3, pp. 850–862, 2024

2024

[47] [47]

Measures of the amount of ecologic association between species,

L. R. Dice, “Measures of the amount of ecologic association between species,”Ecology, vol. 26, no. 3, pp. 297–302, 1945

1945

[48] [48]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595

2018

[49] [49]

Im- proved precision and recall metric for assessing generative models,

T. Kynk ¨a¨anniemi, T. Karras, S. Laine, J. Lehtinen, and T. Aila, “Im- proved precision and recall metric for assessing generative models,” Advances in neural information processing systems, vol. 32, 2019

2019

[50] [50]

Diffinfinite: Large mask-image synthesis via parallel random patch diffusion in histopathology,

M. Aversa, G. Nobis, M. H ¨agele, K. Standvoss, M. Chirica, R. Murray- Smith, A. M. Alaa, L. Ruff, D. Ivanova, W. Sameket al., “Diffinfinite: Large mask-image synthesis via parallel random patch diffusion in histopathology,”Advances in Neural Information Processing Systems, vol. 36, pp. 78 126–78 141, 2023

2023

[51] [51]

Training generative adversarial networks with limited data,

T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila, “Training generative adversarial networks with limited data,”Advances in neural information processing systems, vol. 33, pp. 12 104–12 114, 2020

2020

[52] [52]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695

2022

[53] [53]

Efficientnet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for con- volutional neural networks,” inInternational conference on machine learning. PMLR, 2019, pp. 6105–6114

2019

[54] [54]

nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,

F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnu-net: a self-configuring method for deep learning-based biomedical image segmentation,”Nature methods, vol. 18, no. 2, pp. 203–211, 2021. VI. BIOGRAPHYSECTION Yuan ZhangYuan Zhang received the B.S. degree in 2019 from Northwestern Polytechnical University, Xi’an, China, and the M...

2021