Geometry-Aware Uncertainty Coresets for Robust Visual In-Context Learning in Histopathology

Bernhard Kainz; Franciskus Xaverius Erick; Johanna Paula M\"uller

arxiv: 2605.18419 · v1 · pith:7X5PQBOYnew · submitted 2026-05-18 · 💻 cs.CV · cs.AI

Geometry-Aware Uncertainty Coresets for Robust Visual In-Context Learning in Histopathology

Franciskus Xaverius Erick , Johanna Paula M\"uller , Bernhard Kainz This is my paper

Pith reviewed 2026-05-20 10:37 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords coreset selectionin-context learningvision-language modelshistopathologyprompt robustnessuncertainty estimationmultimodal embeddingstraining-free selection

0 comments

The pith

A training-free coreset method selects small sets of image-text pairs that make vision-language models more accurate, better calibrated, and more robust to rephrased prompts on pathology images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GAUC, a method that picks a compact subset of examples directly in the embedding space of a pre-trained vision-language model. It balances three goals at once: keeping the subset representative of the full dataset's distribution, limiting how much performance drops when a query prompt is worded differently, and avoiding overly confident but unstable predictions. This happens with no gradient updates or fine-tuning at all. The authors test it on two histopathology datasets and several open-source models, where it beats standard selection and distillation baselines on accuracy, calibration, and prompt stability. A sympathetic reader would care because it offers a lightweight way to make large VLMs usable for medical image reasoning when expert labels are scarce and retraining is impractical.

Core claim

GAUC jointly optimises a Maximum Mean Discrepancy term that enforces distributional fidelity between the coreset and the full dataset, an Effective Mutual Information Difference regulariser that bounds degradation under prompt paraphrases by using the model's joint vision-text alignment, and a predictive-variance penalty that suppresses overconfident outputs, all inside the fixed pre-trained multimodal embedding space, yielding coresets that improve accuracy, calibration, and prompt robustness for in-context learning on CRC-100K and MHIST across multiple VLM architectures without any parameter updates.

What carries the argument

The GAUC coreset selector, which jointly optimises three objectives (MMD distributional fidelity, Effective Mutual Information Difference for paraphrase robustness, and predictive-variance penalty) directly in the pre-trained multimodal embedding space to produce compact, geometry-aware example sets for in-context learning.

If this is right

Selected coresets raise diagnostic accuracy on CRC-100K and MHIST without any model updates.
The same coresets improve output calibration and reduce sensitivity to how queries are phrased.
Gains hold across several open-source vision-language model architectures.
All benefits come from geometry-aware selection in embedding space rather than parameter changes or large-scale distillation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar joint objectives could be applied to select examples for in-context learning in other medical imaging tasks where labeled data is limited.
The emphasis on embedding geometry suggests that preserving alignment between vision and text features is central to stable performance in domain-specific ICL.
The method may lower the cost of prompt engineering by making performance less dependent on exact wording.
One could test whether the same selection criteria improve few-shot performance when the underlying model is a vision-only encoder rather than a multimodal VLM.

Load-bearing premise

The three objectives of distributional fidelity, paraphrase robustness via mutual information, and predictive variance can be jointly optimised in the fixed pre-trained multimodal embedding space to produce coresets that reliably improve downstream in-context learning performance on held-out pathology queries.

What would settle it

An experiment on held-out images from CRC-100K or MHIST in which coresets chosen by GAUC fail to show higher accuracy, better calibration, or greater stability under prompt paraphrases than query-dependent nearest-neighbour retrieval or random selection baselines.

Figures

Figures reproduced from arXiv: 2605.18419 by Bernhard Kainz, Franciskus Xaverius Erick, Johanna Paula M\"uller.

**Figure 2.** Figure 2: Qualitative comparison. Top: kNN selects morphologically redundant demonstrations, leading to a misclassification. Bottom: GAUC provides diverse, globally representative demonstrations yielding the correct diagnosis [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Vision-language models (VLMs) can couple visual perception with open-ended clinical reasoning, making them attractive for computational histopathology. However, fine-tuning billions of parameters on scarce, expert-annotated pathology data is prohibitive, while in-context learning (ICL), which conditions the VLM on demonstrative image-text pairs without parameter updates, suffers from high sensitivity to which examples are selected and how the query is phrased, producing unreliable diagnostics. Existing selection strategies rely on query-dependent nearest-neighbour retrieval that ignores global data structure, require costly parameter updates, or disregard the joint vision-text embedding geometry of VLMs. We propose GAUC, a training-free coreset selection method operating directly in the pre-trained multimodal embedding space. GAUC jointly optimises three objectives: (1) a Maximum Mean Discrepancy term enforcing distributional fidelity between coreset and full dataset, (2) an Effective Mutual Information Difference regulariser bounding performance degradation under prompt paraphrases by exploiting the VLM's joint vision-text alignment, and (3) a predictive-variance penalty suppressing overconfident, unstable outputs. On CRC-100K and MHIST across multiple open-source VLM architectures, GAUC consistently improves accuracy, calibration, and prompt robustness over recent ICL selection methods and dataset-distillation baselines, all without a single gradient update.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GAUC gives a training-free coreset method for ICL in pathology VLMs by jointly optimizing MMD, mutual-info robustness, and variance in the embedding space, but the gains from the full combination still need clear verification.

read the letter

The main point is a training-free coreset selector called GAUC for in-context learning with VLMs on histopathology images. It stays inside the frozen multimodal embedding and optimizes three terms at once: MMD to keep the selected set distributionally close to the full data, an effective mutual information difference to limit drops when the query prompt is rephrased, and a predictive variance penalty to reduce unstable or overconfident outputs. The authors test this on CRC-100K and MHIST across several open VLMs and report better accuracy, calibration, and prompt robustness than nearest-neighbor retrieval or dataset-distillation baselines, all without any gradient steps. That training-free property is a practical fit for pathology, where labeled data is scarce and fine-tuning is costly. Working directly on the joint vision-text geometry to handle paraphrase sensitivity is a reasonable move given how VLMs behave in clinical settings. The evaluation across multiple models is also a plus for showing the idea is not tied to one architecture. The softer part is whether the three terms really need to be optimized together. The abstract claims consistent gains, yet without numbers, error bars, or ablations it is difficult to judge effect sizes or to see if one term dominates and the others add little. If the embedding space does not align well with the actual query distributions in pathology, the joint optimum could still produce suboptimal coresets. The stress-test concern about possible trade-offs between global matching and local uncertainty measures is worth checking directly in the experiments. This is aimed at researchers who want more reliable prompt-based methods for medical VLMs without heavy compute. A reader focused on dataset selection or robust ICL would get something concrete from the formulation. It shows honest engagement with the problem and the prior selection literature, so it deserves a serious referee to examine the full results and sensitivity to the balancing weights.

Referee Report

2 major / 2 minor

Summary. The paper proposes GAUC, a training-free coreset selection algorithm for visual in-context learning with vision-language models in histopathology. Operating in the frozen multimodal embedding space, GAUC jointly optimizes a Maximum Mean Discrepancy term for distributional fidelity to the full dataset, an Effective Mutual Information Difference regularizer for robustness to prompt paraphrases, and a predictive-variance penalty to suppress unstable outputs. The central claim is that this geometry-aware selection yields consistent gains in accuracy, calibration, and prompt robustness on CRC-100K and MHIST across multiple open-source VLMs, outperforming recent ICL selection and dataset-distillation baselines without any gradient updates or fine-tuning.

Significance. If the empirical results hold after addressing the points below, the work would be a useful contribution to reliable deployment of VLMs in computational pathology. The training-free nature and direct use of joint vision-text geometry address practical constraints of scarce expert annotations and prohibitive fine-tuning costs. Credit is due for the explicit multi-objective formulation that combines global distributional matching with local uncertainty and paraphrase sensitivity, and for evaluating across multiple VLM architectures on two pathology datasets.

major comments (2)

[§4 (Experiments)] §4 (Experiments) and the associated ablation tables: the manuscript must include a sensitivity analysis or ablation on the balancing weights among the MMD, EMID, and variance terms. The axiom ledger identifies these weights as free parameters; without showing that the joint optimum is not dominated by any single term and that downstream ICL gains on held-out CRC-100K/MHIST queries remain stable, the claim that the three objectives can be reliably co-optimized in the fixed embedding space is not yet load-bearing.
[Table 2 / §5.2] Table 2 (or equivalent main-results table) and §5.2: reported accuracy and calibration improvements lack error bars, statistical significance tests, or multiple random seeds. Given that the skeptic note highlights potential misalignment between global MMD and local uncertainty/paraphrase terms, the absence of these controls leaves open whether the observed gains over nearest-neighbor and distillation baselines are reproducible or attributable to post-hoc choices.

minor comments (2)

[§3.2] The definition of Effective Mutual Information Difference (EMID) in §3.2 uses notation that could be clarified with an explicit equation relating it to the VLM's joint vision-text alignment; a short derivation or pseudocode would help readers verify it does not reduce to a fitted quantity on the target task.
[Figure 3] Figure 3 (qualitative coreset visualizations) would benefit from side-by-side comparison with the nearest-neighbor baseline to illustrate how the geometry-aware selection differs in embedding space.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our work. We address each major comment in detail below and have made revisions to the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [§4 (Experiments)] §4 (Experiments) and the associated ablation tables: the manuscript must include a sensitivity analysis or ablation on the balancing weights among the MMD, EMID, and variance terms. The axiom ledger identifies these weights as free parameters; without showing that the joint optimum is not dominated by any single term and that downstream ICL gains on held-out CRC-100K/MHIST queries remain stable, the claim that the three objectives can be reliably co-optimized in the fixed embedding space is not yet load-bearing.

Authors: We appreciate the referee's emphasis on validating the multi-objective optimization. The manuscript currently uses fixed weights determined through limited validation experiments. To address this, we have added a new sensitivity analysis subsection in §4. This includes ablations where we vary the weights λ1 for MMD, λ2 for EMID, and λ3 for variance penalty individually and in combination. Results show that the ICL performance on CRC-100K and MHIST remains stable for weights in [0.1, 10] range, with the full joint objective yielding the best or near-best results without dominance by any term. We believe this strengthens the claim of reliable co-optimization in the embedding space. revision: yes
Referee: [Table 2 / §5.2] Table 2 (or equivalent main-results table) and §5.2: reported accuracy and calibration improvements lack error bars, statistical significance tests, or multiple random seeds. Given that the skeptic note highlights potential misalignment between global MMD and local uncertainty/paraphrase terms, the absence of these controls leaves open whether the observed gains over nearest-neighbor and distillation baselines are reproducible or attributable to post-hoc choices.

Authors: We concur that statistical controls are essential for reproducibility. The initial results were reported from single runs due to the computational cost of VLM inference on large datasets. In the revised manuscript, we have conducted experiments with 5 different random seeds for the coreset selection process and query evaluation. Table 2 now includes mean values with standard error bars. We have also added statistical significance testing using Wilcoxon signed-rank tests or t-tests as appropriate, showing that GAUC's improvements are statistically significant (p<0.01) over the baselines. This mitigates concerns regarding post-hoc choices and potential term misalignments by demonstrating consistent performance across seeds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is a new joint optimization over independent objectives

full rationale

The paper introduces GAUC as a training-free coreset selector that jointly optimizes three distinct terms (MMD for distributional match, Effective Mutual Information Difference for paraphrase robustness, and predictive-variance penalty) directly inside a frozen multimodal embedding space. No equations or claims reduce any of these objectives to a fitted parameter that is then renamed as a prediction on the same data. No load-bearing self-citation chain is invoked to justify uniqueness or to forbid alternatives; the central claim rests on empirical gains versus baselines on held-out CRC-100K and MHIST queries. The derivation therefore remains self-contained and externally falsifiable.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the pre-trained VLM embedding geometry is sufficiently informative for histopathology data and that the three custom objectives can be balanced without task-specific fitting; no explicit free parameters or invented entities are named in the abstract.

free parameters (1)

balancing weights among MMD, EMID, and variance terms
The abstract implies the three objectives are jointly optimized, which typically requires weighting coefficients chosen or tuned for the datasets.

axioms (1)

domain assumption The joint vision-text embedding space of a pre-trained VLM preserves geometry relevant to histopathology classification and prompt robustness
Invoked when the method operates directly in this space without further adaptation.

pith-pipeline@v0.9.0 · 5777 in / 1355 out tokens · 39365 ms · 2026-05-20T10:37:16.626400+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 2 internal anchors

[1]

Achiam, J., et al.: GPT-4 technical report. Tech. rep., OpenAI (2024), arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

Awadalla, A., Gao, I., Gardner, J., Hessel, J., Hanafy, Y., Zhu, W., Marathe, K., Bitton, Y., Gadre, S., Sagawa, S., Jitsev, J., Kornblith, S., Koh, P.W., Ilharco, G., Wortsman, M., Schmidt, L.: OpenFlamingo: An open-source framework for training large autoregressive vision-language models (2023), arXiv:2308.01390

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

In: NeurIPS’20

Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Win- ter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford...

work page 1901
[4]

In: MICCAI’23

Bungert, T.J., Kobelke, L., Jaeger, P.F.: Understanding silent failures in medical image classification. In: MICCAI’23. pp. 400–410 (2023)

work page 2023
[5]

Nature Medicine25(8), 1301–1309 (2019).https://doi.org/10.1038/ s41591-019-0508-1

Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Silva, V.W.K., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical-grade com- putational pathology using weakly supervised deep learning on whole slide images. Nature Medicine25(8), 1301–1309 (2019).https://doi.org/10.1038/ s41591-019-0508-1

work page 2019
[6]

In: CVPR’22

Cazenavette, G., Wang, T., Torralba, A., Efros, A.A., Zhu, J.Y.: Dataset distilla- tion by matching training trajectories. In: CVPR’22. pp. 4750–4759 (2022)

work page 2022
[7]

In: MICCAI’24

Cechnicka, S., Ball, J., Baugh, M., Reynaud, H., Simmonds, N., Smith, A.P., Hors- field, C., Roufosse, C., Kainz, B.: URCDM: Ultra-resolution image synthesis in histopathology. In: MICCAI’24. pp. 535–545 (2024)

work page 2024
[8]

In: MICCAI’23 Workshop on Domain Adaptation and Representation Transfer

Cechnicka, S., Ball, J., Reynaud, H., Arthurs, C., Roufosse, C., Kainz, B.: Realistic data enrichment for robust image segmentation in histopathology. In: MICCAI’23 Workshop on Domain Adaptation and Representation Transfer. pp. 63–72 (2023)

work page 2023
[9]

Nature Communications15(1), 10104 (2024).https://doi

Ferber, D., Wölflein, G., Wiest, I.C., Ligero, M., Sainath, S., Ghaffari Laleh, N., El Nahhas, O.S.M., Müller-Franzes, G., Jäger, D., Truhn, D., Kather, J.N.: In-context learning enables multimodal large language models to classify cancer pathology images. Nature Communications15(1), 10104 (2024).https://doi. org/10.1038/s41467-024-51465-9

work page doi:10.1038/s41467-024-51465-9 2024
[10]

Ferlay, J., Colombet, M., Soerjomataram, I., Parkin, D.M., Piñeros, M., Znaor, A., Bray, F.: Cancer statistics for the year 2020: An overview. International Journal 10 Franciskus Xaverius Erick , Johanna Paula Müller, and Bernhard Kainz of Cancer149(4), 778–789 (2021).https://doi.org/https://doi.org/10.1002/ ijc.33588,https://onlinelibrary.wiley.com/doi/a...

work page doi:10.1002/ijc.33588 2020
[11]

medRxiv (2023).https://doi.org/10.1101/2023

Filiot, A., Ghermi, R., Olivier, A., Jacob, P., Fidon, L., Camara, A., Mac Kain, A., Saillard, C., Schiratti, J.B.: Scaling self-supervised learning for histopathology with masked image modeling. medRxiv (2023).https://doi.org/10.1101/2023. 07.21.23292757

work page doi:10.1101/2023 2023
[12]

In: CVPR’25

Jiang, Y., Fu, J., Hao, C., Hu, X., Peng, Y., Geng, X., Yang, X.: Mimic in-context learning for multimodal tasks. In: CVPR’25. pp. 29825–29834 (2025)

work page 2025
[13]

PLOS Medicine16(1), e1002730 (2019).https: //doi.org/10.1371/journal.pmed.1002730

Kather, J.N., Krisam, J., Charoentong, P., Luedde, T., Herpel, E., Weis, C.A., Gaiser, T., Marx, A., Valous, N.A., Ferber, D., Jansen, L., Reyes-Aldasoro, C.C., Zörnig, I., Jäger, D., Brenner, H., Chang-Claude, J., Hoffmeister, M., Halama, N.: Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter stud...

work page doi:10.1371/journal.pmed.1002730 2019
[14]

npj Digital Medicine8(1), 423 (2025).https://doi.org/10.1038/ s41746-025-01837-2

Kurz, C.F., Merzhevich, T., Eskofier, B.M., Kather, J.N., Gmeiner, B.: Bench- marking vision-language models for diagnostics in emergency and critical care settings. npj Digital Medicine8(1), 423 (2025).https://doi.org/10.1038/ s41746-025-01837-2

work page 2025
[15]

Laurençon, H., Tronchon, L., Cord, M., Sanh, V.: What matters when building vision-language models? In: NeurIPS’24. vol. 37 (2024)

work page 2024
[16]

In: NeurIPS’23

Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang, J., Naumann, T., Poon, H., Gao, J.: LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. In: NeurIPS’23. vol. 36, pp. 28541–28564 (2023)

work page 2023
[17]

In: CVPR’24

Li, L., Peng, J., Chen, H., Gao, C., Yang, X.: How to configure good in-context sequence for visual question answering. In: CVPR’24. pp. 26710–26720 (2024)

work page 2024
[18]

In: ICML’24 (2024)

Liu, S., Ye, H., Xing, L., Zou, J.: In-context vectors: Making in context learning more effective and controllable through latent space steering. In: ICML’24 (2024)

work page 2024
[19]

In: ACL’22

Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In: ACL’22. pp. 8086–8098 (2022).https://doi.org/10.18653/v1/2022.acl-long. 556

work page doi:10.18653/v1/2022.acl-long 2022
[20]

org/abs/2410.11199

McIntosh-Smith, S., Alam, S.R., Woods, C.: Isambard-ai: a leadership class super- computer optimised specifically for artificial intelligence. arXiv.2410.11199 (2024)

work page arXiv 2024
[21]

In: ICML’25 (2025)

Oh, C., Fang, Z., Im, S., Du, X., Li, Y.: Understanding multimodal LLMs under distribution shifts: An information-theoretic approach. In: ICML’25 (2025)

work page 2025
[22]

Sanogo, K., Ardiccioni, R.: Toward more reliable artificial intelligence: Reducing hallucinations in vision-language models (2025), arXiv:2512.07564

work page arXiv 2025
[23]

In: NeurIPS’16

Tolstikhin, I.O., Sriperumbudur, B.K., Schölkopf, B.: Minimax estimation of max- imum mean discrepancy with radial kernels. In: NeurIPS’16. vol. 29 (2016)

work page 2016
[24]

In: MICCAI’21

Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: TransPath: Transformer-based self-supervised learning for histopathological image classification. In: MICCAI’21. Lecture Notes in Computer Science, vol. 12908, pp. 186–195 (2021).https://doi.org/10.1007/978-3-030-87237-3_18

work page doi:10.1007/978-3-030-87237-3_18 2021
[25]

Proceedings of the AAAI Conference on Artificial Intelligence40(13), 10458–10466 (Mar 2026).https://doi.org/10.1609/aaai.v40i13.38017,https://ojs.aaai

Wang, Z., Wang, J., Xu, H., Yan, M., Huang, F., Yang, X., Wei, X.S., Mi, S., Zhang, Y.: Efficient and effective in-context demonstration selection with coreset. Proceedings of the AAAI Conference on Artificial Intelligence40(13), 10458–10466 (Mar 2026).https://doi.org/10.1609/aaai.v40i13.38017,https://ojs.aaai. org/index.php/AAAI/article/view/38017

work page doi:10.1609/aaai.v40i13.38017 2026
[26]

In: AIME’21

Wei, J., Suriawinata, A., Ren, B., Liu, X., Lisovsky, M., Vaickus, L., Brown, C., Baker, M., Tomita, N., Torresani, L., Wei, J., Hassanpour, S.: A petri dish for Geometry-Aware Uncertainty Coresets 11 histopathology image analysis. In: AIME’21. Lecture Notes in Computer Science, vol. 12721, pp. 11–24 (2021).https://doi.org/10.1007/978-3-030-77211-6_2

work page doi:10.1007/978-3-030-77211-6_2 2021
[27]

In: WACV’23

Zhao, B., Bilen, H.: Dataset condensation with distribution matching. In: WACV’23. pp. 6514–6523 (2023)

work page 2023
[28]

In: ICML’25 (2025)

Zhao, L., Wu, Y., Jiang, X., Gu, J., Wang, Y., Xu, X., Zhao, P., Lin, X.: Taming diffusion for dataset distillation with high representativeness. In: ICML’25 (2025)

work page 2025
[29]

In: ICML’21

Zhao, T.Z., Wallace, E., Feng, S., Klein, D., Singh, S.: Calibrate before use: Im- proving few-shot performance of language models. In: ICML’21. pp. 12697–12706 (2021)

work page 2021

[1] [1]

Achiam, J., et al.: GPT-4 technical report. Tech. rep., OpenAI (2024), arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

Awadalla, A., Gao, I., Gardner, J., Hessel, J., Hanafy, Y., Zhu, W., Marathe, K., Bitton, Y., Gadre, S., Sagawa, S., Jitsev, J., Kornblith, S., Koh, P.W., Ilharco, G., Wortsman, M., Schmidt, L.: OpenFlamingo: An open-source framework for training large autoregressive vision-language models (2023), arXiv:2308.01390

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

In: NeurIPS’20

Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Win- ter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford...

work page 1901

[4] [4]

In: MICCAI’23

Bungert, T.J., Kobelke, L., Jaeger, P.F.: Understanding silent failures in medical image classification. In: MICCAI’23. pp. 400–410 (2023)

work page 2023

[5] [5]

Nature Medicine25(8), 1301–1309 (2019).https://doi.org/10.1038/ s41591-019-0508-1

Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Silva, V.W.K., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical-grade com- putational pathology using weakly supervised deep learning on whole slide images. Nature Medicine25(8), 1301–1309 (2019).https://doi.org/10.1038/ s41591-019-0508-1

work page 2019

[6] [6]

In: CVPR’22

Cazenavette, G., Wang, T., Torralba, A., Efros, A.A., Zhu, J.Y.: Dataset distilla- tion by matching training trajectories. In: CVPR’22. pp. 4750–4759 (2022)

work page 2022

[7] [7]

In: MICCAI’24

Cechnicka, S., Ball, J., Baugh, M., Reynaud, H., Simmonds, N., Smith, A.P., Hors- field, C., Roufosse, C., Kainz, B.: URCDM: Ultra-resolution image synthesis in histopathology. In: MICCAI’24. pp. 535–545 (2024)

work page 2024

[8] [8]

In: MICCAI’23 Workshop on Domain Adaptation and Representation Transfer

Cechnicka, S., Ball, J., Reynaud, H., Arthurs, C., Roufosse, C., Kainz, B.: Realistic data enrichment for robust image segmentation in histopathology. In: MICCAI’23 Workshop on Domain Adaptation and Representation Transfer. pp. 63–72 (2023)

work page 2023

[9] [9]

Nature Communications15(1), 10104 (2024).https://doi

Ferber, D., Wölflein, G., Wiest, I.C., Ligero, M., Sainath, S., Ghaffari Laleh, N., El Nahhas, O.S.M., Müller-Franzes, G., Jäger, D., Truhn, D., Kather, J.N.: In-context learning enables multimodal large language models to classify cancer pathology images. Nature Communications15(1), 10104 (2024).https://doi. org/10.1038/s41467-024-51465-9

work page doi:10.1038/s41467-024-51465-9 2024

[10] [10]

Ferlay, J., Colombet, M., Soerjomataram, I., Parkin, D.M., Piñeros, M., Znaor, A., Bray, F.: Cancer statistics for the year 2020: An overview. International Journal 10 Franciskus Xaverius Erick , Johanna Paula Müller, and Bernhard Kainz of Cancer149(4), 778–789 (2021).https://doi.org/https://doi.org/10.1002/ ijc.33588,https://onlinelibrary.wiley.com/doi/a...

work page doi:10.1002/ijc.33588 2020

[11] [11]

medRxiv (2023).https://doi.org/10.1101/2023

Filiot, A., Ghermi, R., Olivier, A., Jacob, P., Fidon, L., Camara, A., Mac Kain, A., Saillard, C., Schiratti, J.B.: Scaling self-supervised learning for histopathology with masked image modeling. medRxiv (2023).https://doi.org/10.1101/2023. 07.21.23292757

work page doi:10.1101/2023 2023

[12] [12]

In: CVPR’25

Jiang, Y., Fu, J., Hao, C., Hu, X., Peng, Y., Geng, X., Yang, X.: Mimic in-context learning for multimodal tasks. In: CVPR’25. pp. 29825–29834 (2025)

work page 2025

[13] [13]

PLOS Medicine16(1), e1002730 (2019).https: //doi.org/10.1371/journal.pmed.1002730

Kather, J.N., Krisam, J., Charoentong, P., Luedde, T., Herpel, E., Weis, C.A., Gaiser, T., Marx, A., Valous, N.A., Ferber, D., Jansen, L., Reyes-Aldasoro, C.C., Zörnig, I., Jäger, D., Brenner, H., Chang-Claude, J., Hoffmeister, M., Halama, N.: Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter stud...

work page doi:10.1371/journal.pmed.1002730 2019

[14] [14]

npj Digital Medicine8(1), 423 (2025).https://doi.org/10.1038/ s41746-025-01837-2

Kurz, C.F., Merzhevich, T., Eskofier, B.M., Kather, J.N., Gmeiner, B.: Bench- marking vision-language models for diagnostics in emergency and critical care settings. npj Digital Medicine8(1), 423 (2025).https://doi.org/10.1038/ s41746-025-01837-2

work page 2025

[15] [15]

Laurençon, H., Tronchon, L., Cord, M., Sanh, V.: What matters when building vision-language models? In: NeurIPS’24. vol. 37 (2024)

work page 2024

[16] [16]

In: NeurIPS’23

Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang, J., Naumann, T., Poon, H., Gao, J.: LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. In: NeurIPS’23. vol. 36, pp. 28541–28564 (2023)

work page 2023

[17] [17]

In: CVPR’24

Li, L., Peng, J., Chen, H., Gao, C., Yang, X.: How to configure good in-context sequence for visual question answering. In: CVPR’24. pp. 26710–26720 (2024)

work page 2024

[18] [18]

In: ICML’24 (2024)

Liu, S., Ye, H., Xing, L., Zou, J.: In-context vectors: Making in context learning more effective and controllable through latent space steering. In: ICML’24 (2024)

work page 2024

[19] [19]

In: ACL’22

Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In: ACL’22. pp. 8086–8098 (2022).https://doi.org/10.18653/v1/2022.acl-long. 556

work page doi:10.18653/v1/2022.acl-long 2022

[20] [20]

org/abs/2410.11199

McIntosh-Smith, S., Alam, S.R., Woods, C.: Isambard-ai: a leadership class super- computer optimised specifically for artificial intelligence. arXiv.2410.11199 (2024)

work page arXiv 2024

[21] [21]

In: ICML’25 (2025)

Oh, C., Fang, Z., Im, S., Du, X., Li, Y.: Understanding multimodal LLMs under distribution shifts: An information-theoretic approach. In: ICML’25 (2025)

work page 2025

[22] [22]

Sanogo, K., Ardiccioni, R.: Toward more reliable artificial intelligence: Reducing hallucinations in vision-language models (2025), arXiv:2512.07564

work page arXiv 2025

[23] [23]

In: NeurIPS’16

Tolstikhin, I.O., Sriperumbudur, B.K., Schölkopf, B.: Minimax estimation of max- imum mean discrepancy with radial kernels. In: NeurIPS’16. vol. 29 (2016)

work page 2016

[24] [24]

In: MICCAI’21

Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: TransPath: Transformer-based self-supervised learning for histopathological image classification. In: MICCAI’21. Lecture Notes in Computer Science, vol. 12908, pp. 186–195 (2021).https://doi.org/10.1007/978-3-030-87237-3_18

work page doi:10.1007/978-3-030-87237-3_18 2021

[25] [25]

Proceedings of the AAAI Conference on Artificial Intelligence40(13), 10458–10466 (Mar 2026).https://doi.org/10.1609/aaai.v40i13.38017,https://ojs.aaai

Wang, Z., Wang, J., Xu, H., Yan, M., Huang, F., Yang, X., Wei, X.S., Mi, S., Zhang, Y.: Efficient and effective in-context demonstration selection with coreset. Proceedings of the AAAI Conference on Artificial Intelligence40(13), 10458–10466 (Mar 2026).https://doi.org/10.1609/aaai.v40i13.38017,https://ojs.aaai. org/index.php/AAAI/article/view/38017

work page doi:10.1609/aaai.v40i13.38017 2026

[26] [26]

In: AIME’21

Wei, J., Suriawinata, A., Ren, B., Liu, X., Lisovsky, M., Vaickus, L., Brown, C., Baker, M., Tomita, N., Torresani, L., Wei, J., Hassanpour, S.: A petri dish for Geometry-Aware Uncertainty Coresets 11 histopathology image analysis. In: AIME’21. Lecture Notes in Computer Science, vol. 12721, pp. 11–24 (2021).https://doi.org/10.1007/978-3-030-77211-6_2

work page doi:10.1007/978-3-030-77211-6_2 2021

[27] [27]

In: WACV’23

Zhao, B., Bilen, H.: Dataset condensation with distribution matching. In: WACV’23. pp. 6514–6523 (2023)

work page 2023

[28] [28]

In: ICML’25 (2025)

Zhao, L., Wu, Y., Jiang, X., Gu, J., Wang, Y., Xu, X., Zhao, P., Lin, X.: Taming diffusion for dataset distillation with high representativeness. In: ICML’25 (2025)

work page 2025

[29] [29]

In: ICML’21

Zhao, T.Z., Wallace, E., Feng, S., Klein, D., Singh, S.: Calibrate before use: Im- proving few-shot performance of language models. In: ICML’21. pp. 12697–12706 (2021)

work page 2021