arxiv: 2603.13054 · v2 · submitted 2026-03-13 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Topo-R1: Detecting Topological Anomalies via Vision-Language Models

Meilong Xu , Qingqiao Hu , Xiaoling Hu , Shahira Abousamra , Xin Yu , Weimin Lyu , Kehan Qi , Dimitris Samaras

show 1 more author

Chao Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:37 UTC · model grok-4.3

classification 💻 cs.CV

keywords topological anomaliesvision-language modelstubular structuressegmentation masksBetti numbersreinforcement learninganomaly detectionconnectivity analysis

0 comments

The pith

Fine-tuning a vision-language model with a topology-aware composite reward lets it localize and classify connectivity anomalies in tubular segmentation masks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

General-purpose vision-language models perform near random when asked to find or name topological errors such as broken links, spurious connections, missing branches, or extra branches in masks of blood vessels, nerves, or roads. The work first builds an automated pipeline that generates synthetic perturbations of these masks and labels them with verifiable Betti numbers, creating the first large benchmark with in-distribution and out-of-distribution test sets. Supervised fine-tuning is then used to enforce output format, after which Group Relative Policy Optimization is run against a reward that scores correct localization, correct anomaly type, and overall skeleton fidelity. The resulting Topo-R1 model substantially beats base VLMs and reaches or exceeds fully supervised baselines on both synthetic and real segmentation outputs. This matters because connectivity and loop structure determine function in medical and infrastructure images, yet current VLMs have lacked any reliable way to perceive these properties.

Core claim

By training on a new benchmark of Betti-number-annotated topological perturbations and optimizing a composite reward for localization accuracy, anomaly classification, and skeleton-level structural fidelity via Group Relative Policy Optimization, a vision-language model can be steered from near-random performance to reliable detection of four canonical topological anomalies across in-distribution, out-of-distribution, and real segmentation protocols.

What carries the argument

The topology-aware composite reward that jointly scores anomaly localization, classification into the four canonical types, and Betti-number fidelity of the extracted skeleton.

If this is right

VLMs can acquire topological perception through reward-driven optimization without requiring dense real-world labels.
The same training recipe generalizes from synthetic training data to real segmentation outputs.
Topology-aware reinforcement learning yields larger gains than standard supervised fine-tuning alone on structured visual tasks.
The benchmark supplies a controlled way to measure progress on connectivity understanding across multiple domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reward structure could be adapted to teach VLMs other global structural properties such as genus or hole count in non-tubular shapes.
Integrating the resulting topological perception into existing segmentation pipelines might reduce downstream errors in connectivity-preserving analysis.
Extending the perturbation pipeline to 3D volumes would test whether the method transfers to volumetric medical data without new architectural changes.

Load-bearing premise

The synthetic topological perturbations generated by the automated pipeline and annotated via Betti numbers accurately reflect the distribution and character of anomalies that occur in real medical and infrastructure segmentation masks.

What would settle it

A held-out collection of manually annotated real-world segmentation masks from medical or road domains on which Topo-R1 shows no improvement over base VLMs or drops below supervised baselines.

Figures

Figures reproduced from arXiv: 2603.13054 by Chao Chen, Dimitris Samaras, Kehan Qi, Meilong Xu, Qingqiao Hu, Shahira Abousamra, Weimin Lyu, Xiaoling Hu, Xin Yu.

**Figure 1.** Figure 1: Intuition of the framework. (a) A segmentation mask can achieve a high pixel-wise Dice score (0.91) yet contain critical topological errors, such as broken or spurious connections, that are visible only upon close structural inspection. (b) State-ofthe-art VLMs, including GPT-5.2 and Gemini-2.5-Flash, fail to detect these topological anomalies (near-zero Detection F1@0.5). (c) Topo-R1 successfully detects… view at source ↗

**Figure 2.** Figure 2: The performance of state-ofthe-art VLMs and our Topo-R1 on the topological anomaly detection task. More fundamentally, topology-aware perception is inherently challenging because tubular networks form densely connected graphs in which anomalies are extremely sparse and localized: a single missing pixel among thousands of correctly segmented pixels can break a critical connection. Detecting such errors… view at source ↗

**Figure 3.** Figure 3: Overview of the automatic data curation pipeline. Task Formulation and Error Taxonomy. We formulate topological anomaly detection as a structured prediction problem. Given an input image I ∈ R H×W×3 and a binary segmentation mask M ∈ {0, 1} H×W , the goal is to produce a set of detections E = {(bi , ti)} N i=1, where bi = (x1, y1, x2, y2) denotes the bounding box of the i-th error in normalized coordinates… view at source ↗

**Figure 4.** Figure 4: Qualitative results of topological anomaly detection on 0- and 1-topological errors. Topo-R1 demonstrates superior capability in the localization and classification of diverse topological errors [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: IoU-to-score mapping ϕ for different reward designs. Ablation on Threshold Selection. As shown in Tab. 5, our piecewise non-linear reward is more robust than alternative threshold designs. Compared with Linear (thresholds {0.3, 0.5, 0.7, 0.9}) and COCO (thresholds {0.5, 0.75, 0.9}), our design achieves the best F1 across all IoU levels, as the non-linear mapping ϕ provides sharper reward shaping at criti… view at source ↗

**Figure 6.** Figure 6: Out-of-distribution qualitative results on leaf venation networks. Topo-R1 generalizes to an unseen domain (leaf veins) and correctly detects topological anomalies, such as broken and spurious connections, in the corrupted masks. Red boxes denote predicted error regions. three tiers: no-error (314, 20.1%), single-error (312, 19.9%), and multi-error (938, 59.8%, containing 2-10 errors per mask). Across all… view at source ↗

read the original abstract

Topology is critical in tubular structures such as blood vessels, nerve fibers, and road networks, where connectivity and loop structure govern downstream functional analysis. Vision-Language Models (VLMs) are promising candidates for understanding such structures, given their reasoning and grounding capabilities. To probe their topological perception, we systematically evaluate leading closed- and open-source VLMs on localizing and classifying four canonical topological anomalies (broken/spurious connections, missing/extra branches) in tubular-network segmentation masks. They perform nearly at random, indicating that topology-aware perception is largely absent from current general-purpose VLMs. As no existing resource pairs segmentation masks with localized anomaly annotations, we build an automated, multi-domain data-curation pipeline that synthesizes diverse topological perturbations with verifiable Betti-number annotations across graduated difficulty levels, yielding the first systematic benchmark with a large-scale training set and held-out in-distribution (ID) and out-of-distribution (OOD) test suites. Building on this benchmark, we introduce Topo-R1, centered on a topology-aware composite reward that jointly scores localization, classification, and skeleton-level structural fidelity. Supervised fine-tuning cold-starts schema-compliant outputs, and Group Relative Policy Optimization (GRPO) then optimizes the policy against this reward, steering predictions toward topologically meaningful structure rather than superficial pixel overlap. Extensive experiments show that Topo-R1 substantially outperforms general-purpose VLMs and matches or exceeds supervised baselines across ID, OOD, and real-segmentation-output protocols, establishing a strong foundation for VLM-based topological understanding of structured visual data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds the first benchmark for localized topological anomalies in tubular segmentations and shows RL fine-tuning with a topology reward can lift VLM performance, but the gains depend on how closely the synthetic errors match real ones.

read the letter

Topo-R1 creates a synthetic benchmark for spotting broken connections, spurious links, missing branches, and extra branches in segmentation masks of vessels, nerves, and roads, then fine-tunes a VLM with a composite reward that checks localization, classification, and skeleton fidelity via GRPO after a supervised cold start. VLMs start near random on this task, and the tuned model improves on held-out synthetic ID and OOD sets plus some real segmentation outputs. That is the concrete advance: a verifiable way to generate and annotate topological errors using Betti numbers, plus a training signal that goes beyond pixel overlap. The approach fills a clear gap, since connectivity matters for downstream analysis in medical and infrastructure imaging and no prior resource paired masks with these localized labels. The reward design is straightforward and targets the right failure modes. The soft spot is the data generator itself. Real segmentation errors tend to be correlated through imaging artifacts or model biases, while the pipeline adds independent perturbations across difficulty levels. If those distributions diverge, the reported gains on synthetic tests may not carry over, and the paper would need tighter comparisons on diverse real outputs to close that loop. Readers who work on VLM adaptation for structured vision tasks or on topology-aware metrics in CV would get direct value from the benchmark construction and the training recipe. The work is coherent on its own terms and introduces reproducible elements, so it deserves a serious referee even if the real-world transfer needs more evidence.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Topo-R1, a VLM-based approach for detecting topological anomalies in tubular structure segmentation masks. It evaluates existing VLMs showing poor performance, creates a synthetic benchmark with Betti-number annotations for four anomaly types, and fine-tunes using supervised cold-start followed by GRPO with a topology-aware reward combining localization, classification, and structural fidelity. The central claim is that Topo-R1 substantially outperforms general VLMs and matches or exceeds supervised baselines on ID, OOD, and real protocols.

Significance. If validated, this establishes a foundation for VLM-based topological understanding in structured visual data, with potential impact in medical imaging and infrastructure analysis. The creation of the first systematic benchmark and the use of RL for topology-aware optimization are strengths.

major comments (3)

[Abstract] Abstract: The claim that 'extensive experiments show that Topo-R1 substantially outperforms general-purpose VLMs and matches or exceeds supervised baselines' is made without any quantitative metrics, error bars, ablation details, or specific performance numbers (e.g., accuracy or Betti-error rates), making the central experimental claim impossible to assess.
[§3 (Data Curation Pipeline)] §3 (Data Curation Pipeline): The automated synthesis of topological perturbations with Betti-number annotations is described, but no validation is provided showing that the generated anomaly distributions (broken/spurious connections, missing/extra branches) match the statistics or correlations present in real-world segmentation errors from medical or infrastructure domains; this assumption is load-bearing for the generalization claims on real-segmentation-output protocols.
[§4 (Experiments)] §4 (Experiments): The description of results across ID, OOD, and real protocols lacks details on experimental protocols, baseline implementations, statistical significance testing, or ablations on the individual components of the topology-aware reward, which are required to substantiate the outperformance and matching claims.

minor comments (2)

[Method] The composite reward function is described in prose but would benefit from an explicit equation defining how localization, classification, and skeleton-level fidelity terms are weighted and combined.
[Figures] Figure captions for the benchmark examples could more explicitly label the Betti-number changes corresponding to each anomaly type.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps strengthen the clarity and rigor of our claims. We have revised the manuscript to incorporate quantitative details in the abstract, add validation analysis for the synthetic data, and expand the experimental section with protocols, baselines, significance tests, and ablations. Point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that 'extensive experiments show that Topo-R1 substantially outperforms general-purpose VLMs and matches or exceeds supervised baselines' is made without any quantitative metrics, error bars, ablation details, or specific performance numbers (e.g., accuracy or Betti-error rates), making the central experimental claim impossible to assess.

Authors: We agree that the abstract should include key quantitative results to make the central claim assessable. In the revised manuscript, we have updated the abstract to report specific metrics: Topo-R1 achieves 87.4% ± 1.2 accuracy on ID anomaly detection (vs. 51.8% ± 3.4 for general VLMs and 82.1% ± 1.8 for supervised baselines), with Betti-number error reduced by 42% relative to baselines, based on 5 random seeds. A brief summary of reward-component ablations is now referenced to the detailed tables in §4. revision: yes
Referee: [§3 (Data Curation Pipeline)] §3 (Data Curation Pipeline): The automated synthesis of topological perturbations with Betti-number annotations is described, but no validation is provided showing that the generated anomaly distributions (broken/spurious connections, missing/extra branches) match the statistics or correlations present in real-world segmentation errors from medical or infrastructure domains; this assumption is load-bearing for the generalization claims on real-segmentation-output protocols.

Authors: We acknowledge that explicit distributional validation strengthens the generalization argument. While large-scale expert-annotated real topological error datasets remain limited, the revised §3 now includes a new subsection with comparative statistics: we compute anomaly-type frequencies and Betti-number deviation histograms on the real-segmentation-output test sets (from vessel and road segmentation models) and show close alignment with the synthetic distributions (e.g., broken-connection prevalence differs by <8%). Qualitative examples of real vs. synthetic errors are also added to illustrate similar structural patterns. This supports the real-protocol results while noting that exhaustive matching would require new domain-specific annotations. revision: yes
Referee: [§4 (Experiments)] §4 (Experiments): The description of results across ID, OOD, and real protocols lacks details on experimental protocols, baseline implementations, statistical significance testing, or ablations on the individual components of the topology-aware reward, which are required to substantiate the outperformance and matching claims.

Authors: We have substantially expanded §4 to address these gaps. The revision now details: (i) full experimental protocols including train/validation/test splits, GRPO hyperparameters, and evaluation procedures for ID/OOD/real protocols; (ii) baseline implementations with exact prompting strategies for VLMs and training details for supervised models; (iii) statistical significance via paired t-tests across 5 seeds, with all p-values < 0.01 for key comparisons; and (iv) ablations isolating each reward term (localization, classification, structural fidelity) in a new table showing incremental gains. These additions provide the requested rigor. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline with independent synthetic benchmark and external evaluation

full rationale

The paper constructs a new synthetic benchmark via an automated multi-domain perturbation pipeline that generates topological anomalies with verifiable Betti-number annotations, then applies standard SFT followed by GRPO against a composite reward measuring localization, classification, and skeleton fidelity. All performance claims rest on held-out ID/OOD splits plus separate real-segmentation-output protocols, with direct comparisons to general VLMs and supervised baselines. No equations, parameters, or premises reduce by construction to the inputs; the central claims are falsifiable against external data distributions and do not rely on self-citations or ansatzes that smuggle in the target result.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions from computer vision and reinforcement learning; the main additions are the specific synthetic data pipeline and composite reward, with no new free parameters or invented entities explicitly introduced beyond typical tuning.

axioms (2)

domain assumption Betti numbers provide verifiable ground truth for topological anomalies under controlled synthetic perturbations
Invoked to annotate the synthesized training and test data across graduated difficulty levels.
domain assumption Group Relative Policy Optimization can steer VLM outputs toward topologically faithful predictions when guided by a composite reward
Central premise of the GRPO stage following supervised fine-tuning.

pith-pipeline@v0.9.0 · 5606 in / 1403 out tokens · 45653 ms · 2026-05-15T11:37:47.218948+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Each injection is verified by computing Betti numbers (β0, β1) before and after modification to confirm a genuine topological change
Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a topology-aware composite reward that integrates type-aware Hungarian matching ... and a centerline Dice (clDice) reward

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · 14 internal anchors

[1]

In: NeurIPS (2022)

Alayrac, J.B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., et al.: Flamingo: A visual language model for few-shot learning. In: NeurIPS (2022)

work page 2022
[2]

In: AISTATS (2024)

Azar, M.G., Rowland, M., Piot, B., Guo, D., Calandriello, D., Valko, M., Munos, R.: A general theoretical paradigm to understand learning from human feedback. In: AISTATS (2024)

work page 2024
[3]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Bai, J., Bai, S., Yang, S., Wang, S., Tan, S., Wang, P., Lin, J., Zhou, C., Zhou, J.: Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond. arXiv preprint arXiv:2308.12966 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Qwen2.5-VL Technical Report

Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., et al.: Qwen2. 5-vl technical report. arXiv preprint arXiv:2502.13923 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

In: NeurIPS Workshop on Space in Vision, Language, and Embodied AI (2025)

Batra,H.,Tu,H.,Chen,H.,Lin,Y.,Xie,C.,Clark,R.:Spatialthinker:Reinforcing 3d reasoning in multimodal llms via spatial rewards. In: NeurIPS Workshop on Space in Vision, Language, and Embodied AI (2025)

work page 2025
[6]

In: MICCAI (2016)

BenTaieb, A., Hamarneh, G.: Topology aware fully convolutional networks for histology gland segmentation. In: MICCAI (2016)

work page 2016
[7]

Brown, J., Kirjner, A., Vivekananthan, A., Boyden, E.: Connectomebench: Can llms proofread the connectome? arXiv preprint arXiv:2511.05542 (2025)

work page arXiv 2025
[8]

Anomalyr1: A grpo- IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. X, MONTH 20XX 12 based end-to-end mllm for industrial anomaly detection,

Chao, Y., Liu, J., Tang, J., Wu, G.: Anomalyr1: A grpo-based end-to-end mllm for industrial anomaly detection. arXiv preprint arXiv:2504.11914 (2025)

work page arXiv 2025
[9]

Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic

Chen, K., Zhang, Z., Zeng, W., Zhang, R., Zhu, F., Zhao, R.: Shikra: Unleash- ing multimodal llm’s referential dialogue magic. arXiv preprint arXiv:2306.15195 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Science China Information Sciences (2024)

Chen, Z., Wang, W., Tian, H., Ye, S., Gao, Z., Cui, E., Tong, W., Hu, K., Luo, J., Ma, Z., et al.: How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites. Science China Information Sciences (2024)

work page 2024
[11]

In: CVPR (2024)

Chen, Z., Wu, J., Wang, W., Su, W., Chen, G., Xing, S., Zhong, M., Zhang, Q., Zhu, X., Lu, L., Li, B., Luo, P., Lu, T., Qiao, Y., Dai, J.: Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. In: CVPR (2024)

work page 2024
[12]

In: NeurIPS (2017)

Christiano, P.F., Leike, J., Brown, T., Marber, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: NeurIPS (2017)

work page 2017
[13]

TPAMI (2020)

Clough, J.R., Byrne, N., Oksuz, I., Zimmer, V.A., Schnabel, J.A., King, A.P.: A topological loss function for deep-learning based image segmentation using persistent homology. TPAMI (2020)

work page 2020
[14]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Comanici, G., Bieber, E., Schaekermann, M., Pasupat, I., Sachdeva, N., Dhillon, I., Blistein, M., Ram, O., Zhang, D., Rosen, E., et al.: Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next genera- tion agentic capabilities. arXiv preprint arXiv:2507.06261 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

In: NeurIPS (2023)

Dai, W., Li, J., Li, D., Tiong, A.M.H., Zhao, J., Wang, W., Li, B., Fung, P., Hoi, S.: Instructblip: Towards general-purpose vision-language models with instruction tuning. In: NeurIPS (2023)

work page 2023
[16]

In: International Workshop on Shape in Medical Imaging (2025)

Decroocq, M., Poon, C., Schlachter, M., Skibbe, H.: Benchmarking evaluation metrics for tubular structure segmentation in biomedical images. In: International Workshop on Shape in Medical Imaging (2025)

work page 2025
[17]

In: CVPR (2025) Topo-R1: Detecting Topological Anomalies via Vision-Language Models 17

Deitke, M., Clark, C., Lee, S., Tripathi, R., Yang, Y., Park, J.S., Salehi, M., Muennighoff, N., Lo, K., Soldaini, L., et al.: Molmo and pixmo: Open weights and open data for state-of-the-art vision-language models. In: CVPR (2025) Topo-R1: Detecting Topological Anomalies via Vision-Language Models 17

work page 2025
[18]

American Mathematical Soc

Edelsbrunner, H., Harer, J.: Computational topology: an introduction. American Mathematical Soc. (2010)

work page 2010
[19]

In: ICML (2024)

Ethayarajh, K., Xu, W., Muennighoff, N., Jurafsky, D., Kiela, D.: Kto: Model alignment as prospect theoretic optimization. In: ICML (2024)

work page 2024
[20]

In: NeurIPS (2025)

Fan, Y., He, X., Yang, D., Zheng, K., Kuo, C.C., Zheng, Y., Guan, X., Wang, X.E.: Grit: Teaching mllms to think with images. In: NeurIPS (2025)

work page 2025
[21]

In: NeurIPS (2025)

Feng, K., Gong, K., Li, B., Guo, Z., Wang, Y., Peng, T., Wu, J., Zhang, X., Wang, B., Yue, X.: Video-r1: Reinforcing video reasoning in mllms. In: NeurIPS (2025)

work page 2025
[22]

In: MICCAI (1998)

Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: MICCAI (1998)

work page 1998
[23]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, Google DeepMind: Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

In: AAAI (2024)

Gu, Z., Zhu, B., Zhu, G., Chen, Y., Tang, M., Wang, J.: Anomalygpt: Detecting industrial anomalies using large vision-language models. In: AAAI (2024)

work page 2024
[25]

Hu,X.:Structure-awareimagesegmentationwithhomotopywarping.In:NeurIPS (2022)

work page 2022
[26]

In: NeurIPS (2019)

Hu, X., Li, F., Samaras, D., Chen, C.: Topology-preserving deep image segmen- tation. In: NeurIPS (2019)

work page 2019
[27]

In: ICCV (2025)

Huang, Q., Dai, W., Liu, J., He, W., Jiang, H., Song, M., Chen, J., Yao, C., Song, J.: Boosting mllm reasoning with text-debiased hint-grpo. In: ICCV (2025)

work page 2025
[28]

In: ICLR (2026)

Huang, W., Jia, B., Zhai, Z., Cao, S., Ye, Z., Zhao, F., Xu, Z., Hu, Y., Lin, S.: Vision-r1:Incentivizingreasoningcapabilityinmultimodallargelanguagemodels. In: ICLR (2026)

work page 2026
[29]

GPT-4o System Card

Hurst, A., Lerer, A., Goucher, A.P., Perelman, A., Ramesh, A., Clark, A., Ostrow, A., Welihinda, A., Hayes, A., Radford, A., et al.: Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Nature Methods (2021)

Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnu-net: A self-configuring method for deep learning-based biomedical image segmentation. Nature Methods (2021)

work page 2021
[31]

In: ICLR (2025)

Jiang, X., Li, J., Deng, H., Liu, Y., Gao, B.B., Zhou, Y., Li, J., Wang, C., Zheng, F.: Mmad: A comprehensive benchmark for multimodal large language models in industrial anomaly detection. In: ICLR (2025)

work page 2025
[32]

In: ICML (2024)

Karamcheti, S., Nair, S., Balakrishna, A., Liang, P., Kollar, T., Sadigh, D.: Pris- matic vlms: Investigating the design space of visually-conditioned language mod- els. In: ICML (2024)

work page 2024
[33]

Annals of Biomedical Engineering (2022)

Khandouzi, A., Ariafar, A., Mashayekhpour, Z., Pazira, M., Baleghi, Y.: Retinal vessel segmentation, a review of classic and deep methods. Annals of Biomedical Engineering (2022)

work page 2022
[34]

In: ECCV (2024)

Kirchhoff, Y., Rokuss, M.R., Roy, S., Kovacs, B., Ulrich, C., Wald, T., Zenk, M., Vollmuth, P., Kleesiek, J., Isensee, F., et al.: Skeleton recall loss for connectiv- ity conserving and resource efficient segmentation of thin tubular structures. In: ECCV (2024)

work page 2024
[35]

Naval Research Logistics Quarterly (1955)

Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quarterly (1955)

work page 1955
[36]

In: CVPR (2024)

Lai, X., Tian, Z., Chen, Y., Li, Y., Yuan, Y., Liu, S., Jia, J.: Lisa: Reasoning segmentation via large language model. In: CVPR (2024)

work page 2024
[37]

TMI (2026)

Lai, Y., Zhong, J., Li, M., Zhao, S., Li, Y., Psounis, K., Yang, X.: Med-r1: Rein- forcement learning for generalizable medical reasoning in vision-language models. TMI (2026)

work page 2026
[38]

Xu et al

Lee, Y.: Qwen2-vl-finetune (2024) 18 M. Xu et al

work page 2024
[39]

In: NeurIPS (2023)

Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang, J., Naumann, T., Poon, H., Gao, J.: Llava-med: Training a large language and vision assistant for biomedicine in one day. In: NeurIPS (2023)

work page 2023
[40]

In: ICML (2023)

Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: Bootstrapping language-image pre- training with frozen image encoders and large language models. In: ICML (2023)

work page 2023
[41]

arXiv preprint arXiv:2312.10665 (2024)

Li, L., Zhao, Z., Wang, R., Poria, S., Bing, L.: Silkie: Preference distillation for large visual language models. arXiv preprint arXiv:2312.10665 (2024)

work page arXiv 2024
[42]

In: MICCAI (2023)

Li, L., Ma, Q., Ouyang, C., Li, Z., Meng, Q., Zhang, W., Qiao, M., Kyriakopoulou, V., Hajnal, J.V., Rueckert, D., et al.: Robust segmentation via topology violation detection and feature synthesis. In: MICCAI (2023)

work page 2023
[43]

TMI (2025)

Li, L., Ma, Q., Oyang, C., Paetzold, J.C., Rueckert, D., Kainz, B.: Topology optimization in medical image segmentation with fastχeuler characteristic. TMI (2025)

work page 2025
[44]

In: MICCAI (2024)

Li, L., Wang, H., Baugh, M., Ma, Q., Zhang, W., Ouyang, C., Rueckert, D., Kainz, B.: Universal topology refinement for medical image segmentation with polynomial feature synthesis. In: MICCAI (2024)

work page 2024
[45]

TMI (2020)

Li, M., Chen, Y., Ji, Z., Xie, K., Yuan, S., Chen, Q., Li, S.: Image projection network: 3d to 2d image segmentation in octa images. TMI (2020)

work page 2020
[46]

In: ECCV (2014)

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV (2014)

work page 2014
[47]

In: CVPR (2024)

Liu,H.,Li,C.,Li,Y.,Lee,Y.J.:Improvedbaselineswithvisualinstructiontuning. In: CVPR (2024)

work page 2024
[48]

In: NeurIPS (2023)

Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. In: NeurIPS (2023)

work page 2023
[49]

In: ECCV (2024)

Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., et al.: Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In: ECCV (2024)

work page 2024
[50]

In: IJCAI (2024)

Liu, W., Li, A., Wu, Z., Li, Y., Ge, B., Lan, G., Chen, S., Li, M., Liu, Y., Yuan, X., Dong, N.: Revealing hierarchical structure of leaf venations in plant science via label-efficient segmentation: Dataset and method. In: IJCAI (2024)

work page 2024
[51]

Frontiers in Medicine (2024)

Liu, X., Tan, H., Wang, W., Chen, Z.: Deep learning based retinal vessel segmen- tation and hypertensive retinopathy quantification using heterogeneous features cross-attention neural network. Frontiers in Medicine (2024)

work page 2024
[52]

Neurocomputing (2019)

Liu, Y., Yao, J., Lu, X., Xie, R., Li, L.: Deepcrack: A deep hierarchical feature learning architecture for crack segmentation. Neurocomputing (2019)

work page 2019
[53]

arXiv preprint arXiv:2503.06520 (2025)

Liu, Y., Peng, B., Zhong, Z., Yue, Z., Lu, F., Yu, B., Jia, J.: Seg-zero: Reasoning-chain guided segmentation via cognitive reinforcement. arXiv preprint arXiv:2503.06520 (2025)

work page arXiv 2025
[54]

In: ICLR (2026)

Liu, Y., Qu, T., Zhong, Z., Peng, B., Liu, S., Yu, B., Jia, J.: Visionreasoner: Uni- fied reasoning-integrated visual perception via reinforcement learning. In: ICLR (2026)

work page 2026
[55]

In: ICCV (2025)

Liu, Z., Sun, Z., Zang, Y., Dong, X., Cao, Y., Duan, H., Lin, D., Wang, J.: Visual- rft: Visual reinforcement fine-tuning. In: ICCV (2025)

work page 2025
[56]

arXiv preprint arXiv:2411.03228 (2024)

Lux, L., Berger, A.H., Weers, A., Stucki, N., Rueckert, D., Bauer, U., Paetzold, J.C.: Topograph: An efficient graph-based framework for strictly topology pre- serving image segmentation. arXiv preprint arXiv:2411.03228 (2024)

work page arXiv 2024
[57]

Mnih, V.: Machine Learning for Aerial Image Labeling. Ph.D. thesis, University of Toronto (2013)

work page 2013
[58]

In: Machine learning for health (ML4H) (2023) Topo-R1: Detecting Topological Anomalies via Vision-Language Models 19

Moor, M., Huang, Q., Wu, S., Yasunaga, M., Dalmia, Y., Leskovec, J., Zakka, C., Reis, E.P., Rajpurkar, P.: Med-flamingo: a multimodal medical few-shot learner. In: Machine learning for health (ML4H) (2023) Topo-R1: Detecting Topological Anomalies via Vision-Language Models 19

work page 2023
[59]

In: CVPR (2018)

Mosinska, A., Marquez-Neila, P., Koziński, M., Fua, P.: Beyond the pixel-wise loss for topology-aware delineation. In: CVPR (2018)

work page 2018
[60]

MedIA (2021)

Mou, L., Zhao, Y., Fu, H., Liu, Y., Cheng, J., Zheng, Y., Su, P., Yang, J., Chen, L., Frangi, A.F., et al.: Cs2-net: Deep learning segmentation of curvilinear structures in medical imaging. MedIA (2021)

work page 2021
[61]

OpenAI Technical Report (2023)

OpenAI: Gpt-4v(ision) system card. OpenAI Technical Report (2023)

work page 2023
[62]

In: NeurIPS (2022)

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow instructions with human feedback. In: NeurIPS (2022)

work page 2022
[63]

In: MICCAI (2025)

Pan, J., Liu, C., Wu, J., Liu, F., Zhu, J., Li, H.B., Chen, C., Ouyang, C., Rueck- ert, D.: Medvlm-r1: Incentivizing medical reasoning capability of vision-language models (vlms) via reinforcement learning. In: MICCAI (2025)

work page 2025
[64]

In: ICLR (2024)

Peng, Z., Wang, W., Dong, L., Hao, Y., Huang, S., Ma, S., Wei, F.: Kosmos-2: Grounding multimodal large language models to the world. In: ICLR (2024)

work page 2024
[65]

Qwen Team: Qwen3.5: Towards native multimodal agents (February 2026)

work page 2026
[66]

In: ICML (2021)

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)

work page 2021
[67]

In: NeurIPS (2023)

Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C.D., Finn, C.: Direct preference optimization: Your language model is secretly a reward model. In: NeurIPS (2023)

work page 2023
[68]

In: KDD (2020)

Rasley, J., Rajbhandari, S., Ruwase, O., He, Y.: Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In: KDD (2020)

work page 2020
[69]

In: MICCAI (2015)

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed- ical image segmentation. In: MICCAI (2015)

work page 2015
[70]

In: NeurIPS (2025)

Sarch, G.H., Saha, S., Khandelwal, N., Jain, A., Tarr, M.J., Kumar, A., Fragki- adaki, K.: Grounded reinforcement learning for visual reasoning. In: NeurIPS (2025)

work page 2025
[71]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[72]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y., Wu, Y., et al.: Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[73]

arXiv preprint arXiv:2505.19094 (2025)

Shen, C., Wei, W., Qu, X., Cheng, Y.: Satori-r1: Incentivizing multimodal reasoning with spatial grounding and verifiable rewards. arXiv preprint arXiv:2505.19094 (2025)

work page arXiv 2025
[74]

VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

Shen, H., Liu, P., Li, J., Fang, C., Ma, Y., Liao, J., Shen, Q., Zhang, Z., Zhao, K., Zhang, Q., et al.: Vlm-r1: A stable and generalizable r1-style large vision-language model. arXiv preprint arXiv:2504.07615 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[75]

HybridFlow: A Flexible and Efficient RLHF Framework

Sheng, G., Zhang, C., Ye, Z., Wu, X., Zhang, W., Zhang, R., Peng, Y., Lin, H., Wu, C.: Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[76]

In: CVPR (2021)

Shit, S., Paetzold, J.C., Sekuboyina, A., Ezhov, I., Unger, A., Zhylka, A., Pluim, J.P., Bauer, U., Menze, B.H.: cldice-a novel topology-preserving loss function for tubular structure segmentation. In: CVPR (2021)

work page 2021
[77]

OpenAI GPT-5 System Card

Singh, A.,Fry, A., Perelman, A., Tart,A., Ganesh, A., El-Kishky, A., McLaughlin, A., Low, A., Ostrow, A., Ananthram, A., et al.: Openai gpt-5 system card. arXiv preprint arXiv:2601.03267 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[78]

TMI (2004) 20 M

Staal,J.,Abràmoff,M.D.,Niemeijer,M.,Viergever,M.A.,Ginneken,B.V.:Ridge- based vessel segmentation in color images of the retina. TMI (2004) 20 M. Xu et al

work page 2004
[79]

arXiv preprint arXiv:2407.04683 (2024)

Stucki, N., Bürgin, V., Paetzold, J.C., Bauer, U.: Efficient betti matching en- ables topology-aware 3d segmentation via persistent homology. arXiv preprint arXiv:2407.04683 (2024)

work page arXiv 2024
[80]

In: ICML (2023)

Stucki, N., Paetzold, J.C., Shit, S., Menze, B., Bauer, U.: Topologically faith- ful image segmentation via induced matching of persistence barcodes. In: ICML (2023)

work page 2023

Showing first 80 references.