{"paper":{"title":"Topo-R1: Detecting Topological Anomalies via Vision-Language Models","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"Fine-tuning a vision-language model with a topology-aware composite reward lets it localize and classify connectivity anomalies in tubular segmentation masks.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Chao Chen, Dimitris Samaras, Kehan Qi, Meilong Xu, Qingqiao Hu, Shahira Abousamra, Weimin Lyu, Xiaoling Hu, Xin Yu","submitted_at":"2026-03-13T15:05:04Z","abstract_excerpt":"Topology is critical in tubular structures such as blood vessels, nerve fibers, and road networks, where connectivity and loop structure govern downstream functional analysis. Vision-Language Models (VLMs) are promising candidates for understanding such structures, given their reasoning and grounding capabilities. To probe their topological perception, we systematically evaluate leading closed- and open-source VLMs on localizing and classifying four canonical topological anomalies (broken/spurious connections, missing/extra branches) in tubular-network segmentation masks. They perform nearly a"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Extensive experiments show that Topo-R1 substantially outperforms general-purpose VLMs and matches or exceeds supervised baselines across ID, OOD, and real-segmentation-output protocols, establishing a strong foundation for VLM-based topological understanding of structured visual data.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The synthetic topological perturbations generated by the automated pipeline, annotated via Betti numbers, accurately capture the distribution and nature of topological anomalies present in real-world segmentation masks from medical and other domains.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Topo-R1 fine-tunes a vision-language model using a topology-aware reward and GRPO to detect anomalies such as broken or spurious connections in tubular segmentation masks, outperforming standard VLMs.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Fine-tuning a vision-language model with a topology-aware composite reward lets it localize and classify connectivity anomalies in tubular segmentation masks.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c36b9f8fb60f5b0a724ec75da6e2023f0e42ee18dc083578a42929d678866154"},"source":{"id":"2603.13054","kind":"arxiv","version":2},"verdict":{"id":"89629f05-4755-4128-baf7-f652326c6b13","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T11:37:47.218948Z","strongest_claim":"Extensive experiments show that Topo-R1 substantially outperforms general-purpose VLMs and matches or exceeds supervised baselines across ID, OOD, and real-segmentation-output protocols, establishing a strong foundation for VLM-based topological understanding of structured visual data.","one_line_summary":"Topo-R1 fine-tunes a vision-language model using a topology-aware reward and GRPO to detect anomalies such as broken or spurious connections in tubular segmentation masks, outperforming standard VLMs.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The synthetic topological perturbations generated by the automated pipeline, annotated via Betti numbers, accurately capture the distribution and nature of topological anomalies present in real-world segmentation masks from medical and other domains.","pith_extraction_headline":"Fine-tuning a vision-language model with a topology-aware composite reward lets it localize and classify connectivity anomalies in tubular segmentation masks."},"references":{"count":105,"sample":[{"doi":"","year":2022,"title":"In: NeurIPS (2022)","work_id":"62a88233-e86b-4a99-ba2c-62189f70a394","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"In: AISTATS (2024)","work_id":"c2329114-0876-46eb-bfc1-88bed9cd7bd2","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond","work_id":"cbc2bb21-b6bb-46c0-80bf-107e195ffe10","ref_index":3,"cited_arxiv_id":"2308.12966","is_internal_anchor":true},{"doi":"","year":2025,"title":"Qwen2.5-VL Technical Report","work_id":"69dffacb-bfe8-442d-be86-48624c60426f","ref_index":4,"cited_arxiv_id":"2502.13923","is_internal_anchor":true},{"doi":"","year":2025,"title":"In: NeurIPS Workshop on Space in Vision, Language, and Embodied AI (2025)","work_id":"59ee22c3-fe4c-4d6c-bb03-2b0191c9cd5f","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":105,"snapshot_sha256":"182c2220d493e36d0c9669d84ae38210f6600d577c2aa86a6e2954dcf48e48ba","internal_anchors":15},"formal_canon":{"evidence_count":2,"snapshot_sha256":"393ac8289564e9bd4021d36f53d56c11a638f00bf8c9fdb1d7523aebb6a0dd99"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}