CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves
Pith reviewed 2026-05-20 20:34 UTC · model grok-4.3
The pith
Vision models achieve only 71 percent accuracy recovering containment trees from nested curve images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CurveBench formulates exact topological reasoning as the recovery of a rooted tree that encodes the containment relations among regions defined by non-intersecting Jordan curves in an image, and demonstrates through model evaluations that current vision-language models do not yet possess reliable capability for this structured prediction task.
What carries the argument
The rooted containment tree induced by the curves, which represents the full hierarchy of nesting and serves as the exact target output for the visual reasoning task.
If this is right
- Targeted training on containment tree prediction improves model performance on simpler instances of the task.
- Accuracy declines sharply when moving from easy polygonal configurations to dense or maze-like ones.
- The benchmark provides a quantitative measure for tracking progress in topology-aware visual reasoning.
- Remaining performance gaps indicate the need for advances in models' ability to handle exact nesting relations.
Where Pith is reading between the lines
- Similar benchmarks could be created for other topological properties such as connectivity or genus to broaden evaluation of spatial reasoning.
- Models might be improved by incorporating explicit geometric priors for curve nesting rather than relying solely on learned patterns.
- Applications in fields like map interpretation or circuit design could benefit from better performance on this type of reasoning.
- Failure modes on hard cases may reveal specific weaknesses in handling high numbers of nested elements.
Load-bearing premise
The supplied ground-truth trees correctly capture the containment relations without errors, and measuring tree generation accuracy truly reflects understanding of topology instead of reliance on superficial image features.
What would settle it
A vision model that produces the correct containment tree for a large fraction of CurveBench-Hard images while failing on images where the topology is altered but low-level visual statistics are preserved would indicate that the models are learning genuine topological reasoning rather than shortcuts.
Figures
read the original abstract
We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of \textbf{756 images} of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only \textbf{71.1\%} tree-generation accuracy on CurveBench-Easy and \textbf{19.1\%} on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over \texttt{Qwen-3-VL-8B-Thinking} from \textbf{2.8\%} to \textbf{33.3\%} tree-generation accuracy on CurveBench-Easy, exceeding GPT-5.4 and Claude Opus 4.5 under our evaluation protocol. The remaining gap, especially on CurveBench-Hard, shows that exact topology-aware visual reasoning remains far from solved.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CurveBench, a benchmark of 756 images depicting nested, non-intersecting Jordan curves in easy, polygonal, topographic, maze-like, and dense configurations. Each image is paired with a ground-truth rooted tree encoding containment relations among the induced planar regions. The central task is structured prediction: given an image, recover the exact containment tree. The authors report that Gemini 3.1 Pro reaches 71.1% tree-generation accuracy on the Easy subset and 19.1% on the Hard subset; they further show that RLVR-style fine-tuning lifts Qwen3-VL-8B from 2.8% to 33.3% on Easy, exceeding several closed models under their protocol. The work concludes that exact topology-aware visual reasoning remains far from solved.
Significance. If the supplied containment trees are verifiably error-free and the tree-generation metric isolates genuine topological understanding from low-level visual shortcuts or rendering artifacts, the benchmark would usefully quantify current VLM limitations on hierarchical spatial reasoning and supply a concrete target for future progress. The fine-tuning result demonstrates that modest gains are achievable with targeted training, lending the benchmark immediate utility for model development.
major comments (2)
- [Abstract and Dataset Construction] The headline performance numbers (71.1% Easy, 19.1% Hard) and the claim that exact topological reasoning remains unsolved rest on the assumption that the 756 ground-truth containment trees are free of annotation errors. The manuscript supplies no inter-annotator agreement statistics, no automated consistency checks (e.g., verification that containment is transitive and that every curve appears exactly once), and no release of raw SVG coordinates that would permit external re-derivation of the trees. Even a modest fraction of inverted parents or missing leaves on the Hard subset would render the 19.1% figure uninterpretable as a pure measure of model capability.
- [Experiments and Evaluation] The evaluation protocol does not report controls for visual shortcuts (e.g., curve thickness, fill color, or counting heuristics) that could allow models to achieve non-zero accuracy without recovering the true containment topology. Because the central claim concerns exact topological reasoning rather than pattern matching, the absence of such controls weakens the interpretation of the reported gaps.
minor comments (1)
- [Abstract] The abstract states that curves are 'pairwise non-intersecting' yet the Hard subset includes 'dense counting configurations'; a brief clarification of how non-intersection is enforced in the densest cases would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback. The comments identify important areas for strengthening claims about dataset reliability and isolating topological reasoning. We respond to each major comment below and will revise the manuscript to address them.
read point-by-point responses
-
Referee: [Abstract and Dataset Construction] The headline performance numbers (71.1% Easy, 19.1% Hard) and the claim that exact topological reasoning remains unsolved rest on the assumption that the 756 ground-truth containment trees are free of annotation errors. The manuscript supplies no inter-annotator agreement statistics, no automated consistency checks (e.g., verification that containment is transitive and that every curve appears exactly once), and no release of raw SVG coordinates that would permit external re-derivation of the trees. Even a modest fraction of inverted parents or missing leaves on the Hard subset would render the 19.1% figure uninterpretable as a pure measure of model capability.
Authors: CurveBench is generated via a fully procedural pipeline in which a target containment tree is first sampled and then realized as non-intersecting Jordan curves; the ground-truth tree is therefore exact by construction rather than the product of manual annotation. We will add an expanded Dataset Construction section that documents the generation algorithm, the automated verification steps (transitivity, uniqueness, and planarity), and the release of raw SVG files with the benchmark to enable external re-derivation. revision: yes
-
Referee: [Experiments and Evaluation] The evaluation protocol does not report controls for visual shortcuts (e.g., curve thickness, fill color, or counting heuristics) that could allow models to achieve non-zero accuracy without recovering the true containment topology. Because the central claim concerns exact topological reasoning rather than pattern matching, the absence of such controls weakens the interpretation of the reported gaps.
Authors: We agree that additional controls would strengthen the interpretation. In the revised manuscript we will include a new ablation subsection that reports model performance under randomized curve thickness and fill colors, as well as on specially constructed subsets that neutralize simple counting or boundary-length heuristics. These results will be used to argue that the observed gaps reflect limitations in hierarchical topological reasoning. revision: yes
Circularity Check
No circularity: empirical benchmark with no derivations or self-referential predictions
full rationale
This paper presents CurveBench as an empirical dataset and evaluation benchmark for visual topological reasoning, consisting of 756 images paired with manually annotated containment trees. No derivation chain, first-principles result, or prediction is claimed; performance numbers (e.g., 71.1% on Easy, 19.1% on Hard) are direct empirical measurements against the supplied ground-truth trees rather than outputs of any fitted model or self-referential equation. The fine-tuning experiment is likewise a standard RLVR-style training run whose results are reported as observed improvements, not as predictions derived from the benchmark itself. Because the work contains neither mathematical derivations nor load-bearing self-citations that reduce to the paper's own inputs, the circularity score is 0.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of 756 images of pairwise non-intersecting Jordan curves... recover the full rooted containment tree
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Topographic Map Symbols , year =
-
[2]
Jordan Theorem , year =
-
[3]
Journal of Visual Languages and Computing , volume =
Rodgers, Peter , title =. Journal of Visual Languages and Computing , volume =. 2014 , url =
work page 2014
-
[4]
Findings of the Association for Computational Linguistics: ACL 2022 , pages =
Masry, Ahmed and Long, Do Xuan and Tan, Jia Qing and Joty, Shafiq and Hoque, Enamul , title =. Findings of the Association for Computational Linguistics: ACL 2022 , pages =. 2022 , url =
work page 2022
-
[5]
Mathew, Minesh and Karatzas, Dimosthenis and Jawahar, C. V. , title =. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages =. 2021 , url =
work page 2021
-
[6]
Advances in Neural Information Processing Systems , volume =
Lu, Pan and Mishra, Swaroop and Xia, Tony and Qiu, Liang and Chang, Kai-Wei and Zhu, Song-Chun and Tafjord, Oyvind and Clark, Peter and Kalyan, Ashwin , title =. Advances in Neural Information Processing Systems , volume =. 2022 , url =
work page 2022
-
[7]
International Conference on Learning Representations Workshop Track , year =
Ebrahimi Kahou, Samira and Michalski, Vincent and Atkinson, Adam and K. International Conference on Learning Representations Workshop Track , year =
-
[8]
Advances in Neural Information Processing Systems, Datasets and Benchmarks Track , year =
Lu, Pan and Qiu, Liang and Chen, Jiaqi and Xia, Tony and Zhao, Yizhou and Zhang, Wei and Yu, Zhou and Liang, Xiaodan and Zhu, Song-Chun , title =. Advances in Neural Information Processing Systems, Datasets and Benchmarks Track , year =
-
[9]
and Ma, Wei-Chiu and Krishna, Ranjay , title =
Fu, Xingyu and Hu, Yushi and Li, Bangzheng and Feng, Yu and Wang, Haoyu and Lin, Xudong and Roth, Dan and Smith, Noah A. and Ma, Wei-Chiu and Krishna, Ranjay , title =. European Conference on Computer Vision , year =
-
[10]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =
Chen, Boyuan and Xu, Zhuo and Kirmani, Sean and Ichter, Brian and Sadigh, Dorsa and Guibas, Leonidas and Xia, Fei , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2024 , url =
work page 2024
-
[11]
Rismanchian, Sina and Razeghi, Yasaman and Singh, Sameer and Doroudi, Shayan , title =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies , year =
work page 2025
-
[12]
Contour Detection and Hierarchical Image Segmentation , journal =
Arbel. Contour Detection and Hierarchical Image Segmentation , journal =. 2011 , url =
work page 2011
- [13]
-
[14]
Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =
Li, Zuoyue and Wegner, Jan Dirk and Lucchi, Aurelien , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =. 2019 , url =
work page 2019
-
[15]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =
Zellers, Rowan and Yatskar, Mark and Thomson, Sam and Choi, Yejin , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2018 , url =
work page 2018
-
[16]
Krishna, Ranjay and Zhu, Yuke and Groth, Oliver and Johnson, Justin and Hata, Kenji and Kravitz, Joshua and Chen, Stephanie and Kalantidis, Yannis and Li, Li-Jia and Shamma, David A. and Bernstein, Michael S. and Fei-Fei, Li , title =. International Journal of Computer Vision , volume =. 2017 , url =
work page 2017
-
[17]
and Hinton, Geoffrey , title =
Chen, Ting and Saxena, Saurabh and Li, Lala and Fleet, David J. and Hinton, Geoffrey , title =. International Conference on Learning Representations , year =
-
[18]
Proceedings of the 40th International Conference on Machine Learning , pages =
Lee, Kenton and Joshi, Mandar and Turc, Iulia Raluca and Hu, Hexiang and Liu, Fangyu and Eisenschlos, Julian Martin and Khandelwal, Urvashi and Shaw, Peter and Chang, Ming-Wei and Toutanova, Kristina , title =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , url =
work page 2023
-
[19]
European Conference on Computer Vision , pages =
Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey , title =. European Conference on Computer Vision , pages =. 2020 , url =
work page 2020
-
[20]
European Conference on Computer Vision , pages =
Kembhavi, Aniruddha and Salvato, Mike and Kolve, Eric and Seo, Minjoon and Hajishirzi, Hannaneh and Farhadi, Ali , title =. European Conference on Computer Vision , pages =. 2016 , url =
work page 2016
-
[21]
Hudson, Drew A. and Manning, Christopher D. , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2019 , url =
work page 2019
-
[22]
Transactions of the Association for Computational Linguistics , volume =
Liu, Fangyu and Emerson, Guy and Collier, Nigel , title =. Transactions of the Association for Computational Linguistics , volume =. 2023 , url =
work page 2023
-
[23]
Advances in Neural Information Processing Systems , volume =
Hu, Xiaoling and Li, Fuxin and Samaras, Dimitris and Chen, Chao , title =. Advances in Neural Information Processing Systems , volume =. 2019 , url =
work page 2019
-
[24]
Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, Y. K. and Wu, Y. and Guo, Daya , title =. arXiv preprint arXiv:2402.03300 , year =. 2402.03300 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[25]
Proximal Policy Optimization Algorithms
Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , title =. arXiv preprint arXiv:1707.06347 , year =. 1707.06347 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[26]
Prime Intellect Environments Hub , year =
-
[27]
Lawrence and Girshick, Ross , title =
Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens and Fei-Fei, Li and Zitnick, C. Lawrence and Girshick, Ross , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2017 , url =
work page 2017
-
[28]
arXiv preprint arXiv:2204.02380 , year =
Salewski, Leonard and Koepke, Sophia and Lensch, Hendrik and Akata, Zeynep , title =. arXiv preprint arXiv:2204.02380 , year =. 2204.02380 , archivePrefix =
-
[29]
Advances in Neural Information Processing Systems , volume =
Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , title =. Advances in Neural Information Pro...
work page 2023
-
[30]
and Zhang, Hao and Gonzalez, Joseph E
Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , title =. Advances in Neural Information Processing Systems, Datasets and Benchmarks Track , volume =. 2023 , url =
work page 2023
-
[31]
Chiang, Wei-Lin and Zheng, Lianmin and Sheng, Ying and Angelopoulos, Anastasios Nikolas and Li, Tianle and Li, Dacheng and Zhu, Banghua and Zhang, Hao and Jordan, Michael I. and Gonzalez, Joseph E. and Stoica, Ion , title =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , url =
work page 2024
-
[32]
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =
Liu, Yang and Iter, Dan and Xu, Yichong and Wang, Shuohang and Xu, Ruochen and Zhu, Chenguang , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =. 2023 , url =
work page 2023
-
[33]
Tulu 3: Pushing Frontiers in Open Language Model Post-Training
Lambert, Nathan and Morrison, Jacob and Pyatkin, Valentina and Huang, Shengyi and Ivison, Hamish and Brahman, Faeze and Miranda, Lester James V. and Liu, Alisa and Dziri, Nouha and Lyu, Shane and Gu, Yuling and Malik, Saumya and Graf, Victoria and Hwang, Jena D. and Yang, Jiangjiang and Le Bras, Ronan and Tafjord, Oyvind and Wilhelm, Chris and Soldaini, L...
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
arXiv preprint arXiv:2501.12948 , year =. 2501.12948 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[35]
Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. International Conference on Learning Representations , year =
-
[36]
Gemma: Open Models Based on Gemini Research and Technology , journal =. 2024 , eprint =
work page 2024
-
[37]
Bai, Shuai and Cai, Yuxuan and Chen, Ruizhe and Chen, Keqin and Chen, Xionghui and Cheng, Zesen and Deng, Lianghao and Ding, Wei and Gao, Chang and Ge, Chunjiang and Ge, Wenbin and Guo, Zhifang and Huang, Qidong and Huang, Jie and Huang, Fei and Hui, Binyuan and Jiang, Shutong and Li, Zhaohai and Li, Mingsheng and Li, Mei and Li, Kaixin and Lin, Zicheng a...
work page internal anchor Pith review Pith/arXiv arXiv
-
[38]
Understanding R1-Zero-Like Training: A Critical Perspective
Liu, Zichen and Chen, Changyu and Li, Wenjun and Qi, Penghui and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min , title =. arXiv preprint arXiv:2503.20783 , year =. 2503.20783 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[39]
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
Shen, Haozhan and Liu, Peng and Li, Jingcheng and Fang, Chunxin and Ma, Yibo and Liao, Jiajia and Shen, Qiaoli and Zhang, Zilun and Zhao, Kangjia and Zhang, Qianqian and Xu, Ruochen and Zhao, Tiancheng , title =. arXiv preprint arXiv:2504.07615 , year =. 2504.07615 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[40]
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Peng, Yingzhe and Zhang, Gongrui and Zhang, Miaosen and You, Zhiyuan and Liu, Jie and Zhu, Qipeng and Yang, Kai and Xu, Xingzhong and Geng, Xin and Yang, Xu , title =. arXiv preprint arXiv:2503.07536 , year =. 2503.07536 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[41]
Zhang, Jingyi and Huang, Jiaxing and Yao, Huanjin and Liu, Shunyu and Zhang, Xikun and Lu, Shijian and Tao, Dacheng , title =. arXiv preprint arXiv:2503.12937 , year =. 2503.12937 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
arXiv preprint arXiv:2504.07954 , year =
Yu, En and Lin, Kangheng and Zhao, Liang and Yin, Jisheng and Wei, Yana and Peng, Yuang and Wei, Haoran and Sun, Jianjian and Han, Chunrui and Ge, Zheng and Zhang, Xiangyu and Jiang, Daxin and Wang, Jingyu and Tao, Wenbing , title =. arXiv preprint arXiv:2504.07954 , year =. 2504.07954 , archivePrefix =
-
[43]
MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
Meng, Fanqing and Du, Lingxiao and Liu, Zongkai and Zhou, Zhixiang and Lu, Quanfeng and Fu, Daocheng and Shi, Botian and Wang, Wenhai and He, Junjun and Zhang, Kaipeng and Luo, Ping and Qiao, Yu and Zhang, Qiaosheng and Shao, Wenqi , title =. arXiv preprint arXiv:2503.07365 , year =. 2503.07365 , archivePrefix =
work page internal anchor Pith review Pith/arXiv arXiv
-
[44]
Schulman, John and Thinking Machines Lab , title =. 2025 , howpublished =
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.