pith. sign in

arxiv: 2605.14068 · v2 · pith:U2GPRN34new · submitted 2026-05-13 · 💻 cs.CV · cs.LG

CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves

Pith reviewed 2026-05-20 20:34 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords Jordan curvescontainment hierarchytopological reasoningvision language modelsbenchmark datasetstructured predictionnested structuresvisual topology
0
0 comments X

The pith

Vision models achieve only 71 percent accuracy recovering containment trees from nested curve images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CurveBench as a new benchmark for testing how well AI systems can understand hierarchical containment from pictures of simple closed curves. It supplies 756 images in different styles, each with a tree that describes which curve encloses which region. Testing shows that even the best model gets 71.1 percent right on easy pictures and just 19.1 percent on hard ones. After fine-tuning, an open model improves but still lags behind on tougher cases. This matters because accurate topological understanding is needed for tasks like interpreting diagrams or analyzing spatial arrangements where mistakes in nesting can lead to wrong conclusions.

Core claim

CurveBench formulates exact topological reasoning as the recovery of a rooted tree that encodes the containment relations among regions defined by non-intersecting Jordan curves in an image, and demonstrates through model evaluations that current vision-language models do not yet possess reliable capability for this structured prediction task.

What carries the argument

The rooted containment tree induced by the curves, which represents the full hierarchy of nesting and serves as the exact target output for the visual reasoning task.

If this is right

  • Targeted training on containment tree prediction improves model performance on simpler instances of the task.
  • Accuracy declines sharply when moving from easy polygonal configurations to dense or maze-like ones.
  • The benchmark provides a quantitative measure for tracking progress in topology-aware visual reasoning.
  • Remaining performance gaps indicate the need for advances in models' ability to handle exact nesting relations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar benchmarks could be created for other topological properties such as connectivity or genus to broaden evaluation of spatial reasoning.
  • Models might be improved by incorporating explicit geometric priors for curve nesting rather than relying solely on learned patterns.
  • Applications in fields like map interpretation or circuit design could benefit from better performance on this type of reasoning.
  • Failure modes on hard cases may reveal specific weaknesses in handling high numbers of nested elements.

Load-bearing premise

The supplied ground-truth trees correctly capture the containment relations without errors, and measuring tree generation accuracy truly reflects understanding of topology instead of reliance on superficial image features.

What would settle it

A vision model that produces the correct containment tree for a large fraction of CurveBench-Hard images while failing on images where the topology is altered but low-level visual statistics are preserved would indicate that the models are learning genuine topological reasoning rather than shortcuts.

Figures

Figures reproduced from arXiv: 2605.14068 by Amirreza Mohseni, Mona Mohammadi, Morteza Saghafian, Naser Talebizadeh Sardari.

Figure 1
Figure 1. Figure 1: Representative examples from each category within the CurveBench dataset [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Examples of images consisting of disjoint Jordan curves. Understanding the nesting [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Tree-reward learning dynamics for trained models. Left: training set Right: eval set environments for CurveBench-Easy and CurveBench-Hard, ensuring that all models are evaluated under identical conditions. The released environments are listed in Appendix B.3. 6 Results [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative examples from each category within the CurveBench dataset [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: CurveBench Dataset: Hierarchical Distribution CurveBench is released as a collection of benchmark datasets for evaluating visual topological reasoning. The released resources include CurveBench-Easy and the main CurveBench benchmark used for the harder evaluation set￾ting. Review mode. This submission is intended for the single￾blind review option in the NeurIPS 2026 Evaluations & Datasets Track. CurveBenc… view at source ↗
Figure 3
Figure 3. Figure 3: Tree-reward learning dynamics for trained models. Left: training set Right: eval set Parameter-Efficient Fine-Tuning. We employ Low-Rank Adaptation (LoRA) Hu et al. [2022] for parameter-efficient RL fine-tuning. Only LoRA adapter parameters are updated, while the base model weights remain frozen. We use the all-linear target-module configuration in TRL, which applies adapters to linear layers throughout th… view at source ↗
Figure 4
Figure 4. Figure 4: Per-category success-rates. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 4
Figure 4. Figure 4: CurveBench Dataset: Hierarchical Distribution CurveBench is released as a collection of benchmark datasets for evaluating visual topological reasoning. The released resources include CurveBench-Easy and the main CurveBench benchmark used for the harder evaluation set￾ting. Review mode. This submission is intended for the single￾blind review option in the NeurIPS 2026 Evaluations & Datasets Track. CurveBenc… view at source ↗
Figure 5
Figure 5. Figure 5: Stacked reward greakdown for CurveBench-Hard. Darkest, medium, and Lightest color [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 5
Figure 5. Figure 5: Per-category success-rates. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Stacked reward greakdown for CurveBench-Hard. Darkest, medium, and Lightest color [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of \textbf{756 images} of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only \textbf{71.1\%} tree-generation accuracy on CurveBench-Easy and \textbf{19.1\%} on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over \texttt{Qwen-3-VL-8B-Thinking} from \textbf{2.8\%} to \textbf{33.3\%} tree-generation accuracy on CurveBench-Easy, exceeding GPT-5.4 and Claude Opus 4.5 under our evaluation protocol. The remaining gap, especially on CurveBench-Hard, shows that exact topology-aware visual reasoning remains far from solved.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces CurveBench, a benchmark of 756 images depicting nested, non-intersecting Jordan curves in easy, polygonal, topographic, maze-like, and dense configurations. Each image is paired with a ground-truth rooted tree encoding containment relations among the induced planar regions. The central task is structured prediction: given an image, recover the exact containment tree. The authors report that Gemini 3.1 Pro reaches 71.1% tree-generation accuracy on the Easy subset and 19.1% on the Hard subset; they further show that RLVR-style fine-tuning lifts Qwen3-VL-8B from 2.8% to 33.3% on Easy, exceeding several closed models under their protocol. The work concludes that exact topology-aware visual reasoning remains far from solved.

Significance. If the supplied containment trees are verifiably error-free and the tree-generation metric isolates genuine topological understanding from low-level visual shortcuts or rendering artifacts, the benchmark would usefully quantify current VLM limitations on hierarchical spatial reasoning and supply a concrete target for future progress. The fine-tuning result demonstrates that modest gains are achievable with targeted training, lending the benchmark immediate utility for model development.

major comments (2)
  1. [Abstract and Dataset Construction] The headline performance numbers (71.1% Easy, 19.1% Hard) and the claim that exact topological reasoning remains unsolved rest on the assumption that the 756 ground-truth containment trees are free of annotation errors. The manuscript supplies no inter-annotator agreement statistics, no automated consistency checks (e.g., verification that containment is transitive and that every curve appears exactly once), and no release of raw SVG coordinates that would permit external re-derivation of the trees. Even a modest fraction of inverted parents or missing leaves on the Hard subset would render the 19.1% figure uninterpretable as a pure measure of model capability.
  2. [Experiments and Evaluation] The evaluation protocol does not report controls for visual shortcuts (e.g., curve thickness, fill color, or counting heuristics) that could allow models to achieve non-zero accuracy without recovering the true containment topology. Because the central claim concerns exact topological reasoning rather than pattern matching, the absence of such controls weakens the interpretation of the reported gaps.
minor comments (1)
  1. [Abstract] The abstract states that curves are 'pairwise non-intersecting' yet the Hard subset includes 'dense counting configurations'; a brief clarification of how non-intersection is enforced in the densest cases would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. The comments identify important areas for strengthening claims about dataset reliability and isolating topological reasoning. We respond to each major comment below and will revise the manuscript to address them.

read point-by-point responses
  1. Referee: [Abstract and Dataset Construction] The headline performance numbers (71.1% Easy, 19.1% Hard) and the claim that exact topological reasoning remains unsolved rest on the assumption that the 756 ground-truth containment trees are free of annotation errors. The manuscript supplies no inter-annotator agreement statistics, no automated consistency checks (e.g., verification that containment is transitive and that every curve appears exactly once), and no release of raw SVG coordinates that would permit external re-derivation of the trees. Even a modest fraction of inverted parents or missing leaves on the Hard subset would render the 19.1% figure uninterpretable as a pure measure of model capability.

    Authors: CurveBench is generated via a fully procedural pipeline in which a target containment tree is first sampled and then realized as non-intersecting Jordan curves; the ground-truth tree is therefore exact by construction rather than the product of manual annotation. We will add an expanded Dataset Construction section that documents the generation algorithm, the automated verification steps (transitivity, uniqueness, and planarity), and the release of raw SVG files with the benchmark to enable external re-derivation. revision: yes

  2. Referee: [Experiments and Evaluation] The evaluation protocol does not report controls for visual shortcuts (e.g., curve thickness, fill color, or counting heuristics) that could allow models to achieve non-zero accuracy without recovering the true containment topology. Because the central claim concerns exact topological reasoning rather than pattern matching, the absence of such controls weakens the interpretation of the reported gaps.

    Authors: We agree that additional controls would strengthen the interpretation. In the revised manuscript we will include a new ablation subsection that reports model performance under randomized curve thickness and fill colors, as well as on specially constructed subsets that neutralize simple counting or boundary-length heuristics. These results will be used to argue that the observed gaps reflect limitations in hierarchical topological reasoning. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark with no derivations or self-referential predictions

full rationale

This paper presents CurveBench as an empirical dataset and evaluation benchmark for visual topological reasoning, consisting of 756 images paired with manually annotated containment trees. No derivation chain, first-principles result, or prediction is claimed; performance numbers (e.g., 71.1% on Easy, 19.1% on Hard) are direct empirical measurements against the supplied ground-truth trees rather than outputs of any fitted model or self-referential equation. The fine-tuning experiment is likewise a standard RLVR-style training run whose results are reported as observed improvements, not as predictions derived from the benchmark itself. Because the work contains neither mathematical derivations nor load-bearing self-citations that reduce to the paper's own inputs, the circularity score is 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no free parameters, mathematical axioms, or invented entities; it relies on the standard topological definition of Jordan curves and containment relations.

pith-pipeline@v0.9.0 · 5777 in / 1167 out tokens · 155537 ms · 2026-05-20T20:34:49.049782+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 10 internal anchors

  1. [1]

    Topographic Map Symbols , year =

  2. [2]

    Jordan Theorem , year =

  3. [3]

    Journal of Visual Languages and Computing , volume =

    Rodgers, Peter , title =. Journal of Visual Languages and Computing , volume =. 2014 , url =

  4. [4]

    Findings of the Association for Computational Linguistics: ACL 2022 , pages =

    Masry, Ahmed and Long, Do Xuan and Tan, Jia Qing and Joty, Shafiq and Hoque, Enamul , title =. Findings of the Association for Computational Linguistics: ACL 2022 , pages =. 2022 , url =

  5. [5]

    Mathew, Minesh and Karatzas, Dimosthenis and Jawahar, C. V. , title =. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages =. 2021 , url =

  6. [6]

    Advances in Neural Information Processing Systems , volume =

    Lu, Pan and Mishra, Swaroop and Xia, Tony and Qiu, Liang and Chang, Kai-Wei and Zhu, Song-Chun and Tafjord, Oyvind and Clark, Peter and Kalyan, Ashwin , title =. Advances in Neural Information Processing Systems , volume =. 2022 , url =

  7. [7]

    International Conference on Learning Representations Workshop Track , year =

    Ebrahimi Kahou, Samira and Michalski, Vincent and Atkinson, Adam and K. International Conference on Learning Representations Workshop Track , year =

  8. [8]

    Advances in Neural Information Processing Systems, Datasets and Benchmarks Track , year =

    Lu, Pan and Qiu, Liang and Chen, Jiaqi and Xia, Tony and Zhao, Yizhou and Zhang, Wei and Yu, Zhou and Liang, Xiaodan and Zhu, Song-Chun , title =. Advances in Neural Information Processing Systems, Datasets and Benchmarks Track , year =

  9. [9]

    and Ma, Wei-Chiu and Krishna, Ranjay , title =

    Fu, Xingyu and Hu, Yushi and Li, Bangzheng and Feng, Yu and Wang, Haoyu and Lin, Xudong and Roth, Dan and Smith, Noah A. and Ma, Wei-Chiu and Krishna, Ranjay , title =. European Conference on Computer Vision , year =

  10. [10]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    Chen, Boyuan and Xu, Zhuo and Kirmani, Sean and Ichter, Brian and Sadigh, Dorsa and Guibas, Leonidas and Xia, Fei , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2024 , url =

  11. [11]

    Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies , year =

    Rismanchian, Sina and Razeghi, Yasaman and Singh, Sameer and Doroudi, Shayan , title =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies , year =

  12. [12]

    Contour Detection and Hierarchical Image Segmentation , journal =

    Arbel. Contour Detection and Hierarchical Image Segmentation , journal =. 2011 , url =

  13. [13]

    , title =

    Bastani, Favyen and He, Songtao and Abbar, Sofiane and Alizadeh, Mohammad and Balakrishnan, Hari and Chawla, Sanjay and Madden, Sam and DeWitt, David J. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2018 , url =

  14. [14]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

    Li, Zuoyue and Wegner, Jan Dirk and Lucchi, Aurelien , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =. 2019 , url =

  15. [15]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =

    Zellers, Rowan and Yatskar, Mark and Thomson, Sam and Choi, Yejin , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2018 , url =

  16. [16]

    and Bernstein, Michael S

    Krishna, Ranjay and Zhu, Yuke and Groth, Oliver and Johnson, Justin and Hata, Kenji and Kravitz, Joshua and Chen, Stephanie and Kalantidis, Yannis and Li, Li-Jia and Shamma, David A. and Bernstein, Michael S. and Fei-Fei, Li , title =. International Journal of Computer Vision , volume =. 2017 , url =

  17. [17]

    and Hinton, Geoffrey , title =

    Chen, Ting and Saxena, Saurabh and Li, Lala and Fleet, David J. and Hinton, Geoffrey , title =. International Conference on Learning Representations , year =

  18. [18]

    Proceedings of the 40th International Conference on Machine Learning , pages =

    Lee, Kenton and Joshi, Mandar and Turc, Iulia Raluca and Hu, Hexiang and Liu, Fangyu and Eisenschlos, Julian Martin and Khandelwal, Urvashi and Shaw, Peter and Chang, Ming-Wei and Toutanova, Kristina , title =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , url =

  19. [19]

    European Conference on Computer Vision , pages =

    Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey , title =. European Conference on Computer Vision , pages =. 2020 , url =

  20. [20]

    European Conference on Computer Vision , pages =

    Kembhavi, Aniruddha and Salvato, Mike and Kolve, Eric and Seo, Minjoon and Hajishirzi, Hannaneh and Farhadi, Ali , title =. European Conference on Computer Vision , pages =. 2016 , url =

  21. [21]

    and Manning, Christopher D

    Hudson, Drew A. and Manning, Christopher D. , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2019 , url =

  22. [22]

    Transactions of the Association for Computational Linguistics , volume =

    Liu, Fangyu and Emerson, Guy and Collier, Nigel , title =. Transactions of the Association for Computational Linguistics , volume =. 2023 , url =

  23. [23]

    Advances in Neural Information Processing Systems , volume =

    Hu, Xiaoling and Li, Fuxin and Samaras, Dimitris and Chen, Chao , title =. Advances in Neural Information Processing Systems , volume =. 2019 , url =

  24. [24]

    Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and Zhang, Haowei and Zhang, Mingchuan and Li, Y. K. and Wu, Y. and Guo, Daya , title =. arXiv preprint arXiv:2402.03300 , year =. 2402.03300 , archivePrefix =

  25. [25]

    Proximal Policy Optimization Algorithms

    Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg , title =. arXiv preprint arXiv:1707.06347 , year =. 1707.06347 , archivePrefix =

  26. [26]

    Prime Intellect Environments Hub , year =

  27. [27]

    Lawrence and Girshick, Ross , title =

    Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens and Fei-Fei, Li and Zitnick, C. Lawrence and Girshick, Ross , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =. 2017 , url =

  28. [28]

    arXiv preprint arXiv:2204.02380 , year =

    Salewski, Leonard and Koepke, Sophia and Lensch, Hendrik and Akata, Zeynep , title =. arXiv preprint arXiv:2204.02380 , year =. 2204.02380 , archivePrefix =

  29. [29]

    Advances in Neural Information Processing Systems , volume =

    Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , title =. Advances in Neural Information Pro...

  30. [30]

    and Zhang, Hao and Gonzalez, Joseph E

    Zheng, Lianmin and Chiang, Wei-Lin and Sheng, Ying and Zhuang, Siyuan and Wu, Zhanghao and Zhuang, Yonghao and Lin, Zi and Li, Zhuohan and Li, Dacheng and Xing, Eric P. and Zhang, Hao and Gonzalez, Joseph E. and Stoica, Ion , title =. Advances in Neural Information Processing Systems, Datasets and Benchmarks Track , volume =. 2023 , url =

  31. [31]

    and Gonzalez, Joseph E

    Chiang, Wei-Lin and Zheng, Lianmin and Sheng, Ying and Angelopoulos, Anastasios Nikolas and Li, Tianle and Li, Dacheng and Zhu, Banghua and Zhang, Hao and Jordan, Michael I. and Gonzalez, Joseph E. and Stoica, Ion , title =. Proceedings of the 41st International Conference on Machine Learning , pages =. 2024 , url =

  32. [32]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =

    Liu, Yang and Iter, Dan and Xu, Yichong and Wang, Shuohang and Xu, Ruochen and Zhu, Chenguang , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =. 2023 , url =

  33. [33]

    Tulu 3: Pushing Frontiers in Open Language Model Post-Training

    Lambert, Nathan and Morrison, Jacob and Pyatkin, Valentina and Huang, Shengyi and Ivison, Hamish and Brahman, Faeze and Miranda, Lester James V. and Liu, Alisa and Dziri, Nouha and Lyu, Shane and Gu, Yuling and Malik, Saumya and Graf, Victoria and Hwang, Jena D. and Yang, Jiangjiang and Le Bras, Ronan and Tafjord, Oyvind and Wilhelm, Chris and Soldaini, L...

  34. [34]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    arXiv preprint arXiv:2501.12948 , year =. 2501.12948 , archivePrefix =

  35. [35]

    and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =

    Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. International Conference on Learning Representations , year =

  36. [36]

    2024 , eprint =

    Gemma: Open Models Based on Gemini Research and Technology , journal =. 2024 , eprint =

  37. [37]

    Qwen3-VL Technical Report

    Bai, Shuai and Cai, Yuxuan and Chen, Ruizhe and Chen, Keqin and Chen, Xionghui and Cheng, Zesen and Deng, Lianghao and Ding, Wei and Gao, Chang and Ge, Chunjiang and Ge, Wenbin and Guo, Zhifang and Huang, Qidong and Huang, Jie and Huang, Fei and Hui, Binyuan and Jiang, Shutong and Li, Zhaohai and Li, Mingsheng and Li, Mei and Li, Kaixin and Lin, Zicheng a...

  38. [38]

    Understanding R1-Zero-Like Training: A Critical Perspective

    Liu, Zichen and Chen, Changyu and Li, Wenjun and Qi, Penghui and Pang, Tianyu and Du, Chao and Lee, Wee Sun and Lin, Min , title =. arXiv preprint arXiv:2503.20783 , year =. 2503.20783 , archivePrefix =

  39. [39]

    VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model

    Shen, Haozhan and Liu, Peng and Li, Jingcheng and Fang, Chunxin and Ma, Yibo and Liao, Jiajia and Shen, Qiaoli and Zhang, Zilun and Zhao, Kangjia and Zhang, Qianqian and Xu, Ruochen and Zhao, Tiancheng , title =. arXiv preprint arXiv:2504.07615 , year =. 2504.07615 , archivePrefix =

  40. [40]

    LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

    Peng, Yingzhe and Zhang, Gongrui and Zhang, Miaosen and You, Zhiyuan and Liu, Jie and Zhu, Qipeng and Yang, Kai and Xu, Xingzhong and Geng, Xin and Yang, Xu , title =. arXiv preprint arXiv:2503.07536 , year =. 2503.07536 , archivePrefix =

  41. [41]

    R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

    Zhang, Jingyi and Huang, Jiaxing and Yao, Huanjin and Liu, Shunyu and Zhang, Xikun and Lu, Shijian and Tao, Dacheng , title =. arXiv preprint arXiv:2503.12937 , year =. 2503.12937 , archivePrefix =

  42. [42]

    arXiv preprint arXiv:2504.07954 , year =

    Yu, En and Lin, Kangheng and Zhao, Liang and Yin, Jisheng and Wei, Yana and Peng, Yuang and Wei, Haoran and Sun, Jianjian and Han, Chunrui and Ge, Zheng and Zhang, Xiangyu and Jiang, Daxin and Wang, Jingyu and Tao, Wenbing , title =. arXiv preprint arXiv:2504.07954 , year =. 2504.07954 , archivePrefix =

  43. [43]

    MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

    Meng, Fanqing and Du, Lingxiao and Liu, Zongkai and Zhou, Zhixiang and Lu, Quanfeng and Fu, Daocheng and Shi, Botian and Wang, Wenhai and He, Junjun and Zhang, Kaipeng and Luo, Ping and Qiao, Yu and Zhang, Qiaosheng and Shao, Wenqi , title =. arXiv preprint arXiv:2503.07365 , year =. 2503.07365 , archivePrefix =

  44. [44]

    2025 , howpublished =

    Schulman, John and Thinking Machines Lab , title =. 2025 , howpublished =