GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
Pith reviewed 2026-05-20 22:39 UTC · model grok-4.3
The pith
A neuro-symbolic engine generates 127K symbolically verified geometric questions and diagrams to train multimodal models on precise diagram reasoning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose the GeoSym Engine, an automated and scalable neuro-symbolic framework that leverages a type-conditional grammar and an analytic SymGT Solver to derive exact symbolic ground truths and seamlessly integrates with a robust rendering pipeline to produce high-precision geometric diagrams. Using this engine we construct GeoSym127K, a difficulty-stratified dataset featuring 51K high-resolution images, 127K questions with symbolic ground truths, and 55K answer-verified CoT QA pairs, and demonstrate through supervised fine-tuning that the data drives concentrated improvements on diagram-dependent and multi-step geometry tasks.
What carries the argument
The GeoSym Engine, which uses a type-conditional grammar to generate geometric problem instances and an analytic SymGT Solver to compute exact symbolic ground truths together with verified Chain-of-Thought sequences.
If this is right
- Supervised fine-tuning on the dataset produces absolute gains of 22.21 percent on the MathVerse Vision-Only subset and 6.19 percent on WeMath.
- The improvements concentrate on diagram-dependent and multi-step geometry tasks while reducing long-horizon logic fragmentation.
- Initializing reinforcement learning with verifiable rewards from the structural SFT checkpoints raises the performance ceiling relative to zero-shot RL.
- The approach yields verifiable exact-match signals that support robust scaling of reasoning synthesis.
Where Pith is reading between the lines
- The same grammar-plus-solver pattern could be applied to other visual reasoning domains such as physics diagrams or algebraic geometry where exact symbolic answers are computable.
- Larger datasets produced by the same engine might further close the gap between open and closed models on complex diagram tasks.
- The verifiable reward structure could be reused for online data filtering or active learning loops that prioritize hard multi-step examples.
Load-bearing premise
The generated symbolic ground truths and CoT pairs contain no systematic errors from the grammar or solver and transfer to real-world diagrams without distribution shift or overfitting to the synthetic rendering style.
What would settle it
A test set of real photographed or hand-drawn geometry diagrams paired with the same questions shows no accuracy gain or introduces new systematic errors in the fine-tuned model compared with the baseline.
Figures
read the original abstract
Large Multimodal Models (LMMs) often struggle with geometric reasoning due to visual hallucinations and a lack of mathematically precise Chain-of-Thought (CoT) data. To address this, we propose the GeoSym Engine, an automated and scalable neuro-symbolic framework. By leveraging a type-conditional grammar and an analytic SymGT Solver, it derives exact symbolic ground truths and seamlessly integrates with a robust rendering pipeline to produce high-precision geometric diagrams. Using this engine, we construct GeoSym127K, a difficulty-stratified dataset featuring 51K high-resolution images, 127K questions with symbolic ground truths, and 55K answer-verified CoT QA pairs. We also introduce GeoSym-Bench, an expert-curated suite of 511 complex samples for rigorous evaluation. Through extensive supervised fine-tuning (SFT), we demonstrate that GeoSym drives concentrated improvements specifically on diagram-dependent and multi-step geometry tasks. Our Qwen3-VL-8B model gains an absolute +22.21% on the MathVerse Vision-Only subset and reaches 61.52% (+6.19% improvement) on WeMath, mitigating long-horizon logic fragmentation and outperforming advanced closed-source models like Doubao-1.8. Furthermore, applying Reinforcement Learning with Verifiable Rewards (RLVR) via GRPO reveals that initializing from structural SFT checkpoints substantially elevates the performance ceiling over zero-shot RL. Driven by deterministic exact-match signals, this showcases the robust scaling potential of our verifiable reasoning synthesis. Datasets and code are available at https://huggingface.co/datasets/Tomie0506/GeoSym127K and https://github.com/Tomie56/GeoSym127K.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the GeoSym Engine, a neuro-symbolic framework leveraging a type-conditional grammar and analytic SymGT Solver to generate the GeoSym127K dataset (51K high-resolution images, 127K questions with symbolic ground truths, 55K answer-verified CoT pairs) along with the expert-curated GeoSym-Bench (511 samples). Experiments demonstrate that SFT on this data yields +22.21% absolute gain on MathVerse Vision-Only for Qwen3-VL-8B and +6.19% on WeMath, with further gains from GRPO-based RLVR initialized from SFT checkpoints; code and data are released.
Significance. If the symbolic ground truths are verifiably correct and transfer without substantial distribution shift, the work provides a scalable, reproducible pipeline for creating mathematically precise training data that targets visual hallucinations and long-horizon reasoning failures in LMMs. The public release of datasets and code is a clear strength supporting reproducibility and follow-on research. The reported concentration of gains on diagram-dependent tasks is consistent with the motivating hypothesis.
major comments (2)
- [§3.2] §3.2 (SymGT Solver description): No error rates, sample-wise human verification, or cross-validation against independent geometry libraries (e.g., SymPy or GeoGebra) are reported for the analytic solver outputs across the 127K questions. Because the central performance attribution rests on the claim of 'exact symbolic ground truths' and 'answer-verified CoT pairs,' even modest systematic solver failures would mean SFT reinforces incorrect reasoning rather than mitigating fragmentation.
- [§5] §5 (Experimental results): The reported benchmark improvements lack details on train/validation/test splits of GeoSym127K, statistical significance testing (e.g., multiple random seeds or confidence intervals), or controls for synthetic-style artifacts versus real diagram distribution shift. These omissions make it difficult to confirm that the +22.21% and +6.19% gains are robust and attributable to the symbolic supervision rather than dataset-specific overfitting.
minor comments (2)
- [Abstract] Abstract and §1: The phrase 'long-horizon logic fragmentation' is used without a precise definition or citation; a short formalization would improve clarity for readers outside the immediate subfield.
- [Figure 1] Figure 1 and rendering pipeline description: The caption and text could more explicitly state the resolution and anti-aliasing settings used for the 51K images to allow exact reproduction.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on verification of the SymGT Solver and robustness of the reported gains. We respond to each major comment below and will incorporate revisions to address the concerns.
read point-by-point responses
-
Referee: [§3.2] §3.2 (SymGT Solver description): No error rates, sample-wise human verification, or cross-validation against independent geometry libraries (e.g., SymPy or GeoGebra) are reported for the analytic solver outputs across the 127K questions. Because the central performance attribution rests on the claim of 'exact symbolic ground truths' and 'answer-verified CoT pairs,' even modest systematic solver failures would mean SFT reinforces incorrect reasoning rather than mitigating fragmentation.
Authors: The SymGT Solver relies on deterministic analytic procedures grounded in Euclidean geometry axioms and exact symbolic algebra, ensuring correctness by construction for all problems generated from the type-conditional grammar. We acknowledge that the original manuscript did not include quantitative error analysis or external cross-validation. In the revision we will add a dedicated validation subsection reporting results on a 1,000-question random sample: (i) expert manual verification of 200 cases, (ii) consistency checks against SymPy for all algebraic sub-expressions, and (iii) a statement of the observed zero-error rate on the sampled set. This directly mitigates the risk of reinforcing incorrect reasoning. revision: yes
-
Referee: [§5] §5 (Experimental results): The reported benchmark improvements lack details on train/validation/test splits of GeoSym127K, statistical significance testing (e.g., multiple random seeds or confidence intervals), or controls for synthetic-style artifacts versus real diagram distribution shift. These omissions make it difficult to confirm that the +22.21% and +6.19% gains are robust and attributable to the symbolic supervision rather than dataset-specific overfitting.
Authors: GeoSym127K was used in its entirety for SFT; no internal train/validation split was held out because evaluation targeted generalization to the fixed external benchmarks MathVerse and WeMath. We agree that statistical significance and distribution-shift controls strengthen the claims. In the revision we will (i) rerun SFT with three random seeds and report mean ± std, (ii) add a short analysis showing that gains remain concentrated on diagram-dependent subsets even after controlling for question length, and (iii) include a qualitative comparison of model behavior on synthetic versus real diagrams from the benchmarks. These additions will clarify that the improvements arise from the symbolic supervision rather than overfitting to synthetic artifacts. revision: yes
Circularity Check
No significant circularity; pipeline uses independent external benchmarks
full rationale
The derivation chain consists of a type-conditional grammar and SymGT Solver producing symbolic ground truths, followed by dataset construction and SFT/RLVR training, with gains reported on external benchmarks (MathVerse Vision-Only, WeMath). These benchmarks are independent of the synthetic generation process and not fitted or redefined within the paper. No equations, self-citations, or ansatzes reduce the central claims to inputs by construction. The approach is self-contained against external evaluation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The type-conditional grammar and analytic SymGT Solver produce exact symbolic ground truths without systematic errors.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We define the geometric environment as an arbitrary-precision state space G=⟨P,E,Φ,L,T⟩. ... Type-Conditional Topological Evolution ... Generalized Symbolic Shoelace Algorithm ... Simplify(Apred − A_GT)≡0
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
analytic SymGT Solver ... deterministic exact-match signals
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report, 2...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Geogpt4v: Towards geometric multi-modal large language models with geometric image generation
Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, and Bo Zheng. Geogpt4v: Towards geometric multi-modal large language models with geometric image generation, 2024. URL https: //arxiv.org/abs/2406.11503
-
[4]
GeoQA: A geometric question answering benchmark towards multimodal numerical reasoning
Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric Xing, and Liang Lin. GeoQA: A geometric question answering benchmark towards multimodal numerical reasoning. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors,Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 513–523, Online, August 20...
-
[5]
UniGeo: Unifying geometry logical reasoning via reformulating mathematical expression
Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, and Xiaodan Liang. UniGeo: Unifying geometry logical reasoning via reformulating mathematical expression. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors,Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3313–3323, Abu Dhabi, United A...
-
[6]
Theorem-validated reverse chain-of-thought problem generation for geometric reasoning,
Linger Deng, Linghao Zhu, Yuliang Liu, Yu Wang, Qunyi Xie, Jingjing Wu, Gang Zhang, Yingying Zhu, and Xiang Bai. Theorem-validated reverse chain-of-thought problem generation for geometric reasoning,
- [7]
-
[8]
Vlmevalkit: An open-source toolkit for evaluating large multi-modality models, 2024
Haodong Duan, Xinyu Fang, Junming Yang, Xiangyu Zhao, Yuxuan Qiao, Mo Li, Amit Agarwal, Zhe Chen, Lin Chen, Yuan Liu, Yubo Ma, Hailong Sun, Yifan Zhang, Shiyin Lu, Tack Hwa Wong, Weiyun Wang, Peiheng Zhou, Xiaozhe Li, Chaoyou Fu, Junbo Cui, Jixuan Chen, Enxin Song, Song Mao, Shengyuan Ding, Tianhao Liang, Zicheng Zhang, Xiaoyi Dong, Yuhang Zang, Pan Zhang...
-
[9]
Geobench: Rethinking multimodal geometric problem-solving via hierarchical evaluation, 2025
Yuan Feng, Yue Yang, Xiaohan He, Jiatong Zhao, Jianlong Chen, Zijun Chen, Daocheng Fu, Qi Liu, Renqiu Xia, Bo Zhang, and Junchi Yan. Geobench: Rethinking multimodal geometric problem-solving via hierarchical evaluation, 2025. URLhttps://arxiv.org/abs/2512.24119
-
[10]
Daocheng Fu, Jianlong Chen, Renqiu Xia, Zijun Chen, Qi Liu, Yuan Feng, Hongbin Zhou, Renrui Zhang, Shiyang Feng, Peng Gao, Hongyuan Zha, Junchi Yan, Botian Shi, Yu Qiao, and Bo Zhang. Trustgeogen: Formal-verified data engine for trustworthy multi-modal geometric problem solving, 2026. URLhttps://arxiv.org/abs/2504.15780
-
[11]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, ...
-
[12]
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang, Bohan Jia, Zijie Zhai, Shaosheng Cao, Zheyu Ye, Fei Zhao, Zhe Xu, Xu Tang, Yao Hu, and Shaohui Lin. Vision-r1: Incentivizing reasoning capability in multimodal large language models, 2026. URLhttps://arxiv.org/abs/2503.06749
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[13]
Autogeo: Automating geometric image dataset creation for enhanced geometry understanding, 2024
Zihan Huang, Tao Wu, Wang Lin, Shengyu Zhang, Jingyuan Chen, and Fei Wu. Autogeo: Automating geometric image dataset creation for enhanced geometry understanding, 2024. URL https://arxiv. org/abs/2409.09039
-
[14]
Inter-gps: Interpretable geometry problem solving with formal language and symbolic reasoning
Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu, Siyuan Huang, Xiaodan Liang, and Song-Chun Zhu. Inter- gps: Interpretable geometry problem solving with formal language and symbolic reasoning, 2021. URL https://arxiv.org/abs/2105.04165
-
[15]
Mathvista: Evaluating mathematical reason- ing of foundation models in visual contexts
Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao. Mathvista: Evaluating mathematical reason- ing of foundation models in visual contexts. In B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, editors,International Conference on Learning Representati...
work page 2024
-
[16]
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Yansong Tang, and Dongmei Zhang. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct, 2025. URL https://arxiv.org/abs/2308.09583
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Gsm-symbolic: Understanding the limitations of mathematical reasoning in large language models
Iman Mirzadeh, Keivan Alizadeh-Vahid, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. Gsm-symbolic: Understanding the limitations of mathematical reasoning in large language models. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Rep- resentations, volume 2025, pages 94743–94765, 2025. URL ...
work page 2025
-
[18]
Nvidia nemotron nano v2 vl.arXiv preprint arXiv:2511.03929, 2025
NVIDIA, :, Amala Sanjay Deshmukh, Kateryna Chumachenko, Tuomas Rintamaki, Matthieu Le, Tyler Poon, Danial Mohseni Taheri, Ilia Karmanov, Guilin Liu, Jarno Seppanen, Guo Chen, Karan Sapra, Zhiding Yu, Adi Renduchintala, Charles Wang, Peter Jin, Arushi Goel, Mike Ranzinger, Lukas V oegtle, Philipp Fischer, Timo Roman, Wei Ping, Boxin Wang, Zhuolin Yang, Nay...
-
[19]
URL https://aclanthology.org/ 2023.findings-acl.850/
Shuai Peng, Di Fu, Yijun Liang, Liangcai Gao, and Zhi Tang. GeoDRL: A self-learning framework for geometry problem solving using reinforcement learning in deductive reasoning. In Anna Rogers, Jordan 11 Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: ACL 2023, pages 13468–13480, Toronto, Canada, July 2023...
-
[20]
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, and Honggang Zhang. We-math: Does your large multimodal model achieve human-like mathematical reasoning?, 2024. URLhttps://arxiv.org/abs/2407.01284
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
A. Rosenfeld and J.L. Pfaltz. Distance functions on digital pictures.Pattern Recognition, 1(1):33– 61, 1968. ISSN 0031-3203. doi: https://doi.org/10.1016/0031-3203(68)90013-7. URL https://www. sciencedirect.com/science/article/pii/0031320368900137
-
[22]
Seed1.8 Model Card: Towards Generalized Real-World Agency
Bytedance Seed. Seed1.8 model card: Towards generalized real-world agency, 2026. URL https: //arxiv.org/abs/2603.20633
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[23]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URLhttps://arxiv.org/abs/2402.03300
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[24]
Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, Akshay Nathan, Alan Luo, Alec Helyar, Aleksander Madry, Aleksandr Efremov, Aleksandra Spyra, Alex Baker-Whitcomb, Alex Beutel, Alex Karpenko, Alex Makelov, Alex Neitz, Alex Wei, Alexandra Barr, Alexandre Kirchmeyer, Ale...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[25]
Defining and characteriz- ing reward gaming
Joar Skalse, Nikolaus Howe, Dmitrii Krasheninnikov, and David Krueger. Defining and characteriz- ing reward gaming. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, edi- tors,Advances in Neural Information Processing Systems, volume 35, pages 9460–9471. Curran As- sociates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/pa...
work page 2022
-
[26]
Math blind: Failures in diagram understanding undermine reasoning in mllms, 2025
Yanpeng Sun, Shan Zhang, Wei Tang, Aotian Chen, Piotr Koniusz, Kai Zou, Yuan Xue, and Anton van den Hengel. Math blind: Failures in diagram understanding undermine reasoning in mllms, 2025. URL https://arxiv.org/abs/2503.20745
-
[27]
Qwen3.5: Accelerating productivity with native multimodal agents, February 2026
Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, February 2026. URL https://qwen.ai/blog?id=qwen3.5
work page 2026
-
[28]
Trieu H. Trinh, Yuhuai Wu, Quoc V . Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nat., 625(7995):476–482, 2024. doi: 10.1038/S41586-023-06747-5. URL https://doi.org/10.1038/s41586-023-06747-5
-
[29]
Measuring multimodal mathematical reasoning with math-vision dataset
Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, Houxing Ren, Aojun Zhou, Mingjie Zhan, and Hong- sheng Li. Measuring multimodal mathematical reasoning with math-vision dataset. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neu- ral Information Processing Systems, volume 37, pages 95095–95169. Curran ...
-
[30]
Do large language models truly understand geometric structures?, 2025
Xiaofeng Wang, Yiming Wang, Wenhong Zhu, and Rui Wang. Do large language models truly understand geometric structures?, 2025. URLhttps://arxiv.org/abs/2501.13773
-
[31]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors,Ad- vances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Asso...
work page 2022
-
[32]
NesyGeo: A neuro-symbolic framework for multimodal geometric reasoning data generation
Weiming Wu, Jin Ye, Zi kang Wang, Zhi Zhou, Yu-Feng Li, and Lan-Zhe Guo. Nesygeo: A neuro-symbolic framework for multimodal geometric reasoning data generation, 2025. URL https://arxiv.org/abs/ 2505.17121
-
[33]
Geox: Geometric problem solving through unified formalized vision-language pre-training
Renqiu Xia, mingsheng li, Hancheng Ye, Wenjie Wu, Hongbin Zhou, Jiakang Yuan, Tianshuo Peng, Xinyu Cai, Xiangchao Yan, Bin Wang, Conghui He, Botian Shi, Tao Chen, Junchi Yan, and Bo Zhang. Geox: Geometric problem solving through unified formalized vision-language pre-training. In Y . Yue, 13 A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Confe...
work page 2025
-
[34]
Shihao Xu, Yiyang Luo, and Wei Shi. Geo-llava: A large multi-modal model for solving geometry math problems with meta in-context learning. InProceedings of the 2nd Workshop on Large Generative Models Meet Multimodal Applications, LGM3A ’24, page 11–15, New York, NY , USA, 2024. Association for Computing Machinery. ISBN 9798400711930. doi: 10.1145/3688866....
-
[35]
Chengrui Zhang, Maizhen Ning, Tianyi Liu, Zihao Zhou, Jie Sun, Qiufeng Wang, and Kaizhu Huang. Geosdf: Plane geometry diagram synthesis via signed distance field, 2025. URL https://arxiv.org/ abs/2506.13492
-
[36]
Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Yu Qiao, Peng Gao, and Hongsheng Li. Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems? InComputer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part VII...
-
[37]
Mavis: Mathematical vi- sual instruction tuning with an automatic data engine
Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Ziyu Guo, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Shanghang Zhang, Gao Peng, and Hongsheng Li. Mavis: Mathematical vi- sual instruction tuning with an automatic data engine. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, pages...
work page 2025
-
[38]
Yuhao Zhang, Dingxin Hu, Tinghao Yu, Hao Liu, and Yiting Liu. Geofm: Enhancing geometric reasoning of mllms via synthetic data generation through formal language, 2025. URL https://arxiv.org/abs/ 2510.27448. 14 A GeoSym127K Dataset Samples and Comparison In this section, we present representative samples from the GeoSym127K dataset to demonstrate our rigo...
-
[39]
Find CD. A. 6.0 B.6.5 C.7.0 D.8.5 Answer:B G-LLaVa -- <ImageHere> Give reasoning steps and answers. There is a trapezoid in the diagram, represented by the symbol TQWC. Line TQ is parallel to line WU. Is line WC parallel to line TQ?" Step1: According to TQWC being a trapezoid and TQ being parallel to WU, we know that in a trapezoid, only one pair of oppos...
-
[40]
Since points F, J, and E are collinear, we have:\n \\(\\angle GEJ = \\angle GEF\\).\n3
By angle addition, we have:\n \\(\\angle HEJ = \\angle GEJ + \\angle HEG\\).\n2. Since points F, J, and E are collinear, we have:\n \\(\\angle GEJ = \\angle GEF\\).\n3. Since quadrilateral EFGH is a parallelogram, we have:
-
[41]
Given that RSTU forms a square, we have:\n - TU = UR. .....TW = \\angle RTU\\).\n4. Therefore:\n - \\(\\angle RTW = \\angle URT\\).", GeoGPT4V -- <ImageHere> If VX is parallel to SU and angle VWY is equal to angle STW, what can we say about angle RTU? A.11.3 B.13.5 C.5.7 D.8.7 Stepl: Draw trapezoid TRVB withTRIIVB,TR=4,VB=6,RV=4,/TRV=12 0°. First, Calcula...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.