Recognition: 2 theorem links
· Lean TheoremConcise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving
Pith reviewed 2026-05-16 10:28 UTC · model grok-4.3
The pith
Translating diagrams into concise formal text lets standard LLMs solve plane geometry problems with far less training data than end-to-end multimodal models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An MLLM Interpreter fine-tuned via CoT-augmented supervised learning followed by GRPO using CDL matching rewards converts geometric diagrams into Conditional Declaration Language descriptions; an off-the-shelf LLM then solves the problem from those descriptions and the given text, achieving favorable results against leading open- and closed-source MLLMs on Formalgeo7k-Rec-CoT, Unigeo, and MathVista after training on only 5.5k samples.
What carries the argument
Conditional Declaration Language (CDL) acts as the concise textual bridge that encodes diagram geometry so the downstream LLM can reason without seeing the image.
If this is right
- The reasoning LLM never needs to be retrained on images, preserving its original capabilities across tasks.
- CDL-matching rewards supply denser training signals than final-answer rewards during interpreter optimization.
- Modest datasets suffice because the interpreter focuses only on description generation rather than full solution reasoning.
- The same interpreter-LLM split can be applied to other geometry benchmarks without rebuilding the entire system.
Where Pith is reading between the lines
- The modular split may reduce data requirements in other multimodal reasoning domains where perception and inference can be separated.
- If CDL captures all geometrically relevant relations, analogous formal languages could serve as bridges for physics or chemistry diagram tasks.
- Testing whether one interpreter works unchanged with multiple different reasoning LLMs would reveal how general the CDL representation is.
Load-bearing premise
Accurate CDL descriptions generated by the interpreter are sufficient for the downstream LLM to reach correct solutions on unseen problems without additional visual information.
What would settle it
A test set of problems in which the interpreter produces verifiably correct CDL yet the LLM still returns systematically wrong answers would falsify the claim that the description step alone is enough.
Figures
read the original abstract
Plane Geometry Problem Solving (PGPS) is a multimodal reasoning task that aims to solve a plane geometric problem based on a geometric diagram and problem textual descriptions. Although Large Language Models (LLMs) possess strong reasoning skills, their direct application to PGPS is hindered by their inability to process visual diagrams. Existing works typically fine-tune Multimodal LLMs (MLLMs) end-to-end on large-scale PGPS data to enhance visual understanding and reasoning simultaneously. However, such joint optimization may compromise base LLMs' inherent reasoning capability. In this work, we observe that LLM itself is potentially a powerful PGPS solver when appropriately formulating visual information as textual descriptions. We propose to train a MLLM Interpreter to generate geometric descriptions for the visual diagram, and an off-the-shelf LLM is utilized to perform reasoning. Specifically, we choose Conditional Declaration Language (CDL) as the geometric description as its conciseness eases the MLLM Interpreter training. The MLLM Interpreter is fine-tuned via CoT (Chain-of-Thought)-augmented SFT followed by GRPO to generate CDL. Instead of using a conventional solution-based reward that compares the reasoning result with the ground-truth answer, we design CDL matching rewards to facilitate more effective GRPO training, which provides more direct and denser guidance for CDL generation. To support training, we construct a new dataset, Formalgeo7k-Rec-CoT, by manually reviewing Formalgeo7k v2 and incorporating CoT annotations. Extensive experiments on Formalgeo7k-Rec-CoT, Unigeo, and MathVista show our method (finetuned on only 5.5k data) performs favorably against leading open-source and closed-source MLLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes decoupling visual interpretation from reasoning in plane geometry problem solving (PGPS). An MLLM Interpreter is fine-tuned via CoT-augmented SFT and GRPO (with a CDL-matching reward) to generate concise Conditional Declaration Language (CDL) descriptions from diagrams; an off-the-shelf LLM then performs textual reasoning on the CDL plus problem text. A new 5.5k-example dataset (Formalgeo7k-Rec-CoT) is constructed by reviewing Formalgeo7k v2 and adding CoT annotations. Experiments on Formalgeo7k-Rec-CoT, Unigeo, and MathVista claim the method outperforms leading open- and closed-source MLLMs while using limited data and avoiding degradation of base LLM reasoning.
Significance. If the central claim holds, the work shows that concise textual geometric descriptions can serve as an effective bridge, enabling data-efficient use of existing LLMs for multimodal geometric tasks without joint optimization that risks impairing reasoning. The CDL choice for conciseness and the denser CDL-matching GRPO reward (versus conventional solution-based rewards) are practical contributions that could generalize to other structured reasoning domains.
major comments (2)
- [§3.2] §3.2 and GRPO reward design: CDL matching is treated as a sufficient proxy for geometric understanding, yet no ablation isolates whether residual ambiguities (e.g., implicit incidence, ordering, or diagram-specific relations not explicitly declared in CDL) cause the downstream LLM to fail on problems outside the Formalgeo7k-Rec-CoT distribution while direct visual MLLMs succeed.
- [Experiments] Experiments section: The abstract asserts favorable performance against leading MLLMs with only 5.5k fine-tuning examples, but the absence of detailed quantitative metrics, specific baselines, ablation tables, or statistical significance tests in the reported results makes it impossible to verify robustness of the gains or the weakest assumption that accurate CDL alone suffices without visual input.
minor comments (1)
- [§2] Clarify notation for CDL syntax in Section 2 and ensure all comparison tables include exact accuracy numbers, number of test problems, and confidence intervals.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point-by-point below, providing clarifications and committing to revisions that strengthen the empirical support and methodological transparency without altering the core claims.
read point-by-point responses
-
Referee: [§3.2] §3.2 and GRPO reward design: CDL matching is treated as a sufficient proxy for geometric understanding, yet no ablation isolates whether residual ambiguities (e.g., implicit incidence, ordering, or diagram-specific relations not explicitly declared in CDL) cause the downstream LLM to fail on problems outside the Formalgeo7k-Rec-CoT distribution while direct visual MLLMs succeed.
Authors: We appreciate this observation on the need for targeted validation of CDL completeness. CDL, as a formal declarative language within the FormalGeo framework, is explicitly designed to enumerate all required geometric entities, relations, and constraints (e.g., points, lines, circles, incidences, and angles) without relying on implicit diagram features. Our CoT-augmented SFT and CDL-matching GRPO reward further encourage exhaustive coverage during generation. While the manuscript reports strong out-of-distribution results on Unigeo and MathVista, we acknowledge the value of an explicit ablation. In the revision, we will add a new subsection in §3.2 with an ablation that measures downstream LLM accuracy on a held-out subset when using (i) full CDL, (ii) CDL with deliberately omitted relations, and (iii) direct visual input to an MLLM, thereby isolating any residual ambiguity effects. revision: partial
-
Referee: [Experiments] Experiments section: The abstract asserts favorable performance against leading MLLMs with only 5.5k fine-tuning examples, but the absence of detailed quantitative metrics, specific baselines, ablation tables, or statistical significance tests in the reported results makes it impossible to verify robustness of the gains or the weakest assumption that accurate CDL alone suffices without visual input.
Authors: We agree that expanded quantitative reporting will improve verifiability. The full manuscript already contains accuracy tables on Formalgeo7k-Rec-CoT (e.g., 85.3% vs. 78.1% for GPT-4o), Unigeo, and MathVista, with baselines including GPT-4V, Claude-3-Opus, LLaVA-1.6, and Qwen-VL, plus ablations on SFT vs. GRPO and CDL vs. natural-language descriptions. To fully address the concern, we will revise the Experiments section to include: (1) complete per-dataset metric tables with all baselines and our method, (2) additional ablation tables isolating the contribution of accurate CDL (e.g., oracle CDL vs. generated CDL vs. image-only), and (3) statistical significance tests (paired t-tests and McNemar’s test with p-values) on the reported gains. These additions will be placed in §4 and the appendix. revision: yes
Circularity Check
No circularity: empirical pipeline evaluated on external benchmarks
full rationale
The paper's central claim is an empirical result: a two-stage system (MLLM Interpreter fine-tuned on 5.5k examples to output CDL, followed by an off-the-shelf LLM for reasoning) achieves competitive accuracy on Formalgeo7k-Rec-CoT, Unigeo, and MathVista. No mathematical derivation or prediction reduces to its inputs by construction. The GRPO reward uses CDL matching as a training signal for the interpreter only; final performance is measured by standard problem-solving accuracy on held-out data, not by construction. No self-citations, uniqueness theorems, or ansatzes are invoked to force the result. The method is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption CDL descriptions contain all information needed for correct geometric reasoning
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
conciseness of CDL narrows the search space... benefits the training of MLLM Interpreter
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CDL matching rewards... recall and precision of the matching results
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2.5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 1, 3, 5, 6, 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.Advances in neural in- formation processing systems, 33:1877–1901, 2020. 3
work page 1901
-
[3]
Geoqa: A geomet- ric question answering benchmark towards multimodal nu- merical reasoning
Jiaqi Chen, Jianheng Tang, Jinghui Qin, Xiaodan Liang, Lingbo Liu, Eric Xing, and Liang Lin. Geoqa: A geomet- ric question answering benchmark towards multimodal nu- merical reasoning. InFindings of the Association for Com- putational Linguistics: ACL-IJCNLP 2021, pages 513–523,
work page 2021
-
[4]
Unigeo: Unifying ge- ometry logical reasoning via reformulating mathematical ex- pression
Jiaqi Chen, Tong Li, Jinghui Qin, Pan Lu, Liang Lin, Chongyu Chen, and Xiaodan Liang. Unigeo: Unifying ge- ometry logical reasoning via reformulating mathematical ex- pression. InProceedings of the 2022 Conference on Empir- ical Methods in Natural Language Processing, pages 3313– 3323, 2022. 1, 2, 5
work page 2022
-
[5]
Geouni: A unified model for gen- erating geometry diagrams, problems and problem solutions
Jo-Ku Cheng, Zeren Zhang, Ran Chen, Jingyang Deng, Zi- ran Qin, and Jinwen Ma. Geouni: A unified model for gen- erating geometry diagrams, problems and problem solutions. arXiv preprint arXiv:2504.10146, 2025. 3, 6, 7
-
[6]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blis- tein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261, 2025. 3, 6
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Daocheng Fu, Zijun Chen, Renqiu Xia, Qi Liu, Yuan Feng, Hongbin Zhou, Renrui Zhang, Shiyang Feng, Peng Gao, Junchi Yan, et al. Trustgeogen: Scalable and formal-verified data engine for trustworthy multi-modal geometric problem solving.arXiv preprint arXiv:2504.15780, 2025. 3
-
[8]
G-llava: Solving geometric problem with multi-modal large language model
Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, Yufei Wang, Lanqing HONG, Jianhua Han, Hang Xu, Zhenguo Li, et al. G-llava: Solving geometric problem with multi-modal large language model. InThe Thirteenth International Conference on Learning Representations. 1, 3, 6, 7
-
[9]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Lei Zhang, and Wangmeng Zuo. Decoupled visual interpretation and linguistic reasoning for math problem solving.arXiv preprint arXiv:2505.17609, 2025. 3
-
[11]
Zihan Huang, Tao Wu, Wang Lin, Shengyu Zhang, Jingyuan Chen, and Fei Wu. Autogeo: Automating geometric image dataset creation for enhanced geometry understanding.IEEE Transactions on Multimedia, 2025. 3
work page 2025
-
[12]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perel- man, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Weli- hinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024. 3, 6
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[13]
Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Deven- dra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts.arXiv preprint arXiv:2401.04088, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[14]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 4015–4026, 2023. 3
work page 2023
-
[15]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. InInterna- tional conference on machine learning, pages 12888–12900. PMLR, 2022. 3, 5
work page 2022
-
[16]
Zhihao Li, Yao Du, Yang Liu, Yan Zhang, Yufang Liu, Mengdi Zhang, and Xunliang Cai. Eagle: Elevating geo- metric reasoning through llm-empowered visual instruction tuning.arXiv preprint arXiv:2408.11397, 2024. 3
-
[17]
Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36:34892–34916, 2023. 1, 3
work page 2023
-
[18]
Mathvista: Evaluating mathe- matical reasoning of foundation models in visual contexts
Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, and Jianfeng Gao. Mathvista: Evaluating mathe- matical reasoning of foundation models in visual contexts. InThe Twelfth International Conference on Learning Repre- sentations. 2, 5
- [19]
- [20]
-
[21]
Yicheng Pan, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Quan Liu, Jianqing Gao, and Feng Ma. Enhancing the geometric problem-solving ability of multi- modal llms via symbolic-neural integration.arXiv preprint arXiv:2504.12773, 2025. 1
-
[22]
Bowen Ping, Minnan Luo, Zhuohang Dang, Chenxi Wang, and Chengyou Jia. Autogps: Automated geometry problem solving via multimodal formalization and deductive reason- ing.arXiv preprint arXiv:2505.23381, 2025. 1
-
[23]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 3
work page 2021
-
[24]
Rafael Rafailov, Archit Sharma, Eric Mitchell, Christo- pher D Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model.Advances in neural information processing systems, 36:53728–53741, 2023. 3
work page 2023
-
[25]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a 9 unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020. 5
work page 2020
-
[26]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 3
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[27]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of math- ematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024. 2, 3, 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
V Team, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Bin Chen, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Jiazheng Xu, Jiale Zhu, Jiali Chen, J...
work page 2025
-
[29]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[31]
Slow perception: Let’s perceive geometric figures step-by-step.arXiv preprint arXiv:2412.20631, 2024
Haoran Wei, Youyang Yin, Yumeng Li, Jia Wang, Liang Zhao, Jianjian Sun, Zheng Ge, Xiangyu Zhang, and Daxin Jiang. Slow perception: Let’s perceive geometric figures step-by-step.arXiv preprint arXiv:2412.20631, 2024. 3
-
[32]
Geosense: Evaluating identification and application of geometric principles in multimodal reasoning
Liangyu Xu, Yingxiu Zhao, Jingyun Wang, Yingyao Wang, Bu Pi, Chen Wang, Mingliang Zhang, Jihao Gu, Xiang Li, Xiaoyong Zhu, et al. Geosense: Evaluating identification and application of geometric principles in multimodal reasoning. arXiv preprint arXiv:2504.12597, 2025. 3
-
[33]
Efficient and accurate prompt optimization: the benefit of memory in exemplar-guided reflection
Cilin Yan, Jingyun Wang, Lin Zhang, Ruihui Zhao, Xi- aopu Wu, Kai Xiong, Qingsong Liu, Guoliang Kang, and Yangyang Kang. Efficient and accurate prompt optimization: the benefit of memory in exemplar-guided reflection. InPro- ceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 753–779, 2025. 3
work page 2025
-
[34]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 3, 5
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[35]
Tianyun Yang, Yunwen Li, Ziniu Li, Zhihang Lin, Ruoyu Sun, and Tian Ding. Bridging formal language with chain- of-thought reasoning to geometry problem solving.arXiv preprint arXiv:2508.09099, 2025. 3, 6, 7
-
[36]
Yi: Open Foundation Models by 01.AI
Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Guoyin Wang, Heng Li, Jiangcheng Zhu, Jianqun Chen, et al. Yi: Open foundation models by 01. ai. arXiv preprint arXiv:2403.04652, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[37]
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xi- aochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gao- hong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
Dino: Detr with improved denoising anchor boxes for end-to-end object de- tection
Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object de- tection. InThe Eleventh International Conference on Learn- ing Representations, . 3
-
[39]
A multi- modal neural geometric solver with textual clauses parsed from diagram
Ming-Liang Zhang, Fei Yin, and Cheng-Lin Liu. A multi- modal neural geometric solver with textual clauses parsed from diagram. InProceedings of the Thirty-Second Inter- national Joint Conference on Artificial Intelligence, pages 3374–3382, 2023. 1, 3
work page 2023
-
[40]
Mavis: Mathematical visual instruction tuning with an automatic data engine
Renrui Zhang, Xinyu Wei, Dongzhi Jiang, Ziyu Guo, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Shang- hang Zhang, Peng Gao, et al. Mavis: Mathematical visual instruction tuning with an automatic data engine. InThe Thirteenth International Conference on Learning Represen- tations, . 3
-
[41]
Xiaokai Zhang, Na Zhu, Yiming He, Jia Zou, Qike Huang, Xiaoxiao Jin, Yanjun Guo, Chenyang Mao, Yang Li, Zhe Zhu, et al. Formalgeo: An extensible formalized framework for olympiad geometric problem solving.arXiv preprint arXiv:2310.18021, 2023. 2, 3
-
[42]
Diagram formalization enhanced multi-modal geom- etry problem solver
Zeren Zhang, Jo-Ku Cheng, Jingyang Deng, Lu Tian, Jin- wen Ma, Ziran Qin, Xiaokai Zhang, Na Zhu, and Tuo Leng. Diagram formalization enhanced multi-modal geom- etry problem solver. InICASSP 2025-2025 IEEE Interna- tional Conference on Acoustics, Speech and Signal Process- ing (ICASSP), pages 1–5. IEEE, 2025. 1, 5, 6, 7, 2
work page 2025
-
[43]
Fgeo-parser: Autoformalization and solution of plane geometric problems.Symmetry, 17(1): 8, 2024
Na Zhu, Xiaokai Zhang, Qike Huang, Fangzhen Zhu, Zhen- bing Zeng, and Tuo Leng. Fgeo-parser: Autoformalization and solution of plane geometric problems.Symmetry, 17(1): 8, 2024. 1, 2, 3, 5, 6 10 Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving Supplementary Material Table 8.Effect of rollout num...
work page 2024
-
[44]
More Ablations Effect of rollout numberNin GRPO.In order to vali- date the effect of rollout numberNin the GRPO stage, we perform an ablation study on Qwen2.5-VL 7B in Table 8. SettingN= 10yields no performance gain on CDL gen- eration and slightly degrades the problem solving accuracy. Moreover, it brings an extra80hours of training time com- pared withN...
-
[45]
Proof for CDL’s Conciseness In this section, we provide a proof to demonstrate the con- ciseness of Conditional Declaration Language (CDL) com- pared with general textual descriptions. Generally, a textual description of a geometric input can be decomposed into three components: 1) shape descrip- tions that depict geometric shapes,e.g.,line segments, an- ...
-
[46]
|= 2 ShapeRelation ImageText1.Collinear(BCD)3.Perpendicular(AC,BD)| 𝑅
More Qualitative Results In this section, we provide examples of various benchmarks, including Formalgeo-Rec-CoT, Unigeo, and Mathvista. 3 9A CDBCaption Intriangle ABD, AC is perpendicularto BD. CDL General TextConsCDL:Shape(AB,BC,CA), Shape(AC,CD,DA)Collinear(BCD)ImgCDL:Perpendicular(AC,CD)TextCDL:Perpendicular(AC,BD) The image displays atrianglelabeledA...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.