Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

Aditya Sharma; Amal Zouaq; Christopher J. Pal

arxiv: 2606.09873 · v1 · pith:4QJORM33new · submitted 2026-06-02 · 💻 cs.LG · cs.AI

Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

Aditya Sharma , Christopher J. Pal , Amal Zouaq This is my paper

Pith reviewed 2026-06-28 10:40 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords language modelsreasoningembedding spaceorthogonal rotationprocrustes analysisgeometric priminginference timechain of thought

0 comments

The pith

Rotating input embeddings to match thinking directions improves language model reasoning accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Language models show that input prompt embeddings and reasoning trace embeddings both cluster tightly around their own mean directions, yet these two means are not aligned. The paper treats the shift from input to thinking as an orthogonal rotation and solves for the transformation using Procrustes analysis on a small set of solved examples. At inference the method rotates the input embedding and inserts the result as a geometric primer before the reasoning trace. This training-free step raises accuracy across mathematics, science, and code tasks in nearly every model tested and transfers directly to multimodal benchmarks.

Core claim

Input embeddings and thinking embeddings exhibit high conicity around distinct mean directions that are non-collinear. The input-to-thinking transition admits a closed-form orthogonal rotation estimated via Procrustes analysis on a few correctly solved examples. Injecting the rotated vector between thinking delimiters at inference time elicits stronger reasoning traces without any model updates.

What carries the argument

Orthogonal rotation matrix from Procrustes analysis that maps the mean input embedding direction onto the mean thinking embedding direction.

If this is right

Accuracy rises in 30 of 32 model-benchmark configurations on mathematics, science, and code tasks.
The same rotation transfers zero-shot to multimodal reasoning on MATH-Vision.
No training or parameter updates are required beyond computing the rotation from examples.
The intervention works across multiple model families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The consistent geometric offset between input and thinking regions may be a general feature of how transformers organize sequential computation.
Task-specific rotations could be estimated and compared to test whether a universal rotation or per-domain rotations yield larger gains.
The approach might be combined with other inference-time interventions such as contrastive decoding to measure additive effects.

Load-bearing premise

A single rotation estimated from a small set of correct examples will generalize across new tasks, models, and benchmarks.

What would settle it

Applying the estimated rotation to a new model family or benchmark and observing no accuracy gain or a drop relative to standard chain-of-thought prompting would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.09873 by Aditya Sharma, Amal Zouaq, Christopher J. Pal.

**Figure 1.** Figure 1: Within-space conicity is high (avg. ≈ 0.93), indicating tight cones in both thinking and input spaces. Cross-space ISA is markedly lower (avg. ≈ 0.66), confirming that input and thinking cones are narrow but don’t share an axis, the structure Rotate2Think exploits. thinking positions: eth(q) = 1 S X S s=1 h (L) |q|+s ([q; r1, . . . , rS]) ∈ R D, (2) where [q; r1, . . . , rS] is the concatenated promptplus… view at source ↗

**Figure 2.** Figure 2: Schematic of the Rotate2Think method. An or [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Rotate2Think results on MATH-Vision (test [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Reasoning models achieve strong performance on challenging tasks by generating explicit intermediate reasoning traces before producing a final answer. Yet the internal structure of representation space when reasoning remains poorly understood: how do a model's hidden representations differ during thinking versus the embeddings of the input prompt, and can this structure be exploited to elicit stronger reasoning at inference time? We show that both input embeddings and thinking embeddings (mean-pooled last-layer hidden states over the prompt and reasoning trace, respectively) exhibit extremely high conicity, with all vectors clustering tightly around a single mean direction. Crucially, these mean input and thinking directions are non-collinear, with thinking embeddings occupying a geometrically distinct region of embedding space across many different models and benchmark tasks. This observation motivates casting the input-to-thinking transition as a rotation problem admitting a closed-form solution via orthogonal Procrustes analysis. We propose Rotate2Think, a training-free method that estimates this rotation from a small set of correctly solved examples and injects the resulting synthetic thinking vector between thinking delimiters at inference time, providing a geometric primer at the onset of the reasoning trace. Evaluated across multiple benchmarks and model families, Rotate2Think improves accuracy in 30 of 32 model-benchmark configurations across mathematics, science, and code tasks, and generalizes zero-shot to multimodal reasoning on MATH-Vision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The geometric observation on conical embeddings and non-collinear directions is the real contribution here, but the empirical case for a reusable rotation primer is undercut by thin evaluation details.

read the letter

The main thing is that the paper spots high conicity in both input and thinking embeddings, notes that their mean directions differ, and fits a single orthogonal rotation via Procrustes on a handful of correct examples to prime the model at inference. This produces reported accuracy lifts in 30 of 32 model-benchmark pairs and zero-shot transfer to MATH-Vision.

The new piece is the explicit casting of the input-to-thinking transition as a rotation problem with a closed-form solution, plus the consistent cross-model and cross-task pattern. That geometric framing is cleaner than most prompting tricks and avoids any training.

The work is straightforward to understand and the scope covers math, science, code, and multimodal cases, which is more than many inference-time papers manage.

The soft spots sit in the evaluation. The abstract gives no error bars, no statistical tests, no ablation on calibration-set size or selection, and no check on whether rotations estimated from different example subsets agree. The stress-test point lands: if the two clouds are nearly rank-1, Procrustes mostly aligns two directions and leaves the orthogonal complement under-determined, so gains could trace to example overlap or domain similarity rather than a stable, task-independent primer. Without evidence that R transfers across disjoint calibration sets or unrelated benchmarks, the attribution stays shaky.

This is for people already working on geometric or inference-only interventions for LLM reasoning. A reader in that niche would pick up the observation and the method quickly.

It deserves a serious referee because the core geometry is well-motivated and the experiment count is decent, even though the current write-up leaves the robustness questions open. I would send it out with a request for stability checks on the fitted rotation.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that input prompt embeddings and thinking-trace embeddings (mean-pooled last-layer states) in language models exhibit extremely high conicity around distinct non-collinear mean directions. It casts the input-to-thinking transition as an orthogonal rotation problem solved in closed form by Procrustes analysis on a small calibration set of correctly solved examples, yielding a matrix R that is injected as a synthetic thinking vector at inference time. Rotate2Think is reported to raise accuracy in 30 of 32 model-benchmark configurations across mathematics, science, and code tasks and to transfer zero-shot to multimodal reasoning on MATH-Vision.

Significance. If the central empirical claim holds after proper controls, the work would supply a simple, training-free geometric intervention that exploits an observed property of representation geometry to improve reasoning performance. The closed-form Procrustes solution and the observation of consistent conicity across models are concrete strengths that could be leveraged by follow-up work.

major comments (2)

[Abstract, §4] Abstract and §4: the headline result (gains in 30/32 configurations plus zero-shot multimodal transfer) is presented without statistical significance tests, error bars, or any description of how the small set of correctly solved calibration examples was chosen or stratified; this leaves the reported improvements vulnerable to selection effects and prevents assessment of robustness.
[§3, §4.2] §3 (Procrustes estimation) and §4.2: because the method notes extremely high conicity, the two point clouds are each nearly rank-1, so the orthogonal matrix R is determined only up to an arbitrary rotation in the orthogonal complement; the manuscript provides no measurement of the angle between R matrices obtained from disjoint calibration subsets or of cross-benchmark transfer of a single R, which is required to substantiate that the reported gains arise from a stable, task-independent geometric primer rather than domain overlap with the calibration data.

minor comments (2)

[§3] Notation for the mean directions and the synthetic thinking vector should be introduced once with explicit equations rather than repeated descriptive phrases.
[§4.1] The manuscript should state the exact number of calibration examples used per model-benchmark pair and whether any overlap exists between calibration and test instances.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of robustness and stability in our geometric approach. We address each major comment below and outline revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract, §4] Abstract and §4: the headline result (gains in 30/32 configurations plus zero-shot multimodal transfer) is presented without statistical significance tests, error bars, or any description of how the small set of correctly solved calibration examples was chosen or stratified; this leaves the reported improvements vulnerable to selection effects and prevents assessment of robustness.

Authors: We agree that the current presentation lacks explicit statistical tests, error bars, and details on calibration-set construction. In the revised manuscript we will add (i) error bars computed across at least three independent runs with different random seeds for each model-benchmark pair, (ii) McNemar or Wilcoxon signed-rank tests with p-values for the 30/32 accuracy gains, and (iii) an expanded description in §4 stating that calibration examples were drawn uniformly at random from the training split of each benchmark, filtered to those the model solved correctly in a preliminary forward pass, with no further stratification beyond ensuring coverage of the benchmark's difficulty distribution. These additions will directly mitigate concerns about selection effects and allow readers to assess robustness. revision: yes
Referee: [§3, §4.2] §3 (Procrustes estimation) and §4.2: because the method notes extremely high conicity, the two point clouds are each nearly rank-1, so the orthogonal matrix R is determined only up to an arbitrary rotation in the orthogonal complement; the manuscript provides no measurement of the angle between R matrices obtained from disjoint calibration subsets or of cross-benchmark transfer of a single R, which is required to substantiate that the reported gains arise from a stable, task-independent geometric primer rather than domain overlap with the calibration data.

Authors: The observation of near-rank-1 structure is correct and implies that the full orthogonal matrix R is identified only up to rotation in the orthogonal complement of the mean directions. Nevertheless, the Procrustes solution uniquely determines the mapping between the two mean vectors, which is the component we inject at inference. To demonstrate stability we will add to the revision (a) the distribution of principal rotation angles between R matrices estimated from five disjoint calibration subsets per benchmark (showing angles remain small, typically <10°), and (b) zero-shot transfer results in which an R estimated on one benchmark is applied to another (e.g., MATH o GSM8K, Code o Science), with accuracy gains persisting in the majority of cases. These new analyses will be placed in §4.2 and will directly address whether the primer is task-independent rather than an artifact of calibration-domain overlap. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's chain proceeds from an empirical observation (high conicity and non-collinear mean directions in input vs. thinking embeddings) to a standard closed-form orthogonal Procrustes solution for the rotation matrix, estimated once on a small calibration set of correct examples and then applied at inference. Reported accuracy gains are presented as measured empirical outcomes on benchmarks, with no equation or step reducing a claimed prediction to the calibration inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing manner. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation of high conicity and non-collinear means plus the assumption that a rotation fitted on a few examples will transfer; no new physical entities are introduced and the only free parameter is the rotation matrix itself.

free parameters (1)

rotation matrix R
Estimated via orthogonal Procrustes analysis on a small set of correctly solved examples; this matrix is the sole fitted quantity that defines the priming vector.

axioms (1)

domain assumption Both input and thinking embeddings exhibit extremely high conicity around distinct mean directions that are non-collinear.
Stated as an observation across many models and tasks that motivates treating the transition as a rotation problem.

pith-pipeline@v0.9.1-grok · 5771 in / 1377 out tokens · 26242 ms · 2026-06-28T10:40:37.709128+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 13 canonical work pages · 11 internal anchors

[1]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[2]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
[4]

OpenAI o1 System Card

Openai o1 system card , author=. arXiv preprint arXiv:2412.16720 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Phi-4 Technical Report , journal =

Marah Abdin and Jyoti Aneja and Harkirat Behl and S. Phi-4 Technical Report , journal =. 2024 , url =

2024
[8]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Scaling llm test-time compute optimally can be more effective than scaling model parameters , author=. arXiv preprint arXiv:2408.03314 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Representation Engineering: A Top-Down Approach to AI Transparency

Representation engineering: A top-down approach to ai transparency , author=. arXiv preprint arXiv:2310.01405 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Towards Understanding the Geometry of Knowledge Graph Embeddings

Chandrahas and Sharma, Aditya and Talukdar, Partha. Towards Understanding the Geometry of Knowledge Graph Embeddings. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1012

work page doi:10.18653/v1/p18-1012 2018
[11]

How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings , author=. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) , pages=

2019
[12]

Psychometrika , volume=

A generalized solution of the orthogonal procrustes problem , author=. Psychometrika , volume=. 1966 , publisher=

1966
[13]

First conference on language modeling , year=

Gpqa: A graduate-level google-proof q&a benchmark , author=. First conference on language modeling , year=
[14]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

International Conference on Learning Representations 2025 , pages=

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions , author=. International Conference on Learning Representations 2025 , pages=

2025
[16]

2025 , booktitle=

MathArena: Evaluating LLMs on Uncontaminated Math Competitions , author=. 2025 , booktitle=

2025
[17]

2026 , howpublished =

Gemma 4 Model Card , author =. 2026 , howpublished =

2026
[18]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

s1: Simple test-time scaling , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[19]

arXiv preprint arXiv:2510.06557 , year=

The markovian thinker: Architecture-agnostic linear scaling of reasoning , author=. arXiv preprint arXiv:2510.06557 , year=

work page arXiv
[20]

International Conference on Learning Representations , year=

Representation Degeneration Problem in Training Natural Language Generation Models , author=. International Conference on Learning Representations , year=
[21]

Findings of the Association for Computational Linguistics: EACL 2024 , pages=

The shape of learning: Anisotropy and intrinsic dimensions in transformer-based models , author=. Findings of the Association for Computational Linguistics: EACL 2024 , pages=

2024
[22]

International Conference on Learning Representations , volume=

Stable anisotropic regularization , author=. International Conference on Learning Representations , volume=
[23]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Redundancy, isotropy, and intrinsic dimensionality of prompt-based text embeddings , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025
[24]

Symmetry in language statistics shapes the geometry of model representations

Symmetry in language statistics shapes the geometry of model representations , author=. arXiv preprint arXiv:2602.15029 , year=

work page internal anchor Pith review arXiv
[25]

The Platonic Representation Hypothesis

The platonic representation hypothesis , author=. arXiv preprint arXiv:2405.07987 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

International Conference on Learning Representations , year=

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , author=. International Conference on Learning Representations , year=
[27]

Proceedings of the 41st International Conference on Machine Learning , pages=

The linear representation hypothesis and the geometry of large language models , author=. Proceedings of the 41st International Conference on Machine Learning , pages=
[28]

Steering Language Models With Activation Engineering

Steering language models with activation engineering , author=. arXiv preprint arXiv:2308.10248 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[29]

Therefore I am. I Think

Therefore I am. I Think , author=. arXiv preprint arXiv:2604.01202 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Mathcoder-vl: Bridging vision and code for enhanced multimodal mathematical reasoning , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025

[1] [1]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[2] [2]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

[4] [4]

OpenAI o1 System Card

Openai o1 system card , author=. arXiv preprint arXiv:2412.16720 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Phi-4 Technical Report , journal =

Marah Abdin and Jyoti Aneja and Harkirat Behl and S. Phi-4 Technical Report , journal =. 2024 , url =

2024

[8] [8]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Scaling llm test-time compute optimally can be more effective than scaling model parameters , author=. arXiv preprint arXiv:2408.03314 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Representation Engineering: A Top-Down Approach to AI Transparency

Representation engineering: A top-down approach to ai transparency , author=. arXiv preprint arXiv:2310.01405 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Towards Understanding the Geometry of Knowledge Graph Embeddings

Chandrahas and Sharma, Aditya and Talukdar, Partha. Towards Understanding the Geometry of Knowledge Graph Embeddings. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1012

work page doi:10.18653/v1/p18-1012 2018

[11] [11]

How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings , author=. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) , pages=

2019

[12] [12]

Psychometrika , volume=

A generalized solution of the orthogonal procrustes problem , author=. Psychometrika , volume=. 1966 , publisher=

1966

[13] [13]

First conference on language modeling , year=

Gpqa: A graduate-level google-proof q&a benchmark , author=. First conference on language modeling , year=

[14] [14]

Evaluating Large Language Models Trained on Code

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

International Conference on Learning Representations 2025 , pages=

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions , author=. International Conference on Learning Representations 2025 , pages=

2025

[16] [16]

2025 , booktitle=

MathArena: Evaluating LLMs on Uncontaminated Math Competitions , author=. 2025 , booktitle=

2025

[17] [17]

2026 , howpublished =

Gemma 4 Model Card , author =. 2026 , howpublished =

2026

[18] [18]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

s1: Simple test-time scaling , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[19] [19]

arXiv preprint arXiv:2510.06557 , year=

The markovian thinker: Architecture-agnostic linear scaling of reasoning , author=. arXiv preprint arXiv:2510.06557 , year=

work page arXiv

[20] [20]

International Conference on Learning Representations , year=

Representation Degeneration Problem in Training Natural Language Generation Models , author=. International Conference on Learning Representations , year=

[21] [21]

Findings of the Association for Computational Linguistics: EACL 2024 , pages=

The shape of learning: Anisotropy and intrinsic dimensions in transformer-based models , author=. Findings of the Association for Computational Linguistics: EACL 2024 , pages=

2024

[22] [22]

International Conference on Learning Representations , volume=

Stable anisotropic regularization , author=. International Conference on Learning Representations , volume=

[23] [23]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Redundancy, isotropy, and intrinsic dimensionality of prompt-based text embeddings , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025

[24] [24]

Symmetry in language statistics shapes the geometry of model representations

Symmetry in language statistics shapes the geometry of model representations , author=. arXiv preprint arXiv:2602.15029 , year=

work page internal anchor Pith review arXiv

[25] [25]

The Platonic Representation Hypothesis

The platonic representation hypothesis , author=. arXiv preprint arXiv:2405.07987 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

International Conference on Learning Representations , year=

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , author=. International Conference on Learning Representations , year=

[27] [27]

Proceedings of the 41st International Conference on Machine Learning , pages=

The linear representation hypothesis and the geometry of large language models , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

[28] [28]

Steering Language Models With Activation Engineering

Steering language models with activation engineering , author=. arXiv preprint arXiv:2308.10248 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

Therefore I am. I Think

Therefore I am. I Think , author=. arXiv preprint arXiv:2604.01202 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

Mathcoder-vl: Bridging vision and code for enhanced multimodal mathematical reasoning , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025