pith. sign in

arxiv: 2509.23292 · v4 · pith:5WCIEIX5new · submitted 2025-09-27 · 💻 cs.AI · cs.CL

Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning

classification 💻 cs.AI cs.CL
keywords reasoningcodepatternpattern-awaretool-integratedtoolswhenapproach
0
0 comments X
read the original abstract

Tool-integrated reasoning (TIR) has become a key approach for improving large reasoning models (LRMs) on complex problems. Prior work has mainly studied when to invoke tools, while overlooking how tools are applied. We identify two common patterns: a calculator pattern that uses code for direct computation, and an algorithmic pattern that encodes problems as programs. Misaligned choices often cause failures even when reasoning is sound. We propose a two-stage framework that first builds code competence from both patterns and then aligns pattern selection with teacher preferences. Across challenging math datasets, our pattern-aware method substantially improves both code usage and accuracy, for instance raising Code@1 on MATH500 from 64.0% to 70.5% and on AIME24 from 26.7% to 50.0%. These gains highlight the effectiveness of a pattern-aware approach for tool-integrated reasoning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction

    cs.CV 2026-05 unverdicted novelty 7.0

    Draw2Think recasts geometric reasoning as agentic interaction with a constraint engine, achieving 95.9% predicate-level construction fidelity and up to 16.4% accuracy gains on solid geometry tasks.

  2. Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

    cs.CL 2026-06 unverdicted novelty 6.0

    First end-to-end RAG on mobile NPU delivers 18.1x faster prefilling, 4x lower latency and energy than CPU on Snapdragon X Elite with equivalent quality.

  3. When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

    cs.SE 2026-05 unverdicted novelty 6.0

    About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.

  4. RankVR: Low-Rank Structure Perception and Value Recalibration for Robust Composed Image Retrieval

    cs.CV 2026-06 unverdicted novelty 4.0

    RankVR introduces GSCP and ASVC modules to improve CIR robustness by decoupling clean samples via low-rank structure and dynamically scoring triplet value in noisy datasets.

  5. IMAGINE: Adaptive Schema-Imagery Enhanced Composition for Composed Video Retrieval

    cs.CV 2026-06 unverdicted novelty 4.0

    IMAGINE uses adaptive schema-imagery via dynamic multimodal prototypes to incorporate implicit semantics into composed video retrieval, claiming SOTA results on CVR and CIR benchmarks.