PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning

Chao Lu; Guannan He; Hengbo Xiao; Jingyuan Fan; Jingzhao Zhang; Xin Tong

arxiv: 2509.18169 · v3 · submitted 2025-09-17 · 💻 cs.LG · cs.CE· cs.CL

PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning

Hengbo Xiao , Jingyuan Fan , Xin Tong , Jingzhao Zhang , Chao Lu , Guannan He This is my paper

Pith reviewed 2026-05-18 16:01 UTC · model grok-4.3

classification 💻 cs.LG cs.CEcs.CL

keywords token-level routingcomputation integrationlanguage modelsexpert networksefficient reasoninghigh-precision computationmulti-agent alternatives

0 comments

The pith

PiERN integrates high-precision computation into LLMs by routing at the token level within a single chain of thought.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an architecture called PiERN that trains computation experts, a text-to-computation module, and a router separately before combining them. At inference time, the router makes decisions at the token level to alternate between reasoning and precise computation inside one continuous response from the model. This avoids the communication costs of multi-agent systems while outperforming simple fine-tuning of language models on tasks that mix language reasoning with exact numerical calculations. If successful, it provides a way to make language models more reliable for scientific and technical applications that demand both understanding and accuracy.

Core claim

PiERN endogenously integrates computational capabilities into neural networks by separately training experts, a text-to-computation module, and a router, then using the router to direct computation and reasoning at the token level for iterative alternation within a single chain of thought. This yields higher accuracy than direct fine-tuning of LLMs and better efficiency in latency, tokens, and energy than multi-agent approaches on linear and nonlinear computation-reasoning tasks.

What carries the argument

The router that performs token-level decisions to switch between computation experts and reasoning within the model's output sequence.

Load-bearing premise

That training the components separately will result in a combined system where the router's token-level decisions stay stable and accurate during inference without creating new errors.

What would settle it

A test on complex computation-reasoning tasks where PiERN fails to show accuracy gains over fine-tuned LLMs or exhibits higher latency or instability in token routing decisions.

Figures

Figures reproduced from arXiv: 2509.18169 by Chao Lu, Guannan He, Hengbo Xiao, Jingyuan Fan, Jingzhao Zhang, Xin Tong.

**Figure 2.** Figure 2: (a): Training of Expert Model for specific tasks. (b): Training the Text-to-Computation Module for textcomputation alignment (c): Training the Token Router to determine experts for each token. Middle: The overall architecture of PiERN. computation results, achieving the unity of accuracy and interpretability. Meanwhile, runtime dynamic invocation at the token granularity keeps the inference of PiERN effic… view at source ↗

**Figure 3.** Figure 3: Token routing for reasoning-computation inference paradigm in PiERN. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Token usage comparison among PiERN and multi-agent systems with LLMs: [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Token usage decomposition for PiERN and QwQ-32B based multi-agent systems on Non-Linear Task. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: GPU energy consumption comparison between PiERN and multi-agent systems: [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Success Rate between PiERN and multiagent systems on inference tasks. Success Rate. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: The combination of fine-tuning data for non-linear tasks, including time-series current data, time, battery [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: The combination of fine-tuning data for linear tasks, including calculation data related to profit and language [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

read the original abstract

Tasks on complex systems require high-precision numerical computation to support decisions, but current large language models (LLMs) cannot integrate such computations as an intrinsic and interpretable capability with existing architectures. Multi-agent approaches can leverage external experts, but inevitably introduce communication overhead and suffer from inefficiency caused by limited scalability. To this end, we propose Physically-isolated Experts Routing Network (PiERN), an architecture for integrating computation and reasoning. Instead of the tool-use workflows or function-calling, PiERN endogenously integrates computational capabilities into neural networks after separately training experts, a text-to-computation module, and a router. At inference, the router directs computation and reasoning at the token level, thereby enabling iterative alternation within a single chain of thought. We evaluate PiERN on representative linear and nonlinear computation-reasoning tasks against LLM finetuning and the multi-agent system approaches. Results show that the PiERN architecture achieves not only higher accuracy than directly finetuning LLMs but also significant improvements in response latency, token usage, and GPU energy consumption compared with mainstream multi-agent approaches. PiERN offers an efficient, interpretable, and scalable paradigm for interfacing language models with scientific systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PiERN routes at the token level between separately trained reasoning and high-precision computation experts to avoid multi-agent overhead, but the integration stability looks under-tested.

read the letter

PiERN's core move is to train computation experts, a text-to-computation converter, and a router on their own, then let the router flip between them at the token level inside one chain of thought. This is meant to give LLMs native high-precision numerical steps without external tool calls or agent handoffs. The abstract positions it as more accurate than plain fine-tuning and faster, lighter, and less power-hungry than multi-agent baselines on linear and nonlinear tasks. That framing is useful because it directly targets the communication cost that usually kills these hybrids in practice. If the efficiency numbers are real, the approach could matter for anyone wiring models into simulation or control loops. The token-level alternation also keeps the reasoning trace more interpretable than black-box function calls. The paper earns credit for spelling out a concrete alternative to the usual tool-use or mixture-of-experts patterns. The soft spots sit in the training and inference handoff. Separate training keeps things modular, but it leaves the router without direct feedback from how its decisions affect downstream computation accuracy. A modest routing error rate could compound across iterative steps and erase the claimed gains; the abstract gives no sign of joint fine-tuning, routing-error ablations, or recovery mechanisms. The results section is also thin on specifics—no dataset sizes, no error bars, no breakdown of where the accuracy lift comes from versus where the latency savings come from. Without those, the efficiency story is hard to trust at face value. This is the kind of paper that belongs in a reading group focused on practical LLM-scientific computing hybrids. Readers who already work on routing or hybrid architectures will get the most out of the design choices. It is coherent enough and the problem is live enough that a serious editor should send it to referees rather than desk-reject it. The experiments will need work, but the architecture itself is worth a proper look.

Referee Report

2 major / 2 minor

Summary. The paper proposes the Physically-isolated Experts Routing Network (PiERN), an architecture for integrating high-precision numerical computation with reasoning in LLMs. Computation experts, a text-to-computation module, and a router are trained separately; at inference the router performs token-level routing to enable iterative alternation between computation and reasoning inside a single chain of thought. The authors evaluate the approach on linear and nonlinear computation-reasoning tasks and claim higher accuracy than direct LLM finetuning together with lower latency, token usage, and GPU energy consumption than mainstream multi-agent systems.

Significance. If the empirical results prove robust, PiERN would offer a practical, lower-overhead alternative to multi-agent tool-use pipelines for tasks that require both symbolic reasoning and precise numerical computation. The token-level endogenous routing is a distinctive design choice that could improve interpretability and scalability; the separate-training strategy is a pragmatic engineering decision whose stability must still be demonstrated.

major comments (2)

[§3] §3 (Architecture): The central claim that separately trained experts, text-to-computation module, and router integrate into a stable system at inference rests on an unverified assumption. No analysis of routing-error propagation, joint fine-tuning, or post-training alignment is provided, yet even modest mis-routing could cascade into incorrect high-precision results and undermine both the accuracy and efficiency claims.
[§4] §4 (Experiments): The reported accuracy and efficiency gains are presented without error bars, number of random seeds, dataset sizes, or statistical significance tests. This information is load-bearing for the claim that PiERN outperforms both finetuned LLMs and multi-agent baselines; its absence prevents assessment of whether the improvements are reliable or sensitive to post-hoc experimental choices.

minor comments (2)

[Abstract] Abstract: The phrase 'significant improvements' is used without any numerical values or effect sizes, which would help readers gauge the practical magnitude of the reported gains.
[§3] Notation: The distinction between 'computation experts' and the 'text-to-computation module' is introduced without a clear diagram or pseudocode, making the token-level routing flow harder to follow on first reading.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment point by point below, indicating the revisions we will make to strengthen the presentation of the architecture and experimental results.

read point-by-point responses

Referee: [§3] §3 (Architecture): The central claim that separately trained experts, text-to-computation module, and router integrate into a stable system at inference rests on an unverified assumption. No analysis of routing-error propagation, joint fine-tuning, or post-training alignment is provided, yet even modest mis-routing could cascade into incorrect high-precision results and undermine both the accuracy and efficiency claims.

Authors: We acknowledge that the manuscript does not contain an explicit analysis of routing-error propagation, joint fine-tuning, or post-training alignment. The separate-training strategy was deliberately chosen to preserve modularity, allowing each component (experts, text-to-computation module, and router) to be optimized independently before integration at inference. The empirical results across linear and nonlinear tasks show that PiERN achieves higher accuracy than fine-tuned baselines without observable cascading failures, providing indirect evidence of practical stability. To directly address the concern, we will add a dedicated discussion subsection in §3 on potential error propagation pathways and include a new ablation study that measures routing accuracy and its downstream effect on final computation-reasoning outcomes. revision: yes
Referee: [§4] §4 (Experiments): The reported accuracy and efficiency gains are presented without error bars, number of random seeds, dataset sizes, or statistical significance tests. This information is load-bearing for the claim that PiERN outperforms both finetuned LLMs and multi-agent baselines; its absence prevents assessment of whether the improvements are reliable or sensitive to post-hoc experimental choices.

Authors: We agree that the absence of these statistical details limits the ability to evaluate result reliability. The current manuscript omitted error bars, seed counts, exact dataset sizes, and significance tests, which was an oversight. In the revised version we will report all results with standard deviations computed over multiple random seeds, explicitly state the dataset sizes used for each task, and include statistical significance tests (e.g., paired t-tests with p-values) comparing PiERN against the fine-tuning and multi-agent baselines. revision: yes

Circularity Check

0 steps flagged

No significant circularity in PiERN empirical architecture evaluation

full rationale

The paper introduces the PiERN architecture for token-level routing between computation experts and reasoning in LLMs, with claims resting on empirical comparisons of accuracy, latency, token usage, and energy against finetuning and multi-agent baselines. The abstract and description detail separate training of experts, text-to-computation module, and router, followed by inference-time alternation in a single CoT chain, but present no equations, derivations, or fitted parameters that reduce reported gains to quantities defined by construction from the same inputs. No self-citation chains, uniqueness theorems, or ansatzes are invoked to force the central results; performance metrics are treated as experimental outcomes rather than mathematically entailed by the architecture definition itself. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The proposal rests on the unstated premise that separate pre-training of experts and router yields stable token-level switching at inference; no free parameters, axioms, or invented entities are quantified in the abstract.

invented entities (1)

Physically-isolated Experts Routing Network (PiERN) no independent evidence
purpose: Endogenous integration of computation and reasoning via token-level routing
New architecture introduced in the abstract; no independent evidence supplied.

pith-pipeline@v0.9.0 · 5753 in / 1209 out tokens · 43093 ms · 2026-05-18T16:01:35.654442+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PiERN endogenously integrates computational capabilities into neural networks after separately training experts, a text-to-computation module, and a router. At inference, the router directs computation and reasoning at the token level
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

stepwise training method that decouples the training processes of the high-precision scientific computation experts, the text-to-computation module, and the token router

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 4 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Junyi Chen, Longteng Guo, Jia Sun, Shuai Shao, Zehuan Yuan, Liang Lin, and Dongyu Zhang

URLhttps://arxiv.org/abs/2411.16955. Junyi Chen, Longteng Guo, Jia Sun, Shuai Shao, Zehuan Yuan, Liang Lin, and Dongyu Zhang. Eve: Efficient vision- language pre-training with masked prediction and modality-aware moe. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 1110–1119, 2024a. Weize Chen, Jiarui Yuan, Chen Qian, Cheng...

work page arXiv 2012
[3]

9 Jakob N

doi:10.48550/arXiv.2410.13857. 9 Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi- agent reinforcement learning. InProceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 2145–2153, Red Hook, NY , USA,

work page doi:10.48550/arxiv.2410.13857
[4]

GPT-4o System Card

ISSN 1558-2868. doi:10.1145/3703155. URLhttp://dx.doi.org/10.1145/3703155. Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3703155
[5]

Mmad: A comprehensive benchmark for multimodal large language models in industrial anomaly detection, 2025a

Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou, Jialin Li, Chengjie Wang, and Feng Zheng. Mmad: A comprehensive benchmark for multimodal large language models in industrial anomaly detection, 2025a. URLhttps://arxiv.org/abs/2410.09453. Zhuoxuan Jiang, Haoyuan Peng, Shanshan Feng, Fan Li, and Dongsheng Li. Llms can find mathematical rea...

work page arXiv
[6]

doi:10.1111/1467-9868.00294

ISSN 1369-7412. doi:10.1111/1467-9868.00294. URLhttps://doi.org/10.1111/1467-9868.00294. Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Prin...

work page doi:10.1111/1467-9868.00294
[7]

InProceedings of the ACM SIGOPS 29th Sym- posium on Operating Systems Principles

Association for Computing Machinery. ISBN 9798400702297. doi:10.1145/3600006.3613165. URL https://doi.org/10.1145/3600006.3613165. Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: commu- nicative agents for "mind" exploration of large language model society. InProceedings of the 37th International Conferen...

work page doi:10.1145/3600006.3613165
[8]

Search-o1: Agentic Search-Enhanced Large Reasoning Models

URL https://arxiv.org/abs/2501.05366. Xiaoyuan Li, Wenjie Wang, Moxin Li, Junrong Guo, Yang Zhang, and Fuli Feng. Evaluating mathematical reasoning of large language models: A focus on error identification and correction, 2024a. URL https://arxiv.org/abs/ 2406.00755. Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, and...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

doi:10.1146/annurev-matsci-070218-010015. OpenAI. Hello gpt-4o,

work page doi:10.1146/annurev-matsci-070218-010015
[10]

ISBN 9798331314385

Curran Associates Inc. ISBN 9798331314385. Jingu Qian, Hong Wang, Zekun Li, SHIYANG LI, and Xifeng Yan. Limitations of language models in arithmetic and symbolic induction.ArXiv, abs/2208.05051,

work page arXiv
[11]

URLhttps://arxiv.org/abs/2305.11147. 10 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning,

work page arXiv
[12]

arXiv preprint arXiv:2403.02884 , year=

URLhttps://arxiv.org/abs/2403.02884. V Team, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, et al. Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning,

work page arXiv
[13]

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

URLhttps://arxiv.org/abs/2507.01006. Aäron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.ArXiv, abs/1807.03748,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Nature , author =

doi:10.1038/s41586-023-06221-2. URL https://doi.org/10.1038/ s41586-023-06221-2. Published online 2 August 2023; Issue date 3 August

work page doi:10.1038/s41586-023-06221-2 2023
[15]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang

URL https://arxiv.org/abs/2412.00129. Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. Autogen: Enabling next-gen llm applications via multi-agent conversation framework.arXiv preprint arXiv:2308.08155, 3(4),

work page arXiv
[16]

Haotong Yang, Yi Hu, Shijia Kang, Zhouchen Lin, and Muhan Zhang

URLhttps://arxiv.org/abs/2411.08794. Haotong Yang, Yi Hu, Shijia Kang, Zhouchen Lin, and Muhan Zhang. Number cookbook: Number understanding of language models and how to improve it,

work page arXiv
[17]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao

URLhttps://arxiv.org/abs/2411.03766. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR),

work page arXiv
[18]

Mm-llms: Recent advances in multimodal large language models

URLhttps://arxiv.org/abs/2401.13601. Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, and Xiangyu Yue. Meta- transformer: A unified framework for multimodal learning,

work page arXiv
[19]

MetaTransformer: a unified framework for multimodal learning.arXiv preprint arXiv:2307.10802, 2023b

URLhttps://arxiv.org/abs/2307.10802. 11 A Appendix A.1 Task and Data A.1.1 Data Description of Battery Capacity Prediction Task The data input include two main information. First is the time-series current data, which refers to 11 current values collected over a 2-hour period with a sampling interval of 12 minutes. Second is the time of the to-be-predicte...

work page arXiv

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Junyi Chen, Longteng Guo, Jia Sun, Shuai Shao, Zehuan Yuan, Liang Lin, and Dongyu Zhang

URLhttps://arxiv.org/abs/2411.16955. Junyi Chen, Longteng Guo, Jia Sun, Shuai Shao, Zehuan Yuan, Liang Lin, and Dongyu Zhang. Eve: Efficient vision- language pre-training with masked prediction and modality-aware moe. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 1110–1119, 2024a. Weize Chen, Jiarui Yuan, Chen Qian, Cheng...

work page arXiv 2012

[3] [3]

9 Jakob N

doi:10.48550/arXiv.2410.13857. 9 Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi- agent reinforcement learning. InProceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 2145–2153, Red Hook, NY , USA,

work page doi:10.48550/arxiv.2410.13857

[4] [4]

GPT-4o System Card

ISSN 1558-2868. doi:10.1145/3703155. URLhttp://dx.doi.org/10.1145/3703155. Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card.arXiv preprint arXiv:2410.21276,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1145/3703155

[5] [5]

Mmad: A comprehensive benchmark for multimodal large language models in industrial anomaly detection, 2025a

Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou, Jialin Li, Chengjie Wang, and Feng Zheng. Mmad: A comprehensive benchmark for multimodal large language models in industrial anomaly detection, 2025a. URLhttps://arxiv.org/abs/2410.09453. Zhuoxuan Jiang, Haoyuan Peng, Shanshan Feng, Fan Li, and Dongsheng Li. Llms can find mathematical rea...

work page arXiv

[6] [6]

doi:10.1111/1467-9868.00294

ISSN 1369-7412. doi:10.1111/1467-9868.00294. URLhttps://doi.org/10.1111/1467-9868.00294. Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Prin...

work page doi:10.1111/1467-9868.00294

[7] [7]

InProceedings of the ACM SIGOPS 29th Sym- posium on Operating Systems Principles

Association for Computing Machinery. ISBN 9798400702297. doi:10.1145/3600006.3613165. URL https://doi.org/10.1145/3600006.3613165. Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: commu- nicative agents for "mind" exploration of large language model society. InProceedings of the 37th International Conferen...

work page doi:10.1145/3600006.3613165

[8] [8]

Search-o1: Agentic Search-Enhanced Large Reasoning Models

URL https://arxiv.org/abs/2501.05366. Xiaoyuan Li, Wenjie Wang, Moxin Li, Junrong Guo, Yang Zhang, and Fuli Feng. Evaluating mathematical reasoning of large language models: A focus on error identification and correction, 2024a. URL https://arxiv.org/abs/ 2406.00755. Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, and...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[9] [9]

doi:10.1146/annurev-matsci-070218-010015. OpenAI. Hello gpt-4o,

work page doi:10.1146/annurev-matsci-070218-010015

[10] [10]

ISBN 9798331314385

Curran Associates Inc. ISBN 9798331314385. Jingu Qian, Hong Wang, Zekun Li, SHIYANG LI, and Xifeng Yan. Limitations of language models in arithmetic and symbolic induction.ArXiv, abs/2208.05051,

work page arXiv

[11] [11]

URLhttps://arxiv.org/abs/2305.11147. 10 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning,

work page arXiv

[12] [12]

arXiv preprint arXiv:2403.02884 , year=

URLhttps://arxiv.org/abs/2403.02884. V Team, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, et al. Glm-4.5v and glm-4.1v-thinking: Towards versatile multimodal reasoning with scalable reinforcement learning,

work page arXiv

[13] [13]

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

URLhttps://arxiv.org/abs/2507.01006. Aäron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.ArXiv, abs/1807.03748,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Nature , author =

doi:10.1038/s41586-023-06221-2. URL https://doi.org/10.1038/ s41586-023-06221-2. Published online 2 August 2023; Issue date 3 August

work page doi:10.1038/s41586-023-06221-2 2023

[15] [15]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang

URL https://arxiv.org/abs/2412.00129. Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. Autogen: Enabling next-gen llm applications via multi-agent conversation framework.arXiv preprint arXiv:2308.08155, 3(4),

work page arXiv

[16] [16]

Haotong Yang, Yi Hu, Shijia Kang, Zhouchen Lin, and Muhan Zhang

URLhttps://arxiv.org/abs/2411.08794. Haotong Yang, Yi Hu, Shijia Kang, Zhouchen Lin, and Muhan Zhang. Number cookbook: Number understanding of language models and how to improve it,

work page arXiv

[17] [17]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao

URLhttps://arxiv.org/abs/2411.03766. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR),

work page arXiv

[18] [18]

Mm-llms: Recent advances in multimodal large language models

URLhttps://arxiv.org/abs/2401.13601. Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, and Xiangyu Yue. Meta- transformer: A unified framework for multimodal learning,

work page arXiv

[19] [19]

MetaTransformer: a unified framework for multimodal learning.arXiv preprint arXiv:2307.10802, 2023b

URLhttps://arxiv.org/abs/2307.10802. 11 A Appendix A.1 Task and Data A.1.1 Data Description of Battery Capacity Prediction Task The data input include two main information. First is the time-series current data, which refers to 11 current values collected over a 2-hour period with a sampling interval of 12 minutes. Second is the time of the to-be-predicte...

work page arXiv