arxiv: 2605.08975 · v1 · submitted 2026-05-09 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

Latency Analysis and Optimization of Alpamayo 1 via Efficient Trajectory Generation

Yunseong Jeon , Namcheol Lee , Yoonsu Lee , Jangwoon Park , Sol Ahn , Jong-Chan Kim , Seongsoo Hong

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:47 UTC · model grok-4.3

classification 💻 cs.AI

keywords autonomous drivingend-to-end learningtrajectory predictionreasoning modelslatency optimizationdiffusion modelsAlpamayo 1

0 comments

The pith

Redesigning Alpamayo 1 from multi-reasoning to single-reasoning plus kernel fixes cuts inference latency by 69.23 percent with no loss in trajectory diversity or quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines Alpamayo 1, a reasoning-based end-to-end autonomous driving system that generates multiple trajectories each paired with its own reasoning sequence. It shows that switching to a single shared reasoning sequence across all trajectories preserves the needed diversity of future behaviors while removing redundant computation. A second change removes copy operations and inefficient kernel launches inside the diffusion process used for action generation. Closed-loop and open-loop tests confirm that both changes together deliver the reported latency drop while trajectory quality metrics remain essentially unchanged.

Core claim

Alpamayo 1's multi-reasoning design can be replaced by single-reasoning without meaningful degradation of trajectory diversity, and its diffusion-based action generator can be accelerated by eliminating inter-block copy overhead and inefficient kernel execution, yielding a measured 69.23 percent reduction in inference latency while trajectory diversity and prediction quality stay intact.

What carries the argument

Single-reasoning architecture that shares one reasoning sequence across multiple trajectories together with diffusion kernel changes that remove unnecessary copy operations and inter-block overhead.

If this is right

Single-reasoning maintains measured trajectory diversity and prediction quality across the evaluated open- and closed-loop scenarios.
Eliminating inter-block copy operations and kernel inefficiencies directly reduces diffusion inference time.
The combined optimizations produce a 69.23 percent end-to-end latency reduction without altering output quality.
Reasoning-based E2E systems can jointly optimize architecture and runtime execution for lower latency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Other multi-reasoning E2E driving models may achieve similar efficiency gains by adopting shared reasoning if their scenario coverage matches the tested set.
The latency reduction could enable higher-frequency replanning loops in deployed autonomous vehicles.
Extending the single-reasoning approach to longer-horizon or multi-agent prediction would test whether the diversity preservation generalizes.

Load-bearing premise

That the tested driving scenarios and diversity metrics fully represent the range of future behaviors needed for safe real-world decisions.

What would settle it

An open- or closed-loop test case in which single-reasoning trajectories miss a critical future behavior that multi-reasoning would have included, producing a measurably unsafe control decision.

Figures

Figures reproduced from arXiv: 2605.08975 by Jangwoon Park, Jong-Chan Kim, Namcheol Lee, Seongsoo Hong, Sol Ahn, Yoonsu Lee, Yunseong Jeon.

**Figure 2.** Figure 2: Comparison of multi-reasoning and single-reasoning [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 5.** Figure 5: Relation between number of CoT tokens and decoding [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 4.** Figure 4: Latency analysis for single-reasoning architecture [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Optimizing action generation the multi-reasoning case when the number of trajectories is 1. However, as the number of trajectories increases, all the latency components other than the action generation latency are constant, meaning that there is no additional computation overhead affected by the number of trajectories. Only the action generation latency scales by 3.15, which is the same as the multi-reason… view at source ↗

**Figure 7.** Figure 7: As shown on the left-hand side, a CPU thread launches [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 7.** Figure 7: CUDA graph capture Traj 1 Traj 2 Traj 3 Traj 4 Traj 5 Traj 6 [1] Accelerate to proceed through the intersection since the straight traffic light is green. [2] Yield to the pedestrian in the crosswalk because they are crossing ahead. [3] Keep distance to the cyclist ahead since it is in the same lane. . . . [1] Resume speed after passing the speed bump. [2] Resume speed since the speed bump has been passed.… view at source ↗

**Figure 8.** Figure 8: Trajectory diversity comparison between multi-reasoning and single-reasoning with six trajectories with CoT messages [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Proportion of action generation latency with varying [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: Action generation latency optimization results [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 12.** Figure 12: Overall latency optimization result 13.33 s to 4.10 s) compared to the baseline, demonstrating the effectiveness of combining single-reasoning with efficient action generation. This reduction is critical, as the baseline latency poses a significant barrier to practical deployment, whereas our optimized latency brings VLA-based autonomous driving models closer to real-world applicability. VI. RELATED WORK … view at source ↗

read the original abstract

Reasoning-based end-to-end (E2E) autonomous driving has recently emerged as a promising approach to improving the interpretability of driving decisions as it can generate human-readable reasoning together with predicted trajectories. Such approaches commonly generate multiple trajectories to capture diverse future behaviors, and they fall into two categories: (1) multi-reasoning, where one reasoning sequence is generated per trajectory, and (2) single-reasoning, where a single reasoning is shared across all trajectories. The former offers richer diversity at the cost of redundant computation, while the latter is more efficient but is often assumed to sacrifice diversity. Alpamayo 1, a representative system, adopts the multi-reasoning approach and achieves competitive trajectory prediction performance. However, the efficiency of this design remains largely unexplored, making it a well-motivated subject for investigation. In this paper, we systematically analyze and improve Alpamayo 1 in two ways. First, we reduce inference latency while preserving trajectory diversity by redesigning Alpamayo 1 into a single-reasoning system. Through extensive experiments, we find that replacing multi-reasoning with single-reasoning does not meaningfully degrade trajectory diversity. Second, we accelerate diffusion-based action generation by eliminating inter-block overhead arising from unnecessary copy operations and inefficient kernel execution. Through closed-loop and open-loop experiments, we validate both optimizations, demonstrating a 69.23% reduction in inference latency while maintaining trajectory diversity and prediction quality. These results highlight the importance of jointly analyzing system architecture and runtime execution to improve the efficiency of reasoning-based E2E AD systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Alpamayo 1 can switch to single-reasoning and fix diffusion overhead for a 69% latency cut while keeping diversity, but the evaluation metrics need scrutiny for safety coverage.

read the letter

The main thing here is that Alpamayo 1's multi-reasoning design can be simplified to single-reasoning without a clear loss in trajectory diversity, and the authors also removed some straightforward runtime waste in the diffusion generation step. Together these changes deliver the reported 69.23% drop in inference latency, backed by both closed-loop and open-loop tests that also claim preserved prediction quality. The single-reasoning redesign is the bigger conceptual shift; the diffusion fixes are more engineering cleanup around copy operations and kernel calls. Both feel like honest, targeted improvements on an existing system rather than a full new architecture. The paper does a decent job showing that the efficiency gains are real enough to matter for real-time use, and the experiments at least attempt to check that diversity and quality do not collapse. That combination of architecture tweak plus runtime measurement is the useful part. The soft spot is the thin description of how diversity was actually measured and what scenarios were used. Standard variance or entropy numbers can look fine while still missing low-probability but safety-critical behaviors, and the abstract gives no indication of adversarial testing or failure-mode coverage. If the full paper only reports aggregate scores without showing edge cases or comparing against stronger baselines, the preservation claim stays harder to trust. This work is mainly for people already working on reasoning-based E2E driving stacks who need concrete latency wins on hardware similar to Alpamayo 1. A reader focused on deployment efficiency will get practical ideas; someone looking for general theory or broad benchmarks will find less. The claims are specific and testable enough that it deserves a serious referee who can ask for the missing metric details and scenario breakdowns. I would send it to review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper analyzes latency in Alpamayo 1, a reasoning-based E2E autonomous driving system that generates multiple trajectories. It redesigns the system from multi-reasoning (one reasoning per trajectory) to single-reasoning (shared reasoning) to cut redundant computation while claiming to preserve diversity, and optimizes diffusion-based action generation by removing inter-block copy overhead and inefficient kernels. Closed-loop and open-loop experiments are reported to show a 69.23% inference latency reduction with no meaningful loss in trajectory diversity or prediction quality.

Significance. If the empirical claims hold with adequate validation, the work offers practical value for real-time deployment of interpretable E2E AD systems by showing that architectural simplification and runtime tuning can be combined without performance trade-offs. The absence of circular reasoning or invented parameters is a strength. However, the low level of detail on metrics, baselines, and diversity evaluation in the abstract limits assessment of whether the diversity preservation is robust enough for safety-critical decisions.

major comments (2)

[Abstract] Abstract: the central 69.23% latency reduction claim with preserved diversity and quality is presented without any specific metrics, baselines, diversity measures (e.g., variance, entropy), error bars, or implementation details, preventing verification of the load-bearing experimental validation.
[Experiments] Experiments section: the claim that single-reasoning does not meaningfully degrade trajectory diversity rests on the unproven adequacy of the chosen quantitative metrics and tested scenarios. Standard diversity scores can remain high while missing coverage of rare safety-critical futures (e.g., specific collision-avoidance maneuvers); the manuscript should include adversarial or failure-mode analysis to substantiate this.

minor comments (2)

[Abstract] Abstract: adding at least one key quantitative result (with standard deviation or confidence interval) would improve informativeness without altering length substantially.
[Throughout] Ensure all tables and figures include clear captions, axis labels, and legends; consider adding a reproducibility statement regarding code or data availability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We agree that the abstract would benefit from greater specificity to aid verification of our claims. Regarding the experiments, we will strengthen the discussion of our diversity metrics and their coverage of relevant scenarios. Our point-by-point responses to the major comments are provided below.

read point-by-point responses

Referee: [Abstract] Abstract: the central 69.23% latency reduction claim with preserved diversity and quality is presented without any specific metrics, baselines, diversity measures (e.g., variance, entropy), error bars, or implementation details, preventing verification of the load-bearing experimental validation.

Authors: We concur that the abstract is currently too concise and omits key quantitative details that would allow readers to assess the claims more readily. In the revised version, we will expand the abstract to report the baseline and optimized latency values, specify the diversity measures used in our evaluation, identify the primary baselines (the original multi-reasoning Alpamayo 1), and indicate that results are reported with error bars from repeated runs. These additions will improve verifiability while preserving the abstract's length and focus. revision: yes
Referee: [Experiments] Experiments section: the claim that single-reasoning does not meaningfully degrade trajectory diversity rests on the unproven adequacy of the chosen quantitative metrics and tested scenarios. Standard diversity scores can remain high while missing coverage of rare safety-critical futures (e.g., specific collision-avoidance maneuvers); the manuscript should include adversarial or failure-mode analysis to substantiate this.

Authors: We appreciate the referee's emphasis on the limitations of aggregate diversity metrics. Our evaluation employs quantitative diversity metrics across a broad collection of open-loop and closed-loop scenarios drawn from standard autonomous driving datasets, which encompass varied urban, highway, and edge-case conditions. The absence of degradation in both diversity scores and closed-loop prediction quality provides evidence that critical behaviors are retained. We acknowledge that dedicated adversarial testing of rare failure modes was not performed. In revision, we will add a dedicated paragraph discussing the strengths and potential gaps of our metrics, include qualitative examples of preserved safety-critical maneuvers, and present a brief analysis of performance on the most challenging subsets of our test data. This will be a partial revision that clarifies and extends the existing experimental evidence. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical system redesign and measurement

full rationale

The paper contains no mathematical derivations, equations, fitted parameters, or predictive models. Its central results (69.23% latency reduction and preserved diversity) are obtained directly from runtime measurements on redesigned single-reasoning architecture and kernel optimizations, validated in closed- and open-loop experiments. No self-citations are invoked as load-bearing uniqueness theorems, no ansatzes are smuggled, and no quantity is renamed as a prediction after being fitted to itself. The work is self-contained empirical engineering with external benchmarks (latency, diversity metrics, prediction quality) that do not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical systems optimization paper with no mathematical derivations or new theoretical constructs. No free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5610 in / 1300 out tokens · 74625 ms · 2026-05-12T01:47:03.606230+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We redesign Alpamayo into a single-reasoning architecture... eliminate inter-block overhead arising from unnecessary copy operations and inefficient kernel execution.
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

69.23% reduction in inference latency while maintaining trajectory diversity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 8 internal anchors

[1]

End-to-end driving via conditional imitation learn- ing,

F. Codevillaet al., “End-to-end driving via conditional imitation learn- ing,” arXiv preprint, 2018. [Online], Available: https://arxiv.org/abs/ 1710.02410 [Accessed: March 25, 2026]

work page arXiv 2018
[2]

Jiang, S

B. Jianget al., “Vad: Vectorized scene representation for efficient autonomous driving,” arXiv preprint, 2023. [Online], Available: https: //arxiv.org/abs/2303.12077 [Accessed: March 25, 2026]

work page arXiv 2023
[3]

Transfuser: Imitation with transformer-based sensor fu- sion for autonomous driving,

K. Chittaet al., “Transfuser: Imitation with transformer-based sensor fu- sion for autonomous driving,” arXiv preprint, 2022. [Online], Available: https://arxiv.org/abs/2205.15997 [Accessed: March 25, 2026]

work page arXiv 2022
[4]

Drivetransformer: Unified transformer for scalable end-to-end autonomous driving.arXiv preprint arXiv:2503.07656,

X. Jiaet al., “Drivetransformer: Unified transformer for scalable end- to-end autonomous driving,” arXiv preprint, 2025. [Online], Available: https://arxiv.org/abs/2503.07656 [Accessed: March 25, 2026]

work page arXiv 2025
[5]

Planning-oriented autonomous driving,

Y . Huet al., “Planning-oriented autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

work page 2023
[6]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving.arXiv preprint arXiv:2411.15139,

B. Liaoet al., “Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving,” arXiv preprint, 2025. [Online], Available: https: //arxiv.org/abs/2411.15139 [Accessed: March 25, 2026]

work page arXiv 2025
[7]

Alpamayo-R1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail,

Y . Wanget al., “Alpamayo-R1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail,” arXiv preprint,

work page
[8]

[Online], Available: https://arxiv.org/abs/2511.00088 [Accessed: March 25, 2026]

work page arXiv 2026
[9]

EMMA: End-to-end multimodal model for au- tonomous driving,

J.-J. Hwanget al., “EMMA: End-to-end multimodal model for au- tonomous driving,” arXiv preprint, 2025. [Online], Available: https: //arxiv.org/abs/2410.2326 [Accessed: March 25, 2026]

work page arXiv 2025
[10]

DriveGPT4: Interpretable end-to-end autonomous driving via large language model,

Z. Xuet al., “DriveGPT4: Interpretable end-to-end autonomous driving via large language model,”IEEE Robotics and Automation Letters, 2024

work page 2024
[11]

GPT-Driver: Learning to drive with gpt,

J. Maoet al., “GPT-Driver: Learning to drive with gpt,” arXiv preprint,

work page
[12]

[Online], Available: https://arxiv.org/abs/2310.01415 [Accessed: March 25, 2026]

work page arXiv 2026
[13]

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

X. Tianet al., “DriveVLM: The convergence of autonomous driving and large vision-language models,” arXiv preprint, 2024. [Online], Available: https://arxiv.org/abs/2402.12289 [Accessed: March 25, 2026]

work page internal anchor Pith review arXiv 2024
[14]

Senna: Bridging large vision-language models and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024a

B. Jianget al., “Senna: Bridging large vision-language models and end- to-end autonomous driving,” arXiv preprint, 2024. [Online], Available: https://arxiv.org/abs/2410.22313 [Accessed: March 25, 2026]

work page arXiv 2024
[15]

AutoVLA: A vision-language-action model for end- to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning,

Z. Zhouet al., “AutoVLA: A vision-language-action model for end- to-end autonomous driving with adaptive reasoning and reinforcement fine-tuning,” inAdvances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[16]

AutoDrive-R 2: Incentivizing reasoning and self- reflection capacity for vla model in autonomous driving,

Z. Yuanet al., “AutoDrive-R 2: Incentivizing reasoning and self- reflection capacity for vla model in autonomous driving,” arXiv preprint,

work page
[17]

[Online], Available: https://arxiv.org/abs/2509.01944 [Accessed: March 25, 2026], 2025

work page internal anchor Pith review Pith/arXiv arXiv 2026
[18]

2509.13769 , archivePrefix =

Y . Luoet al., “AdaThinkDrive: Adaptive thinking via reinforcement learning for autonomous driving,” arXiv preprint, 2025. [Online], Avail- able: https://arxiv.org/abs/2509.13769 [Accessed: March 25, 2026]

work page arXiv 2025
[19]

Futuresightdrive: Thinking visually with spatio-temporal cot for autonomous driving,

S. Zenget al., “FutureSightDrive: Thinking visually with spatio- temporal cot for autonomous driving,” arXiv preprint, 2025. [Online], Available: https://arxiv.org/abs/2505.17685 [Accessed: March 25, 2026]

work page arXiv 2025
[20]

ReasonPlan: Unified scene prediction and decision reasoning for closed-loop autonomous driving,

X. Liu,et al., “ReasonPlan: Unified scene prediction and decision reasoning for closed-loop autonomous driving,” in9th Conference on Robot Learning, 2025

work page 2025
[21]

ORION: A holistic end-to-end autonomous driving frame- work by vision-language instructed action generation,

H. Fuet al., “ORION: A holistic end-to-end autonomous driving frame- work by vision-language instructed action generation,” arXiv preprint,

work page
[22]

[Online], Available: https://arxiv.org/abs/2503.19755 [Accessed: March 25, 2026]

work page arXiv 2026
[23]

Diffvla: Vision-language guided diffusion planning for autonomous driving.arXiv preprint arXiv:2505.19381, 2025

A. Jianget al., “DiffVLA: Vision-language guided diffusion planning for autonomous driving,” arXiv preprint, 2025. [Online], Available: https: //arxiv.org/abs/2505.19381 [Accessed: March 25, 2026]

work page arXiv 2025
[24]

IRL-VLA: Training an vision-language-action policy via reward world model,

A. Jianget al., “IRL-VLA: Training an vision-language-action policy via reward world model,” arXiv preprint, 2025. [Online], Available: https: //arxiv.org/abs/2508.06571 [Accessed: March 25, 2026]

work page arXiv 2025
[25]

ColaVLA: Leveraging cognitive latent reasoning for hierarchical parallel trajectory planning in autonomous driving,

Q. Penget al., “ColaVLA: Leveraging cognitive latent reasoning for hierarchical parallel trajectory planning in autonomous driving,” arXiv preprint, 2026. [Online], Available: https://arxiv.org/abs/2512. 22939 [Accessed: March 25, 2026]

work page 2026
[26]

PhysicalAI autonomous vehicles dataset,

NVIDIA, “PhysicalAI autonomous vehicles dataset,” [Online], Available: https://huggingface.co/datasets/nvidia/ PhysicalAI-Autonomous-Vehicles [Accessed: March 25, 2026]

work page 2026
[27]

AlpaSim: A modular, lightweight, and data-driven research simulator for autonomous driving,

NVlabs, “AlpaSim: A modular, lightweight, and data-driven research simulator for autonomous driving,” 2025. [Online]. Available: https:// github.com/NVlabs/alpasim. [Accessed: Mar. 25, 2026]

work page 2025
[28]

Alpamayo 1,

NVlabs, “Alpamayo 1,” 2025. [Online]. Available: https://github.com/ NVlabs/alpamayo. [Accessed: Mar. 25, 2026]

work page 2025
[29]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Paszkeet al., “PyTorch: An imperative style, high-performance deep learning library,” inAdvances in Neural Information Processing Systems, vol. 32, 2019. [Online]. Available: https://arxiv.org/abs/1912.01703

work page internal anchor Pith review Pith/arXiv arXiv 2019
[30]

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Wolfet al., “Transformers: State-of-the-art natural language processing,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 38–45. [Online]. Available: https://arxiv.org/abs/1910.03771

work page internal anchor Pith review Pith/arXiv arXiv 2020
[31]

Qwen3-VL Technical Report

S. Baiet al., “Qwen3-VL technical report,” arXiv preprint, 2025. [Online], Available: https://arxiv.org/abs/2511.21631 [Accessed: March 25, 2026]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

Cosmos-reason1: From physical common sense to embodied reasoning.arXiv preprint arXiv:2503.15558, 2025

A. Azzoliniet al., “Cosmos-Reason1: From physical common sense to embodied reasoning,” arXiv preprint, 2025. [Online], Available: https: //arxiv.org/abs/2503.15558 [Accessed: March 25, 2026]

work page arXiv 2025
[33]

Qwen2.5 Technical Report

A. Yanget al., “Qwen2.5 technical report,” arXiv preprint, 2025. [Online], Available: https://arxiv.org/abs/2412.15115 [Accessed: March 25, 2026]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[34]

Deepstack: Deeply stacking visual tokens is surprisingly simple and effective for lmms,

L. Menget al., “Deepstack: Deeply stacking visual tokens is surprisingly simple and effective for lmms,” inNeurIPS, 2024

work page 2024
[35]

s1: Simple test-time scaling,

N. Muennighoffet al., “s1: Simple test-time scaling,” arXiv preprint,

work page
[36]

[Online], Available: https://arxiv.org/abs/2501.19393 [Accessed: March 25, 2026]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[37]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

C. Snellet al., “Scaling llm test-time compute optimally can be more effective than scaling model parameters,” arXiv preprint, 2024. [Online], Available: https://arxiv.org/abs/2408.03314 [Accessed: March 25, 2026]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Alpamayo 1.5,

NVlabs, “Alpamayo 1.5,” https://github.com/NVlabs/alpamayo1.5, 2026, accessed: Mar. 25, 2026

work page 2026