pith. machine review for the scientific record. sign in

arXiv preprint arXiv:2509.26626 , year=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

years

2026 4

verdicts

UNVERDICTED 4

representative citing papers

Test-Time Learning with an Evolving Library

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without parameter updates or supervision.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

citing papers explorer

Showing 4 of 4 citing papers.

  • Test-Time Learning with an Evolving Library cs.LG · 2026-05-14 · unverdicted · none · ref 6

    EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without parameter updates or supervision.

  • ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 152

    ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

  • QED-Nano: Teaching a Tiny Model to Prove Hard Theorems cs.AI · 2026-04-06 · unverdicted · none · ref 3

    A 4B model post-trained with SFT, RL, and a reasoning cache surpasses larger open models and approaches proprietary ones on Olympiad proof generation.

  • Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models cs.CL · 2026-04-07 · unverdicted · none · ref 25

    Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.