pith. machine review for the scientific record. sign in

arxiv: 2512.07461 · v3 · submitted 2025-12-08 · 💻 cs.CL

Recognition: unknown

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Authors on Pith no claims yet
classification 💻 cs.CL
keywords parallelreasoningnativeexecutiongenuinemodelreasonerself-distilled
0
0 comments X
read the original abstract

We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the model from sequential emulation to native parallel cognition through three key innovations: 1) a self-distilled progressive training paradigm that transitions from ``cold-start'' format discovery to strict topological constraints without external supervision; 2) a novel Parallel-Aware Policy Optimization (PAPO) algorithm that optimizes branching policies directly within the execution graph, allowing the model to learn adaptive decomposition via trial and error; and 3) a robust NPR Engine that refactors memory management and flow control of SGLang to enable stable, large-scale parallel RL training. Across eight reasoning benchmarks, NPR trained on Qwen3-4B achieves performance gains of up to 24.5% and inference speedups up to 4.6x. Unlike prior baselines that often fall back to autoregressive decoding, NPR demonstrates 100% genuine parallel execution, establishing a new standard for self-evolving, efficient, and scalable agentic reasoning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LACE: Lattice Attention for Cross-thread Exploration

    cs.AI 2026-04 unverdicted novelty 6.0

    LACE enables parallel reasoning paths in LLMs to communicate via lattice attention and error-correct using synthetic training data, improving accuracy by over 7 points over standard parallel search.

  2. LACE: Lattice Attention for Cross-thread Exploration

    cs.AI 2026-04 unverdicted novelty 5.0

    LACE adds lattice attention to let parallel LLM reasoning threads interact and correct errors, raising accuracy over 7 points versus standard independent sampling.

  3. LACE: Lattice Attention for Cross-thread Exploration

    cs.AI 2026-04 unverdicted novelty 5.0

    LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.