pith. sign in

hub Canonical reference

arXiv preprint arXiv:2506.19767 , year=

Canonical reference. 100% of citing Pith papers cite this work as background.

12 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 5

citation-polarity summary

years

2026 7 2025 5

roles

background 5

polarities

background 5

representative citing papers

Learning Agentic Policy from Action Guidance

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

ActGuide-RL uses human action data as plan-style guidance in mixed-policy RL to overcome exploration barriers in LLM agents, matching SFT+RL performance on search benchmarks without cold-start training.

Near-Future Policy Optimization

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

NPO uses a policy's own near-future checkpoint as auxiliary trajectories to maximize effective learning signal S = Q/V, improving performance from 57.88 to 63.15 on Qwen3-VL-8B-Instruct with GRPO while accelerating convergence.

AIPO: Learning to Reason from Active Interaction

cs.CL · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

AIPO adds active multi-agent consultation (Verify, Knowledge, Reasoning agents) plus custom importance sampling to RLVR training so LLMs expand their reasoning boundary and then operate without the agents.

citing papers explorer

Showing 12 of 12 citing papers.