pith. sign in

Dcpo: Dynamic clipping policy optimization

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 2 other 1

citation-polarity summary

years

2026 7 2025 1

polarities

background 2 unclear 1

clear filters

representative citing papers

Revisiting DAgger in the Era of LLM-Agents

cs.LG · 2026-05-13 · conditional · novelty 6.0

DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.

SSPO: Subsentence-level Policy Optimization

cs.CL · 2025-11-06 · unverdicted · novelty 6.0

SSPO computes policy importance ratios at the subsentence level with entropy-adjusted clipping bounds, yielding higher average scores than GRPO and GSPO on math reasoning benchmarks with Qwen models.

citing papers explorer

Showing 1 of 1 citing paper after filters.