pith. sign in

Canonical reference

Process reward models for llm agents: Practical framework and directions

Canonical reference. 83% of citing Pith papers cite this work as background.

11 Pith papers citing it
Background 83% of classified citations

citation-role summary

background 6

citation-polarity summary

years

2026 7 2025 4

roles

background 6

polarities

background 5 unclear 1

clear filters

representative citing papers

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior performance and up to 67% faster convergence across math, code, and agent benchmarks.

Self-evolving LLM agents with in-distribution Optimization

cs.LG · 2026-06-05 · unverdicted · novelty 5.0

Q-Evolve unifies automatic process-reward labeling via advantage estimation and behavior-proximal policy optimization inside an in-distribution RL loop to enable self-evolving LLM agents on interactive tasks.

citing papers explorer

Showing 11 of 11 citing papers.