pith. sign in

Canonical reference

Process reward models for llm agents: Practical framework and directions

Canonical reference. 83% of citing Pith papers cite this work as background.

8 Pith papers citing it
Background 83% of classified citations

citation-role summary

background 6

citation-polarity summary

years

2026 4 2025 4

roles

background 6

polarities

background 5 unclear 1

representative citing papers

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior performance and up to 67% faster convergence across math, code, and agent benchmarks.

citing papers explorer

Showing 8 of 8 citing papers.