pith. sign in

hub Canonical reference

The surprising effectiveness of negative reinforcement in llm reasoning

Canonical reference. 80% of citing Pith papers cite this work as background.

18 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 5

citation-polarity summary

years

2026 15 2025 3

roles

background 5

polarities

background 4 unclear 1

representative citing papers

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.

What Is Preference Optimization Doing, and Why?

cs.LG · 2025-11-30 · unverdicted · novelty 5.0

Gradient analysis and ablations show DPO and PPO have different target directions and component roles in preference optimization for LLMs.

citing papers explorer

Showing 18 of 18 citing papers.