R-prm: Reasoning-driven pro- cess reward modeling.arXiv preprint arXiv:2503.21295

R-prm: Reasoning-driven process reward modeling , author= · 2025 · arXiv 2503.21295

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling

cs.AI · 2025-10-16 · unverdicted · novelty 7.0

ToolPRM provides fine-grained intra-call process supervision via a new dataset and reward model, outperforming outcome and coarse-grained alternatives on function-calling benchmarks.

Reinforcement Learning without Ground-Truth Solutions can Improve LLMs

cs.LG · 2026-06-25 · unverdicted · novelty 6.0

RiVER applies calibrated ranking rewards from execution scores to train LLMs on score-based tasks without ground-truth, producing gains on both heuristic contests and exact-solution coding benchmarks.

MARD: Mirror-Augmented Reasoning Distillation for Mechanism-Level Drug-Drug Interaction Prediction

cs.CL · 2026-06-10 · unverdicted · novelty 6.0

MARD-7B outperforms baselines and GPT-4o on novel drug pairs for mechanism-level DDI prediction via a new distillation pipeline with verifiable process rewards and releases all resources.

From Failure to Feedback: Group Revision Unlocks Hard Cases in Object-Level Grounding

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

A group-revision paradigm for GRPO-based RL fine-tuning of VLMs converts failure responses into improvement signals that refine rewards and advantages, yielding gains on referring segmentation, REC, and counting benchmarks.

Pause or Fabricate? Training Language Models for Grounded Reasoning

cs.CL · 2026-04-21 · conditional · novelty 6.0

GRIL uses stage-specific RL rewards to train LLMs to detect missing premises, pause proactively, and resume grounded reasoning after clarification, yielding up to 45% better premise detection and 30% higher task success on insufficient math datasets.

Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing

cs.LG · 2026-04-17 · unverdicted · novelty 6.0

PRJA achieves 83.6% average success injecting harmful content into LRM reasoning chains on five QA datasets without altering final answers.

From System 1 to System 2: A Survey of Reasoning Large Language Models

cs.AI · 2025-02-24 · accept · novelty 3.0

The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.

citing papers explorer

Showing 1 of 1 citing paper after filters.

From System 1 to System 2: A Survey of Reasoning Large Language Models cs.AI · 2025-02-24 · accept · none · ref 166
The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.

R-prm: Reasoning-driven pro- cess reward modeling.arXiv preprint arXiv:2503.21295

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer