S 2r: Teaching llms to self-verify and self-correct via reinforcement learning.arXiv preprint arXiv:2502.12853

Ruotian Ma, Peisong Wang, Cheng Liu, Xingyan Liu, Jiaqi Chen, Bang Zhang, Xin Zhou, Nan Du, Jia Li · 2025 · arXiv 2502.12853

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering

cs.LG · 2026-05-20 · conditional · novelty 7.0

VerifySteer selectively steers hidden states at paragraph boundaries using latent correctness signals to control verifier strictness and outperform baselines on ProcessBench and Hard2Verify with lower compute.

Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

SAri-RFT applies GRPO-based reinforcement fine-tuning to LVLMs on novel two-term and three-term visual semantic arithmetic tasks, reaching SOTA on the new IRPD dataset and Visual7W-Telling.

From System 1 to System 2: A Survey of Reasoning Large Language Models

cs.AI · 2025-02-24 · accept · novelty 3.0

The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

cs.LG · 2026-04-19

SVSR: A Self-Verification and Self-Rectification Paradigm for Multimodal Reasoning

cs.AI · 2026-04-11

citing papers explorer

Showing 5 of 5 citing papers.

The Hidden Signal of Verifier Strictness: Controlling and Improving Step-Wise Verification via Selective Latent Steering cs.LG · 2026-05-20 · conditional · none · ref 11
VerifySteer selectively steers hidden states at paragraph boundaries using latent correctness signals to control verifier strictness and outperform baselines on ProcessBench and Hard2Verify with lower compute.
Multi-modal Reasoning with LLMs for Visual Semantic Arithmetic cs.AI · 2026-04-21 · unverdicted · none · ref 14
SAri-RFT applies GRPO-based reinforcement fine-tuning to LVLMs on novel two-term and three-term visual semantic arithmetic tasks, reaching SOTA on the new IRPD dataset and Visual7W-Telling.
From System 1 to System 2: A Survey of Reasoning Large Language Models cs.AI · 2025-02-24 · accept · none · ref 200
The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.
Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning cs.LG · 2026-04-19 · unreviewed · ref 13
SVSR: A Self-Verification and Self-Rectification Paradigm for Multimodal Reasoning cs.AI · 2026-04-11 · unreviewed · ref 12

S 2r: Teaching llms to self-verify and self-correct via reinforcement learning.arXiv preprint arXiv:2502.12853

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer