CoRR , volume =

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov , title =

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

TRN-R1-Zero: Text-rich Network Reasoning via LLMs with Reinforcement Learning Only

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

TRN-R1-Zero is an RL-only post-training method that lets LLMs perform zero-shot node, edge, and graph reasoning on text-rich networks without supervised data or larger-model distillation.

Training Language Models to Self-Correct via Reinforcement Learning

cs.LG · 2024-09-19 · unverdicted · novelty 6.0

SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.

citing papers explorer

Showing 2 of 2 citing papers.

TRN-R1-Zero: Text-rich Network Reasoning via LLMs with Reinforcement Learning Only cs.CL · 2026-04-21 · unverdicted · none · ref 16
TRN-R1-Zero is an RL-only post-training method that lets LLMs perform zero-shot node, edge, and graph reasoning on text-rich networks without supervised data or larger-model distillation.
Training Language Models to Self-Correct via Reinforcement Learning cs.LG · 2024-09-19 · unverdicted · none · ref 279
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.

CoRR , volume =

fields

years

verdicts

representative citing papers

citing papers explorer