DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature, 2025

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

ContactPrompt uses part-wise vertex grids and multi-stage part-conditioned reasoning in MLLMs to achieve training-free dense hand contact estimation that outperforms prior supervised methods.

How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning

cs.LG · 2026-05-17 · conditional · novelty 6.0

Mu-GRPO enables substantially more off-policy GRPO training for LLMs via relaxed clipping and negative-advantage veto in large staged batches, matching standard GRPO performance at ~2x training speed.

citing papers explorer

Showing 2 of 2 citing papers.

Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models cs.CV · 2026-05-07 · unverdicted · none · ref 11
ContactPrompt uses part-wise vertex grids and multi-stage part-conditioned reasoning in MLLMs to achieve training-free dense hand contact estimation that outperforms prior supervised methods.
How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning cs.LG · 2026-05-17 · conditional · none · ref 5
Mu-GRPO enables substantially more off-policy GRPO training for LLMs via relaxed clipping and negative-advantage veto in large staged batches, matching standard GRPO performance at ~2x training speed.

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning.Nature, 2025

fields

years

verdicts

representative citing papers

citing papers explorer