Transactions on Machine Learning Research , issn=

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models , author= · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Multi-Rollout On-Policy Distillation via Peer Successes and Failures

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

MOPD improves on-policy distillation for LLMs by using peer successes for positive patterns and failures for negative examples to create more informative teacher signals.

GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

GRLO shows RLHF from scratch on 5K open-ended prompts raises average performance from 24.1 to 63.1 across domains on Qwen3-4B-Base using 46x less data and 68x less compute than in-domain RLVR while remaining competitive with heavily post-trained models.

citing papers explorer

Showing 2 of 2 citing papers.

Multi-Rollout On-Policy Distillation via Peer Successes and Failures cs.LG · 2026-05-12 · unverdicted · none · ref 24
MOPD improves on-policy distillation for LLMs by using peer successes for positive patterns and failures for negative examples to create more informative teacher signals.
GRLO: Towards Generalizable Reinforcement Learning in Open-Ended Environments from Zero cs.LG · 2026-05-14 · unverdicted · none · ref 38
GRLO shows RLHF from scratch on 5K open-ended prompts raises average performance from 24.1 to 63.1 across domains on Qwen3-4B-Base using 46x less data and 68x less compute than in-domain RLVR while remaining competitive with heavily post-trained models.

Transactions on Machine Learning Research , issn=

fields

years

verdicts

representative citing papers

citing papers explorer