Reinforcement learning with sparse re- wards using guidance from offline demonstration.arXiv preprint arXiv:2202.04628,

Rengarajan, D · arXiv 2202.04628

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering

cs.LG · 2026-02-08 · unverdicted · novelty 5.0

AceGRPO trains 30B-parameter LLM agents to achieve 100% valid submissions and competitive performance on MLE-Bench-Lite through evolving data buffers and adaptive task sampling.

citing papers explorer

Showing 1 of 1 citing paper.

AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering cs.LG · 2026-02-08 · unverdicted · none · ref 12
AceGRPO trains 30B-parameter LLM agents to achieve 100% valid submissions and competitive performance on MLE-Bench-Lite through evolving data buffers and adaptive task sampling.

Reinforcement learning with sparse re- wards using guidance from offline demonstration.arXiv preprint arXiv:2202.04628,

fields

years

verdicts

representative citing papers

citing papers explorer