sGPO uses an initial-policy success-rate profiling pass to adaptively set rollout group sizes, filter data, and build a curriculum, cutting total RLVR training compute by 3x while matching baseline performance.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Multilevel CNN-LSTM architectures using both late and intermediate feature fusion achieve higher accuracy in human activity recognition than late fusion alone on two benchmark datasets.
citing papers explorer
-
sGPO: Trading Inference FLOPs for Training Efficiency in RLVR
sGPO uses an initial-policy success-rate profiling pass to adaptively set rollout group sizes, filter data, and build a curriculum, cutting total RLVR training compute by 3x while matching baseline performance.
-
Multilevel neural networks with dual-stage feature fusion for human activity recognition
Multilevel CNN-LSTM architectures using both late and intermediate feature fusion achieve higher accuracy in human activity recognition than late fusion alone on two benchmark datasets.