D 6-Batch Pipelined Overlapping Execution In this appendix we present the algorithm of the fine-grained pipeline orchestration

The forward pass of batch (i+ 2) ’s dense model to overlap with the backward pass of batch(i+ 1) ’s sparse model

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

TurboGR: An Accelerated Training System for Large-Scale Generative Recommendation

cs.DC · 2026-05-13 · unverdicted · novelty 6.0

TurboGR trains up to 0.2B-parameter generative recommendation models on Ascend NPUs at 54.71% MFU with 0.97 near-linear scalability via jagged acceleration, hierarchical parallelism, and negative sampling optimizations.

citing papers explorer

Showing 1 of 1 citing paper.

TurboGR: An Accelerated Training System for Large-Scale Generative Recommendation cs.DC · 2026-05-13 · unverdicted · none · ref 21
TurboGR trains up to 0.2B-parameter generative recommendation models on Ascend NPUs at 54.71% MFU with 0.97 near-linear scalability via jagged acceleration, hierarchical parallelism, and negative sampling optimizations.

D 6-Batch Pipelined Overlapping Execution In this appendix we present the algorithm of the fine-grained pipeline orchestration

fields

years

verdicts

representative citing papers

citing papers explorer