High-performance, distributed training of large-scale deep learning recommendation models

· 2021 · arXiv 2104.05158

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

cs.DC · 2023-04-21 · unverdicted · novelty 6.0

PyTorch Fully Sharded Data Parallel enables training of significantly larger models than Distributed Data Parallel with comparable speed and near-linear TFLOPS scaling.

citing papers explorer

Showing 2 of 2 citing papers.

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale cs.LG · 2026-05-11 · unverdicted · none · ref 53 · 2 links
LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel cs.DC · 2023-04-21 · unverdicted · none · ref 20
PyTorch Fully Sharded Data Parallel enables training of significantly larger models than Distributed Data Parallel with comparable speed and near-linear TFLOPS scaling.

High-performance, distributed training of large-scale deep learning recommendation models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer