Deep Learning Recommendation Model for Personalization and Recommendation Systems

Maxim Naumov , Dheevatsa Mudigere , Hao-Jun Michael Shi , Jianyu Huang , Narayanan Sundaraman , Jongsoo Park , Xiaodong Wang , Udit Gupta

show 16 more authors

Carole-Jean Wu Alisson G. Azzolini Dmytro Dzhulgakov Andrey Mallevich Ilia Cherniavskii Yinghai Lu Raghuraman Krishnamoorthi Ansha Yu Volodymyr Kondratenko Stephanie Pereira Xianjie Chen Wenlin Chen Vijay Rao Bill Jia Liang Xiong Misha Smelyanskiy

Authors on Pith no claims yet

classification 💻 cs.IR cs.LG

keywords recommendationdeeplearningmodeldlrmmodelsnetworksparallelism

0 comments

read the original abstract

With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 17 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Agentic Recommender System with Hierarchical Belief-State Memory
cs.CL 2026-05 unverdicted novelty 7.0

MARS uses hierarchical memory and LLM planning to achieve 26.4% higher HR@1 on InstructRec benchmarks compared to prior methods.
TENNOR: Trustworthy Execution for Neural Networks through Obliviousness and Retrievals
cs.CR 2026-05 unverdicted novelty 7.0

TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
cs.CR 2026-04 unverdicted novelty 7.0

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation
cs.IR 2026-04 accept novelty 7.0

Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
cs.DC 2026-05 unverdicted novelty 6.0

Chakra introduces a portable, interoperable graph-based execution trace format for distributed ML workloads along with supporting tools to standardize performance benchmarking and software-hardware co-design.
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
cs.LG 2026-05 unverdicted novelty 6.0

LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
cs.LG 2026-05 unverdicted novelty 6.0

LoKA enables practical FP8 use in numerically sensitive large recommendation models via profiling, model adaptations, and runtime kernel orchestration.
LLM Agents Enable User-Governed Personalization Beyond Platform Boundaries
cs.IR 2026-05 unverdicted novelty 6.0

LLM agents enable users to integrate cross-platform and offline data for personalization that outperforms single-platform baselines in proof-of-concept tests.
One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving
cs.DC 2026-05 unverdicted novelty 6.0

HELM adaptively partitions HBM between EMB and KV caches via a three-layer PPO controller and EMB-KV-aware scheduling, reducing P99 latency by 24-38% while achieving 93.5-99.6% SLO satisfaction on production workloads.
RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping
cs.AR 2026-04 unverdicted novelty 6.0

RecFlash uses frequency-based data remapping in NAND flash in-storage computing to improve recommendation inference latency by up to 81% and energy consumption by 91.9% over prior ISC architectures.
Recommender Systems as Control Systems
eess.SY 2026-05 unverdicted novelty 5.0

Modeling recommender systems as control systems shows that time-optimized fairness interventions can improve overall long-term performance rather than merely trading off against utility.
SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data
cs.DC 2026-05 unverdicted novelty 5.0

SURGE achieves fixed-batch throughput for GPU embedding generation on 800M texts across 40k partitions using 12.6x less memory, 68x faster time-to-first-output, and fault tolerance via a streaming two-threshold policy...
Intelligent Elastic Feature Fading: Enabling Model Retrain-Free Feature Efficiency Rollouts at Scale
cs.IR 2026-05 unverdicted novelty 5.0

IEFF enables retrain-free feature efficiency rollouts in ranking systems by elastically controlling feature coverage at serving time, achieving 5x faster rollouts, zero retraining GPU cost, and 50-55% less performance...
SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling
cs.LG 2026-04 unverdicted novelty 5.0

SOLARIS speculatively precomputes user-item latent representations to decouple large-model inference from real-time serving, delivering 0.67% revenue gain when deployed in Meta's ad system.
Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation
cs.IR 2026-04 unverdicted novelty 5.0

SSR uses static random filters and iterative competitive sparse mechanisms to explicitly enforce sparsity in recommendation models, outperforming dense baselines on public and billion-scale industrial datasets.
Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators
cs.AR 2026-04 unverdicted novelty 4.0

Sparse neural networks achieve better area and energy efficiency when executed on dense matrix multiplication accelerators using a Sparse-on-Dense approach than on dedicated sparse accelerators.
A General Framework for Multimodal LLM-Based Multimedia Understanding in Large-Scale Recommendation Systems
cs.IR 2026-05 unverdicted novelty 3.0

A framework integrates MM-LLMs into recommendation systems via caption generation as categorical features, reporting 0.35% offline AUC lift and 0.02% online metric improvement.