hub Canonical reference

Azzolini, et al

Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaram, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G · 2019 · cs.IR · arXiv 1906.00091

Canonical reference. 86% of citing Pith papers cite this work as background.

24 Pith papers citing it

Background 86% of classified citations

open full Pith review browse 24 citing papers arXiv PDF

abstract

With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 method 1

citation-polarity summary

background 6 use method 1

representative citing papers

TENNOR: Trustworthy Execution for Neural Networks through Obliviousness and Retrievals

cs.CR · 2026-05-08 · unverdicted · novelty 7.0

TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.

Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading

cs.CR · 2026-04-19 · unverdicted · novelty 7.0

Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.

Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation

cs.IR · 2026-04-04 · accept · novelty 7.0

Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.

Agentic Recommender System with Hierarchical Belief-State Memory

cs.CL · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

MARS uses hierarchical event-preference-profile memory with an LLM-scheduled lifecycle of six operations to achieve state-of-the-art results on InstructRec benchmarks.

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

cs.DC · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Chakra introduces a standardized graph-based execution trace representation for distributed ML workloads along with supporting tools to enable benchmarking, analysis, generation, and co-design across simulators and hardware.

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.

LayerPipe2: Multistage Pipelining and Weight Recompute via Improved Exponential Moving Average for Training Neural Networks

cs.LG · 2025-12-09 · unverdicted · novelty 6.0

LayerPipe2 derives per-layer delay assignments for multistage pipelined training and uses an improved moving average to recompute past weights without explicit storage.

SilverTorch: A Unified Model-based System to Democratize Large-Scale Recommendation on GPUs

cs.IR · 2025-11-18 · unverdicted · novelty 6.0

SilverTorch replaces standalone ANN indexing and filtering with a unified GPU model using a model-based Bloom index and fused Int8 ANN kernel, delivering up to 23.7x throughput and 13.35x cost efficiency gains on industry data.

Learning from Natural Language Feedback for Personalized Question Answering

cs.CL · 2025-08-14 · unverdicted · novelty 6.0

VAC replaces scalar rewards with natural language feedback in an alternating training loop between a feedback model and a policy model, yielding better personalized QA on the LaMP-QA benchmark.

TrainMover: An Interruption-Resilient Runtime for ML Training

cs.DC · 2024-12-17 · unverdicted · novelty 6.0

TrainMover achieves ~20s downtime for interruptions in 1024-GPU LLM training via two-phase delta-based communication setup, communication-free sandboxed warmup, and general standby design, projecting 55% reduction in wasted GPU hours.

LLM Agents Enable User-Governed Personalization Beyond Platform Boundaries

cs.IR · 2026-05-10 · unverdicted · novelty 6.0

LLM agents enable users to integrate cross-platform and offline data for personalization that outperforms single-platform baselines in proof-of-concept tests.

One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving

cs.DC · 2026-05-06 · unverdicted · novelty 6.0

HELM adaptively partitions HBM between EMB and KV caches via a three-layer PPO controller and EMB-KV-aware scheduling, reducing P99 latency by 24-38% while achieving 93.5-99.6% SLO satisfaction on production workloads.

RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping

cs.AR · 2026-04-28 · unverdicted · novelty 6.0

RecFlash uses frequency-based data remapping in NAND flash in-storage computing to improve recommendation inference latency by up to 81% and energy consumption by 91.9% over prior ISC architectures.

LLM Retrieval for Stable and Predictable Ad Recommendations

cs.IR · 2026-05-21 · unverdicted · novelty 5.0

LLM-based semantic retrieval with hierarchical attributes and graph expansion improves stability and predictability in industrial ad recommendation systems.

Make It Long, Keep It Fast: End-to-End 10K Long User Behavior Sequence Modeling for Billion-Scale Douyin Recommendation

cs.LG · 2025-11-08 · unverdicted · novelty 5.0 · 2 refs

Introduces STCA for linear-complexity target-to-history attention, RLB for shared user encoding across targets, and length-extrapolative training to enable end-to-end 10K sequence modeling with observed scaling-law gains and production deployment improvements.

Recommender Systems as Control Systems

eess.SY · 2026-05-02 · unverdicted · novelty 5.0

Modeling recommender systems as control systems shows that time-optimized fairness interventions can improve overall long-term performance rather than merely trading off against utility.

SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data

cs.DC · 2026-05-01 · unverdicted · novelty 5.0

SURGE achieves fixed-batch throughput for GPU embedding generation on 800M texts across 40k partitions using 12.6x less memory, 68x faster time-to-first-output, and fault tolerance via a streaming two-threshold policy with an analytical cost model accurate to 2%.

Intelligent Elastic Feature Fading: Enabling Model Retrain-Free Feature Efficiency Rollouts at Scale

cs.IR · 2026-05-01 · unverdicted · novelty 5.0

IEFF enables retrain-free feature efficiency rollouts in ranking systems by elastically controlling feature coverage at serving time, achieving 5x faster rollouts, zero retraining GPU cost, and 50-55% less performance degradation than abrupt feature removal.

SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling

cs.LG · 2026-04-13 · unverdicted · novelty 5.0

SOLARIS speculatively precomputes user-item latent representations to decouple large-model inference from real-time serving, delivering 0.67% revenue gain when deployed in Meta's ad system.

Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation

cs.IR · 2026-04-09 · unverdicted · novelty 5.0

SSR uses static random filters and iterative competitive sparse mechanisms to explicitly enforce sparsity in recommendation models, outperforming dense baselines on public and billion-scale industrial datasets.

Joint Model Parameter Scaling and Universal-Domain Data Integration for E-commerce Search Ranking

cs.IR · 2026-03-25 · unverdicted · novelty 4.0

UniScale couples entire-space data construction with a hierarchical fusion transformer to improve scaling behavior and deliver 1.70% purchase and 2.04% GMV lifts in large-scale e-commerce search A/B tests.

Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators

cs.AR · 2026-04-29 · unverdicted · novelty 4.0

Sparse neural networks achieve better area and energy efficiency when executed on dense matrix multiplication accelerators using a Sparse-on-Dense approach than on dedicated sparse accelerators.

A General Framework for Multimodal LLM-Based Multimedia Understanding in Large-Scale Recommendation Systems

cs.IR · 2026-05-10 · unverdicted · novelty 3.0

A framework integrates MM-LLMs into recommendation systems via caption generation as categorical features, reporting 0.35% offline AUC lift and 0.02% online metric improvement.

FLUID: From Ephemeral IDs to Multimodal Semantic Codes for Industrial-Scale Livestreaming Recommendation

cs.AI · 2026-05-20

citing papers explorer

Showing 24 of 24 citing papers.

TENNOR: Trustworthy Execution for Neural Networks through Obliviousness and Retrievals cs.CR · 2026-05-08 · unverdicted · none · ref 89
TENNOR enables efficient private training of wide neural networks in TEEs by recasting sparsification as doubly oblivious LSH retrievals and introducing MP-WTA to cut hash table memory by 50x while preserving accuracy.
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading cs.CR · 2026-04-19 · unverdicted · none · ref 177
Privatar uses horizontal frequency partitioning and distribution-aware minimal perturbation to enable private offloading of VR avatar reconstruction, supporting 2.37x more users with modest overhead.
Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation cs.IR · 2026-04-04 · accept · none · ref 40
Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
Agentic Recommender System with Hierarchical Belief-State Memory cs.CL · 2026-05-14 · unverdicted · none · ref 14 · 2 links · internal anchor
MARS uses hierarchical event-preference-profile memory with an LLM-scheduled lifecycle of six operations to achieve state-of-the-art results on InstructRec benchmarks.
MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces cs.DC · 2026-05-11 · unverdicted · none · ref 82 · 2 links · internal anchor
Chakra introduces a standardized graph-based execution trace representation for distributed ML workloads along with supporting tools to enable benchmarking, analysis, generation, and co-design across simulators and hardware.
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale cs.LG · 2026-05-11 · unverdicted · none · ref 57 · 2 links · internal anchor
LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
LayerPipe2: Multistage Pipelining and Weight Recompute via Improved Exponential Moving Average for Training Neural Networks cs.LG · 2025-12-09 · unverdicted · none · ref 5 · internal anchor
LayerPipe2 derives per-layer delay assignments for multistage pipelined training and uses an improved moving average to recompute past weights without explicit storage.
SilverTorch: A Unified Model-based System to Democratize Large-Scale Recommendation on GPUs cs.IR · 2025-11-18 · unverdicted · none · ref 27 · internal anchor
SilverTorch replaces standalone ANN indexing and filtering with a unified GPU model using a model-based Bloom index and fused Int8 ANN kernel, delivering up to 23.7x throughput and 13.35x cost efficiency gains on industry data.
Learning from Natural Language Feedback for Personalized Question Answering cs.CL · 2025-08-14 · unverdicted · none · ref 24 · internal anchor
VAC replaces scalar rewards with natural language feedback in an alternating training loop between a feedback model and a policy model, yielding better personalized QA on the LaMP-QA benchmark.
TrainMover: An Interruption-Resilient Runtime for ML Training cs.DC · 2024-12-17 · unverdicted · none · ref 27 · internal anchor
TrainMover achieves ~20s downtime for interruptions in 1024-GPU LLM training via two-phase delta-based communication setup, communication-free sandboxed warmup, and general standby design, projecting 55% reduction in wasted GPU hours.
LLM Agents Enable User-Governed Personalization Beyond Platform Boundaries cs.IR · 2026-05-10 · unverdicted · none · ref 31
LLM agents enable users to integrate cross-platform and offline data for personalization that outperforms single-platform baselines in proof-of-concept tests.
One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving cs.DC · 2026-05-06 · unverdicted · none · ref 34
HELM adaptively partitions HBM between EMB and KV caches via a three-layer PPO controller and EMB-KV-aware scheduling, reducing P99 latency by 24-38% while achieving 93.5-99.6% SLO satisfaction on production workloads.
RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping cs.AR · 2026-04-28 · unverdicted · none · ref 4
RecFlash uses frequency-based data remapping in NAND flash in-storage computing to improve recommendation inference latency by up to 81% and energy consumption by 91.9% over prior ISC architectures.
LLM Retrieval for Stable and Predictable Ad Recommendations cs.IR · 2026-05-21 · unverdicted · none · ref 1 · internal anchor
LLM-based semantic retrieval with hierarchical attributes and graph expansion improves stability and predictability in industrial ad recommendation systems.
Make It Long, Keep It Fast: End-to-End 10K Long User Behavior Sequence Modeling for Billion-Scale Douyin Recommendation cs.LG · 2025-11-08 · unverdicted · none · ref 25 · 2 links · internal anchor
Introduces STCA for linear-complexity target-to-history attention, RLB for shared user encoding across targets, and length-extrapolative training to enable end-to-end 10K sequence modeling with observed scaling-law gains and production deployment improvements.
Recommender Systems as Control Systems eess.SY · 2026-05-02 · unverdicted · none · ref 26
Modeling recommender systems as control systems shows that time-optimized fairness interventions can improve overall long-term performance rather than merely trading off against utility.
SURGE: SuperBatch Unified Resource-efficient GPU Encoding for Heterogeneous Partitioned Data cs.DC · 2026-05-01 · unverdicted · none · ref 31
SURGE achieves fixed-batch throughput for GPU embedding generation on 800M texts across 40k partitions using 12.6x less memory, 68x faster time-to-first-output, and fault tolerance via a streaming two-threshold policy with an analytical cost model accurate to 2%.
Intelligent Elastic Feature Fading: Enabling Model Retrain-Free Feature Efficiency Rollouts at Scale cs.IR · 2026-05-01 · unverdicted · none · ref 16
IEFF enables retrain-free feature efficiency rollouts in ranking systems by elastically controlling feature coverage at serving time, achieving 5x faster rollouts, zero retraining GPU cost, and 50-55% less performance degradation than abrupt feature removal.
SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling cs.LG · 2026-04-13 · unverdicted · none · ref 31
SOLARIS speculatively precomputes user-item latent representations to decouple large-model inference from real-time serving, delivering 0.67% revenue gain when deployed in Meta's ad system.
Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation cs.IR · 2026-04-09 · unverdicted · none · ref 22
SSR uses static random filters and iterative competitive sparse mechanisms to explicitly enforce sparsity in recommendation models, outperforming dense baselines on public and billion-scale industrial datasets.
Joint Model Parameter Scaling and Universal-Domain Data Integration for E-commerce Search Ranking cs.IR · 2026-03-25 · unverdicted · none · ref 20 · internal anchor
UniScale couples entire-space data construction with a hierarchical fusion transformer to improve scaling behavior and deliver 1.70% purchase and 2.04% GMV lifts in large-scale e-commerce search A/B tests.
Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators cs.AR · 2026-04-29 · unverdicted · none · ref 5
Sparse neural networks achieve better area and energy efficiency when executed on dense matrix multiplication accelerators using a Sparse-on-Dense approach than on dedicated sparse accelerators.
A General Framework for Multimodal LLM-Based Multimedia Understanding in Large-Scale Recommendation Systems cs.IR · 2026-05-10 · unverdicted · none · ref 13
A framework integrates MM-LLMs into recommendation systems via caption generation as categorical features, reporting 0.35% offline AUC lift and 0.02% online metric improvement.
FLUID: From Ephemeral IDs to Multimodal Semantic Codes for Industrial-Scale Livestreaming Recommendation cs.AI · 2026-05-20 · unreviewed · ref 24 · internal anchor

Azzolini, et al

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer