Is Sliding Window All You Need? An Open Framework for Long-Sequence Recommendation

Sayak Chakrabarty; Souradip Pal

arxiv: 2604.12372 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.IR

Is Sliding Window All You Need? An Open Framework for Long-Sequence Recommendation

Sayak Chakrabarty , Souradip Pal This is my paper

Pith reviewed 2026-05-10 15:44 UTC · model grok-4.3

classification 💻 cs.LG cs.IR

keywords sliding windowslong-sequence recommendationrecommender systemsk-shift embeddingopen frameworktraining efficiencyretrieval qualityuser interaction histories

0 comments

The pith

An open framework shows sliding windows make long-sequence recommendation training practical on modest hardware with competitive accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to prove that long user interaction histories, long dismissed as too expensive to train on, can be handled effectively with sliding windows in a fully open pipeline. It releases complete code for data processing, training, and evaluation so that academic labs can run industrial-style long-sequence models without special resources. A new k-shift embedding layer is added to support million-scale vocabularies on ordinary GPUs while keeping accuracy loss negligible. The work reports concrete gains on Retailrocket alongside measured training-time costs, turning a closed technique into something the broader community can extend.

Core claim

We release a complete end-to-end framework that implements industrial-style long-sequence training with sliding windows, including all data processing, training, and evaluation scripts. The framework delivers up to +6.04% MRR and +6.34% Recall@10 on Retailrocket with roughly 4x training-time overhead while running reliably on modest university clusters. A novel k-shift embedding layer enables million-scale vocabularies on commodity GPUs with negligible accuracy loss, and runtime-aware ablations quantify the accuracy-compute frontier across window sizes and strides.

What carries the argument

The sliding-window training pipeline combined with the k-shift embedding layer that packs large vocabularies onto limited GPU memory.

If this is right

Long user histories become usable for training without requiring industrial-scale compute clusters.
Ablation results map clear trade-offs between window size, stride, and retrieval quality.
The k-shift embedding lets models handle item vocabularies of a million or more on standard GPUs.
Training-time overhead stays bounded at approximately four times the cost of shorter baselines.
The full pipeline turns long-sequence methods into an open, reproducible methodology.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sliding-window pattern could be tested on sequence tasks outside recommendation, such as session-based prediction in other domains.
If the k-shift technique generalizes, it offers a practical route for very large vocabularies in any embedding model constrained by GPU memory.
Widespread use of the open code would allow direct comparison of long-sequence efficiency across different public datasets and hardware setups.

Load-bearing premise

The reported accuracy gains and small accuracy penalty from the k-shift embedding hold on datasets other than Retailrocket and the framework runs end-to-end on ordinary hardware without hidden optimizations.

What would settle it

Running the released framework on a second public recommendation dataset such as Amazon reviews or MovieLens and measuring no improvement in MRR or Recall@10 over standard shorter-sequence baselines.

Figures

Figures reproduced from arXiv: 2604.12372 by Sayak Chakrabarty, Souradip Pal.

**Figure 1.** Figure 1: Sliding window training loop 3 Motivation and Approach The key motivation behind this word lies in ensuring transparent, replicable, and extensible recommender-system research for long-range behavioral context in academia. Although the study [5] provides a high-level algorithmic description of the sliding window training technique and performance metrics on a large interaction dataset, we encountered sever… view at source ↗

**Figure 2.** Figure 2: Overview of the RecSys Foundation model architecture used for Sliding Window Training, designed similar [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Long interaction histories are central to modern recommender systems, yet training with long sequences is often dismissed as impractical under realistic memory and latency budgets. This work demonstrates that it is not only practical but also effective-at academic scale. We release a complete, end-to-end framework that implements industrial-style long-sequence training with sliding windows, including all data processing, training, and evaluation scripts. Beyond reproducing prior gains, we contribute two capabilities missing from earlier reports: (i) a runtime-aware ablation study that quantifies the accuracy-compute frontier across windowing regimes and strides, and (ii) a novel k-shift embedding layer that enables million-scale vocabularies on commodity GPUs with negligible accuracy loss. Our implementation trains reliably on modest university clusters while delivering competitive retrieval quality (e.g., up to +6.04% MRR and +6.34% Recall@10 on Retailrocket) with $\sim 4 \times $ training-time overheads. By packaging a robust pipeline, reporting training time costs, and introducing an embedding mechanism tailored for low-resource settings, we transform long-sequence training from a closed, industrial technique into a practical, open, and extensible methodology for the community.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper packages an open sliding-window framework for long-sequence recsys plus a k-shift embedding for large vocabularies, with useful runtime ablations, but the gains are modest and shown on only one dataset.

read the letter

The main thing here is a complete open framework that brings industrial-style sliding-window training for long user histories down to academic hardware, including data pipelines, training, and eval scripts. They add a k-shift embedding layer that handles million-scale item sets on commodity GPUs with negligible accuracy loss, and they run a runtime-aware ablation across window sizes and strides that actually reports training-time costs. On Retailrocket they get up to +6% MRR and Recall@10 with roughly 4x overhead, which is concrete and reproducible by design. That openness and the compute measurements are the parts that actually add value; prior sliding-window work existed but rarely came with end-to-end code and explicit overhead numbers at this scale. The soft spots are straightforward. Everything rests on a single public dataset, so we do not yet know whether the k-shift trick or the reported gains hold up elsewhere. The improvements are incremental rather than large, and without the full baseline details or statistical checks it is hard to judge how much comes from the windowing versus other implementation choices. The paper does not claim theoretical novelty for sliding windows themselves, which is honest. This is for recsys people who want to try longer histories without reinventing the data pipeline or guessing at compute costs. A reader who values reproducible engineering work will get something usable from it. It deserves peer review because the framework itself could be a practical reference point for the community, even if later work will need to test it more broadly.

Referee Report

2 major / 3 minor

Summary. The paper claims that long-sequence training for recommender systems is practical at academic scale via sliding windows. It releases a complete open-source end-to-end framework (data processing, training, evaluation) that reproduces prior gains while adding a runtime-aware ablation across window sizes/strides and a novel k-shift embedding layer supporting million-scale vocabularies on commodity GPUs with negligible accuracy loss. On Retailrocket it reports up to +6.04% MRR and +6.34% Recall@10 with ~4× training-time overhead relative to shorter-sequence baselines.

Significance. If the reported gains, overhead measurements, and reproducibility claims hold, the work would be a useful engineering contribution by converting an industrial technique into an accessible, extensible open framework with explicit cost reporting and a memory-efficient embedding tailored for low-resource settings. The combination of public code, ablation on the accuracy-compute frontier, and the k-shift mechanism addresses a practical barrier in the long-sequence recommendation literature.

major comments (2)

[§4] §4 (Ablation study): The runtime-aware ablation quantifies the accuracy-compute frontier but does not include a direct comparison against memory-efficient alternatives to sliding windows (e.g., gradient checkpointing on full sequences or sparse attention); without this, the claim that sliding windows are sufficient cannot be fully evaluated against the broader design space.
[§5] §5 (Results on Retailrocket): All quantitative gains (+6.04% MRR, +6.34% Recall@10) and the negligible-loss claim for the k-shift embedding are reported on a single public dataset; the central claim that the framework delivers competitive quality with modest overhead would be strengthened by results on at least one additional dataset with different sequence-length statistics.

minor comments (3)

[Abstract / §1] The abstract states '∼4× training-time overheads' but the exact baseline (window size, stride, and hardware) used for this multiplier is not restated in the introduction or experimental setup; a one-sentence clarification would improve readability.
[§3] Notation for the k-shift embedding (definition of the shift parameter k and how it interacts with the vocabulary embedding matrix) is introduced in §3 but never summarized in a single equation; adding Eq. (X) would make the mechanism easier to implement from the text alone.
[Figures / Tables in §4] Table captions and axis labels in the ablation figures do not explicitly state whether reported times include data loading or only forward/backward passes; this detail affects interpretation of the 4× overhead figure.

Simulated Author's Rebuttal

2 responses · 1 unresolved

Thank you for the positive assessment and recommendation for minor revision. We appreciate the constructive comments on the ablation study and experimental scope. We address each major comment below and will make targeted revisions to strengthen the manuscript where possible.

read point-by-point responses

Referee: §4 (Ablation study): The runtime-aware ablation quantifies the accuracy-compute frontier but does not include a direct comparison against memory-efficient alternatives to sliding windows (e.g., gradient checkpointing on full sequences or sparse attention); without this, the claim that sliding windows are sufficient cannot be fully evaluated against the broader design space.

Authors: We agree that situating sliding windows against other memory-efficient techniques provides useful context. Our central claim is that sliding-window training is practical and accessible at academic scale via an open framework, not that it is the only or optimal solution in the broader design space. The §4 ablation specifically maps the accuracy-compute frontier for window sizes and strides under realistic runtime constraints. We will add a concise discussion paragraph to §4 that conceptually contrasts sliding windows with gradient checkpointing (noting its training-time overhead) and sparse attention (noting implementation complexity on commodity hardware), while emphasizing that our released code enables direct comparisons by the community. This revision clarifies positioning without requiring new experiments. revision: partial
Referee: §5 (Results on Retailrocket): All quantitative gains (+6.04% MRR, +6.34% Recall@10) and the negligible-loss claim for the k-shift embedding are reported on a single public dataset; the central claim that the framework delivers competitive quality with modest overhead would be strengthened by results on at least one additional dataset with different sequence-length statistics.

Authors: We acknowledge that results across multiple datasets with varying sequence-length distributions would further support generalizability. Retailrocket was selected because it exhibits the long interaction histories central to the paper's motivation and is a standard public benchmark in the literature. The framework (data pipeline, k-shift embedding, and training scripts) is intentionally dataset-agnostic, and the full code release allows straightforward extension to other corpora. We will revise the manuscript to include an expanded discussion in §5 and the conclusion on dataset choice, sequence statistics, and expected applicability to other domains. However, we do not have ready results on a second dataset. revision: partial

standing simulated objections not resolved

We are unable to provide new experimental results on additional datasets beyond Retailrocket at this time.

Circularity Check

0 steps flagged

No significant circularity; empirical framework release with independent validation

full rationale

The paper's core claims rest on releasing an open implementation of sliding-window long-sequence training plus a k-shift embedding, validated through runtime ablations and metrics (MRR, Recall@10) on the public Retailrocket dataset. No equations, first-principles derivations, or predictions are presented that reduce to fitted inputs by construction. Prior gains are reproduced rather than derived; the k-shift mechanism is introduced as a novel engineering contribution with reported negligible accuracy loss, not as a fitted parameter renamed as a prediction. Self-citations, if present, are not load-bearing for the central empirical results, which remain falsifiable via the released code and data. The work is self-contained as an engineering artifact.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The work is primarily empirical and engineering-focused; no explicit free parameters, axioms, or invented physical entities are described in the abstract.

invented entities (1)

k-shift embedding layer no independent evidence
purpose: Enables handling of million-scale vocabularies on commodity GPUs with negligible accuracy loss
Presented as a novel technical contribution in the abstract; no independent evidence or external validation provided.

pith-pipeline@v0.9.0 · 5507 in / 1174 out tokens · 23126 ms · 2026-05-10T15:44:00.083991+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)

Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). InProceedings of the 16th ACM Conference on Recommender Systems, RecSys ’22, page 299–315, New York, NY , USA, 2022. Association for Computing Machinery

work page 2022
[2]

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training, 2024

Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, and Sujian Li. PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training, 2024

work page 2024
[3]

Actions speak louder than words: trillion-parameter sequential transducers for generative recommendations

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Jiayuan He, Yinghai Lu, and Yu Shi. Actions speak louder than words: trillion-parameter sequential transducers for generative recommendations. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024. 7 APREPRINT

work page 2024
[4]

On the consistency of maximum likelihood estimation of probabilistic principal component analysis

Arghya Datta and Sayak Chakrabarty. On the consistency of maximum likelihood estimation of probabilistic principal component analysis. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

work page 2023
[5]

Sliding window training - utilizing historical recommender systems data for foundation models

Swanand Joshi, Yesu Feng, Ko-Jen Hsiao, Zhe Zhang, and Sudarshan Lamkhede. Sliding window training - utilizing historical recommender systems data for foundation models. InProceedings of the 18th ACM Conference on Recommender Systems, RecSys ’24, page 835–837, New York, NY , USA, 2024. Association for Computing Machinery

work page 2024
[6]

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, 2024

Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang. LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, 2024

work page 2024
[7]

Single-pass pivot algorithm for correlation clustering

Konstantin Makarychev and Sayak Chakrabarty. Single-pass pivot algorithm for correlation clustering. keep it simple!Advances in Neural Information Processing Systems, 36:6412–6421, 2023

work page 2023
[8]

ReadmeReady: Free and Customizable Code Documentation with LLMs - A Fine-Tuning Approach.Journal of Open Source Software, 10(108):7489, 2025

Sayak Chakrabarty and Souradip Pal. ReadmeReady: Free and Customizable Code Documentation with LLMs - A Fine-Tuning Approach.Journal of Open Source Software, 10(108):7489, 2025

work page 2025
[9]

S.} Subrahmanian

Youzhi Zhang, Sayak Chakrabarty, Rui Liu, Andrea Pugliese, and {V . S.} Subrahmanian. Sockdef: A dynamically adaptive defense to a novel attack on review fraud detection engines.IEEE Transactions on Computational Social Systems, 11(4):5253–5265, 2024. Publisher Copyright: IEEE

work page 2024
[11]

Judicial support tool: Finding the k most likely judicial worlds

Maksim Bolonkin, Sayak Chakrabarty, Cristian Molinaro, and VS Subrahmanian. Judicial support tool: Finding the k most likely judicial worlds. InInternational Conference on Scalable Uncertainty Management, pages 53–69. Springer, 2024

work page 2024
[12]

Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce

Sayak Chakrabarty and Souradip Pal. Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce.arXiv preprint arXiv:2512.13726, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[13]

MM-PoE: Multiple Choice Reasoning via

Sayak Chakrabarty and Souradip Pal. MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models.Journal of Open Source Software, 10(108):7783, 2025

work page 2025
[14]

Jinming Li, Wentao Zhang, Tiantian Wang, Guanglei Xiong, Alan Lu, and Gérard G. Medioni. GPT4Rec: A generative framework for personalized recommendation and user interests interpretation.ArXiv, abs/2304.03879, 2023

work page arXiv 2023
[15]

Pixrec: Leveraging visual context for next-item prediction in sequential recommendation.arXiv preprint arXiv:2601.06458, 2026

Sayak Chakrabarty and Souradip Pal. PixRec: Leveraging Visual Context for Next-Item Prediction in Sequential Recommendation.arXiv preprint arXiv:2601.06458, 2026

work page arXiv 2026
[16]

Bert4rec: Sequential Rec- ommendation with Bidirectional Encoder Representations from Transformer

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. Bert4rec: Sequential Rec- ommendation with Bidirectional Encoder Representations from Transformer. InProceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, page 1441–1450, New York, NY , USA, 2019. Association for Computing Machinery

work page 2019
[17]

Springer US, Boston, MA, 2009

Nick Craswell.Mean Reciprocal Rank, pages 1703–1703. Springer US, Boston, MA, 2009

work page 2009
[18]

Taobao User Purchase Behavior Prediction And Feature Analysis Based On Ensemble Learning

Yang Chengjie and Qi Wei. Taobao User Purchase Behavior Prediction And Feature Analysis Based On Ensemble Learning. In2023 IEEE International Conference on e-Business Engineering (ICEBE), pages 205–209, 2023

work page 2023
[19]

The trade-offs of model size in large recommendation models : A 10000×compressed criteo-tb DLRM model (100 GB parameters to mere 10MB), 2022

Aditya Desai and Anshumali Shrivastava. The trade-offs of model size in large recommendation models : A 10000×compressed criteo-tb DLRM model (100 GB parameters to mere 10MB), 2022

work page 2022
[20]

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems.CoRR, abs/1909.02107, 2019

Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, and Jiyan Yang. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems.CoRR, abs/1909.02107, 2019. 8

work page arXiv 1909

[1] [1]

Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)

Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). InProceedings of the 16th ACM Conference on Recommender Systems, RecSys ’22, page 299–315, New York, NY , USA, 2022. Association for Computing Machinery

work page 2022

[2] [2]

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training, 2024

Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, and Sujian Li. PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training, 2024

work page 2024

[3] [3]

Actions speak louder than words: trillion-parameter sequential transducers for generative recommendations

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Jiayuan He, Yinghai Lu, and Yu Shi. Actions speak louder than words: trillion-parameter sequential transducers for generative recommendations. InProceedings of the 41st International Conference on Machine Learning, ICML’24. JMLR.org, 2024. 7 APREPRINT

work page 2024

[4] [4]

On the consistency of maximum likelihood estimation of probabilistic principal component analysis

Arghya Datta and Sayak Chakrabarty. On the consistency of maximum likelihood estimation of probabilistic principal component analysis. InProceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY , USA, 2023. Curran Associates Inc

work page 2023

[5] [5]

Sliding window training - utilizing historical recommender systems data for foundation models

Swanand Joshi, Yesu Feng, Ko-Jen Hsiao, Zhe Zhang, and Sudarshan Lamkhede. Sliding window training - utilizing historical recommender systems data for foundation models. InProceedings of the 18th ACM Conference on Recommender Systems, RecSys ’24, page 835–837, New York, NY , USA, 2024. Association for Computing Machinery

work page 2024

[6] [6]

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, 2024

Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang. LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, 2024

work page 2024

[7] [7]

Single-pass pivot algorithm for correlation clustering

Konstantin Makarychev and Sayak Chakrabarty. Single-pass pivot algorithm for correlation clustering. keep it simple!Advances in Neural Information Processing Systems, 36:6412–6421, 2023

work page 2023

[8] [8]

ReadmeReady: Free and Customizable Code Documentation with LLMs - A Fine-Tuning Approach.Journal of Open Source Software, 10(108):7489, 2025

Sayak Chakrabarty and Souradip Pal. ReadmeReady: Free and Customizable Code Documentation with LLMs - A Fine-Tuning Approach.Journal of Open Source Software, 10(108):7489, 2025

work page 2025

[9] [9]

S.} Subrahmanian

Youzhi Zhang, Sayak Chakrabarty, Rui Liu, Andrea Pugliese, and {V . S.} Subrahmanian. Sockdef: A dynamically adaptive defense to a novel attack on review fraud detection engines.IEEE Transactions on Computational Social Systems, 11(4):5253–5265, 2024. Publisher Copyright: IEEE

work page 2024

[10] [11]

Judicial support tool: Finding the k most likely judicial worlds

Maksim Bolonkin, Sayak Chakrabarty, Cristian Molinaro, and VS Subrahmanian. Judicial support tool: Finding the k most likely judicial worlds. InInternational Conference on Scalable Uncertainty Management, pages 53–69. Springer, 2024

work page 2024

[11] [12]

Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce

Sayak Chakrabarty and Souradip Pal. Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce.arXiv preprint arXiv:2512.13726, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [13]

MM-PoE: Multiple Choice Reasoning via

Sayak Chakrabarty and Souradip Pal. MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models.Journal of Open Source Software, 10(108):7783, 2025

work page 2025

[13] [14]

Jinming Li, Wentao Zhang, Tiantian Wang, Guanglei Xiong, Alan Lu, and Gérard G. Medioni. GPT4Rec: A generative framework for personalized recommendation and user interests interpretation.ArXiv, abs/2304.03879, 2023

work page arXiv 2023

[14] [15]

Pixrec: Leveraging visual context for next-item prediction in sequential recommendation.arXiv preprint arXiv:2601.06458, 2026

Sayak Chakrabarty and Souradip Pal. PixRec: Leveraging Visual Context for Next-Item Prediction in Sequential Recommendation.arXiv preprint arXiv:2601.06458, 2026

work page arXiv 2026

[15] [16]

Bert4rec: Sequential Rec- ommendation with Bidirectional Encoder Representations from Transformer

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. Bert4rec: Sequential Rec- ommendation with Bidirectional Encoder Representations from Transformer. InProceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, page 1441–1450, New York, NY , USA, 2019. Association for Computing Machinery

work page 2019

[16] [17]

Springer US, Boston, MA, 2009

Nick Craswell.Mean Reciprocal Rank, pages 1703–1703. Springer US, Boston, MA, 2009

work page 2009

[17] [18]

Taobao User Purchase Behavior Prediction And Feature Analysis Based On Ensemble Learning

Yang Chengjie and Qi Wei. Taobao User Purchase Behavior Prediction And Feature Analysis Based On Ensemble Learning. In2023 IEEE International Conference on e-Business Engineering (ICEBE), pages 205–209, 2023

work page 2023

[18] [19]

The trade-offs of model size in large recommendation models : A 10000×compressed criteo-tb DLRM model (100 GB parameters to mere 10MB), 2022

Aditya Desai and Anshumali Shrivastava. The trade-offs of model size in large recommendation models : A 10000×compressed criteo-tb DLRM model (100 GB parameters to mere 10MB), 2022

work page 2022

[19] [20]

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems.CoRR, abs/1909.02107, 2019

Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, and Jiyan Yang. Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems.CoRR, abs/1909.02107, 2019. 8

work page arXiv 1909