BAHSD: Bridging the Long-tail Gap via Adaptive Distillation in Black-box Sequential Recommendation

Famin Wu; Hongyue Zhang; Jiao Dai; Jizhong Han; Mingming Li; Tao Guo; Xi Zhou

arxiv: 2606.03091 · v2 · pith:DB57JBBUnew · submitted 2026-06-02 · 💻 cs.IR · cs.AI

BAHSD: Bridging the Long-tail Gap via Adaptive Distillation in Black-box Sequential Recommendation

Xi Zhou , Famin Wu , Mingming Li , Hongyue Zhang , Jiao Dai , Jizhong Han , Tao Guo This is my paper

Pith reviewed 2026-06-28 08:39 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords black-box model extractionsequential recommendationlong-tail distributionadaptive distillationknowledge distillationconsistency probingtail user performance

0 comments

The pith

BAHSD adapts black-box distillation via multi-scale consistency probing to close the long-tail gap in sequential recommendation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that one-size-fits-all extraction from black-box sequential recommenders is undermined by long-tail signal heterogeneity, where head sequences solidify teacher preferences and tail sequences produce flat noisy outputs. It introduces BAHSD, whose multi-scale consistency probing implicitly gauges reliability from outputs alone, then feeds an adaptive hierarchical objective that applies dynamic-temperature KL divergence to confident signals and ranking consistency plus InfoNCE contrastive learning to noisy ones. If the approach works, proprietary recommendation APIs could be replicated locally with higher fidelity, especially for the sparse users that dominate real data. Readers would care because most deployed systems are black-box and long-tail distributions are the norm rather than the exception.

Core claim

The central claim is that a black-box adaptive hierarchical self-distillation framework (BAHSD) built on multi-scale consistency probing can quantify signal reliability from API outputs without internal states or labels, then apply tailored distillation—dynamic-temperature KL for high-confidence cases and noise-robust ranking plus contrastive terms for low-confidence cases—to outperform uniform baselines, delivering up to 4.98% gain over the teacher model and over 80% improvement on tail users.

What carries the argument

The multi-scale consistency probing mechanism that implicitly quantifies signal reliability from black-box outputs to drive the adaptive hierarchical objective.

If this is right

Local replicas of black-box sequential recommenders achieve higher overall fidelity than uniform extraction methods.
Tail users with sparse histories receive recommendations closer in quality to the original teacher model.
The framework functions as a plug-and-play module using only query access to proprietary APIs.
Dynamic adjustment of the distillation objective reduces overfitting to noise in low-confidence sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same consistency-based reliability proxy could be tested in black-box extraction tasks outside recommendation, such as language or vision models with skewed query distributions.
If probing scores correlate with true reliability, they might guide active query selection to reduce the number of API calls needed for effective extraction.
The method implies that output consistency across scales can substitute for explicit uncertainty estimates when labels are unavailable.

Load-bearing premise

Multi-scale consistency probing can accurately measure the reliability of black-box predictions without ground-truth labels or access to internal model states.

What would settle it

A test dataset in which high consistency scores from the probing mechanism are shown to correspond to low actual prediction accuracy on held-out tail users, yielding no gain or worse performance than standard distillation.

Figures

Figures reproduced from arXiv: 2606.03091 by Famin Wu, Hongyue Zhang, Jiao Dai, Jizhong Han, Mingming Li, Tao Guo, Xi Zhou.

**Figure 2.** Figure 2: Logits distribution analysis on teacher models across user tiers. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The framework of BAHSD. A multi-scale view generator probes the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution shift analysis between teacher and student models. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of truncation ratios on distillation performance. [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Impact of distillation temperature τkd on distillation performance. Impact of Distillation Temperature (RQ4) Experiments with τkd ranging from 1 to 8 ( [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

Sequential recommendation systems are widely adopted but often deployed as black-box APIs, which has driven recent interest in model extraction to replicate their capabilities locally. However, the long-tail distribution induces severe signal heterogeneity: dense head sequences trigger the solidification of teacher preference, biasing extraction toward local patterns, while sparse tail sequences yield flat, noisy predictions. Existing one-size-fits-all extraction overlooks this disparity, resulting in noise overfitting and suboptimal knowledge transfer. We propose BAHSD, a black-box adaptive distillation framework that handles signal heterogeneity via a multi-scale consistency probing mechanism to implicitly quantify signal reliability. Based on this, an adaptive hierarchical objective is designed: dynamic-temperature KL divergence mitigates preference solidification for high-confidence signals, while ranking consistency and InfoNCE contrastive learning provide noise-robust enhancement for low-confidence signals. BAHSD consistently outperforms baselines, achieving up to 4.98\% gain over the teacher and 80\%+ improvement on tail users, offering a plug-and-play solution for high-fidelity black-box recommendation extraction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BAHSD adapts distillation to long-tail signals in black-box recsys via consistency probing, but the abstract gives no way to check if the claimed gains hold up.

read the letter

The paper's core claim is that standard distillation from black-box sequential recommenders fails on long-tail users because head sequences solidify preferences while tail ones produce noisy outputs, and BAHSD fixes this with a multi-scale consistency probe that decides per-sequence how to distill.

What stands out is the adaptive hierarchical objective: dynamic-temperature KL for high-confidence signals to avoid over-solidification, plus ranking consistency and InfoNCE for low-confidence ones to add robustness. This is a reasonable engineering response to signal heterogeneity in API-based extraction, and the plug-and-play framing matches a real deployment constraint.

The main limitation is that the description stays at the abstract level. There are no equations for the probing mechanism, no baseline list, no dataset details, and no mention of how the 4.98% overall gain or 80%+ tail improvement were measured or tested for significance. Without those, it is impossible to tell whether the probe actually captures reliability or whether the gains come from stronger baselines or favorable splits. The abstract also skips prior citations, so it is hard to gauge how much the combination differs from existing adaptive distillation work.

This is for people already working on model extraction or distillation inside recommendation systems. A reader who needs ideas for handling heterogeneous teacher signals might pick up the high-level structure, but anyone wanting to reproduce or extend the method will need the full implementation.

I would send it to peer review. The problem is practical and the adaptive idea is coherent on its face, but the paper will need concrete methods, experiments, and comparisons before the results can be trusted.

Referee Report

0 major / 0 minor

Summary. The paper proposes BAHSD, a black-box adaptive distillation framework for sequential recommendation. It introduces a multi-scale consistency probing mechanism to quantify signal reliability from black-box outputs under long-tail heterogeneity, paired with an adaptive hierarchical objective using dynamic-temperature KL divergence for high-confidence signals and ranking consistency plus InfoNCE for low-confidence signals. Empirical claims include up to 4.98% improvement over the teacher model and over 80% gains on tail users, positioning BAHSD as a plug-and-play extraction method.

Significance. If the results hold under rigorous validation, the work addresses a practically relevant gap in model extraction for deployed black-box sequential recommenders by explicitly handling signal heterogeneity induced by long-tail distributions. The plug-and-play framing and reported tail-user gains could have impact on real-world API-based systems where internal states are inaccessible.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their summary of our work on BAHSD. The assessment accurately reflects the paper's focus on handling signal heterogeneity in black-box sequential recommendation extraction. No specific major comments were provided in the report, so we have no point-by-point responses.

Circularity Check

0 steps flagged

No significant circularity; empirical claims only

full rationale

The provided abstract and context contain no equations, derivations, or mathematical claims. The work describes an empirical black-box distillation framework with mechanisms like multi-scale consistency probing and adaptive objectives, but presents results as experimental outcomes (e.g., 4.98% gain, 80%+ tail improvement) rather than any derivation chain. No self-definitional loops, fitted inputs renamed as predictions, or self-citation load-bearing steps are present or inspectable. This matches the reader's assessment that claims appear empirical and not reducible by construction. The paper is self-contained against external benchmarks with no load-bearing reductions identified.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are described in sufficient detail to populate the ledger.

pith-pipeline@v0.9.1-grok · 5724 in / 952 out tokens · 19860 ms · 2026-06-28T08:39:51.919847+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 2 canonical work pages

[1]

Information Systems125, 102427 (2024)

Boka, T.F., Niu, Z., Neupane, R.B.: A survey of sequential recommendation sys- tems: Techniques, evaluation, and future directions. Information Systems125, 102427 (2024)

2024
[2]

In: Proceedings of the sixteenth ACM international conference on web search and data mining

Chen, G., Chen, J., Feng, F., Zhou, S., He, X.: Unbiased knowledge distillation for recommendation. In: Proceedings of the sixteenth ACM international conference on web search and data mining. pp. 976–984 (2023)

2023
[3]

In: Proceedings of the 18th ACM Conference on Recommender Systems

Cui, Y., Liu, F., Wang, P., Wang, B., Tang, H., Wan, Y., Wang, J., Chen, J.: Distillation matters: empowering sequential recommenders to match the perfor- mance of large language models. In: Proceedings of the 18th ACM Conference on Recommender Systems. pp. 507–517 (2024)

2024
[4]

In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Du, H., Yuan, H., Zhao, P., Zhuang, F., Liu, G., Zhao, L., Liu, Y., Sheng, V.S.: Ensemble modeling with contrastive knowledge distillation for sequential recom- mendation. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 58–67 (2023)

2023
[5]

Inter- national journal of computer vision129(6), 1789–1819 (2021)

Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: A survey. Inter- national journal of computer vision129(6), 1789–1819 (2021)

2021
[6]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Han, Z., Chen, C., Zheng, X., Li, M., Liu, W., Yao, B., Li, Y., Yin, J.: Intra-and inter-group optimal transport for user-oriented fairness in recommender systems. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 8463–8471 (2024)

2024
[7]

Acm transactions on interactive intelligent systems (tiis)5(4), 1–19 (2015)

Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis)5(4), 1–19 (2015)

2015
[8]

arXiv preprint arXiv:1503.02531 (2015)

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

Pith/arXiv arXiv 2015
[9]

In: Proceedings of the 29th ACM international conference on information & knowledge management

Kang, S., Hwang, J., Kweon, W., Yu, H.: De-rrd: A knowledge distillation frame- work for recommender system. In: Proceedings of the 29th ACM international conference on information & knowledge management. pp. 605–614 (2020)

2020
[10]

Kang,W.C.,McAuley,J.:Self-attentivesequentialrecommendation.In:2018IEEE international conference on data mining (ICDM). pp. 197–206. IEEE (2018)

2018
[11]

In: Proceedings of the Web Conference 2021

Kweon, W., Kang, S., Yu, H.: Bidirectional distillation for top-k recommender system. In: Proceedings of the Web Conference 2021. pp. 3861–3871 (2021)

2021
[12]

In: Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N

Li, M., Hu, S., Zhu, F., Zhu, Q.: Few-shot learning for cold-start recommenda- tion. In: Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N. (eds.) Proceedings of the 2024 Joint International Conference on Computational Linguis- tics, Language Resources and Evaluation (LREC-COLING 2024). pp. 7185–7195. ELRA and ICCL, Torino, Italia (May 202...

2024
[13]

In: Proceedings 16 Xi Zhou et al

Li, M., Yuan, C., Wang, B., Zhuo, J., Wang, S., Liu, L., Xu, S.: Learning query- aware embedding index for improving e-commerce dense retrieval. In: Proceedings 16 Xi Zhou et al. of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. p. 3265–3269. SIGIR ’23, Association for Computing Machinery,NewYork,NY,USA(2...

work page doi:10.1145/3539618.3591834 2023
[14]

In: Proceedings of the 2023 ConferenceonEmpiricalMethodsinNaturalLanguageProcessing:IndustryTrack

Li, M., Yuan, C., Wang, H., Wang, P., Zhuo, J., Wang, B., Liu, L., Xu, S.: Adaptive hyper-parameter learning for deep semantic retrieval. In: Proceedings of the 2023 ConferenceonEmpiricalMethodsinNaturalLanguageProcessing:IndustryTrack. pp. 775–782 (2023)

2023
[15]

In: Proceedings of the AAAI conference on artificial intelligence

Li, M., Zhang, S., Zhu, F., Qian, W., Zang, L., Han, J., Hu, S.: Symmetric metric learning with adaptive margin for recommendation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 4634–4641 (2020)

2020
[16]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, S., Liu, F., Hao, Z., Wang, X., Li, L., Liu, X., Chen, P., Ma, W.: Logits deconfu- sion with clip for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 25411–25421 (2025)

2025
[17]

In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval

Liu, D., Cheng, P., Dong, Z., He, X., Pan, W., Ming, Z.: A general knowledge distillation framework for counterfactual recommendation via uniform data. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp. 831–840 (2020)

2020
[18]

Data Min

Ma, T., Li, M., Lv, S., Zhu, F., Huang, L., Hu, S.: Conte: contextualized knowledge graph embedding for circular relations. Data Min. Knowl. Dis- cov.37(1), 110–135 (Oct 2022). https://doi.org/10.1007/s10618-022-00851-2, https://doi.org/10.1007/s10618-022-00851-2

work page doi:10.1007/s10618-022-00851-2 2022
[19]

In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)

Ni, J., Li, J., McAuley, J.: Justifying recommendations using distantly-labeled re- views and fine-grained aspects. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). pp. 188–197 (2019)

2019
[20]

Frontiers of Computer Science20(3), 2003606 (2026)

Pan, L.W., Pan, W.K., Wei, M.Y., Yin, H.Z., Ming, Z.: A survey on sequential recommendation. Frontiers of Computer Science20(3), 2003606 (2026)

2026
[21]

Electronics14(8), 1538 (2025)

Song, H., Zhao, Y., Zhang, Y., Chen, H., Cui, L.: Knowledge distillation based recommendation systems: A comprehensive survey. Electronics14(8), 1538 (2025)

2025
[22]

In: Proceedings of the 28th ACM international conference on information and knowl- edge management

Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., Jiang, P.: Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In: Proceedings of the 28th ACM international conference on information and knowl- edge management. pp. 1441–1450 (2019)

2019
[23]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Sun, W., Chen, D., Lyu, S., Chen, G., Chen, C., Wang, C.: Knowledge distillation with refined logits. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1110–1119 (2025)

2025
[24]

arXiv preprint arXiv:2505.04560 (2025)

Wang, G., Yang, Z., Wang, Z., Wang, S., Xu, Q., Huang, Q.: Abkd: Pursuing a proper allocation of the probability mass in knowledge distillation viaα-β- divergence. arXiv preprint arXiv:2505.04560 (2025)

arXiv 2025
[25]

In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Wang, Y., Ma, W., Zhang, M., Xu, X., Liu, Z., Ma, S.: Improving long-tail user ctr prediction via hierarchical distribution alignment. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. pp. 3079–3090 (2025)

2025
[26]

In: Companion Proceedings of the ACM on Web Conference 2025

Wang, Y., Zhang, D., Wenren, H., Wang, Y., Li, Y.: Ekd4rec: Ensemble knowl- edge distillation from llm-based models to traditional sequential recommenders. In: Companion Proceedings of the ACM on Web Conference 2025. pp. 1370–1374 (2025)

2025
[27]

In: Proceedings Adaptive Distillation in Black-box Sequential Recommendation 17 of the 27th ACM SIGKDD conference on knowledge discovery & data mining

Wei, T., Feng, F., Chen, J., Wu, Z., Yi, J., He, X.: Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system. In: Proceedings Adaptive Distillation in Black-box Sequential Recommendation 17 of the 27th ACM SIGKDD conference on knowledge discovery & data mining. pp. 1791–1800 (2021)

2021
[28]

In: Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining

Wu, J., Chang, C.C., Yu, T., He, Z., Wang, J., Hou, Y., McAuley, J.: Coral: Collaborative retrieval-augmented large language models improve long-tail recom- mendation. In: Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. pp. 3391–3401 (2024)

2024
[29]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

Xu, X., Zhao, X., Xiang, H., Zhang, X., Shen, W., Hu, H., Qi, L.: Hpserec: A hierar- chical partitioning and stepwise enhancement framework for long-tailed sequential recommendation. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

2025
[30]

In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Yang, P., Zong, C.C., Huang, S.J., Feng, L., An, B.: Dual-head knowledge distil- lation: Enhancing logits utilization with an auxiliary head. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. pp. 3530–3541 (2025)

2025
[31]

In: Proceedings of the 15th ACM confer- ence on recommender systems

Yue, Z., He, Z., Zeng, H., McAuley, J.: Black-box attacks on sequential recom- menders via data-free model extraction. In: Proceedings of the 15th ACM confer- ence on recommender systems. pp. 44–54 (2021)

2021
[32]

arXiv preprint arXiv:2602.10633 (2026)

Zhang, H., Li, M., Liu, D., Wang, H., Zhang, Y., Zhou, X., Lv, H., Dai, J., Han, J.: A cognitive distribution and behavior-consistent framework for black-box attacks on recommender systems. arXiv preprint arXiv:2602.10633 (2026)

arXiv 2026
[33]

In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval

Zhang, Y., Feng, F., He, X., Wei, T., Song, C., Ling, G., Zhang, Y.: Causal in- tervention for leveraging popularity bias in recommendation. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. pp. 11–20 (2021)

2021
[34]

In: Proceedings of the web conference 2021

Zhang, Y., Cheng, D.Z., Yao, T., Yi, X., Hong, L., Chi, E.H.: A model of two tales: Dual transfer learning framework for improved long-tail item recommendation. In: Proceedings of the web conference 2021. pp. 2220–2231 (2021)

2021
[35]

In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Zhu, Z., Zhang, W.: Exploring feature-based knowledge distillation for recom- mender system: A frequency perspective. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. pp. 2182– 2193 (2025)

2025
[36]

arXiv preprint arXiv:2509.20989 (2025)

Zhu, Z., Zhang, W.: Rejuvenating cross-entropy loss in knowledge distillation for recommender systems. arXiv preprint arXiv:2509.20989 (2025)

arXiv 2025
[37]

In: Proceedings of the ACM web conference 2023

Zhu, Z., Wu, C., Fan, R., Lian, D., Chen, E.: Membership inference attacks against sequential recommender systems. In: Proceedings of the ACM web conference 2023. pp. 1208–1219 (2023)

2023

[1] [1]

Information Systems125, 102427 (2024)

Boka, T.F., Niu, Z., Neupane, R.B.: A survey of sequential recommendation sys- tems: Techniques, evaluation, and future directions. Information Systems125, 102427 (2024)

2024

[2] [2]

In: Proceedings of the sixteenth ACM international conference on web search and data mining

Chen, G., Chen, J., Feng, F., Zhou, S., He, X.: Unbiased knowledge distillation for recommendation. In: Proceedings of the sixteenth ACM international conference on web search and data mining. pp. 976–984 (2023)

2023

[3] [3]

In: Proceedings of the 18th ACM Conference on Recommender Systems

Cui, Y., Liu, F., Wang, P., Wang, B., Tang, H., Wan, Y., Wang, J., Chen, J.: Distillation matters: empowering sequential recommenders to match the perfor- mance of large language models. In: Proceedings of the 18th ACM Conference on Recommender Systems. pp. 507–517 (2024)

2024

[4] [4]

In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Du, H., Yuan, H., Zhao, P., Zhuang, F., Liu, G., Zhao, L., Liu, Y., Sheng, V.S.: Ensemble modeling with contrastive knowledge distillation for sequential recom- mendation. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 58–67 (2023)

2023

[5] [5]

Inter- national journal of computer vision129(6), 1789–1819 (2021)

Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: A survey. Inter- national journal of computer vision129(6), 1789–1819 (2021)

2021

[6] [6]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Han, Z., Chen, C., Zheng, X., Li, M., Liu, W., Yao, B., Li, Y., Yin, J.: Intra-and inter-group optimal transport for user-oriented fairness in recommender systems. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 8463–8471 (2024)

2024

[7] [7]

Acm transactions on interactive intelligent systems (tiis)5(4), 1–19 (2015)

Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis)5(4), 1–19 (2015)

2015

[8] [8]

arXiv preprint arXiv:1503.02531 (2015)

Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

Pith/arXiv arXiv 2015

[9] [9]

In: Proceedings of the 29th ACM international conference on information & knowledge management

Kang, S., Hwang, J., Kweon, W., Yu, H.: De-rrd: A knowledge distillation frame- work for recommender system. In: Proceedings of the 29th ACM international conference on information & knowledge management. pp. 605–614 (2020)

2020

[10] [10]

Kang,W.C.,McAuley,J.:Self-attentivesequentialrecommendation.In:2018IEEE international conference on data mining (ICDM). pp. 197–206. IEEE (2018)

2018

[11] [11]

In: Proceedings of the Web Conference 2021

Kweon, W., Kang, S., Yu, H.: Bidirectional distillation for top-k recommender system. In: Proceedings of the Web Conference 2021. pp. 3861–3871 (2021)

2021

[12] [12]

In: Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N

Li, M., Hu, S., Zhu, F., Zhu, Q.: Few-shot learning for cold-start recommenda- tion. In: Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N. (eds.) Proceedings of the 2024 Joint International Conference on Computational Linguis- tics, Language Resources and Evaluation (LREC-COLING 2024). pp. 7185–7195. ELRA and ICCL, Torino, Italia (May 202...

2024

[13] [13]

In: Proceedings 16 Xi Zhou et al

Li, M., Yuan, C., Wang, B., Zhuo, J., Wang, S., Liu, L., Xu, S.: Learning query- aware embedding index for improving e-commerce dense retrieval. In: Proceedings 16 Xi Zhou et al. of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. p. 3265–3269. SIGIR ’23, Association for Computing Machinery,NewYork,NY,USA(2...

work page doi:10.1145/3539618.3591834 2023

[14] [14]

In: Proceedings of the 2023 ConferenceonEmpiricalMethodsinNaturalLanguageProcessing:IndustryTrack

Li, M., Yuan, C., Wang, H., Wang, P., Zhuo, J., Wang, B., Liu, L., Xu, S.: Adaptive hyper-parameter learning for deep semantic retrieval. In: Proceedings of the 2023 ConferenceonEmpiricalMethodsinNaturalLanguageProcessing:IndustryTrack. pp. 775–782 (2023)

2023

[15] [15]

In: Proceedings of the AAAI conference on artificial intelligence

Li, M., Zhang, S., Zhu, F., Qian, W., Zang, L., Han, J., Hu, S.: Symmetric metric learning with adaptive margin for recommendation. In: Proceedings of the AAAI conference on artificial intelligence. vol. 34, pp. 4634–4641 (2020)

2020

[16] [16]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Li, S., Liu, F., Hao, Z., Wang, X., Li, L., Liu, X., Chen, P., Ma, W.: Logits deconfu- sion with clip for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 25411–25421 (2025)

2025

[17] [17]

In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval

Liu, D., Cheng, P., Dong, Z., He, X., Pan, W., Ming, Z.: A general knowledge distillation framework for counterfactual recommendation via uniform data. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp. 831–840 (2020)

2020

[18] [18]

Data Min

Ma, T., Li, M., Lv, S., Zhu, F., Huang, L., Hu, S.: Conte: contextualized knowledge graph embedding for circular relations. Data Min. Knowl. Dis- cov.37(1), 110–135 (Oct 2022). https://doi.org/10.1007/s10618-022-00851-2, https://doi.org/10.1007/s10618-022-00851-2

work page doi:10.1007/s10618-022-00851-2 2022

[19] [19]

In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)

Ni, J., Li, J., McAuley, J.: Justifying recommendations using distantly-labeled re- views and fine-grained aspects. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). pp. 188–197 (2019)

2019

[20] [20]

Frontiers of Computer Science20(3), 2003606 (2026)

Pan, L.W., Pan, W.K., Wei, M.Y., Yin, H.Z., Ming, Z.: A survey on sequential recommendation. Frontiers of Computer Science20(3), 2003606 (2026)

2026

[21] [21]

Electronics14(8), 1538 (2025)

Song, H., Zhao, Y., Zhang, Y., Chen, H., Cui, L.: Knowledge distillation based recommendation systems: A comprehensive survey. Electronics14(8), 1538 (2025)

2025

[22] [22]

In: Proceedings of the 28th ACM international conference on information and knowl- edge management

Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., Jiang, P.: Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In: Proceedings of the 28th ACM international conference on information and knowl- edge management. pp. 1441–1450 (2019)

2019

[23] [23]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Sun, W., Chen, D., Lyu, S., Chen, G., Chen, C., Wang, C.: Knowledge distillation with refined logits. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1110–1119 (2025)

2025

[24] [24]

arXiv preprint arXiv:2505.04560 (2025)

Wang, G., Yang, Z., Wang, Z., Wang, S., Xu, Q., Huang, Q.: Abkd: Pursuing a proper allocation of the probability mass in knowledge distillation viaα-β- divergence. arXiv preprint arXiv:2505.04560 (2025)

arXiv 2025

[25] [25]

In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Wang, Y., Ma, W., Zhang, M., Xu, X., Liu, Z., Ma, S.: Improving long-tail user ctr prediction via hierarchical distribution alignment. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. pp. 3079–3090 (2025)

2025

[26] [26]

In: Companion Proceedings of the ACM on Web Conference 2025

Wang, Y., Zhang, D., Wenren, H., Wang, Y., Li, Y.: Ekd4rec: Ensemble knowl- edge distillation from llm-based models to traditional sequential recommenders. In: Companion Proceedings of the ACM on Web Conference 2025. pp. 1370–1374 (2025)

2025

[27] [27]

In: Proceedings Adaptive Distillation in Black-box Sequential Recommendation 17 of the 27th ACM SIGKDD conference on knowledge discovery & data mining

Wei, T., Feng, F., Chen, J., Wu, Z., Yi, J., He, X.: Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system. In: Proceedings Adaptive Distillation in Black-box Sequential Recommendation 17 of the 27th ACM SIGKDD conference on knowledge discovery & data mining. pp. 1791–1800 (2021)

2021

[28] [28]

In: Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining

Wu, J., Chang, C.C., Yu, T., He, Z., Wang, J., Hou, Y., McAuley, J.: Coral: Collaborative retrieval-augmented large language models improve long-tail recom- mendation. In: Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining. pp. 3391–3401 (2024)

2024

[29] [29]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

Xu, X., Zhao, X., Xiang, H., Zhang, X., Shen, W., Hu, H., Qi, L.: Hpserec: A hierar- chical partitioning and stepwise enhancement framework for long-tailed sequential recommendation. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

2025

[30] [30]

In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Yang, P., Zong, C.C., Huang, S.J., Feng, L., An, B.: Dual-head knowledge distil- lation: Enhancing logits utilization with an auxiliary head. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. pp. 3530–3541 (2025)

2025

[31] [31]

In: Proceedings of the 15th ACM confer- ence on recommender systems

Yue, Z., He, Z., Zeng, H., McAuley, J.: Black-box attacks on sequential recom- menders via data-free model extraction. In: Proceedings of the 15th ACM confer- ence on recommender systems. pp. 44–54 (2021)

2021

[32] [32]

arXiv preprint arXiv:2602.10633 (2026)

Zhang, H., Li, M., Liu, D., Wang, H., Zhang, Y., Zhou, X., Lv, H., Dai, J., Han, J.: A cognitive distribution and behavior-consistent framework for black-box attacks on recommender systems. arXiv preprint arXiv:2602.10633 (2026)

arXiv 2026

[33] [33]

In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval

Zhang, Y., Feng, F., He, X., Wei, T., Song, C., Ling, G., Zhang, Y.: Causal in- tervention for leveraging popularity bias in recommendation. In: Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. pp. 11–20 (2021)

2021

[34] [34]

In: Proceedings of the web conference 2021

Zhang, Y., Cheng, D.Z., Yao, T., Yi, X., Hong, L., Chi, E.H.: A model of two tales: Dual transfer learning framework for improved long-tail item recommendation. In: Proceedings of the web conference 2021. pp. 2220–2231 (2021)

2021

[35] [35]

In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V

Zhu, Z., Zhang, W.: Exploring feature-based knowledge distillation for recom- mender system: A frequency perspective. In: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. pp. 2182– 2193 (2025)

2025

[36] [36]

arXiv preprint arXiv:2509.20989 (2025)

Zhu, Z., Zhang, W.: Rejuvenating cross-entropy loss in knowledge distillation for recommender systems. arXiv preprint arXiv:2509.20989 (2025)

arXiv 2025

[37] [37]

In: Proceedings of the ACM web conference 2023

Zhu, Z., Wu, C., Fan, R., Lian, D., Chen, E.: Membership inference attacks against sequential recommender systems. In: Proceedings of the ACM web conference 2023. pp. 1208–1219 (2023)

2023