MTServe achieves up to 3.1x speedup for generative recommendation model serving by using hierarchical caches with host RAM and system optimizations while keeping cache hit ratios above 98.5%.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SA²CRQ uses sequential adaptive residual quantization based on path entropy plus anchored curriculum regularization from head items to improve both efficiency and cold-start performance in generative retrieval.
citing papers explorer
-
MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches
MTServe achieves up to 3.1x speedup for generative recommendation model serving by using hierarchical caches with host RAM and system optimizations while keeping cache hit ratios above 98.5%.
-
Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer
SA²CRQ uses sequential adaptive residual quantization based on path entropy plus anchored curriculum regularization from head items to improve both efficiency and cold-start performance in generative retrieval.