Vectorizing the trie: Efficient constrained decoding for llm-based generative retrieval on accelerators

Zhengyang Su, Isay Katsman, Yueqi Wang, Ruining He, Lukasz Heldt, Raghunandan Keshavan, Shao-Chuan Wang, Xinyang Yi, Mingyan Gao, Onkar Dalal, Lichan Hong, Ed Chi, Ningren Han · 2026 · arXiv 2602.22647

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

LLMs Need Encoders for Semantic IDs Too

cs.IR · 2026-05-29 · unverdicted · novelty 7.0

PrefixMem encoder for Semantic IDs improves deepest-level accuracy by up to 46% relative and full-SID retrieval recall by up to 22% relative on Pinterest data across LLM families.

UniPinRec: Unifying Generative Retrieval and Ranking at Pinterest Scale

cs.IR · 2026-05-29 · unverdicted · novelty 6.0

UniPinRec unifies retrieval and ranking into a single model and pipeline deployed at Pinterest, reporting +1% engagement lift, 11.1% lower latency, and 63.6% higher QPS.

CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation

cs.IR · 2026-05-06 · unverdicted · novelty 6.0

CapsID uses probabilistic capsule routing and confidence-based termination to generate variable-length semantic IDs, improving recall by 9.6% over strong baselines with half the latency of dual-representation systems.

citing papers explorer

Showing 3 of 3 citing papers after filters.

LLMs Need Encoders for Semantic IDs Too cs.IR · 2026-05-29 · unverdicted · none · ref 33
PrefixMem encoder for Semantic IDs improves deepest-level accuracy by up to 46% relative and full-SID retrieval recall by up to 22% relative on Pinterest data across LLM families.
UniPinRec: Unifying Generative Retrieval and Ranking at Pinterest Scale cs.IR · 2026-05-29 · unverdicted · none · ref 22
UniPinRec unifies retrieval and ranking into a single model and pipeline deployed at Pinterest, reporting +1% engagement lift, 11.1% lower latency, and 63.6% higher QPS.
CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation cs.IR · 2026-05-06 · unverdicted · none · ref 24
CapsID uses probabilistic capsule routing and confidence-based termination to generate variable-length semantic IDs, improving recall by 9.6% over strong baselines with half the latency of dual-representation systems.

Vectorizing the trie: Efficient constrained decoding for llm-based generative retrieval on accelerators

fields

years

verdicts

representative citing papers

citing papers explorer