Vocabulary adaptation via targeted token addition and replacement improves semantic similarity, domain word usage, and training efficiency for LLM summarization in legal and medical domains.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.
citing papers explorer
-
Learning Faster with Better Tokens: Parameter-Efficient Vocabulary Adaptation for Specialized Text Summarization
Vocabulary adaptation via targeted token addition and replacement improves semantic similarity, domain word usage, and training efficiency for LLM summarization in legal and medical domains.
-
The Efficiency Gap in Byte Modeling
Byte modeling incurs greater scaling overhead for masked diffusion than autoregressive models because the diffusion objective destroys local byte contiguity needed to resolve semantics.