DiffPrune reformulates visual token pruning as continuous control of token information using an Information Throttler with importance-conditioned variance-preserving noise, enabling fully differentiable learning of scores that are hard-thresholded at inference.
Dynamic token reduction during generation for vision language models.arXiv preprint arXiv:2501.14204, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
A roadmap that defines architectural nativity for multimodal models and categorizes them into Multi-to-Text, Multi-to-Target, and Multi-to-Multi types while outlining an industrial pipeline toward unified transformer-based native multimodal modeling.
citing papers explorer
-
Beyond Surrogate Gradients: Fully Differentiable Token Pruning for Vision-Language Models
DiffPrune reformulates visual token pruning as continuous control of token information using an Information Throttler with importance-conditioned variance-preserving noise, enabling fully differentiable learning of scores that are hard-thresholded at inference.
-
Toward Native Multimodal Modeling: A Roadmap
A roadmap that defines architectural nativity for multimodal models and categorizes them into Multi-to-Text, Multi-to-Target, and Multi-to-Multi types while outlining an industrial pipeline toward unified transformer-based native multimodal modeling.