A pretrained LLM is adapted via LoRA fine-tuning into a content-adaptive compressor that maps long texts to compact variable-length Z-token sequences while preserving reconstruction quality and downstream performance.
Gomez, Lukasz Kaiser, and Illia Polosukhin
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Gaze regularization aligns VLA attention with human visual patterns via KL divergence on patch distributions, yielding 4-12% gains on manipulation benchmarks.
Co-Me distills a confidence predictor to selectively merge low-confidence tokens in visual geometric transformers, delivering up to 21.5x speedup on VGGT and 20.4x on Pi3 while preserving spatial coverage and performance.
citing papers explorer
-
Large Language Model as Token Compressor and Decompressor
A pretrained LLM is adapted via LoRA fine-tuning into a content-adaptive compressor that maps long texts to compact variable-length Z-token sequences while preserving reconstruction quality and downstream performance.
-
Gaze-Regularized Vision-Language-Action Models for Robotic Manipulation
Gaze regularization aligns VLA attention with human visual patterns via KL divergence on patch distributions, yielding 4-12% gains on manipulation benchmarks.
-
Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers
Co-Me distills a confidence predictor to selectively merge low-confidence tokens in visual geometric transformers, delivering up to 21.5x speedup on VGGT and 20.4x on Pi3 while preserving spatial coverage and performance.