RegimeVGGT applies layer-wise U-shaped compression via saliency-guided banded merging and selectively protected K/V downsampling to deliver 6.7x speedup on VGGT at matched reconstruction quality.
Dymu: Dynamic merging and virtual unmerging for efficient vlms.arXiv preprint arXiv:2504.17040, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
CIVIC is a path-consistent compact visual inference framework that reduces KV-cache memory to approximately one-third and end-to-end latency in VLMs while preserving accuracy via text-aligned KL distillation and adaptive spatial retention.
citing papers explorer
-
RegimeVGGT: Layer-Wise Spatially Preserving Redundancy Removal for Visual Geometry Grounded Transformer
RegimeVGGT applies layer-wise U-shaped compression via saliency-guided banded merging and selectively protected K/V downsampling to deliver 6.7x speedup on VGGT at matched reconstruction quality.
-
CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models
CIVIC is a path-consistent compact visual inference framework that reduces KV-cache memory to approximately one-third and end-to-end latency in VLMs while preserving accuracy via text-aligned KL distillation and adaptive spatial retention.