FrameVGGT replaces token-level KV retention with frame-level segments and prototypes to bound memory while preserving geometric coherence in streaming VGGT.
Evict3r: Training-free token eviction for memory-bounded streaming visual geometry transformers
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5years
2026 5roles
background 1polarities
background 1representative citing papers
A two-stage diversity-plus-entropy token selection framework speeds up visual geometry transformers by over 85% on 500-image scenes while preserving baseline accuracy.
GHOST applies geometry-hierarchical online token eviction with hierarchical scoring, privilege protection, and layer-wise budget allocation to halve KV cache size while maintaining reconstruction quality and achieving 1.75x faster inference.
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
OVGGT achieves constant O(1) memory and compute for streaming 3D geometry reconstruction by using FFN-residual-based KV cache compression and dynamic anchor protection, matching state-of-the-art accuracy on long sequences.
citing papers explorer
-
FrameVGGT: Geometry-Aligned Frame-Level Memory for Bounded Streaming VGGT
FrameVGGT replaces token-level KV retention with frame-level segments and prototypes to bound memory while preserving geometric coherence in streaming VGGT.
-
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers
A two-stage diversity-plus-entropy token selection framework speeds up visual geometry transformers by over 85% on 500-image scenes while preserving baseline accuracy.
-
GHOST: Geometry-Hierarchical Online Streaming Token Eviction for Efficient 3D Reconstruction
GHOST applies geometry-hierarchical online token eviction with hierarchical scoring, privilege protection, and layer-wise budget allocation to halve KV cache size while maintaining reconstruction quality and achieving 1.75x faster inference.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
-
OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer
OVGGT achieves constant O(1) memory and compute for streaming 3D geometry reconstruction by using FFN-residual-based KV cache compression and dynamic anchor protection, matching state-of-the-art accuracy on long sequences.