A two-stage diversity-plus-entropy token selection framework speeds up visual geometry transformers by over 85% on 500-image scenes while preserving baseline accuracy.
G-cut3r: Guided 3d reconstruction with camera and depth prior integration
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5years
2026 5verdicts
UNVERDICTED 5roles
background 2polarities
background 2representative citing papers
UniT unifies online and offline 3D geometry perception via a Group Autoregressive Transformer that processes observation groups with anchor-free point map prediction and a scale-adaptive loss.
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.
VGGT-SLAM++ improves on prior transformer SLAM by adding dense DEM submap graphs and high-cadence local optimization, achieving SOTA accuracy with reduced drift and bounded memory on benchmarks.
citing papers explorer
-
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers
A two-stage diversity-plus-entropy token selection framework speeds up visual geometry transformers by over 85% on 500-image scenes while preserving baseline accuracy.
-
UniT: Unified Geometry Learning with Group Autoregressive Transformer
UniT unifies online and offline 3D geometry perception via a Group Autoregressive Transformer that processes observation groups with anchor-free point map prediction and a scale-adaptive loss.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
-
Efficient 3D Content Reconstruction and Generation
Presents Instant3D for rapid text/image-to-3D generation via multi-view diffusion plus feed-forward reconstruction, and FastMap for 10x faster structure-from-motion with comparable accuracy.
-
VGGT-SLAM++
VGGT-SLAM++ improves on prior transformer SLAM by adding dense DEM submap graphs and high-cadence local optimization, achieving SOTA accuracy with reduced drift and bounded memory on benchmarks.