FrameVGGT: Geometry-Aligned Frame-Level Memory for Bounded Streaming VGGT

· 2026 · cs.CV · arXiv 2603.07690

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Streaming Visual Geometry Transformers such as StreamVGGT enable strong online 3D perception, but their KV-cache grows unbounded over long streams, limiting practical deployment. We study bounded-memory streaming geometry from the perspective of memory organization: unlike language modeling, where useful information can often be compressed at token level, geometry-driven inference relies on coherent and mutually compatible observations across views. Under fixed memory budgets, retaining history as isolated entries can progressively fragment the geometric context needed for stable long-horizon matching and fusion. We therefore propose \textbf{FrameVGGT}, a bounded-memory framework that maintains a fixed-capacity set of complementary memory units for streaming geometry. In our implementation, each unit is instantiated as a frame-wise KV segment summarized by a compact key-space prototype, together with a sparse anchor tier for persistent long-range references. Across long-sequence 3D reconstruction, video depth estimation, and camera pose estimation, FrameVGGT achieves favorable accuracy--memory trade-offs under bounded budgets while maintaining more stable geometry over long streams.

representative citing papers

MyGO-Splat: Multi-Objective Closed-Loop Geometric Feedback for RGB-Only Gaussian SLAM

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

MyGO-Splat is a closed-loop RGB-only Gaussian SLAM system that rasterizes depth and normals from the map to supervise pose optimization and align monocular depth priors for scale consistency.

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

A two-stage diversity-plus-entropy token selection framework speeds up visual geometry transformers by over 85% on 500-image scenes while preserving baseline accuracy.

PanoImager: Geometry-Guided Novel View Synthesis and Reconstruction from Sparse Panoramic Views

cs.CV · 2026-06-25 · unverdicted · novelty 4.0

PanoImager is an SfM-free pipeline combining feed-forward priors, geometry-conditioned diffusion view completion, and depth-guided 3DGS optimization to reconstruct from sparse panoramic images.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers cs.CV · 2026-05-22 · unverdicted · none · ref 99 · internal anchor
A two-stage diversity-plus-entropy token selection framework speeds up visual geometry transformers by over 85% on 500-image scenes while preserving baseline accuracy.
PanoImager: Geometry-Guided Novel View Synthesis and Reconstruction from Sparse Panoramic Views cs.CV · 2026-06-25 · unverdicted · none · ref 13 · internal anchor
PanoImager is an SfM-free pipeline combining feed-forward priors, geometry-conditioned diffusion view completion, and depth-guided 3DGS optimization to reconstruct from sparse panoramic images.

FrameVGGT: Geometry-Aligned Frame-Level Memory for Bounded Streaming VGGT

fields

years

verdicts

representative citing papers

citing papers explorer