pith. sign in

FrameVGGT: Geometry-Aligned Frame-Level Memory for Bounded Streaming VGGT

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it
abstract

Streaming Visual Geometry Transformers such as StreamVGGT enable strong online 3D perception, but their KV-cache grows unbounded over long streams, limiting practical deployment. We study bounded-memory streaming geometry from the perspective of memory organization: unlike language modeling, where useful information can often be compressed at token level, geometry-driven inference relies on coherent and mutually compatible observations across views. Under fixed memory budgets, retaining history as isolated entries can progressively fragment the geometric context needed for stable long-horizon matching and fusion. We therefore propose \textbf{FrameVGGT}, a bounded-memory framework that maintains a fixed-capacity set of complementary memory units for streaming geometry. In our implementation, each unit is instantiated as a frame-wise KV segment summarized by a compact key-space prototype, together with a sparse anchor tier for persistent long-range references. Across long-sequence 3D reconstruction, video depth estimation, and camera pose estimation, FrameVGGT achieves favorable accuracy--memory trade-offs under bounded budgets while maintaining more stable geometry over long streams.

fields

cs.CV 2 cs.RO 1

years

2026 3

verdicts

UNVERDICTED 3

clear filters

representative citing papers

citing papers explorer

Showing 2 of 2 citing papers after filters.