TTSA3R: Training-Free Temporal-Spatial Adaptive Persistent State for Streaming 3D Reconstruction

Jiawei Zhang; Xinhao Xiang; Zhijie Zheng

arxiv: 2601.22615 · v3 · pith:YMHQFMQEnew · submitted 2026-01-30 · 💻 cs.CV

TTSA3R: Training-Free Temporal-Spatial Adaptive Persistent State for Streaming 3D Reconstruction

Zhijie Zheng , Xinhao Xiang , Jiawei Zhang This is my paper

classification 💻 cs.CV

keywords stateadaptivettsa3rreconstructionspatialtemporalsequencesupdate

0 comments

read the original abstract

Streaming recurrent models enable efficient 3D reconstruction by maintaining persistent state representations. However, they suffer from catastrophic forgetting over long sequences due to balancing historical information with new observations. Recent methods alleviate this by deriving adaptive signals from the attention perspective, but they operate on single dimensions without considering temporal and spatial consistency. To this end, we propose a training-free framework termed TTSA3R that leverages both temporal state evolution and spatial observation quality for adaptive state updates in 3D reconstruction. In particular, we devise a Temporal Adaptive Update Module that regulates update magnitude by analyzing temporal state evolution patterns. Then, a Spatial Contextual Update Module is introduced to localize spatial regions that require updates through observation-state alignment and scene dynamics. These complementary signals are finally fused to determine the state updating strategies. Extensive experiments show that TTSA3R achieves competitive performance on standard short-sequence benchmarks and provides substantially stronger robustness on extended sequences. On NRGBD, as sequences extend from 50 to 250 frames, TTSA3R exhibits only a 1.33x error increase, compared with over 4x degradation for CUT3R. This highlights the practical value of temporal-spatial adaptive updates for long-term reconstruction stability. Our code is available at https://github.com/anonus2357/ttsa3r.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mem3R: Streaming 3D Reconstruction with Hybrid Memory via Test-Time Training
cs.CV 2026-04 unverdicted novelty 7.0

Mem3R achieves better long-sequence 3D reconstruction by decoupling tracking and mapping with a hybrid memory of TTT-updated MLP and explicit tokens, reducing model size and trajectory errors.
Good Token Hunting: A Hitchhiker's Guide to Token Selection for Visual Geometry Transformers
cs.CV 2026-05 unverdicted novelty 6.0

A two-stage diversity-plus-entropy token selection framework speeds up visual geometry transformers by over 85% on 500-image scenes while preserving baseline accuracy.
Rethinking the State Update Gate for Long-Sequence Recurrent 3D Reconstruction
cs.CV 2026-05 unverdicted novelty 6.0

A closed-form scalar frame-level gate α_t derived from internal feature changes extends effective memory in recurrent 3D reconstruction and improves accuracy on long sequences up to 4541 frames.
Attention Itself Could Retrieve.RetrieveVGGT: Training-Free Long Context Streaming 3D Reconstruction via Query-Key Similarity Retrieval
cs.CV 2026-05 unverdicted novelty 6.0

RetrieveVGGT enables constant-memory long-context streaming 3D reconstruction by retrieving relevant frames via query-key similarities in VGGT's first attention layer, outperforming StreamVGGT and others.