pith. sign in

hub Canonical reference

TTT3R: 3D Reconstruction as Test-Time Training

Canonical reference. 100% of citing Pith papers cite this work as background.

25 Pith papers citing it
Background 100% of classified citations
abstract

Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a $2\times$ improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code is available in https://rover-xingyu.github.io/TTT3R

hub tools

citation-role summary

background 6

citation-polarity summary

fields

cs.CV 22 cs.RO 3

years

2026 23 2025 2

roles

background 6

polarities

background 6

representative citing papers

Learning 3D Reconstruction with Priors in Test Time

cs.CV · 2026-04-04 · unverdicted · novelty 7.0

Test-time constrained optimization incorporates priors into pre-trained multiview transformers via self-supervised losses and penalty terms to improve 3D reconstruction accuracy.

Linearizing Vision Transformer with Test-Time Training

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

Using Test-Time Training's structural match to Softmax attention plus key normalization and locality modules allows inheriting pretrained weights and fine-tuning Stable Diffusion 3.5 in one hour to match quality while speeding inference 1.32-1.47x.

Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.

OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer

cs.CV · 2026-03-06 · conditional · novelty 6.0

OVGGT achieves constant O(1) memory and compute for streaming 3D geometry reconstruction by using FFN-residual-based KV cache compression and dynamic anchor protection, matching state-of-the-art accuracy on long sequences.

ViT$^3$: Unlocking Test-Time Training in Vision

cs.CV · 2025-12-01 · unverdicted · novelty 6.0

ViT³ is a Test-Time Training vision model that achieves linear complexity, matches or exceeds other linear models like Mamba on classification, generation, detection and segmentation, and narrows the gap to standard vision Transformers.

HorizonStream: Long-Horizon Attention for Streaming 3D Reconstruction

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

HorizonStream is a long-horizon Transformer that factorizes geometric evidence influence into channel-wise linear attention for long-range temporal propagation and local spatiotemporal attention for short-range matching, claiming stable generalization from 48-frame training to over 10,000-frame test

citing papers explorer

Showing 25 of 25 citing papers.