Recognition: 2 theorem links
· Lean TheoremFast Spatial Memory with Elastic Test-Time Training
Pith reviewed 2026-05-10 18:22 UTC · model grok-4.3
The pith
Elastic Test-Time Training stabilizes LaCT fast-weight updates using a Fisher-weighted prior and EMA anchor to support multi-chunk 3D/4D reconstruction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose Elastic Test-Time Training that stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights to balance stability and plasticity. Based on this architecture, we introduce Fast Spatial Memory (FSM), an efficient model for 4D reconstruction that learns spatiotemporal representations from long observation sequences and renders novel view-time combinations. Pre-trained on large-scale curated 3D/4D data, FSM supports fast adaptation over long sequences and delivers high-quality 3D/4D reconstruction with smaller chunks while mitigating the camera-interpolation shortcut,
What carries the argument
Elastic Test-Time Training mechanism that applies a Fisher-weighted elastic prior around an exponentially moving average anchor state to regularize LaCT fast-weight updates.
Load-bearing premise
The Fisher-weighted elastic prior combined with the EMA-updated anchor will reliably prevent catastrophic forgetting and overfitting during multi-chunk test-time adaptation without introducing new instabilities or reducing the benefits of fast-weight updates.
What would settle it
Measuring 3D/4D reconstruction quality and forgetting rates when FSM processes a long sequence split into many small chunks versus a single large chunk; if quality drops or forgetting increases with multiple chunks, the stabilization claim fails.
Figures
read the original abstract
Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling short of the broader goal of handling arbitrarily long sequences in a single pass. We propose Elastic Test-Time Training inspired by elastic weight consolidation, that stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights to balance stability and plasticity. Based on this updated architecture, we introduce Fast Spatial Memory (FSM), an efficient and scalable model for 4D reconstruction that learns spatiotemporal representations from long observation sequences and renders novel view-time combinations. We pre-trained FSM on large-scale curated 3D/4D data to capture the dynamics and semantics of complex spatial environments. Extensive experiments show that FSM supports fast adaptation over long sequences and delivers high-quality 3D/4D reconstruction with smaller chunks and mitigating the camera-interpolation shortcut. Overall, we hope to advance LaCT beyond the bounded single-chunk setting toward robust multi-chunk adaptation, a necessary step for generalization to genuinely longer sequences, while substantially alleviating the activation-memory bottleneck.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Elastic Test-Time Training (ETT), inspired by elastic weight consolidation, to stabilize Large Chunk Test-Time Training (LaCT) fast-weight updates for long-context 3D/4D reconstruction. It introduces a Fisher-weighted elastic prior around an anchor state that evolves via exponential moving average (EMA) of past fast weights to balance stability and plasticity. This enables the Fast Spatial Memory (FSM) model, pre-trained on large-scale 3D/4D data, to support multi-chunk test-time adaptation over long sequences with smaller chunks, high-quality novel view-time rendering, and mitigation of the camera-interpolation shortcut, while reducing activation-memory bottlenecks.
Significance. If the empirical results hold, the approach could meaningfully advance test-time adaptation methods for spatiotemporal vision models by enabling scalable handling of arbitrarily long sequences without single-chunk memory limits or severe forgetting/overfitting. The explicit use of EWC-style regularization with an evolving anchor is a clear strength, and the pre-training plus multi-chunk experiments provide a concrete path toward practical 4D reconstruction systems.
major comments (3)
- [§3.2] §3.2 (Elastic Test-Time Training): The central stabilization claim relies on the Fisher-weighted prior accurately ranking parameter importance for the test-time objective, yet the manuscript does not specify whether the Fisher matrix is computed once on pre-training data, recomputed on each chunk, or updated online. This leaves open the distributional mismatch risk highlighted in the stress-test note, which directly affects whether the prior curbs forgetting without damping plasticity.
- [§4.1] §4.1 (FSM architecture and anchor update): The EMA anchor is presented as balancing stability/plasticity, but no ablation isolates its contribution versus the Fisher prior alone, nor quantifies how the anchor update rate interacts with chunk size to prevent the overfitting observed in plain LaCT. This is load-bearing for the multi-chunk claim.
- [Table 2] Table 2 (quantitative comparisons): The reported gains in PSNR/SSIM for smaller chunks are central to the 'high-quality reconstruction with smaller chunks' claim, but the table lacks variance across runs or statistical significance tests, making it difficult to confirm the improvements exceed the camera-interpolation shortcut baseline.
minor comments (3)
- [Eq. (7)] Notation for the elastic prior loss (Eq. 7) uses inconsistent symbols for the anchor state across the text and algorithm box; standardize to a single symbol.
- [§5] The abstract and §1 claim 'extensive experiments' but the experimental section would benefit from an explicit list of datasets and chunk sizes used in the multi-chunk setting.
- [Figure 3] Figure 3 caption does not state the number of chunks or sequence length for the visualized 4D reconstruction, reducing interpretability of the qualitative results.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and have revised the manuscript to incorporate clarifications, additional analyses, and statistical reporting as appropriate.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Elastic Test-Time Training): The central stabilization claim relies on the Fisher-weighted prior accurately ranking parameter importance for the test-time objective, yet the manuscript does not specify whether the Fisher matrix is computed once on pre-training data, recomputed on each chunk, or updated online. This leaves open the distributional mismatch risk highlighted in the stress-test note, which directly affects whether the prior curbs forgetting without damping plasticity.
Authors: We have revised Section 3.2 to explicitly state that the Fisher matrix is computed once on the pre-training data, consistent with standard EWC practice, to obtain a fixed importance ranking without incurring per-chunk overhead at test time. We acknowledge the potential for distributional mismatch between pre-training and test chunks and have expanded the discussion to explain why the resulting elastic prior still supports effective stabilization in our setting, as demonstrated by the multi-chunk results. A brief reference to the stress-test observations has also been added for context. revision: yes
-
Referee: [§4.1] §4.1 (FSM architecture and anchor update): The EMA anchor is presented as balancing stability/plasticity, but no ablation isolates its contribution versus the Fisher prior alone, nor quantifies how the anchor update rate interacts with chunk size to prevent the overfitting observed in plain LaCT. This is load-bearing for the multi-chunk claim.
Authors: We agree that isolating the EMA anchor's role strengthens the multi-chunk claims. The revised manuscript includes a new ablation in Section 4.1 comparing the full ETT model against a Fisher-prior-only variant and the plain LaCT baseline. We have also added quantitative analysis and a supplementary figure examining the interaction between the EMA update rate and chunk size, showing that suitable rates reduce the overfitting seen in LaCT while preserving adaptation performance. revision: yes
-
Referee: [Table 2] Table 2 (quantitative comparisons): The reported gains in PSNR/SSIM for smaller chunks are central to the 'high-quality reconstruction with smaller chunks' claim, but the table lacks variance across runs or statistical significance tests, making it difficult to confirm the improvements exceed the camera-interpolation shortcut baseline.
Authors: We have updated Table 2 to report means accompanied by standard deviations computed over multiple runs with different random seeds. We have also added the results of paired statistical significance tests (t-tests) against the baselines, including the camera-interpolation shortcut, confirming that the reported gains are statistically significant. revision: yes
Circularity Check
No significant circularity; proposal extends external EWC without self-referential reduction
full rationale
The paper's core contribution is the proposal of Elastic Test-Time Training (inspired by external elastic weight consolidation) and Fast Spatial Memory for LaCT stabilization via Fisher-weighted prior and EMA anchor. No derivation chain is presented that reduces a claimed prediction or result to its own inputs by construction. The abstract and description frame the approach as an architectural extension applying known regularization ideas to test-time adaptation, without fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations that collapse the argument. The method's claims rest on empirical validation rather than tautological re-expression of inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Elastic weight consolidation using Fisher information provides effective regularization to prevent catastrophic forgetting in neural network updates.
invented entities (1)
-
Fast Spatial Memory (FSM)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose Elastic Test-Time Training ... stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights
-
IndisputableMonolith/Foundation/DimensionForcing.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LaCET ... combining its scalability, efficiency, and elastic stability for robust long sequence modeling
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Memoryawaresynapses: Learning what (not) to forget
RahafAljundi,FrancescaBabiloni,MohamedElhoseiny,Mar- cusRohrbach,andTinneTuytelaars. Memoryawaresynapses: Learning what (not) to forget. InEuropean conference on computer vision (ECCV), pages 139–154, 2018. 4
2018
-
[2]
Recammaster: Camera-controlled generative rendering from a single video
Jianhong Bai, Menghan Xia, Xiao Fu, Xintao Wang, Lianrui Mu, Jinwen Cao, Zuozhu Liu, Haoji Hu, Xiang Bai, Pengfei Wan, et al. Recammaster: Camera-controlled generative rendering from a single video. InInternational Conference on Computer Vision, 2025. 6
2025
-
[3]
Atlas: Learning to optimally memorize the context at test time, 2025
Ali Behrouz, Zeman Li, Praneeth Kacham, Majid Daliri, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, and Vahab 10 Mirrokni. Atlas: Learning to optimally memorize the context at test time.arXiv preprint arXiv:2505.23735, 2025. 10
-
[4]
Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, and Vahab Mirrokni.It’sallconnected: Ajourneythroughtest-timemem- orization, attentional bias, retention, and online optimization. arXiv preprint arXiv:2504.13173, 2025. 10
-
[5]
Titans: Learning to memorize at test time
Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. InConference on Neural Information Processing Systems, 2025. 10
2025
-
[6]
Birth of a transformer: A memory viewpoint
Alberto Bietti, Vivien Cabannes, Diane Bouchacourt, Herve Jegou, and Leon Bottou. Birth of a transformer: A memory viewpoint. InConference on Neural Information Processing Systems, pages 1560–1588, 2023. 10
2023
-
[7]
Hardware-constrained hy- bridcodingofvideoimagery.IEEETransactionsonAerospace and Electronic Systems, (1):71–84, 1983
Luen C Chan and Peter Whiteman. Hardware-constrained hy- bridcodingofvideoimagery.IEEETransactionsonAerospace and Electronic Systems, (1):71–84, 1983. 7
1983
-
[8]
Ttt3r: 3d reconstruction as test-time training
Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, and Anpei Chen. Ttt3r: 3d reconstruction as test-time training. In International Conference on Learning Representations, 2026. 2, 10
2026
-
[9]
Wildrayzer: Self-supervisedlargeviewsynthesisindynamicenvironments
XuweiyiChen,WentaoZhou,andZezhouCheng. Wildrayzer: Self-supervisedlargeviewsynthesisindynamicenvironments. InConference on Computer Vision and Pattern Recognition,
-
[10]
One-minute video generation with test-time training
Karan Dalal, Daniel Koceja, Jiarui Xu, Yue Zhao, Shihao Han, Ka Chun Cheung, Jan Kautz, Yejin Choi, Yu Sun, and Xiaolong Wang. One-minute video generation with test-time training. InConference on Computer Vision and Pattern Recognition, pages 17702–17711, 2025. 10
2025
-
[11]
Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, and Javier Gonzalvo. Learning without training: The implicit dynamics of in-context learning.arXiv preprint arXiv:2507.16003, 2025. 10
-
[12]
St4rtrack: Simultaneous 4d reconstruction and tracking in the world
Haiwen Feng, Junyi Zhang, Qianqian Wang, Yufei Ye, Pengcheng Yu, Michael J Black, Trevor Darrell, and Angjoo Kanazawa. St4rtrack: Simultaneous 4d reconstruction and tracking in the world. InInternational Conference on Com- puter Vision, pages 8503–8513, 2025. 10
2025
-
[13]
Query-key normalization for trans- formers
Alex Henry, Prudhvi Raj Dachapally, Shubham Shantaram Pawar, and Yuxuan Chen. Query-key normalization for trans- formers. InFindings of the Association for Computational Linguistics: EMNLP 2020, pages 4246–4253, 2020. 14
2020
-
[14]
Lrm: Large reconstruction model for single image to 3d
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. InInternationalConferenceonLearningRepresentations,
-
[15]
Real3d: Scalinguplargereconstructionmodelswithreal-worldimages
HanwenJiang,QixingHuang,andGeorgiosPavlakos. Real3d: Scalinguplargereconstructionmodelswithreal-worldimages. InInternationalConferenceonComputerVision,pages5821– 5833, 2025. 10
2025
-
[16]
Rayzer: A self-supervised large view synthesis model
HanwenJiang,HaoTan,PengWang,HaianJin,YueZhao,Sai Bi,KaiZhang,FujunLuan,KalyanSunkavalli,QixingHuang, et al. Rayzer: A self-supervised large view synthesis model. InInternational Conference on Computer Vision, 2025. 9, 10
2025
-
[17]
LVSM: A large view synthesis model with minimal 3d inductive bias
Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. LVSM: A large view synthesis model with minimal 3d inductive bias. InInternational Conference on Learning Representations, 2025. 1, 4, 9, 10
2025
-
[18]
Stereo4d: Learning how things move in 3d from internet stereo videos
Linyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely, and Aleksander Holynski. Stereo4d: Learning how things move in 3d from internet stereo videos. InConference on Computer Vision and Pattern Recognition, pages 10497– 10509, 2025. 6, 8, 9, 14, 15
2025
-
[19]
Muon: An optimizer for hidden layers in neural networks.https: //kellerjordan.github.io/posts/muon , 2024
Keller Jordan, Yuchen Jin, Vlado Boza, You Jiacheng, Franz Cecista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks.https: //kellerjordan.github.io/posts/muon , 2024. 3
2024
-
[20]
Dy- namicstereo: Consistent dynamic depth from stereo videos
Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Dy- namicstereo: Consistent dynamic depth from stereo videos. InConference on Computer Vision and Pattern Recognition, pages 13229–13239, 2023. 6
2023
-
[21]
Lattice: Learn- ing to efficiently compress the memory.arXiv preprint arXiv:2504.05646, 2025
Mahdi Karami and Vahab Mirrokni. Lattice: Learn- ing to efficiently compress the memory.arXiv preprint arXiv:2504.05646, 2025. 10
-
[22]
Robot see robot do: Imitating articulated object manipulation with monocular 4d reconstruction
Justin Kerr, Chung Min Kim, Mingxuan Wu, Brent Yi, Qian- qian Wang, Ken Goldberg, and Angjoo Kanazawa. Robot see robot do: Imitating articulated object manipulation with monocular 4d reconstruction. InConference on Robot Learn- ing, 2024. 1
2024
-
[23]
Scalingviewsynthesistransformers.arXivpreprint arXiv:2602.21341, 2026
Evan Kim, Hyunwoo Ryu, Thomas W Mitchel, and Vincent Sitzmann. Scalingviewsynthesistransformers.arXivpreprint arXiv:2602.21341, 2026. 1, 4, 10
-
[24]
Overcoming catastrophic forgetting in neural networks
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017. 2, 3, 4
2017
-
[25]
Dynamic evaluation of neural sequence models
Ben Krause, Emmanuel Kahembwe, Iain Murray, and Steve Renals. Dynamic evaluation of neural sequence models. In International Conference on Machine Learning, pages 2766– 2775, 2018. 5, 14
2018
-
[26]
Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds
Jiahui Lei, Yijia Weng, Adam W Harley, Leonidas Guibas, and Kostas Daniilidis. Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds. InConference on Computer Vision and Pattern Recognition, pages 6165–6177,
-
[27]
Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. In International Conference on Learning Representations, 2024. 10
2024
-
[28]
Feed-forward bullet-timereconstructionofdynamicscenesfrommonocular videos
HanxueLiang,JiaweiRen,AshkanMirzaei,AntonioTorralba, Ziwei Liu, Igor Gilitschenski, Sanja Fidler, Cengiz Oztireli, Huan Ling, Zan Gojcic, and Jiahui Huang. Feed-forward bullet-timereconstructionofdynamicscenesfrommonocular videos. InConference on Neural Information Processing Systems, 2025. 10
2025
-
[29]
Movies: 11 Motion-aware 4d dynamic view synthesis in one second
Chenguo Lin, Yuchen Lin, Panwang Pan, Yifan Yu, Tao Hu, Honglei Yan, Katerina Fragkiadaki, and Yadong Mu. Movies: 11 Motion-aware 4d dynamic view synthesis in one second. In Conference on Computer Vision and Pattern Recognition,
-
[30]
Dl3dv-10k: A large-scale scene dataset for deep learning- based 3d vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, KunWan,LantaoYu,QianyuGuo,ZixunYu,YawenLu,etal. Dl3dv-10k: A large-scale scene dataset for deep learning- based 3d vision. InConference on Computer Vision and Pattern Recognition, pages 22160–22169, 2024. 6, 9, 15
2024
-
[31]
Longhorn: State space models are amortized online learners
Bo Liu, Rui Wang, Lemeng Wu, Yihao Feng, Peter Stone, and qiang liu. Longhorn: State space models are amortized online learners. InInternational Conference on Learning Representations, 2025. 10
2025
-
[32]
Muon is Scalable for LLM Training
Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, et al. Muon is scalable for llm training.arXiv preprint arXiv:2502.16982, 2025. 3
work page internal anchor Pith review arXiv 2025
-
[33]
Test-Time Training with KV Binding Is Secretly Linear Attention
JunchenLiu,SvenElflein,OrLitany,ZanGojcic,andRuilong Li. Test-time training with kv binding is secretly linear attention.arXiv preprint arXiv:2602.21204, 2026. 10
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[34]
4d-lrm: Large space-time reconstruction model from and to any view at any time
Ziqiao Ma, Xuweiyi Chen, Shoubin Yu, Sai Bi, Kai Zhang, Chen Ziwen, Sihan Xu, Jianing Yang, Zexiang Xu, Kalyan Sunkavalli, et al. 4d-lrm: Large space-time reconstruction model from and to any view at any time. InConference on Neural Information Processing Systems, 2025. 2, 4, 5, 10, 15
2025
-
[35]
Spring: A high-resolution high- detail dataset and benchmark for scene flow, optical flow and stereo
Lukas Mehl, Jenny Schmalfuss, Azin Jahedi, Yaroslava Nali- vayko, and Andrés Bruhn. Spring: A high-resolution high- detail dataset and benchmark for scene flow, optical flow and stereo. InConference on Computer Vision and Pattern Recognition, pages 4981–4991, 2023. 6
2023
-
[36]
True self-supervised novel view synthesis is transferable
Thomas Mitchel, Hyunwoo Ryu, and Vincent Sitzmann. True self-supervised novel view synthesis is transferable. InInter- national Conference on Learning Representations, 2026. 8, 10
2026
-
[37]
Julius Plücker. Xvii. on a new geometry of space.Philo- sophical Transactions of the Royal Society of London, (155): 725–791, 1865. 4
-
[38]
Hopfield networks is all you need
Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, Thomas Adler, David Kreil, Michael K Kopp, Günter Klam- bauer, Johannes Brandstetter, and Sepp Hochreiter. Hopfield networks is all you need. InInternational Conference on Learning Representations, 2021. 10
2021
-
[39]
L4gm: Large4dgaussianreconstructionmodel
JiaweiRen,ChengXie,AshkanMirzaei,KarstenKreis,Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling,etal. L4gm: Large4dgaussianreconstructionmodel. In Conference on Neural Information Processing Systems, pages 56828–56858, 2024. 1, 8, 10
2024
-
[40]
Weight normalization: A simplereparameterizationtoacceleratetrainingofdeepneural networks
Tim Salimans and Durk P Kingma. Weight normalization: A simplereparameterizationtoacceleratetrainingofdeepneural networks. InConference on Neural Information Processing Systems, 2016. 3
2016
-
[41]
Linear transformers are secretly fast weight programmers
Imanol Schlag, Kazuki Irie, and Jürgen Schmidhuber. Linear transformers are secretly fast weight programmers. InInter- national conference on machine learning, pages 9355–9366,
-
[42]
Learning to control fast-weight memo- ries: An alternative to dynamic recurrent networks.Neural Computation, 4(1):131–139, 1992
Jürgen Schmidhuber. Learning to control fast-weight memo- ries: An alternative to dynamic recurrent networks.Neural Computation, 4(1):131–139, 1992. 10
1992
-
[43]
GLU Variants Improve Transformer
Noam Shazeer. Glu variants improve transformer.arXiv preprint arXiv:2002.05202, 2020. 4
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[44]
Learning to (learn at test time): Rnns with expressive hidden states
Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, et al. Learning to (learn at test time): Rnns with expressive hidden states. InInternational Conference on Machine Learning, pages 57503–57522, 2025. 2, 10
2025
-
[45]
End-to-end test-time training for long context.arXiv preprint arXiv:2512.23675, 2025
ArnuvTandon,KaranDalal,XinhaoLi,DanielKoceja,Marcel Rød, Sam Buchanan, Xiaolong Wang, Jure Leskovec, Sanmi Koyejo, Tatsunori Hashimoto, et al. End-to-end test-time training for long context.arXiv preprint arXiv:2512.23675,
-
[46]
Mv- dust3r+: Single-stage scene reconstruction from sparse views in 2 seconds
Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan, Alexander Schwing, and Zhicheng Yan. Mv- dust3r+: Single-stage scene reconstruction from sparse views in 2 seconds. InConference on Computer Vision and Pattern Recognition, 2024. 10
2024
-
[47]
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
Antti Tarvainen and Harri Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. InConference on Neural Information Processing Systems, 2017. 4
2017
-
[48]
Transformers learn in-context by gradient descent
Johannes Von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmogi- nov, and Max Vladymyrov. Transformers learn in-context by gradient descent. InInternational Conference on Machine Learning, pages 35151–35174, 2023. 5, 10
2023
-
[49]
tttlrm: Test-time training for long context and autoregressive 3d reconstruction
Chen Wang, Hao Tan, Wang Yifan, Zhiqin Chen, Yuheng Liu, Kalyan Sunkavalli, Sai Bi, Lingjie Liu, and Yiwei Hu. tttlrm: Test-time training for long context and autoregressive 3d reconstruction. InConference on Computer Vision and Pattern Recognition, 2026. 2, 4, 5, 9, 10, 15
2026
-
[50]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InConference on Computer Vision and Pattern Recognition, pages 5294–5306,
-
[51]
Ke Alexander Wang, Jiaxin Shi, and Emily B Fox. Test-time regression: aunifyingframeworkfordesigningsequencemod- elswithassociativememory.arXivpreprintarXiv:2501.12352,
-
[52]
Pf- lrm: Pose-free large reconstruction model for joint pose and shape prediction
PengWang,HaoTan,SaiBi,YinghaoXu,FujunLuan,Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, and Kai Zhang. Pf- lrm: Pose-free large reconstruction model for joint pose and shape prediction. InInternational Conference on Learning Representations, 2024. 10
2024
-
[53]
Shape of mo- tion: 4d reconstruction from a single video
Qianqian Wang, Vickie Ye, Hang Gao, Weijia Zeng, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of mo- tion: 4d reconstruction from a single video. InInternational Conference on Computer Vision, pages 9660–9672, 2025. 8
2025
-
[54]
Continuous 3d perception model with persistent state
Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A Efros, and Angjoo Kanazawa. Continuous 3d perception model with persistent state. InConference on Computer Vision and Pattern Recognition, pages 10510–10522, 2025. 10
2025
-
[55]
Dust3r: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InConference on Computer Vision and Pattern Recognition, pages 20697–20709, 2024. 10 12
2024
-
[56]
Imagequalityassessment: fromerrorvisibilityto structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Imagequalityassessment: fromerrorvisibilityto structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004. 7
2004
-
[57]
Lrm- zero: Training large reconstruction models with synthesized data.InConferenceonNeuralInformationProcessingSystems,
Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou,SorenPirk,ArieKaufman,XinSun,andHaoTan. Lrm- zero: Training large reconstruction models with synthesized data.InConferenceonNeuralInformationProcessingSystems,
-
[58]
SV4d: Dynamic 3d content generation withmulti-frameandmulti-viewconsistency
Yiming Xie, Chun-Han Yao, Vikram Voleti, Huaizu Jiang, and Varun Jampani. SV4d: Dynamic 3d content generation withmulti-frameandmulti-viewconsistency. InInternational Conference on Learning Representations, 2025. 1
2025
-
[59]
Depth- splat: Connectinggaussiansplattinganddepth
HaofeiXu,SongyouPeng, FangjinhuaWang,HermannBlum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depth- splat: Connectinggaussiansplattinganddepth. InConference on Computer Vision and Pattern Recognition, pages 16453– 16463, 2025. 9
2025
-
[60]
4dgt: Learning a 4d gaussian transformerusingreal-worldmonocularvideos
Zhen Xu, Zhengqin Li, Zhao Dong, Xiaowei Zhou, Richard Newcombe, and Zhaoyang Lv. 4dgt: Learning a 4d gaussian transformerusingreal-worldmonocularvideos. InConference on Neural Information Processing Systems, 2025. 8, 10
2025
-
[61]
InInternational Conference on Learning Representations, 2025
Jiawei Yang, Jiahui Huang, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You, Apoorva Sharma, Maximilian Igl, Peter Karkus, DanfeiXu,etal.Storm: Spatio-temporalreconstructionmodel for large-scale outdoor scenes. InInternational Conference on Learning Representations, 2025. 10
2025
-
[62]
Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass
Jianing Yang, Alexander Sax, Kevin J Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, and Matt Feiszli. Fast3r: Towards 3d reconstruction of 1000+ images in one forward pass. InConference on Computer Vision and Pattern Recognition, 2025. 10
2025
-
[63]
Parallelizinglineartransformerswiththedeltarule over sequence length
Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, and YoonKim. Parallelizinglineartransformerswiththedeltarule over sequence length. InConference on Neural Information Processing Systems, pages 115491–115522, 2024. 10
2024
-
[64]
Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting
ZeyuYang,HongyeYang,ZijiePan,andLiZhang. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. InInternational Conference on Learning Representations, 2024. 5, 15
2024
-
[65]
Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera
Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, and Jan Kautz. Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In Conference on Computer Vision and Pattern Recognition, pages 5336–5345, 2020. 8, 9
2020
-
[66]
Revealing and mitigating the local pattern shortcuts of mamba
Wangjie You, Zecheng Tang, Juntao Li, Lili Yao, and Min Zhang. Revealing and mitigating the local pattern shortcuts of mamba. InFindings of the Association for Computational Linguistics: ACL 2025, pages 12156–12178, 2025. 7
2025
-
[67]
Contin- ual learning through synaptic intelligence
Friedemann Zenke, Ben Poole, and Surya Ganguli. Contin- ual learning through synaptic intelligence. InInternational conference on machine learning, pages 3987–3995, 2017. 4
2017
-
[68]
Monst3r: A simple approach for estimating geometry in the presence of motion
JunyiZhang,CharlesHerrmann,JunhwaHur,VarunJampani, Trevor Darrell, Forrester Cole, Deqing Sun, and Ming-Hsuan Yang. Monst3r: A simple approach for estimating geometry in the presence of motion. InInternational Conference on Learning Representations, 2025. 10
2025
-
[69]
LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory
Junyi Zhang, Charles Herrmann, Junhwa Hur, Chen Sun, Ming-HsuanYang,ForresterCole,TrevorDarrell,andDeqing Sun. Loger: Long-context geometric reconstruction with hybrid memory.arXiv preprint arXiv:2603.03269, 2026. 2, 10
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[70]
Arf: Artistic radiance fields
Kai Zhang, Nick Kolkin, Sai Bi, Fujun Luan, Zexiang Xu, Eli Shechtman, and Noah Snavely. Arf: Artistic radiance fields. InEuropean Conference on Computer Vision, pages 717–733,
-
[71]
Gs-lrm: Large reconstruction model for 3d gaussian splatting
Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. Gs-lrm: Large reconstruction model for 3d gaussian splatting. InEuropean Conference on Computer Vision, pages 1–19, 2024. 1, 4, 9, 10, 15
2024
-
[72]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. 6, 7
2018
-
[73]
Test-time training done right
Tianyuan Zhang, Sai Bi, Yicong Hong, Kai Zhang, Fujun Luan, Songlin Yang, Kalyan Sunkavalli, William T Freeman, and Hao Tan. Test-time training done right. InInternational Conference on Learning Representations, 2026. 2, 9, 10
2026
-
[74]
Learning 4d embodied world models
Haoyu Zhen, Qiao Sun, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, and Chuang Gan. Learning 4d embodied world models. InInternational Conference on Computer Vision, pages 5337–5347, 2025. 1
2025
-
[75]
Pointodyssey: A large-scale syn- thetic dataset for long-term point tracking
YangZheng,AdamWHarley,BokuiShen,GordonWetzstein, and Leonidas J Guibas. Pointodyssey: A large-scale syn- thetic dataset for long-term point tracking. InInternational Conference on Computer Vision, pages 19855–19865, 2023. 6
2023
-
[76]
Page-4d: Disentangled pose and geometry estimation for 4d perception
KaichenZhou,YuhanWang,GraceChen,GaspardBeaudouin, Fangneng Zhan, Paul Pu Liang, and Mengyu Wang. Page-4d: Disentangled pose and geometry estimation for 4d perception. InInternational Conference on Learning Representations,
-
[77]
Stereo magnification: learning view synthesis using multiplane images.ACM Transactions on Graphics, 37 (4):1–12, 2018
TinghuiZhou,RichardTucker,JohnFlynn,GrahamFyffe,and Noah Snavely. Stereo magnification: learning view synthesis using multiplane images.ACM Transactions on Graphics, 37 (4):1–12, 2018. 6
2018
-
[78]
Streaming 4d visual geometry transformer.arXiv preprint arXiv:2507.11539, 2025
Dong Zhuo, Wenzhao Zheng, Jiahe Guo, Yuqi Wu, Jie Zhou, and Jiwen Lu. Streaming 4d visual geometry transformer. arXiv preprint arXiv:2507.11539, 2025. 10
-
[79]
Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction
Chen Ziwen, Hao Tan, Peng Wang, Zexiang Xu, and Li Fuxin. Long-lrm++: Preserving fine details in feed-forward wide- coverage reconstruction.arXiv preprint arXiv:2512.10267,
work page internal anchor Pith review Pith/arXiv arXiv
-
[80]
Long-lrm: Long-sequence large reconstruction model for wide-coverage gaussian splats
ChenZiwen,HaoTan,KaiZhang,SaiBi,FujunLuan,Yicong Hong, Li Fuxin, and Zexiang Xu. Long-lrm: Long-sequence large reconstruction model for wide-coverage gaussian splats. InInternationalConferenceonComputerVision,pages4349– 4359, 2025. 2, 9, 10 13 A. Implementation and Training Details A.1. Data Pre-processing For each training sample, we load a video clip to...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.