Recognition: 2 theorem links
· Lean TheoremMonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion
Pith reviewed 2026-05-15 14:36 UTC · model grok-4.3
The pith
A pointmap estimator fine-tuned on limited dynamic video data can estimate geometry in moving scenes without explicit motion modeling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MonST3R directly estimates per-timestep pointmaps from dynamic scenes by fine-tuning an existing pointmap model on several dynamic posed video datasets with depth labels, enabling it to handle motion and deformation without any explicit motion representation or multi-stage decomposition.
What carries the argument
Per-timestep pointmap output, which supplies an independent 3D point cloud for every video frame to serve as the geometry representation.
If this is right
- Video depth estimation becomes more robust because the single-stage pointmap prediction avoids compounding errors from separate depth and flow stages.
- Camera pose estimation in dynamic scenes improves in both accuracy and speed by operating directly on the per-frame geometry output.
- Primarily feed-forward 4D reconstruction from video becomes feasible without requiring global optimization or explicit temporal modeling.
Where Pith is reading between the lines
- The same fine-tuning strategy could support real-time video geometry if the underlying model is distilled or quantized for lower latency.
- Integration with generative video models might allow the pointmaps to guide synthesis of missing or occluded geometry across frames.
- Performance on fluid or highly non-rigid motion could be tested by adding synthetic datasets with controlled deformation parameters.
Load-bearing premise
The fine-tuning data of dynamic posed videos with depth labels is sufficient for the model to generalize to arbitrary motions and deformations outside the training distribution.
What would settle it
Evaluate the model on a held-out set of videos containing motion patterns and object deformations absent from the fine-tuning datasets and measure whether depth accuracy or pose estimation error rises sharply compared with prior multi-stage methods.
read the original abstract
Estimating geometry from dynamic scenes, where objects move and deform over time, remains a core challenge in computer vision. Current approaches often rely on multi-stage pipelines or global optimizations that decompose the problem into subtasks, like depth and flow, leading to complex systems prone to errors. In this paper, we present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes. Our key insight is that by simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes. However, this approach presents a significant challenge: the scarcity of suitable training data, namely dynamic, posed videos with depth labels. Despite this, we show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics, even without an explicit motion representation. Based on this, we introduce new optimizations for several downstream video-specific tasks and demonstrate strong performance on video depth and camera pose estimation, outperforming prior work in terms of robustness and efficiency. Moreover, MonST3R shows promising results for primarily feed-forward 4D reconstruction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces MonST3R, a fine-tuned adaptation of the DUST3R pointmap estimator for dynamic scenes. By predicting independent per-timestep pointmaps from posed video frames with depth supervision, the method implicitly accommodates object motion and deformation without explicit flow, optical flow, or 4D representations. The central contribution is an empirical demonstration that fine-tuning on a limited set of dynamic posed video+depth datasets suffices to extend static-scene geometry estimation to dynamic cases, yielding improved robustness on video depth estimation, camera pose recovery, and feed-forward 4D reconstruction relative to prior multi-stage pipelines.
Significance. If the reported generalization holds, the work would offer a notably simpler alternative to existing dynamic geometry pipelines that decompose the problem into separate depth, flow, and optimization stages. The approach's strength lies in its minimal architectural change and avoidance of hand-crafted motion models; however, its dependence on empirical fine-tuning rather than a parameter-free derivation limits the strength of the theoretical claim.
major comments (3)
- [§5] §5 (Experiments): Results are reported primarily on in-distribution sequences drawn from the same small set of sources used for fine-tuning. No cross-dataset zero-shot evaluation on novel non-rigid deformations outside the training distribution is presented, leaving open whether performance stems from memorization of dataset-specific motion patterns rather than a general geometry-first dynamic prior.
- [§4] §4 (Training details) and §5: The manuscript acknowledges data scarcity yet provides no ablation that isolates the contribution of data selection strategy versus fine-tuning hyperparameters, nor any quantitative measure (e.g., performance drop under distribution shift) to support the claim that limited data suffices for arbitrary motion handling.
- [§5] Abstract and §5: The claim of outperforming prior work is stated without accompanying tables showing baseline implementations, error bars, or statistical significance; the absence of these details in the reported metrics undermines verification of the robustness and efficiency advantages.
minor comments (2)
- [§3] Notation for per-timestep pointmaps is introduced without an explicit equation linking the dynamic case to the original DUST3R static formulation; adding a short derivation or reference to the base model would improve clarity.
- [Figure 4] Figure captions for qualitative 4D reconstruction results do not indicate which sequences are held-out versus training-distribution, making it difficult to assess generalization from visuals alone.
Simulated Author's Rebuttal
We thank the referee for their thoughtful comments and suggestions. We provide point-by-point responses below and indicate the changes we will make in the revised manuscript.
read point-by-point responses
-
Referee: [§5] §5 (Experiments): Results are reported primarily on in-distribution sequences drawn from the same small set of sources used for fine-tuning. No cross-dataset zero-shot evaluation on novel non-rigid deformations outside the training distribution is presented, leaving open whether performance stems from memorization of dataset-specific motion patterns rather than a general geometry-first dynamic prior.
Authors: We acknowledge that our current evaluations are primarily on sequences from the training data distributions. To address this, we will include additional zero-shot evaluations on out-of-distribution dynamic scenes with novel deformations in the revised manuscript. This will help demonstrate that the model learns a general geometry prior rather than memorizing specific patterns. revision: yes
-
Referee: [§4] §4 (Training details) and §5: The manuscript acknowledges data scarcity yet provides no ablation that isolates the contribution of data selection strategy versus fine-tuning hyperparameters, nor any quantitative measure (e.g., performance drop under distribution shift) to support the claim that limited data suffices for arbitrary motion handling.
Authors: We agree that further ablations would strengthen the paper. In the revision, we plan to add ablations isolating the effects of data selection and fine-tuning hyperparameters. Additionally, we will report performance drops under distribution shifts to quantify the generalization from limited data. revision: yes
-
Referee: [§5] Abstract and §5: The claim of outperforming prior work is stated without accompanying tables showing baseline implementations, error bars, or statistical significance; the absence of these details in the reported metrics undermines verification of the robustness and efficiency advantages.
Authors: We will revise the abstract and Section 5 to include more detailed comparison tables with baseline implementations, error bars, and statistical significance tests where applicable. This will provide better verification of our claims regarding robustness and efficiency. revision: yes
Circularity Check
Empirical fine-tuning on external dynamic datasets; no derivation reduces to self-defined inputs
full rationale
The paper presents MonST3R as a fine-tuning of the pre-trained DUST3R pointmap estimator on external posed dynamic video+depth datasets. No mathematical derivation chain exists that reduces predictions to quantities defined in terms of the model's own fitted parameters. The central claim is supported by experimental results on training and test splits from those datasets rather than by self-referential equations or load-bearing self-citations that forbid alternatives. This matches the default expectation of no significant circularity for an empirical adaptation paper.
Axiom & Free-Parameter Ledger
free parameters (1)
- fine-tuning hyperparameters and data selection strategy
axioms (1)
- domain assumption Per-timestep pointmaps are sufficient to capture geometry in the presence of motion without an explicit motion representation
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
by simply estimating a pointmap for each timestep, we can effectively adapt DUSt3R's representation... fine-tuning on this limited data
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
global optimization... Lalign + wsmooth Lsmooth + wflow Lflow
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 23 Pith papers
-
Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction
Ray-aware pointers that track both location and viewing direction enable adaptive retain-or-replace memory updates for more stable streaming 3D reconstruction.
-
Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes
Ground4D resolves temporal conflicts in feedforward 4D Gaussian reconstruction for off-road scenes via voxel-grounded temporal aggregation with intra-voxel softmax and surface normal regularization, outperforming prio...
-
AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision
AirZoo is a new large-scale synthetic dataset for aerial 3D vision that improves state-of-the-art models on image retrieval, cross-view matching, and 3D reconstruction when used for fine-tuning.
-
Holo360D: A Large-Scale Real-World Dataset with Continuous Trajectories for Advancing Panoramic 3D Reconstruction and Beyond
Holo360D is the first large-scale dataset providing continuous panoramic sequences with accurately aligned high-completeness depth maps and meshes for training panoramic 3D reconstruction models.
-
Learning 3D Reconstruction with Priors in Test Time
Test-time constrained optimization incorporates priors into pre-trained multiview transformers via self-supervised losses and penalty terms to improve 3D reconstruction accuracy.
-
STAC: Plug-and-Play Spatio-Temporal Aware Cache Compression for Streaming 3D Reconstruction
STAC compresses KV caches in streaming 3D reconstruction transformers via temporal token preservation with decayed attention, spatial voxel compression, and chunked multi-frame optimization, delivering 10x memory redu...
-
ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training
ZipMap achieves linear-time bidirectional 3D reconstruction by zipping image collections into a compact stateful representation via test-time training layers.
-
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
π³ is a feed-forward network with full permutation equivariance that outputs affine-invariant poses and scale-invariant local point maps without reference frames, reaching state-of-the-art on camera pose, depth, and d...
-
CoGE: Sim-to-Real Online Geometric Estimation for Monocular Colonoscopy
CoGE achieves state-of-the-art monocular geometric estimation in colonoscopy by training solely on simulated data via an illumination-aware Retinex-based module and a wavelet-based structure-aware module.
-
Attention Itself Could Retrieve.RetrieveVGGT: Training-Free Long Context Streaming 3D Reconstruction via Query-Key Similarity Retrieval
RetrieveVGGT enables constant-memory long-context streaming 3D reconstruction by retrieving relevant frames via query-key similarities in VGGT's first attention layer, outperforming StreamVGGT and others.
-
RigidFormer: Learning Rigid Dynamics using Transformers
RigidFormer learns mesh-free rigid dynamics from point clouds using object-centric anchors, Anchor-Vertex Pooling, Anchor-based RoPE, and differentiable Kabsch alignment to enforce rigidity.
-
Sat3R: Satellite DSM Reconstruction via RPC-Aware Depth Fine-tuning
Sat3R adapts Depth Anything V2 via RPC-aware metric depth fine-tuning to deliver satellite DSM reconstruction with 38% lower MAE than zero-shot baselines and over 300x speedup versus optimization methods.
-
Ray-Aware Pointer Memory with Adaptive Updates for Streaming 3D Reconstruction
Ray-aware pointer memory with adaptive retain-or-replace updates enhances stability and accuracy in streaming 3D reconstruction.
-
Long-tail Internet photo reconstruction
Finetuning 3D foundation models on simulated sparse subsets from MegaDepth-X produces robust reconstructions from extremely sparse, noisy internet photos while preserving performance on dense benchmarks.
-
Vista4D: Video Reshooting with 4D Point Clouds
Vista4D re-synthesizes dynamic videos from new viewpoints by grounding them in a 4D point cloud built with static segmentation and multiview training.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temp...
-
Self-Improving 4D Perception via Self-Distillation
SelfEvo enables pretrained 4D perception models to self-improve on unlabeled videos via self-distillation, delivering up to 36.5% relative gains in video depth estimation and 20.1% in camera estimation across eight be...
-
SceneScribe-1M: A Large-Scale Video Dataset with Comprehensive Geometric and Semantic Annotations
SceneScribe-1M is a new dataset of 1 million videos with semantic text, camera parameters, dense depth, and consistent 3D point tracks to support monocular depth estimation, scene reconstruction, point tracking, and t...
-
OVGGT: O(1) Constant-Cost Streaming Visual Geometry Transformer
OVGGT achieves constant O(1) memory and compute for streaming 3D geometry reconstruction by using FFN-residual-based KV cache compression and dynamic anchor protection, matching state-of-the-art accuracy on long sequences.
-
Streaming 4D Visual Geometry Transformer
A causal transformer with key-value caching and distillation from a bidirectional VGGT model enables efficient online 4D geometry reconstruction from videos.
-
WildPose: A Unified Framework for Robust Pose Estimation in the Wild
WildPose unifies feedforward 3D features from MASt3R with differentiable bundle adjustment for robust monocular pose estimation across dynamic, static, and low-ego-motion scenes.
-
LychSim: A Controllable and Interactive Simulation Framework for Vision Research
LychSim introduces a controllable simulation platform on Unreal Engine 5 with Python API, procedural generation, and LLM integration for vision research tasks.
-
DINO_4D: Semantic-Aware 4D Reconstruction
DINO_4D uses frozen DINOv3 features to inject semantic awareness into 4D dynamic scene reconstruction, improving tracking accuracy and completeness on benchmarks while preserving O(T) complexity.
Reference graph
Works this paper leans on
-
[1]
Scaling Learning Algorithms Towards
Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
-
[2]
and Osindero, Simon and Teh, Yee Whye , journal =
Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
- [3]
-
[4]
Repurposing diffusion-based image generators for monocular depth estimation , author=
-
[5]
Wang, Wenshan and Hu, Yaoyu and Scherer, Sebastian , booktitle=CoRL, pages=. Tartan
-
[6]
Shen, Shihao and Cai, Yilin and Wang, Wenshan and Scherer, Sebastian , booktitle=ICRA, pages=. Dytan
-
[7]
Structure and motion from casual videos , author=
-
[8]
Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang , journal=. Depth Anything
-
[9]
Deep patch visual odometry , author=
-
[10]
ACM Transactions on Graphics (ToG) , volume=
Consistent video depth estimation , author=. ACM Transactions on Graphics (ToG) , volume=. 2020 , publisher=
work page 2020
-
[11]
ACM Transactions on Graphics (ToG) , volume=
Consistent depth of moving objects in video , author=. ACM Transactions on Graphics (ToG) , volume=
-
[12]
Teed, Zachary and Deng, Jia , journal=NeurIPS, pages=
-
[13]
Chen, Weirong and Chen, Le and Wang, Rui and Pollefeys, Marc , booktitle=CVPR, pages=
-
[14]
Hu, Wenbo and Gao, Xiangjun and Li, Xiaoyu and Zhao, Sijie and Cun, Xiaodong and Zhang, Yong and Quan, Long and Shan, Ying , journal=. Depth
-
[15]
Zhao, Wang and Liu, Shaohui and Guo, Hengkai and Wang, Wenping and Liu, Yong-Jin , booktitle=ECCV, pages=
-
[16]
Lei, Jiahui and Weng, Yijia and Harley, Adam and Guibas, Leonidas and Daniilidis, Kostas , journal=
-
[17]
Structure-from-Motion Revisited , booktitle=CVPR, year=
Sch\". Structure-from-Motion Revisited , booktitle=CVPR, year=
-
[18]
Chu, Wen-Hsuan and Ke, Lei and Fragkiadaki, Katerina , journal=
-
[19]
Wang, Qianqian and Ye, Vickie and Gao, Hang and Austin, Jake and Li, Zhengqi and Kanazawa, Angjoo , journal =. Shape of Motion: 4
-
[20]
Wu, Guanjun and Yi, Taoran and Fang, Jiemin and Xie, Lingxi and Zhang, Xiaopeng and Wei, Wei and Liu, Wenyu and Tian, Qi and Wang, Xinggang , booktitle=CVPR, pages=. 4
-
[21]
Wang, Shizun and Yang, Xingyi and Shen, Qiuhong and Jiang, Zhenxiang and Wang, Xinchao , journal=
-
[22]
Liu, Qingming and Liu, Yuan and Wang, Jiepeng and Lv, Xianqiang and Wang, Peng and Wang, Wenping and Hou, Junhui , journal=
-
[23]
Wang, Shuzhe and Leroy, Vincent and Cabon, Yohann and Chidlovskii, Boris and Revaud, Jerome , booktitle=CVPR, pages=
-
[24]
A naturalistic open source movie for optical flow evaluation , author=
-
[25]
Wang, Wenshan and Zhu, Delong and Wang, Xiangwei and Hu, Yaoyu and Qiu, Yuheng and Wang, Chen and Hu, Yafei and Kapoor, Ashish and Scherer, Sebastian , booktitle=IROS, pages=. Tartan
-
[26]
Spring: A high-resolution high-detail dataset and benchmark for scene flow, optical flow and stereo , author=
-
[27]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Learning to recover 3d scene shape from a single image , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[28]
Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset , author=
-
[29]
ReFusion: 3D reconstruction in dynamic environments for
Palazzolo, Emanuele and Behley, Jens and Lottes, Philipp and Giguere, Philippe and Stachniss, Cyrill , booktitle=IROS, pages=. ReFusion: 3D reconstruction in dynamic environments for
-
[30]
Geiger, Andreas and Lenz, Philip and Stiller, Christoph and Urtasun, Raquel , journal=IJRR, volume=. Vision meets robotics: The
-
[31]
Indoor segmentation and support inference from
Silberman, Nathan and Hoiem, Derek and Kohli, Pushmeet and Fergus, Rob , booktitle=ECCV, pages=. Indoor segmentation and support inference from
-
[32]
Harley and Bokui Shen and Gordon Wetzstein and Leonidas J
Yang Zheng and Adam W. Harley and Bokui Shen and Gordon Wetzstein and Leonidas J. Guibas , title =
-
[33]
Lepetit, Vincent and Moreno-Noguer, Francesc and Fua, Pascal , journal=IJCV, volume=
-
[34]
Moving Object Segmentation: All You Need Is
Xie, Junyu and Yang, Charig and Xie, Weidi and Zisserman, Andrew , journal=. Moving Object Segmentation: All You Need Is
-
[35]
Communications of the ACM , volume=
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , author=. Communications of the ACM , volume=
- [36]
-
[37]
Multiple view geometry in computer vision , author=. 2003 , publisher=
work page 2003
-
[38]
Weinzaepfel, Philippe and Leroy, Vincent and Lucas, Thomas and Br. Cro
-
[39]
Sun, Pei and Kretzschmar, Henrik and Dotiwalla, Xerxes and Chouard, Aurelien and Patnaik, Vijaysai and Tsui, Paul and Guo, James and Zhou, Yin and Chai, Yuning and Caine, Benjamin and Vasudevan, Vijay and Han, Wei and Ngiam, Jiquan and Zhao, Hang and Timofeev, Aleksei and Ettinger, Scott and Krivokon, Maxim and Gao, Amy and Joshi, Aditya and Zhang, Yu and...
-
[40]
Ettinger, Scott and Cheng, Shuyang and Caine, Benjamin and Liu, Chenxi and Zhao, Hang and Pradhan, Sabeek and Chai, Yuning and Sapp, Ben and Qi, Charles R. and Zhou, Yin and Yang, Zoey and Chouard, Aur'elien and Sun, Pei and Ngiam, Jiquan and Vasudevan, Vijay and McCauley, Alexander and Shlens, Jonathon and Anguelov, Dragomir , title=. 2021 , pages=
work page 2021
-
[41]
Chen, Kan and Ge, Runzhou and Qiu, Hang and Ai-Rfou, Rami and Qi, Charles R. and Zhou, Xuanyu and Yang, Zoey and Ettinger, Scott and Sun, Pei and Leng, Zhaoqi and Mustafa, Mustafa and Bogun, Ivan and Wang, Weiyue and Tan, Mingxing and Anguelov, Dragomir , title=
-
[42]
A benchmark dataset and evaluation methodology for video object segmentation , author=
-
[43]
Depth anything: Unleashing the power of large-scale unlabeled data , author=
-
[44]
Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer , author=
-
[45]
Vision transformers for dense prediction , author=
-
[46]
Bhat, Shariq Farooq and Birkl, Reiner and Wofk, Diana and Wonka, Peter and M. Zoe. arXiv preprint arXiv:2302.12288 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[47]
The surprising effectiveness of diffusion models for optical flow and monocular depth estimation , author=
-
[48]
Depth map prediction from a single image using a multi-scale deep network , author=
-
[49]
Deep ordinal regression network for monocular depth estimation , author=
-
[50]
Deeper depth prediction with fully convolutional residual networks , author=. 3DV , pages=
-
[51]
Li, Zhenyu and Wang, Xuyang and Liu, Xianming and Jiang, Junjun , journal=TIP, year=. Bins
-
[52]
Bhat, Shariq Farooq and Alhashim, Ibraheem and Wonka, Peter , booktitle=CVPR, pages=. Ada
-
[53]
Piccinelli, Luigi and Yang, Yung-Hsu and Sakaridis, Christos and Segu, Mattia and Li, Siyuan and Van Gool, Luc and Yu, Fisher , booktitle=CVPR, pages=. Uni
-
[54]
arXiv preprint arXiv:2312.13252 , year=
Zero-shot metric depth with a field-of-view conditioned diffusion model , author=. arXiv preprint arXiv:2312.13252 , year=
-
[55]
Robust consistent video depth estimation , author=
-
[56]
Digging into self-supervised monocular depth estimation , author=
-
[58]
arXiv preprint arXiv:2409.02104 , year=
Seidenschwarz, Jenny and Zhou, Qunjie and Duisterhof, Bardienus and Ramanan, Deva and Leal-Taix. arXiv preprint arXiv:2409.02104 , year=
- [59]
- [60]
-
[61]
and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Nie
Dai, Angela and Chang, Angel X. and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Nie. Scan
-
[62]
Neural video depth stabilizer , author=
-
[63]
Mur-Artal, Raul and Montiel, Jose Maria Martinez and Tardos, Juan D , journal=
-
[64]
IEEE Transactions on Robotics , volume=
Mur-Artal, Raul and Tard. IEEE Transactions on Robotics , volume=
-
[65]
Direct sparse odometry , author=
-
[66]
Depth from videos in the wild: Unsupervised monocular depth learning from unknown cameras , author=
-
[67]
Unsupervised learning of depth and ego-motion from monocular video using 3
Mahjourian, Reza and Wicke, Martin and Angelova, Anelia , booktitle=CVPR, pages=. Unsupervised learning of depth and ego-motion from monocular video using 3
-
[68]
Every pixel counts: Unsupervised geometry learning with holistic 3
Yang, Zhenheng and Wang, Peng and Wang, Yang and Xu, Wei and Nevatia, Ram , booktitle=. Every pixel counts: Unsupervised geometry learning with holistic 3
-
[69]
Pixel-perfect structure-from-motion with featuremetric refinement , author=
-
[70]
Teed, Zachary and Deng, Jia , journal=. Deep
-
[71]
Tang, Chengzhou and Tan, Ping , journal=
-
[72]
Engel, Jakob and Sch
-
[73]
Newcombe, Richard A and Lovegrove, Steven J. and Davison, Andrew J. , booktitle=ICCV, pages=
-
[74]
Guo, Yulan and Wang, Hanyun and Hu, Qingyong and Liu, Hao and Liu, Li and Bennamoun, Mohammed , journal=PAMI, volume=. Deep learning for 3
-
[75]
Learning efficient point cloud generation for dense 3
Lin, Chen-Hsuan and Kong, Chen and Lucey, Simon , booktitle=AAAI, year=. Learning efficient point cloud generation for dense 3
-
[76]
Sitzmann, Vincent and Thies, Justus and Heide, Felix and Nie. Deep
-
[77]
and Xu, Danfei and Gwak, JunYoung and Chen, Kevin and Savarese, Silvio , booktitle=ECCV, pages=
Choy, Christopher B. and Xu, Danfei and Gwak, JunYoung and Chen, Kevin and Savarese, Silvio , booktitle=ECCV, pages=
-
[78]
Multi-view supervision for single-view reconstruction via differentiable ray consistency , author=
-
[79]
Wang, Peng and Liu, Lingjie and Liu, Yuan and Theobalt, Christian and Komura, Taku and Wang, Wenping , journal=. Neu
-
[80]
Convolutional occupancy networks , author=
-
[81]
Learning implicit fields for generative shape modeling , author=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.