pith. sign in

arxiv: 2507.22699 · v1 · submitted 2025-07-30 · 💻 cs.CV

Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints

Pith reviewed 2026-05-19 02:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords shape from template3D reconstructionunsupervised SfTmesh inextensibilitynon-rigid deformationimage-guided optimizationocclusion handling
0
0 comments X

The pith

Unsupervised shape-from-template reconstructs deforming objects from images using only color, gradients, silhouettes and mesh inextensibility.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an unsupervised approach to shape-from-template reconstruction that deforms a 3D template to match input images without point correspondences or any supervised training data. It combines matching terms for color features, image gradients and object silhouettes with a constraint that keeps the mesh edges from stretching or compressing. This produces 3D shapes at roughly 400 times the speed of the fastest prior unsupervised methods while recovering finer surface details and maintaining accuracy under heavy occlusions. A reader would care because the method removes the need for large annotated datasets or manual feature matching, potentially allowing real-time 3D modeling from ordinary video of everyday objects. If the central claim holds, geometric constraints alone can substitute for learned priors in non-rigid reconstruction tasks.

Core claim

The central claim is that an image-guided optimization that matches color, gradient and silhouette observations while enforcing mesh inextensibility can recover accurate 3D deformations from single or few images without correspondences or supervision, and that this formulation runs at 400 times the speed of the best previous unsupervised SfT method while outperforming it on detail and occlusion cases.

What carries the argument

The mesh inextensibility constraint, which penalizes any change in edge lengths of the deforming template mesh during optimization against image observations.

If this is right

  • Reconstruction runs fast enough for real-time video processing on standard hardware without pre-computed matches.
  • Accuracy holds when large regions of the object are invisible, unlike correspondence-dependent methods.
  • No training data or network weights are required, so the same template can be used for new objects immediately.
  • Surface details are recovered more sharply than in prior unsupervised physics-based approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same image-plus-inextensibility formulation might be extended to other physical priors such as bending resistance or volume preservation.
  • Integration with multi-view or temporal consistency terms could further reduce drift in long video sequences.
  • The approach could be tested on objects with known elastic properties to measure how much the strict inextensibility assumption limits applicability.

Load-bearing premise

Image features together with the single geometric constraint of inextensibility supply enough information to determine the correct 3D deformation uniquely even when large parts of the object are hidden.

What would settle it

A benchmark sequence of images with known ground-truth 3D shapes and severe occlusions where the method's output deviates from the true surface by more than the error of a supervised baseline.

Figures

Figures reproduced from arXiv: 2507.22699 by Ruochen Chen, Shaifali Parashar, Thuy Tran.

Figure 1
Figure 1. Figure 1: Image-guided SfT vs SoTA. Our method recovers finer [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our frame-wise image-guided SfT pipeline. The template is provided with a texture map T , the mesh connectivity M and the initial shape x0. We learn the reconstruction on a video sequence frame-by-frame. At current frame t, the deformation network fθ predicts a displacement from x0 to produce the deformed shape xt. A differentiable renderer then projects this mesh to produce a rendered RGB imag… view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparison of DeepSfT vs Ours on Kinect Paper dataset. Depth map error at each frame is reported in mm. v1. We used SAM2 to generate image masks. We also ex￾perimented with ϕ-SfT dataset [17]. It has 4 synthetic se￾quences of cloth deformations simulated using ArcSim [22] and 9 real sequences cloth deformation with various shapes and textures recorded from Azure Kinect v2 camera. The texture and tem… view at source ↗
Figure 4
Figure 4. Figure 4: shows the 3D reconstruction of our method from frontal and side perspectives, together with the ones [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison on frames with strongest self-occlusions on [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison on modeling formulation. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison on using the adaptive data loss. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Illustration of depth map RMSE (mm) on the entire [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗
read the original abstract

Shape-from-Template (SfT) refers to the class of methods that reconstruct the 3D shape of a deforming object from images/videos using a 3D template. Traditional SfT methods require point correspondences between images and the texture of the 3D template in order to reconstruct 3D shapes from images/videos in real time. Their performance severely degrades when encountered with severe occlusions in the images because of the unavailability of correspondences. In contrast, modern SfT methods use a correspondence-free approach by incorporating deep neural networks to reconstruct 3D objects, thus requiring huge amounts of data for supervision. Recent advances use a fully unsupervised or self-supervised approach by combining differentiable physics and graphics to deform 3D template to match input images. In this paper, we propose an unsupervised SfT which uses only image observations: color features, gradients and silhouettes along with a mesh inextensibility constraint to reconstruct at a $400\times$ faster pace than (best-performing) unsupervised SfT. Moreover, when it comes to generating finer details and severe occlusions, our method outperforms the existing methodologies by a large margin. Code is available at https://github.com/dvttran/nsft.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an unsupervised Shape-from-Template (SfT) method that reconstructs 3D shapes of deforming objects from single or few images. It combines image observations (color features, gradients, and silhouettes) with a mesh inextensibility constraint to deform a 3D template, claiming a 400× speedup over the best unsupervised SfT baselines and large-margin improvements in fine-detail recovery and severe-occlusion robustness without point correspondences or supervision.

Significance. If the quantitative claims hold, the work would provide a fast, correspondence-free, and data-efficient alternative to supervised deep SfT approaches, particularly valuable for real-time applications involving occlusions. The public code release supports reproducibility and allows direct verification of the reported speed and accuracy gains.

major comments (2)
  1. [§4 and Table 2] §4 (Experiments) and Table 2: the central claim of outperforming unsupervised SfT by a large margin under severe occlusions requires explicit quantitative support. The abstract states large gains and 400× speedup, yet the reported results must include per-scenario error metrics (e.g., mean vertex error, silhouette IoU) and ablation studies isolating the inextensibility term; without these, the occlusion-robustness advantage cannot be verified.
  2. [§3.2 and Eq. (7)] §3.2 (Mesh Inextensibility Constraint) and Eq. (7): inextensibility is an isometric prior that admits multiple consistent deformations when large image regions are occluded. The optimization (soft penalty or hard projection) must be shown to select the ground-truth mapping among these; otherwise the reported robustness under occlusion rests on an unverified disambiguation assumption rather than on the prior itself. A concrete test (e.g., multiple random initializations or synthetic multi-modal cases) is needed.
minor comments (2)
  1. [§3] Notation for the inextensibility energy term should be defined once and used consistently across equations and text.
  2. [Figure 3] Figure 3 caption should explicitly state the occlusion percentage and viewpoint for each row to allow direct comparison with quantitative tables.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [§4 and Table 2] §4 (Experiments) and Table 2: the central claim of outperforming unsupervised SfT by a large margin under severe occlusions requires explicit quantitative support. The abstract states large gains and 400× speedup, yet the reported results must include per-scenario error metrics (e.g., mean vertex error, silhouette IoU) and ablation studies isolating the inextensibility term; without these, the occlusion-robustness advantage cannot be verified.

    Authors: We agree that additional quantitative details are required to fully support the occlusion-robustness claims. In the revised manuscript we will expand Section 4 and Table 2 with per-scenario metrics (mean vertex error and silhouette IoU) across different occlusion levels. We will also add ablation studies that isolate the contribution of the mesh inextensibility term. These changes will make the reported advantages directly verifiable. revision: yes

  2. Referee: [§3.2 and Eq. (7)] §3.2 (Mesh Inextensibility Constraint) and Eq. (7): inextensibility is an isometric prior that admits multiple consistent deformations when large image regions are occluded. The optimization (soft penalty or hard projection) must be shown to select the ground-truth mapping among these; otherwise the reported robustness under occlusion rests on an unverified disambiguation assumption rather than on the prior itself. A concrete test (e.g., multiple random initializations or synthetic multi-modal cases) is needed.

    Authors: We acknowledge that the inextensibility constraint alone permits multiple solutions under heavy occlusion. Our method combines this prior with image observations (color features, gradients, and silhouettes) to disambiguate the solution. To address the concern, we will add experiments with multiple random initializations on synthetic occluded sequences and report the frequency with which the optimizer recovers the ground-truth deformation. We will also clarify in the text how the image-guided terms guide the optimization to the correct mapping. revision: yes

Circularity Check

0 steps flagged

No circularity: method combines independent image cues with physical constraint

full rationale

The paper presents an optimization-based SfT pipeline that minimizes a composite loss over color features, image gradients, silhouettes, and an explicit mesh inextensibility term. None of these terms is defined in terms of the final 3D output or fitted to the target reconstruction; the inextensibility constraint is a standard isometric prior applied directly to the template mesh. Performance numbers (speed, occlusion robustness) are reported as measured outcomes on held-out data rather than as quantities forced by construction from the inputs. No self-citation is invoked to justify uniqueness or to smuggle an ansatz, and the derivation chain remains self-contained against external image observations and the physical model.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is limited to the abstract; the central claim rests on the domain assumption that inextensibility plus image cues suffice for reconstruction. No free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Mesh inextensibility constraint is a valid and sufficient prior for the deforming objects considered
    Invoked to enable correspondence-free reconstruction from image observations alone

pith-pipeline@v0.9.0 · 5746 in / 1398 out tokens · 55249 ms · 2026-05-19T02:50:49.496387+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 3 internal anchors

  1. [1]

    Simultaneous pose and non-rigid shape with particle dynamics

    Antonio Agudo and Francesc Moreno-Noguer. Simultaneous pose and non-rigid shape with particle dynamics. In CVPR,

  2. [2]

    Large steps in cloth sim- ulation

    David Baraff and Andrew Witkin. Large steps in cloth sim- ulation. In SIGGRAPH, 1998. 2

  3. [3]

    Shape-from-template

    Adrien Bartoli, Yan G ´erard, Francois Chadebecq, Toby Collins, and Daniel Pizarro. Shape-from-template. IEEE TPAMI, 37(10):2099–2118, 2015. 1, 2, 5, 6

  4. [4]

    Brunet, A

    F. Brunet, A. Bartoli, and R.I. Hartley. Monocular template- based 3d surface reconstruction: Convex inextensible and nonconvex isometric methods. Computer Vision and Image Understanding, 125:138–154, 2014. 2

  5. [5]

    Equiareal shape-from-template

    David Casillas-Perez, Daniel Pizarro, David Fuentes- Jimenez, Manuel Mazo, and Adrien Bartoli. Equiareal shape-from-template. Journal of Mathematical Imaging and Vision, 61(5):607–626, 2014. 2

  6. [6]

    Gaps: Geometry-aware, physics-based, self-supervised neural gar- ment draping

    Ruochen Chen, Shaifali Parashar, and Liming Chen. Gaps: Geometry-aware, physics-based, self-supervised neural gar- ment draping. In International Conference on 3D Vision (3DV), 2024. 2, 3, 4

  7. [7]

    Neural ordinary differential equa- tions

    Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equa- tions. Advances in neural information processing systems , 31, 2018. 1

  8. [8]

    A stable analytical framework for isometric shape- from-template by surface integration

    Ajad Chhatkuli, Daniel Pizarro, Adrien Bartoli, and Toby Collins. A stable analytical framework for isometric shape- from-template by surface integration. IEEE TPAMI, 39(5): 833–850, 2016. 1, 2

  9. [9]

    [poster] realtime shape- from-template: System and applications

    Toby Collins and Adrien Bartoli. [poster] realtime shape- from-template: System and applications. In 2015 IEEE In- ternational Symposium on Mixed and Augmented Reality ,

  10. [10]

    Texture-generic deep shape-from-template

    David Fuentes-Jimenez, Daniel Pizarro, David Casillas- Perez, Toby Collins, and Adrien Bartoli. Texture-generic deep shape-from-template. IEEE Access , 9:75211–75230,

  11. [11]

    Deep shape-from- template: Single-image quasi-isometric deformable registra- tion and reconstruction

    David Fuentes-Jimenez, Daniel Pizarro, David Casillas- P´erez, Toby Collins, and Adrien Bartoli. Deep shape-from- template: Single-image quasi-isometric deformable registra- tion and reconstruction. Image and Vision Computing, 127: 104531, 2022. 2, 5

  12. [12]

    Hdm-net: Monocular non-rigid 3d reconstruc- tion with learned deformation model

    Vladislav Golyanik, Soshi Shimada, Kiran Varanasi, and Di- dier Stricker. Hdm-net: Monocular non-rigid 3d reconstruc- tion with learned deformation model. In Virtual Reality and Augmented Reality, 2018. 1, 2

  13. [13]

    Inverse simulation: Reconstructing dynamic geometry of clothed humans via optimal control

    Jingfan Guo, Jie Li, Rahul Narain, and Hyun Soo Park. Inverse simulation: Reconstructing dynamic geometry of clothed humans via optimal control. In CVPR, 2021. 2

  14. [14]

    Template-based monocular 3d recovery of elastic shapes using lagrangian multipliers

    Nazim Haouchine and Stephane Cotin. Template-based monocular 3d recovery of elastic shapes using lagrangian multipliers. In CVPR, 2017. 2

  15. [15]

    libigl: A simple C++ geometry processing library, 2018

    Alec Jacobson, Daniele Panozzo, et al. libigl: A simple C++ geometry processing library, 2018. https://libigl.github.io/. 5

  16. [16]

    Physics-as-inverse-graphics: Unsupervised physical param- eter estimation from video

    Miguel Jaques, Michael Burke, and Timothy Hospedales. Physics-as-inverse-graphics: Unsupervised physical param- eter estimation from video. In ICLR, 2020. 2

  17. [17]

    ϕ-sft: Shape- from-template with a physics-based deformation model

    Navami Kairanda, Edith Tretschk, Mohamed Elgharib, Christian Theobalt, and Vladislav Golyanik. ϕ-sft: Shape- from-template with a physics-based deformation model. In CVPR, 2022. 2, 3, 4, 5, 6, 7, 9, 1

  18. [18]

    Karaev, I

    Nikita Karaev, Iurii Makarov, Jianyuan Wang, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Co- tracker3: Simpler and better point tracking by pseudo- labelling real videos. arXiv preprint arXiv:2410.11831 ,

  19. [19]

    Modular primitives for high-performance differentiable rendering

    Samuli Laine, Janne Hellsten, Tero Karras, Yeongho Seol, Jaakko Lehtinen, and Timo Aila. Modular primitives for high-performance differentiable rendering. ACM TOG, 39 (6):1–14, 2020. 3, 5

  20. [20]

    Deep Physics- aware Inference of Cloth Deformation for Monocular Hu- man Performance Capture

    Yue Li, Marc Habermann, Bernhard Thomaszewski, Stelian Coros, Thabo Beeler, and Christian Theobalt. Deep Physics- aware Inference of Cloth Deformation for Monocular Hu- man Performance Capture . In 3DV, 2021. 2

  21. [21]

    Diffcloth: Differentiable cloth simulation with dry frictional contact

    Yifei Li, Tao Du, Kui Wu, Jie Xu, and Wojciech Matusik. Diffcloth: Differentiable cloth simulation with dry frictional contact. ACM TOG, 42(1), 2022. 2

  22. [22]

    Differen- tiable cloth simulation for inverse problems

    Junbang Liang, Ming Lin, and Vladlen Koltun. Differen- tiable cloth simulation for inverse problems. In NeurIPS,

  23. [23]

    Better together: Joint reasoning for non- rigid 3d reconstruction with specularities and shading

    Qi Liu-Yin, Rui Yu, Lourdes Agapito, Andrew Fitzgibbon, and Chris Russell. Better together: Joint reasoning for non- rigid 3d reconstruction with specularities and shading. In BMVC, 2016. 2

  24. [24]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017. 5

  25. [25]

    Weakly-supervised deep shape-from-template.IEEE Access, 13:22868–22892, 2025

    Sara Luengo-Sanchez, David Fuentes-Jimenez, Cristina Losada-Gutierrez, Daniel Pizarro, and Adrien Bartoli. Weakly-supervised deep shape-from-template.IEEE Access, 13:22868–22892, 2025. 1, 2, 6, 8

  26. [26]

    Elastic shape-from-template with spatially sparse deforming forces

    Abed Malti and C ´edric Herzet. Elastic shape-from-template with spatially sparse deforming forces. In CVPR, 2017. 2

  27. [27]

    Monocular template-based 3d reconstruction of exten- sible surfaces with local linear elasticity

    Abed Malti, Richard Hartley, Adrien Bartoli, and Jae-Hak Kim. Monocular template-based 3d reconstruction of exten- sible surfaces with local linear elasticity. In CVPR, 2013

  28. [28]

    A lin- ear least-squares solution to elastic shape-from-template

    Abed Malti, Adrien Bartoli, and Richard Hartley. A lin- ear least-squares solution to elastic shape-from-template. In CVPR, 2015. 2

  29. [29]

    Nerf: Representing scenes as neural radiance fields for view syn- thesis

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. Communications of the ACM , 65(1):99–106, 2021. 1

  30. [30]

    Rahul Narain, Armin Samii, and James F. O’Brien. Adaptive anisotropic remeshing for cloth simulation. ACM TOG, 31 (6), 2012. 2 9

  31. [31]

    Yoo, and Pascal Fua

    Dat Tien Ngo, Sanghyuk Park, Anne Jorstad, Alberto Criv- ellaro, Chang D. Yoo, and Pascal Fua. Dense image regis- tration and deformable surface reconstruction in presence of occlusions and minimal texture. In ICCV, 2015. 1

  32. [32]

    Template- based monocular 3d shape recovery using laplacian meshes

    Dat Tien Ngo, Jonas ¨Ostlund, and Pascal Fua. Template- based monocular 3d shape recovery using laplacian meshes. IEEE TPAMI, 38(1):172–187, 2016. 1, 2

  33. [33]

    As-rigid-as-possible volumetric shape-from- template

    Shaifali Parashar, Daniel Pizarro, Adrien Bartoli, and Toby Collins. As-rigid-as-possible volumetric shape-from- template. In ICCV, 2015. 2

  34. [34]

    Lo- cal deformable 3d reconstruction with cartan’s connections

    Shaifali Parashar, Daniel Pizarro, and Adrien Bartoli. Lo- cal deformable 3d reconstruction with cartan’s connections. IEEE TPAMI, 42(12):3011–3026, 2020. 2

  35. [35]

    Pytorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019. 5

  36. [36]

    Monocular template-based reconstruction of inextensible surfaces

    Mathieu Perriollat, Richard Hartley, and Adrien Bartoli. Monocular template-based reconstruction of inextensible surfaces. IJCV, 95(2):124–137, 2011. 2

  37. [37]

    Pumarola, A

    A. Pumarola, A. Agudo, L. Porzi, A. Sanfeliu, V . Lepetit, and F. Moreno-Noguer. Geometry-Aware Network for Non- Rigid Shape Prediction from a Single View. In CVPR, 2018. 1, 2

  38. [38]

    Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, and Ming C. Lin. Scalable differentiable physics for learning and control. In ICML, 2020. 2

  39. [39]

    Accelerating 3D Deep Learning with PyTorch3D

    Nikhila Ravi, Jeremy Reizenstein, David Novotny, Tay- lor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020. 3, 5

  40. [40]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:...

  41. [41]

    Guibas, Aaron Hertzmann, Bryan Russell, Ruben Villegas, and Jimei Yang

    Davis Rempe, Leonidas J. Guibas, Aaron Hertzmann, Bryan Russell, Ruben Villegas, and Jimei Yang. Contact and human dynamics from monocular video. In ECCV, 2020. 2

  42. [42]

    Kornia: an open source differentiable computer vision library for pytorch

    Edgar Riba, Dmytro Mishkin, Daniel Ponsa, Ethan Rublee, and Gary Bradski. Kornia: an open source differentiable computer vision library for pytorch. In CVPR, 2020. 5

  43. [43]

    Reconstructing sharply folding surfaces: A convex formulation

    Mathieu Salzmann and Pascal Fua. Reconstructing sharply folding surfaces: A convex formulation. In CVPR, 2009. 2

  44. [44]

    Closed-form solution to non-rigid 3d surface registration

    Mathieu Salzmann, Francesc Moreno-Noguer, Vincent Lep- etit, and Pascal Fua. Closed-form solution to non-rigid 3d surface registration. In ECCV, 2008. 1, 2

  45. [45]

    SNUG: Self-Supervised Neural Dynamic Garments

    Igor Santesteban, Miguel A Otaduy, and Dan Casas. SNUG: Self-Supervised Neural Dynamic Garments. In CVPR, 2022. 2, 3

  46. [46]

    Repulsive shells

    Josua Sassen, Henrik Schumacher, Martin Rumpf, and Keenan Crane. Repulsive shells. TOG, 43(4):1–22, 2024. 7

  47. [47]

    Robusft: Ro- bust real-time shape-from-template, a c++ library.Image and Vision Computing, 141:104867, 2024

    Mohammadreza Shetab-Bushehri, Miguel Aranda, Erol ¨Ozg¨ur, Youcef Mezouar, and Adrien Bartoli. Robusft: Ro- bust real-time shape-from-template, a c++ library.Image and Vision Computing, 141:104867, 2024. 1

  48. [48]

    IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction

    Soshi Shimada, Vladislav Golyanik, Christian Theobalt, and Didier Stricker. IsMo-GAN: Adversarial Learning for Monocular Non-Rigid 3D Reconstruction . In CVPRW,

  49. [49]

    Physics- guided shape-from-template: Monocular video perception through neural surrogate models

    David Stotko, Nils Wandel, and Reinhard Klein. Physics- guided shape-from-template: Monocular video perception through neural surrogate models. In CVPR, 2024. 2, 3, 5, 6, 7, 9, 1

  50. [50]

    Improved shape-from-template method with perspective space constraints for disappearing features

    Dongliang Tan, Huamin Yang, Zhengang Jiang, Weili Shi, Jun Qin, and Feng Qu. Improved shape-from-template method with perspective space constraints for disappearing features. Complex & Intelligent Systems, 10(4):5475–5488,

  51. [51]

    A constrained latent variable model

    Aydin Varol, Mathieu Salzmann, Pascal Fua, and Raquel Ur- tasun. A constrained latent variable model. In CVPR, 2012. 5

  52. [52]

    Correspondence-free ma- terial reconstruction using sparse surface constraints

    Sebastian Weiss, Robert Maier, Daniel Cremers, R ¨udiger Westermann, and Nils Thuerey. Correspondence-free ma- terial reconstruction using sparse surface constraints. In CVPR, 2020. 2

  53. [53]

    Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

    Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. In CVPR, pages 20331–20341, 2024. 1

  54. [54]

    Shape-from-shading: a survey

    Ruo Zhang, Ping-Sing Tsai, James Edwin Cryer, and Mubarak Shah. Shape-from-shading: a survey. TPAMI, 21 (8):690–706, 2002. 8

  55. [55]

    Particle-sft: A provably- convergent, fast shape-from-template algorithm

    Erol ¨Ozg¨ur and Adrien Bartoli. Particle-sft: A provably- convergent, fast shape-from-template algorithm. IJCV, 123 (2):184–205, 2017. 2 10 Image-Guided Shape-from-Template Using Mesh Inextensibility Constraints Supplementary Material

  56. [56]

    Deformation Network The object deformation at time step t can be represented by φ(x0, t; zt) = xt − x0, for the initial state x0 and latent variables zt. We can learn such deformation in the follow- ing ways: (1) vertex-offset: consider φ := ∆ xt ∈ R3V as learning variables directly, (2) physics-based: sequentially predict the per-time-step acceleration a...

  57. [57]

    We found it to be as computationally inefficient as ϕ-SfT

    to learn the sequence’s velocity field. We found it to be as computationally inefficient as ϕ-SfT. Our method, on the other hand, allows optimizing in a frame-wise strategy de- scribed in Section 3.6. This brings the novelty in our work, where we can reconstruct high-quality shapes efficiently

  58. [58]

    Adaptive Data Loss Rather than using the standardℓ-p norm for the vision losses as ϕ-SfT[17] or PGSfT[49], we propose to use an adaptive data loss as described in Section 3.3

    Loss Functions 7.1. Adaptive Data Loss Rather than using the standardℓ-p norm for the vision losses as ϕ-SfT[17] or PGSfT[49], we propose to use an adaptive data loss as described in Section 3.3. Its adaptive nature comes from the exponential weighting factor on the pixel difference d, which allows color errors to remain in fo- cus during optimization des...

  59. [59]

    Additionally, we discuss some possible extensions in Sections 8.2 and 8.3

    Optimization Strategies We analyze the complexity of the proposed frame-wise op- timization strategy in Section 8.1 in comparison with the strategies from physics-based approaches. Additionally, we discuss some possible extensions in Sections 8.2 and 8.3. Depending on the practical applications, future works can develope from these fundamentals to balance...