ShowMak3r: Compositional TV Show Reconstruction
Pith reviewed 2026-05-22 17:51 UTC · model grok-4.3
The pith
ShowMak3r reconstructs TV show videos into dynamic 3D radiance fields that support new camera views and actor edits at different times.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ShowMak3r is a comprehensive reconstruction pipeline that allows the editing of scenes like how video clips are made in a production control room. In ShowMak3r, a 3DLocator module locates recovered actors on the stage using depth prior and estimates unseen human poses via interpolation. The proposed ShotMatcher module then tracks the actors under shot changes. Furthermore, ShowMak3r introduces a face-fitting network that dynamically recovers the actors' expressions. Experiments on Sitcoms3D dataset show that our pipeline can reassemble TV show scenes with new cameras at different timestamps.
What carries the argument
The ShowMak3r pipeline, which integrates 3DLocator for depth-based actor placement and pose interpolation, ShotMatcher for cross-shot tracking, and a dynamic face-fitting network to recover expressions, builds an editable dynamic radiance field from multi-shot TV video.
Load-bearing premise
The method assumes depth priors plus pose interpolation and shot tracking can place actors and handle occlusions and cuts without creating large errors in the final 3D field.
What would settle it
Visible artifacts or wrong actor locations when the reconstructed field is rendered from entirely new camera angles and timestamps not present in the input clips would show the claim is incorrect.
Figures
read the original abstract
Reconstructing dynamic radiance fields from video clips is challenging, especially when entertainment videos like TV shows are given. Many challenges make the reconstruction difficult due to (1) actors occluding with each other and having diverse facial expressions, (2) cluttered stages, and (3) small baseline views or sudden shot changes. To address these issues, we present ShowMak3r, a comprehensive reconstruction pipeline that allows the editing of scenes like how video clips are made in a production control room. In ShowMak3r, a 3DLocator module locates recovered actors on the stage using depth prior and estimates unseen human poses via interpolation. The proposed ShotMatcher module then tracks the actors under shot changes. Furthermore, ShowMak3r introduces a face-fitting network that dynamically recovers the actors' expressions. Experiments on Sitcoms3D dataset show that our pipeline can reassemble TV show scenes with new cameras at different timestamps. We also demonstrate that ShowMak3r enables interesting applications such as synthetic shot-making, actor relocation, insertion, deletion, and pose manipulation. Project page : https://nstar1125.github.io/showmak3r
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents ShowMak3r, a pipeline for reconstructing dynamic radiance fields from TV show video clips. It addresses challenges of actor occlusions with diverse expressions, cluttered stages, small baselines, and abrupt shot changes via three modules: 3DLocator (depth priors plus pose interpolation for actor localization and unseen poses), ShotMatcher (actor tracking across shot changes), and a face-fitting network (dynamic expression recovery). Experiments on the Sitcoms3D dataset are said to demonstrate reassembly of scenes under novel cameras and timestamps, plus applications including synthetic shot-making, actor relocation/insertion/deletion, and pose manipulation.
Significance. If the pipeline's modules prove robust, the work would provide a practical system for compositional editing of dynamic entertainment video, extending radiance-field methods to production-style control-room operations. Targeted handling of shot changes and expressions could enable new virtual-production workflows, though the lack of supporting numbers leaves the magnitude of the advance unclear.
major comments (1)
- [Experiments] Experiments section: the central claim that the pipeline 'can reassemble TV show scenes with new cameras at different timestamps' and supports reliable editing applications rests on 3DLocator and ShotMatcher recovering accurate 3D positions and radiance fields. No quantitative metrics (novel-view PSNR/SSIM on held-out timestamps, 3D localization error, or ablation on depth-prior vs. interpolation) are reported to bound errors under mutual occlusions, diverse expressions, or abrupt cuts; without these the robustness assumption remains unverified and load-bearing for the editing claims.
minor comments (1)
- [Abstract] Abstract: the project-page URL is given without a period or proper formatting; consider moving it to a footnote or the end of the paper.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address the concern about experimental validation below and will incorporate the suggested improvements in the revision.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the central claim that the pipeline 'can reassemble TV show scenes with new cameras at different timestamps' and supports reliable editing applications rests on 3DLocator and ShotMatcher recovering accurate 3D positions and radiance fields. No quantitative metrics (novel-view PSNR/SSIM on held-out timestamps, 3D localization error, or ablation on depth-prior vs. interpolation) are reported to bound errors under mutual occlusions, diverse expressions, or abrupt cuts; without these the robustness assumption remains unverified and load-bearing for the editing claims.
Authors: We agree that the current experiments rely primarily on qualitative demonstrations and that quantitative metrics would better support the robustness claims. In the revised manuscript we will add novel-view PSNR and SSIM scores on held-out timestamps, report 3D localization error for the 3DLocator module, and include an ablation study comparing depth priors against pose interpolation. These additions will directly address performance under mutual occlusions, diverse expressions, and abrupt shot changes. revision: yes
Circularity Check
No circularity: pipeline modules are independently specified without definitional reduction
full rationale
The paper describes a reconstruction pipeline (3DLocator using depth priors and pose interpolation, ShotMatcher for tracking under shot changes, and a face-fitting network) that maps input video clips to editable dynamic radiance fields. No equations, fitted parameters, or self-citations appear in the provided text that would make any output quantity definitionally equivalent to its inputs. The central claims rest on the empirical behavior of these proposed modules on the Sitcoms3D dataset rather than on any self-referential loop or imported uniqueness result. The derivation chain is therefore self-contained and non-circular.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Face-fitting network that refines color and opacity residuals via MLP on positional encodings of Gaussian centers and time; combined with SDS loss for unobserved regions.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Nerf: Representing scenes as neural radiance fields for view synthesis,
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” inEuropean Conference on Computer Vision, 2020
work page 2020
-
[2]
3d gaussian splatting for real-time radiance field rendering
B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.”ACM Transactions on Graphics, vol. 42, no. 4, pp. 139–1, 2023
work page 2023
-
[3]
Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields,
K. Park, U. Sinha, P. Hedman, J. T. Barron, S. Bouaziz, D. B. Goldman, R. Martin-Brualla, and S. M. Seitz, “Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields,”ACM Transactions on Graphics, vol. 40, no. 6, pp. 1–12, 2021
work page 2021
-
[4]
Deformable 3d gaussian splatting for animatable human avatars,
H. Jung, N. Brasch, J. Song, E. Perez-Pellitero, Y . Zhou, Z. Li, N. Navab, and B. Busam, “Deformable 3d gaussian splatting for animatable human avatars,” inarXiv, 2023
work page 2023
-
[5]
4d gaussian splatting for real-time dynamic scene rendering,
G. Wu, T. Yi, J. Fang, L. Xie, X. Zhang, W. Wei, W. Liu, Q. Tian, and X. Wang, “4d gaussian splatting for real-time dynamic scene rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 310–20 320. Reference Frame (c) Actor Insertion (a) Actor Deletion (d) Pose Manipulation (b) Actor Relocation Fig. 13...
work page 2024
-
[6]
NeuMan: Neural Human Radiance Field from a Single Video,
W. Jiang, K. M. Yi, G. Samei, O. Tuzel, and A. Ranjan, “NeuMan: Neural Human Radiance Field from a Single Video,” inEuropean Conference on Computer Vision, S. Avidan, G. Brostow, M. Ciss´e, G. M. Farinella, and T. Hassner, Eds. Cham: Springer Nature Switzerland, 2022, pp. 402–418
work page 2022
-
[7]
M. Kocabas, J.-H. R. Chang, J. Gabriel, O. Tuzel, and A. Ranjan, “Hugs: Human gaussian splats,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 505–515
work page 2024
-
[8]
Shape of motion: 4d reconstruction from a single video,
Q. Wang, V . Ye, H. Gao, W. Zeng, J. Austin, Z. Li, and A. Kanazawa, “Shape of motion: 4d reconstruction from a single video,” inInterna- tional Conference on Computer Vision (ICCV), 2025
work page 2025
-
[9]
Monst3r: A simple approach for estimating geometry in the presence of motion,
J. Zhang, C. Herrmann, J. Hur, V . Jampani, T. Darrell, F. Cole, D. Sun, and M.-H. Yang, “Monst3r: A simple approach for estimating geometry in the presence of motion,”International Conference on Learning Representations, 2025
work page 2025
-
[10]
Align3r: Aligned monocular depth estimation for dynamic videos,
J. Lu, T. Huang, P. Li, Z. Dou, C. Lin, Z. Cui, Z. Dong, S.-K. Yeung, W. Wang, and Y . Liu, “Align3r: Aligned monocular depth estimation for dynamic videos,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 22 820–22 830
work page 2025
-
[11]
Continuous 3d perception model with persistent state,
Q. Wang*, Y . Zhang*, A. Holynski, A. A. Efros, and A. Kanazawa, “Continuous 3d perception model with persistent state,” inCVPR, 2025
work page 2025
-
[12]
Megasam: Accurate, fast and robust structure and motion from casual dynamic videos,
Z. Li, R. Tucker, F. Cole, Q. Wang, L. Jin, V . Ye, A. Kanazawa, A. Holynski, and N. Snavely, “Megasam: Accurate, fast and robust structure and motion from casual dynamic videos,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 10 486–10 496
work page 2025
-
[13]
Showmak3r: Compositional tv show reconstruction,
S. Kim, S. Do, and J. Park, “Showmak3r: Compositional tv show reconstruction,”CVPR, 2025
work page 2025
-
[14]
The one where they reconstructed 3d humans and environments in tv shows,
G. Pavlakos, E. Weber, M. Tancik, and A. Kanazawa, “The one where they reconstructed 3d humans and environments in tv shows,” in European Conference on Computer Vision. Springer, 2022, pp. 732– 749
work page 2022
-
[15]
Panoptic studio: A massively multiview system for social motion capture,
H. Joo, H. Liu, L. Tan, L. Gui, B. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y . Sheikh, “Panoptic studio: A massively multiview system for social motion capture,” inProceedings of IEEE International Conference on Computer Vision, 2015, pp. 3334–3342
work page 2015
-
[16]
Neural 3d video synthesis from multi-view video,
T. Li, M. Slavcheva, M. Zollhoefer, S. Green, C. Lassner, C. Kim, T. Schmidt, S. Lovegrove, M. Goesele, R. Newcombeet al., “Neural 3d video synthesis from multi-view video,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5521–5531
work page 2022
-
[17]
Dynibar: Neural dynamic image-based rendering,
Z. Li, Q. Wang, F. Cole, R. Tucker, and N. Snavely, “Dynibar: Neural dynamic image-based rendering,” inProceedings of the IEEE/CVF PREPRINT 11 Conference on Computer Vision and Pattern Recognition, 2023, pp. 4273–4284
work page 2023
-
[18]
Nerfies: Deformable neural radiance fields,
K. Park, U. Sinha, J. T. Barron, S. Bouaziz, D. B. Goldman, S. M. Seitz, and R. Martin-Brualla, “Nerfies: Deformable neural radiance fields,” inProceedings of IEEE International Conference on Computer Vision, 2021, pp. 5865–5874
work page 2021
-
[19]
D- nerf: Neural radiance fields for dynamic scenes,
A. Pumarola, E. Corona, G. Pons-Moll, and F. Moreno-Noguer, “D- nerf: Neural radiance fields for dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 318–10 327
work page 2021
-
[20]
Hexplane: A fast representation for dynamic scenes,
A. Cao and J. Johnson, “Hexplane: A fast representation for dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 130–141
work page 2023
-
[21]
Neural scene flow fields for space-time view synthesis of dynamic scenes,
Z. Li, S. Niklaus, N. Snavely, and O. Wang, “Neural scene flow fields for space-time view synthesis of dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2021, pp. 6498–6508
work page 2021
-
[22]
Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds,
J. Lei, Y . Weng, A. Harley, L. Guibas, and K. Daniilidis, “Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds,” inarXiv, 2024
work page 2024
-
[23]
Dynamic gaussian marbles for novel view synthesis of casual monocular videos,
C. Stearns, A. Harley, M. Uy, F. Dubost, F. Tombari, G. Wetzstein, and L. Guibas, “Dynamic gaussian marbles for novel view synthesis of casual monocular videos,” inSIGGRAPH Asia 2024 Conference Papers, 2024, pp. 1–11
work page 2024
-
[24]
Gflow: Recovering 4d world from monocular video,
S. Wang, X. Yang, Q. Shen, Z. Jiang, and X. Wang, “Gflow: Recovering 4d world from monocular video,” inAssociation for the Advancement of Artificial Intelligence, 2025
work page 2025
-
[25]
K-planes: Explicit radiance fields in space, time, and appearance,
S. Fridovich-Keil, G. Meanti, F. R. Warburg, B. Recht, and A. Kanazawa, “K-planes: Explicit radiance fields in space, time, and appearance,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2023, pp. 12 479–12 488
work page 2023
-
[26]
Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields,
L. Song, A. Chen, Z. Li, Z. Chen, L. Chen, J. Yuan, Y . Xu, and A. Geiger, “Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields,”IEEE Transactions on Visualization and Computer Graphics, vol. 29, no. 5, pp. 2732–2742, 2023
work page 2023
-
[27]
Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle,
Y . Lin, Z. Dai, S. Zhu, and Y . Yao, “Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21 136–21 145
work page 2024
-
[28]
Bard-gs: Blur-aware re- construction of dynamic scenes via gaussian splatting,
Y . Lu, Y . Zhou, D. Liu, T. Liang, and Y . Yin, “Bard-gs: Blur-aware re- construction of dynamic scenes via gaussian splatting,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
work page 2025
-
[29]
Mem4d: Decoupling static and dynamic memory for dynamic scene reconstruction,
Y . Wang, D. Ceylan, and L. Agapito, “Mem4d: Decoupling static and dynamic memory for dynamic scene reconstruction,”arXiv preprint arXiv:2508.07908, 2025
-
[30]
HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video,
C.-Y . Weng, B. Curless, P. P. Srinivasan, J. T. Barron, and I. Kemelmacher-Shlizerman, “HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp. 16 210–16 220
work page 2022
-
[31]
Humannerf: Efficiently generated human radiance field from sparse inputs,
F. Zhao, W. Yang, J. Zhang, P. Lin, Y . Zhang, J. Yu, and L. Xu, “Humannerf: Efficiently generated human radiance field from sparse inputs,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7743–7753
work page 2022
-
[32]
C. Guo, T. Jiang, X. Chen, J. Song, and O. Hilliges, “Vid2Avatar: 3D Avatar Reconstruction From Videos in the Wild via Self-Supervised Scene Decomposition,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp. 12 858–12 868
work page 2023
-
[33]
MonoHuman: Animat- able Human Neural Field From Monocular Video,
Z. Yu, W. Cheng, X. Liu, W. Wu, and K.-Y . Lin, “MonoHuman: Animat- able Human Neural Field From Monocular Video,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp. 16 943–16 953
work page 2023
-
[34]
Learning Neural V olumetric Representations of Dynamic Humans in Minutes,
C. Geng, S. Peng, Z. Xu, H. Bao, and X. Zhou, “Learning Neural V olumetric Representations of Dynamic Humans in Minutes,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp. 8759–8770
work page 2023
-
[35]
L. Hu, H. Zhang, Y . Zhang, B. Zhou, B. Liu, S. Zhang, and L. Nie, “Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 634– 644
work page 2024
-
[36]
Gauhuman: Articulated gaussian splatting from monocular human videos,
S. Hu, T. Hu, and Z. Liu, “Gauhuman: Articulated gaussian splatting from monocular human videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 418–20 431
work page 2024
-
[37]
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting,
Y . Jiang, Z. Shen, P. Wang, Z. Su, Y . Hong, Y . Zhang, J. Yu, and L. Xu, “HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2024, pp. 19 734–19 745
work page 2024
-
[38]
Z. Li, Z. Zheng, L. Wang, and Y . Liu, “Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Model- ing,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2024, pp. 19 711–19 722
work page 2024
-
[39]
Expressive Whole-Body 3D Gaus- sian Avatar,
G. Moon, T. Shiratori, and S. Saito, “Expressive Whole-Body 3D Gaus- sian Avatar,” inEuropean Conference on Computer Vision, A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, and G. Varol, Eds. Cham: Springer Nature Switzerland, 2024, pp. 19–35
work page 2024
-
[40]
Human Gaussian Splatting: Real-time Rendering of Animat- able Avatars,
A. Moreau, J. Song, H. Dhamo, R. Shaw, Y . Zhou, and E. P ´erez- Pellitero, “Human Gaussian Splatting: Real-time Rendering of Animat- able Avatars,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2024, pp. 788–798
work page 2024
-
[41]
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering,
H. Pang, H. Zhu, A. Kortylewski, C. Theobalt, and M. Habermann, “ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2024, pp. 1165–1175
work page 2024
-
[42]
Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians,
S. Qian, T. Kirschstein, L. Schoneveld, D. Davoli, S. Giebenhain, and M. Nießner, “Gaussianavatars: Photorealistic head avatars with rigged 3d gaussians,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 20 299–20 309
work page 2024
-
[43]
3DGS- Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting,
Z. Qian, S. Wang, M. Mihajlovic, A. Geiger, and S. Tang, “3DGS- Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2024, pp. 5020–5030
work page 2024
-
[44]
DEGAS: Detailed Expressions on Full-Body Gaussian Avatars,
Z. Shao, D. Wang, Q.-Y . Tian, Y .-D. Yang, H. Meng, Z. Cai, B. Dong, Y . Zhang, K. Zhang, and Z. Wang, “DEGAS: Detailed Expressions on Full-Body Gaussian Avatars,” inProceedings of the International Conference on 3D Vision (3DV), 2025
work page 2025
-
[45]
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting,
Z. Shao, Z. Wang, Z. Li, D. Wang, X. Lin, Y . Zhang, M. Fan, and Z. Wang, “SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2024, pp. 1606–1616
work page 2024
-
[46]
Humanrf: High-fidelity neural radiance fields for humans in motion,
M. Is ¸ık, M. R¨unz, M. Georgopoulos, T. Khakhulin, J. Starck, L. Agapito, and M. Nießner, “Humanrf: High-fidelity neural radiance fields for humans in motion,”ACM Transactions on Graphics, vol. 42, no. 4, pp. 1–12, 2023. [Online]. Available: https://doi.org/10.1145/3592415
-
[47]
A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose,
S.-Y . Su, F. Yu, M. Zollh¨ofer, and H. Rhodin, “A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose,”Ad- vances in Neural Information Processing Systems, vol. 34, pp. 12 278– 12 291, 2021
work page 2021
-
[48]
Animatable neural radiance fields for modeling dynamic human bod- ies,
S. Peng, J. Dong, Q. Wang, S. Zhang, Q. Shuai, X. Zhou, and H. Bao, “Animatable neural radiance fields for modeling dynamic human bod- ies,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 314–14 323
work page 2021
-
[49]
Dream, lift, animate: From single images to animatable gaussian avatars,
M. C. Buehler, Y . Yuan, X. Li, Y . Huang, K. Nagano, and U. Iqbal, “Dream, lift, animate: From single images to animatable gaussian avatars,” 2025
work page 2025
-
[50]
MoGA: 3d Gen- erative Avatar Prior for Monocular Gaussian Avatar Reconstruction,
Z. Dong, L. Duan, J. Song, M. J. Black, and A. Geiger, “MoGA: 3d Gen- erative Avatar Prior for Monocular Gaussian Avatar Reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025
work page 2025
-
[51]
Vid2avatar- pro: Authentic avatar from videos in the wild via universal prior,
C. Guo, J. Li, Y . Kant, Y . Sheikh, S. Saito, and C. Cao, “Vid2avatar- pro: Authentic avatar from videos in the wild via universal prior,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025
work page 2025
-
[52]
M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “Smpl: a skinned multi-person linear model,”ACM Trans. Graph., vol. 34, no. 6, Oct. 2015. [Online]. Available: https: //doi.org/10.1145/2816795.2818013
-
[53]
Expressive body capture: 3d hands, face, and body from a single image,
G. Pavlakos, V . Choutas, N. Ghorbani, T. Bolkart, A. A. Osman, D. Tzionas, and M. J. Black, “Expressive body capture: 3d hands, face, and body from a single image,” inProceedings of IEEE International Conference on Computer Vision, 2019, pp. 10 975–10 985
work page 2019
-
[54]
Sherf: Generalizable human nerf from a single image,
S. Hu, F. Hong, L. Pan, H. Mei, L. Yang, and Z. Liu, “Sherf: Generalizable human nerf from a single image,” inProceedings of IEEE International Conference on Computer Vision, 2023, pp. 9352–9364
work page 2023
-
[55]
Ghunerf: Generalizable human nerf from a monocular video,
C. Li, J. Lin, and G. H. Lee, “Ghunerf: Generalizable human nerf from a monocular video,” in2024 International Conference on 3D Vision (3DV). IEEE, 2024, pp. 923–932
work page 2024
-
[56]
Ghnerf: Learning generalizable human features with PREPRINT 12 efficient neural radiance fields,
A. Dey, D. Yang, R. Agaram, A. Dantcheva, A. I. Comport, S. Sridhar, and J. Martinet, “Ghnerf: Learning generalizable human features with PREPRINT 12 efficient neural radiance fields,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024, pp. 2812– 2821
work page 2024
-
[57]
Eg-humannerf: Efficient gener- alizable human nerf utilizing human prior for sparse view,
Z. Wang, Y . Kanamori, and Y . Endo, “Eg-humannerf: Efficient gener- alizable human nerf utilizing human prior for sparse view,” inarXiv, 2024
work page 2024
-
[58]
Actorsnerf: Animatable few-shot human rendering with generalizable nerfs,
J. Mu, S. Sang, N. Vasconcelos, and X. Wang, “Actorsnerf: Animatable few-shot human rendering with generalizable nerfs,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 391–18 401
work page 2023
-
[59]
GAIA: Generative animatable interactive avatars with expression-conditioned gaussians,
Z. Yu, T. Li, J. Sun, O. Shapira, S. Park, M. Stengel, M. Chan, X. Li, W. Wang, K. Nagano, and S. D. Mello, “GAIA: Generative animatable interactive avatars with expression-conditioned gaussians,” inACM SIGGRAPH, 2025
work page 2025
-
[60]
High- resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2022, pp. 10 684–10 695
work page 2022
-
[61]
Sdxl: Improving latent diffusion models for high-resolution image synthesis,
D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. M ¨uller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,” inInternational Conference on Learning Representations, 2024
work page 2024
-
[62]
Z. Wang, C. Lu, Y . Wang, F. Bao, C. Li, H. Su, and J. Zhu, “Prolific- dreamer: High-fidelity and diverse text-to-3d generation with variational score distillation,” inAdvances in Neural Information Processing Sys- tems, vol. 36, 2023, pp. 8406–8441
work page 2023
-
[63]
Zero- shot text-guided object generation with dream fields,
A. Jain, B. Mildenhall, J. T. Barron, P. Abbeel, and B. Poole, “Zero- shot text-guided object generation with dream fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 867–876
work page 2022
-
[64]
Dreamfusion: Text-to-3d using 2d diffusion,
B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “Dreamfusion: Text-to-3d using 2d diffusion,” inInternational Conference on Learning Representations, 2023
work page 2023
-
[65]
Diffusion models as plug-and-play priors,
A. Graikos, N. Malkin, N. Jojic, and D. Samaras, “Diffusion models as plug-and-play priors,” inAdvances in Neural Information Processing Systems, vol. 35, 2022, pp. 14 715–14 728
work page 2022
-
[66]
Guess the unseen: Dynamic 3d scene re- construction from partial 2d glimpses,
I. Lee, B. Kim, and H. Joo, “Guess the unseen: Dynamic 3d scene re- construction from partial 2d glimpses,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1062–1071
work page 2024
-
[67]
A. Dutta, M. Zheng, Z. Gao, B. Planche, A. Choudhuri, T. Chen, A. K. Roy-Chowdhury, and Z. Wu, “Chrome: Clothed human reconstruction with occlusion-resilience and multiview-consistency from a single im- age,” inarXiv, 2025
work page 2025
-
[68]
Scaffoldavatar: High-fidelity gaussian avatars with patch expressions,
S. Aneja, S. Weiss, I. Baeza, P. Chandran, G. Zoss, M. Nießner, and D. Bradley, “Scaffoldavatar: High-fidelity gaussian avatars with patch expressions,”ACM Trans. Graph., vol. 44, no. 4, 2025
work page 2025
-
[69]
Nerf in the wild: Neural radiance fields for unconstrained photo collections,
R. Martin-Brualla, N. Radwan, M. S. Sajjadi, J. T. Barron, A. Doso- vitskiy, and D. Duckworth, “Nerf in the wild: Neural radiance fields for unconstrained photo collections,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7210–7219
work page 2021
-
[70]
Omnire: Omni urban scene reconstruction,
Z. Chen, J. Yang, J. Huang, R. de Lutio, J. M. Esturo, B. Ivanovic, O. Litany, Z. Gojcic, S. Fidler, M. Pavoneet al., “Omnire: Omni urban scene reconstruction,” inInternational Conference on Learning Representations, 2025
work page 2025
-
[71]
Ephraim katz’s the film encyclopedia,
E. Katz, “Ephraim katz’s the film encyclopedia,” 1979
work page 1979
-
[72]
Sklar,Film: An International History of the Medium
R. Sklar,Film: An International History of the Medium. Thames and Hudson, 1990
work page 1990
-
[73]
Global structure-from-motion revisited,
L. Pan, D. Bar ´ath, M. Pollefeys, and J. L. Sch ¨onberger, “Global structure-from-motion revisited,” inEuropean Conference on Computer Vision, 2024
work page 2024
-
[74]
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of IEEE International Conference on Computer Vision, 2023, pp. 4015–4026
work page 2023
-
[75]
Structure-from-motion revisited,
J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016, pp. 4104–4113
work page 2016
-
[76]
$\pi^3$: Permutation-Equivariant Visual Geometry Learning
Y . Wang, J. Zhou, H. Zhu, W. Chang, Y . Zhou, Z. Li, J. Chen, J. Pang, C. Shen, and T. He, “π 3: Scalable permutation-equivariant visual geometry learning,” 2025. [Online]. Available: https://arxiv.org/ abs/2507.13347
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[77]
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
R. Wang, S. Xu, Y . Dong, Y . Deng, J. Xiang, Z. Lv, G. Sun, X. Tong, and J. Yang, “Moge-2: Accurate monocular geometry with metric scale and sharp details,”arXiv preprint arXiv:2507.02546, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[78]
Dn-splatter: Depth and normal priors for gaussian splatting and meshing,
M. Turkulainen, X. Ren, I. Melekhov, O. Seiskari, E. Rahtu, and J. Kannala, “Dn-splatter: Depth and normal priors for gaussian splatting and meshing,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
work page 2025
-
[79]
Depth-regularized optimization for 3d gaussian splatting in few-shot images,
J. Chung, J. Oh, and K. M. Lee, “Depth-regularized optimization for 3d gaussian splatting in few-shot images,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 811– 820
work page 2024
-
[80]
ObjectClear: Complete object removal via object-effect attention,
J. Zhao, S. Zhou, Z. Wang, P. Yang, and C. C. Loy, “ObjectClear: Complete object removal via object-effect attention,” inarXiv preprint arXiv:2505.22636, 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.