pith. sign in

arxiv: 2511.05152 · v2 · submitted 2025-11-07 · 💻 cs.CV · cs.GR· cs.MM

Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges

Pith reviewed 2026-05-18 00:19 UTC · model grok-4.3

classification 💻 cs.CV cs.GRcs.MM
keywords Gaussian SplattingDynamic 3D ReconstructionSparse Multi-View VideoForeground Background SeparationFilmmakingDeformable ModelsTransparent TexturesScene Segmentation
0
0 comments X

The pith

Splitting Gaussians into foreground and background with sparse initial masks enables high-quality dynamic 3D reconstructions from sparse camera views.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deformable Gaussian Splatting technique for dynamic 3D scenes captured with the sparse camera setups typical of low-budget filmmaking. It divides the canonical representation into separate foreground and background components using only a few masks from the first frame. These components receive independent pre-training with custom loss functions. During dynamic training the foreground deformation field learns changes to color, position and rotation while the background learns only position shifts. Experiments on entertainment datasets show improved visual quality and efficiency plus the added ability to output segmented reconstructions that include transparent and moving textures.

Core claim

Deformable Gaussian Splatting is extended by splitting the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks at t=0. Each component is pre-trained separately on different loss functions. During dynamic training the foreground models changes in color, position and rotation while the background models only position changes, following the observation that backgrounds in film sets are typically dimmer and less dynamic. This yields state-of-the-art results on 3-D and 2.5-D entertainment datasets, up to 3 PSNR higher and with half the model size on 3-D scenes, while also producing segmented dynamic reconstructions that include transparo

What carries the argument

Foreground-background split of the canonical Gaussian representation and deformation fields, driven by sparse t=0 masks and separate deformation parameter sets for each component.

If this is right

  • Achieves up to 3 PSNR higher accuracy with roughly half the model size on 3-D scenes versus prior deformable Gaussian Splatting methods.
  • Produces segmented dynamic reconstructions that include transparent and moving textures without requiring dense mask supervision.
  • Supports complex dynamic feature capture under the sparse camera counts common in budget filmmaking.
  • Separately optimizes foreground and background with tailored losses during the canonical pre-training stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The built-in segmentation could simplify downstream visual-effects compositing and editing of filmed scenes.
  • Lower camera counts enabled by the method may reduce on-set capture costs for high-quality 3D content.
  • The same foreground-background split idea could transfer to other dynamic reconstruction pipelines that currently require dense inputs.
  • Further optimization might allow the approach to run in near real time for virtual-production pipelines.

Load-bearing premise

A sparse set of masks at the initial frame is enough to separate foreground from background, and the background is always dimmer and less dynamic so that only position changes need to be learned.

What would settle it

A test scene containing strongly dynamic background elements or complex foreground-background lighting interactions where reconstruction quality drops sharply compared with dense-mask baselines.

Figures

Figures reproduced from arXiv: 2511.05152 by Adrian Azzarelli, David R Bull, Nantheera Anantrasirichai.

Figure 1
Figure 1. Figure 1: Sparse view 3-D reconstruction: Our dynamic representation offers foreground-background separability and high quality 3-D reconstruction without the need for dense mask priors. This paper focuses on filmmaking challenges, including but not limited to sparse view and reflective, transparent and dynamic textures Abstract Deformable Gaussian Splatting (GS) accomplishes pho￾torealistic dynamic 3-D reconstructi… view at source ↗
Figure 2
Figure 2. Figure 2: Novel views (right) reveal over-reconstructed back [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Left: The canonical representation is constructed by masking the initial point cloud and training the foreground and background representations Gf and Gb on specialized loss functions that minimize over-reconstruction. Right: Dynamic features for Gf and Gb are jointly trained using the proposed plane-based design. For Gb, we only learn motion. For Gf , we learn motion, rotation and color change using a nov… view at source ↗
Figure 4
Figure 4. Figure 4: Full and Zoom Temporal Comparison: The zoom results show that our method is the only one capable of capturing the visual dynamics of the semi-transparent key-chain. Using the ViVo-Bassist scene [4] 4.4. Densification Prior works on adaptive density for dynamic GS [22, 36] track parameters that indicate point importance based on full-reference reconstruction quality. In SV3D, sparsely viewed background domi… view at source ↗
Figure 5
Figure 5. Figure 5: Per-Frame and Per-View PSNR Plot: The surrounding plots show the PSNR result and objectively demonstrate our approach is consistently performant. Full and Zoom Frame Comparison: Our-Foreground (labeled Ours) reconstructs keyboard, arms and feet with more visual appeal. Using the ViVo-Pianist scene [4] [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Frame-by-frame test view on the sparse DyNeRF dataset [26]; 4 training cameras at the extremities were used instead of 20. The masked ground truth is on the left 5. Experiments This section evaluates performance on two entertainment datasets, [4, 26]. All tests used an Nvidia RTX 3090 (24GB of VRAM). In the appendix and online, we provide more frame-by-frame and video comparisons. 5.1. Real 3-D Cinematogra… view at source ↗
Figure 7
Figure 7. Figure 7: Visual and average metric results of Ablation. PSNR and LPIPS-Alex are masked evaluations of the ground truth foreground (left). Opt. flow is a MSE optical flow metric (see appendix) the four most distant cameras from the test camera, for 50 frames at 1080p resolution. This experiment serves to as￾sess the quality on sparse views that do share background features. Tab. 1 and [PITH_FULL_IMAGE:figures/full_… view at source ↗
read the original abstract

Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in filmmaking, tight budgets can result in sparse camera configurations, which limits state-of-the-art (SotA) methods when capturing complex dynamic features. To address this issue, we introduce an approach that splits the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks for frames at t=0. Each representation is separately trained on different loss functions during canonical pre-training. Then, during dynamic training, different parameters are modeled for each deformation field following common filmmaking practices. The foreground stage contains diverse dynamic features so changes in color, position and rotation are learned. While, the background containing film-crew and equipment, is typically dimmer and less dynamic so only changes in point position are learned. Experiments on 3-D and 2.5-D entertainment datasets show that our method produces SotA qualitative and quantitative results; up to 3 PSNR higher with half the model size on 3-D scenes. Unlike the SotA and without the need for dense mask supervision, our method also produces segmented dynamic reconstructions including transparent and dynamic textures. Code and video comparisons are available online: https://azzarelli.github.io/splatographypage/index.html

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes Splatography for sparse multi-view dynamic Gaussian Splatting in filmmaking. It splits the canonical Gaussian representation and deformation field into foreground and background components using only sparse masks at t=0, trains each with separate loss functions during canonical pre-training, and during dynamic training applies full color/position/rotation updates to the foreground while restricting the background to position-only updates under the assumption that it is typically dimmer and less dynamic. Experiments on 3-D and 2.5-D entertainment datasets are claimed to yield SotA quantitative and qualitative results (up to 3 PSNR higher with half the model size on 3-D scenes) plus segmented dynamic reconstructions including transparent textures, all without dense mask supervision.

Significance. If the central claims hold after addressing the noted issues, the work could advance practical dynamic 3D reconstruction under sparse camera budgets common in filmmaking by reducing supervision requirements and enabling segmented outputs with complex textures. The public release of code and video comparisons supports reproducibility.

major comments (2)
  1. [Abstract] Abstract: the quantitative claims of up to 3 PSNR improvement and half the model size on 3-D scenes are presented without reported baselines, error bars, exact data splits, or ablation studies, which are required to substantiate the SotA performance and cross-scene generalization.
  2. [Method] Method (as described in Abstract): the foreground-background split at t=0 via sparse masks, followed by separate training and the restriction of background deformation to position-only updates, rests on the unvalidated assumption that the background is dimmer and less dynamic. This precondition is load-bearing for both the claimed PSNR/size gains and the mask-free segmentation advantage; leakage or unmodeled background dynamics (e.g., crew/equipment motion) would directly produce incorrect canonical representations.
minor comments (1)
  1. [Abstract] Abstract: the distinction between '3-D scenes' and '2.5-D entertainment datasets' is not defined, which may confuse readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. We address the two major comments point by point below, indicating the revisions we will make to strengthen the manuscript while preserving its core contributions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the quantitative claims of up to 3 PSNR improvement and half the model size on 3-D scenes are presented without reported baselines, error bars, exact data splits, or ablation studies, which are required to substantiate the SotA performance and cross-scene generalization.

    Authors: We agree that the abstract would benefit from additional context to make the quantitative claims more immediately verifiable. The experiments section of the manuscript already reports comparisons against relevant SotA baselines (including both dense and sparse multi-view dynamic GS methods), with results broken down by 3-D and 2.5-D entertainment datasets. Data splits are described in Section 4.1 and ablation studies appear in Section 4.3 and the supplement. To directly address the concern, we will revise the abstract to explicitly name the primary baselines and reference the sections containing error bars, splits, and ablations. If error bars are not currently visualized, we will add them during revision. revision: partial

  2. Referee: [Method] Method (as described in Abstract): the foreground-background split at t=0 via sparse masks, followed by separate training and the restriction of background deformation to position-only updates, rests on the unvalidated assumption that the background is dimmer and less dynamic. This precondition is load-bearing for both the claimed PSNR/size gains and the mask-free segmentation advantage; leakage or unmodeled background dynamics (e.g., crew/equipment motion) would directly produce incorrect canonical representations.

    Authors: The foreground-background decomposition is initialized from sparse masks only at t=0 and the position-only restriction on the background follows directly from standard filmmaking practice, where backgrounds are typically static or slowly varying and lower in intensity. This design choice is not presented as universally true but as a practical prior that enables both model compression and the mask-free segmentation output. The reported gains (up to 3 dB PSNR with half the parameters on 3-D scenes) and the ability to recover transparent dynamic textures without dense supervision provide empirical support for the approach on the tested entertainment datasets. We acknowledge that strong unmodeled background motion could cause leakage; therefore we will add an explicit limitations paragraph discussing this scenario and how the canonical pre-training with separate losses mitigates it. We will also include a targeted ablation on the position-only background update if not already present. revision: partial

Circularity Check

0 steps flagged

No circularity: design choices and assumptions are explicitly introduced rather than derived from inputs

full rationale

The paper's central method is presented as a set of explicit engineering decisions motivated by domain practices rather than any self-referential derivation. It splits the canonical Gaussians and deformation fields at t=0 using a sparse mask set, then applies separate loss functions and restricts background updates to position-only changes because the background is described as 'typically dimmer and less dynamic' following 'common filmmaking practices.' These choices are not obtained by fitting a parameter to data and relabeling the result as a prediction, nor do they reduce to a self-citation chain or an ansatz imported from prior author work. The quantitative results are evaluated on external 3-D and 2.5-D datasets, and the segmentation output is presented as an emergent benefit of the split rather than a tautological consequence of the inputs. No equation or step in the provided text equates a claimed output to its own construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on domain assumptions about background behavior and the sufficiency of initial sparse masks; no explicit free parameters beyond the learned deformation fields are detailed, and no new physical entities are postulated.

free parameters (1)
  • deformation field parameters
    Separate modeling of color, position, and rotation for foreground versus only position for background, selected according to common filmmaking practices.
axioms (1)
  • domain assumption Background containing film-crew and equipment is typically dimmer and less dynamic
    Invoked to justify learning only position changes for the background deformation field during dynamic training.

pith-pipeline@v0.9.0 · 5558 in / 1321 out tokens · 33629 ms · 2026-05-18T00:19:20.153226+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Relightable Gaussian Splatting for Virtual Production Using Image-Based Illumination

    cs.CV 2026-05 unverdicted novelty 7.0

    A relightable Gaussian Splatting method for virtual production decomposes scenes into fixed appearance and variable lighting by parameterizing primitives to directly sample high-resolution background textures, enablin...

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Hyperreel: High-fidelity 6-dof video with ray- conditioned sampling

    Benjamin Attal, Jia-Bin Huang, Christian Richardt, Michael Zollhoefer, Johannes Kopf, Matthew O’Toole, and Changil Kim. Hyperreel: High-fidelity 6-dof video with ray- conditioned sampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16610–16620, 2023. 3

  2. [2]

    Waveplanes: A compact wavelet representa- tion for dynamic neural radiance fields.arXiv preprint arXiv:2312.02218, 2023

    Adrian Azzarelli, Nantheera Anantrasirichai, and David R Bull. Waveplanes: A compact wavelet representa- tion for dynamic neural radiance fields.arXiv preprint arXiv:2312.02218, 2023. 3, 4, 5, 6, 8

  3. [3]

    Intelligent cinematography: a review of ai research for cinematographic production.Artificial Intelligence Review, 58(4):108, 2025

    Adrian Azzarelli, Nantheera Anantrasirichai, and David R Bull. Intelligent cinematography: a review of ai research for cinematographic production.Artificial Intelligence Review, 58(4):108, 2025. 1

  4. [4]

    ViVo: A Dataset for Volumetric Video Reconstruction and Compression

    Adrian Azzarelli, Ge Gao, Ho Man Kwan, Fan Zhang, Nantheera Anantrasirichai, Ollie Moolan-Feroze, and David Bull. Vivo: A dataset for volumetric videoreconstruction and compression.arXiv preprint arXiv:2506.00558, 2025. 4, 6, 7

  5. [5]

    Hexplane: A fast representa- tion for dynamic scenes

    Ang Cao and Justin Johnson. Hexplane: A fast representa- tion for dynamic scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 130–141, 2023. 3, 4

  6. [6]

    Tensorf: Tensorial radiance fields

    Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean con- ference on computer vision, pages 333–350. Springer, 2022. 3

  7. [7]

    Gaussianeditor: Swift and control- lable 3d editing with gaussian splatting

    Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xi- aofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and control- lable 3d editing with gaussian splatting. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21476–21485, 2024. 1

  8. [8]

    Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion

    Ho Kei Cheng, Yu-Wing Tai, and Chi-Keung Tang. Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5559–5568, 2021. 2

  9. [9]

    Dna-rendering: A diverse neural actor repository for high-fidelity human-centric rendering

    Wei Cheng, Ruixiang Chen, Siming Fan, Wanqi Yin, Keyu Chen, Zhongang Cai, Jingbo Wang, Yang Gao, Zhengming Yu, Zhengyu Lin, et al. Dna-rendering: A diverse neural actor repository for high-fidelity human-centric rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19982–19993, 2023. 1

  10. [10]

    4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes

    Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wen- zheng Chen, and Baoquan Chen. 4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes. InACM SIGGRAPH 2024 Conference Papers, pages 1–11,

  11. [11]

    K-planes: Explicit radiance fields in space, time, and appearance

    Sara Fridovich-Keil, Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 12479–12488, 2023. 3, 4, 5, 8

  12. [12]

    Relightable 3d gaussians: Re- alistic point cloud relighting with brdf decomposition and ray tracing

    Jian Gao, Chun Gu, Youtian Lin, Zhihao Li, Hao Zhu, Xun Cao, Li Zhang, and Yao Yao. Relightable 3d gaussians: Re- alistic point cloud relighting with brdf decomposition and ray tracing. InEuropean Conference on Computer Vision, pages 73–89. Springer, 2024. 1

  13. [13]

    Gaussianflow: Splatting gaussian dynamics for 4d content creation.arXiv preprint arXiv:2403.12365, 2024

    Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wen- chao Ma, Le Chen, Danhang Tang, and Ulrich Neumann. Gaussianflow: Splatting gaussian dynamics for 4d content creation.arXiv preprint arXiv:2403.12365, 2024. 3

  14. [14]

    Prtgs: Precomputed radiance transfer of gaussian splats for real-time high-quality relighting

    Yijia Guo, Yuanxi Bai, Liwen Hu, Ziyi Guo, Mianzhi Liu, Yu Cai, Tiejun Huang, and Lei Ma. Prtgs: Precomputed radiance transfer of gaussian splats for real-time high-quality relighting. InProceedings of the 32nd ACM International Conference on Multimedia, pages 5112–5120, 2024. 1

  15. [15]

    Motion-aware 3d gaussian splatting for effi- cient dynamic scene reconstruction.IEEE Transactions on Circuits and Systems for Video Technology, 2024

    Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, and Houqiang Li. Motion-aware 3d gaussian splatting for effi- cient dynamic scene reconstruction.IEEE Transactions on Circuits and Systems for Video Technology, 2024. 3

  16. [16]

    Learnable infinite taylor gaussian for dynamic view rendering

    Bingbing Hu, Yanyan Li, Rui Xie, Bo Xu, Haoye Dong, Junfeng Yao, and Gim Hee Lee. Learnable infinite taylor gaussian for dynamic view rendering. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 26844–26854, 2025. 2, 3, 6, 8

  17. [17]

    Adc- gs: Anchor-driven deformable and compressed gaussian splatting for dynamic scene reconstruction.arXiv preprint arXiv:2505.08196, 2025

    He Huang, Qi Yang, Mufan Liu, Yiling Xu, and Zhu Li. Adc- gs: Anchor-driven deformable and compressed gaussian splatting for dynamic scene reconstruction.arXiv preprint arXiv:2505.08196, 2025. 3

  18. [18]

    Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes

    Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4220–4230, 2024. 1, 3, 4, 5, 6, 8

  19. [19]

    Humanrf: High-fidelity neural radiance fields for humans in motion.ACM Transactions on Graphics (TOG), 42(4):1–12, 2023

    Mustafa Is ¸ık, Martin R ¨unz, Markos Georgopoulos, Taras Khakhulin, Jonathan Starck, Lourdes Agapito, and Matthias Nießner. Humanrf: High-fidelity neural radiance fields for humans in motion.ACM Transactions on Graphics (TOG), 42(4):1–12, 2023. 1, 3

  20. [20]

    Dif- fuman4d: 4d consistent human view synthesis from sparse- view videos with spatio-temporal diffusion models.arXiv preprint arXiv:2507.13344, 2025

    Yudong Jin, Sida Peng, Xuan Wang, Tao Xie, Zhen Xu, Yi- fan Yang, Yujun Shen, Hujun Bao, and Xiaowei Zhou. Dif- fuman4d: 4d consistent human view synthesis from sparse- view videos with spatio-temporal diffusion models.arXiv preprint arXiv:2507.13344, 2025. 3, 8

  21. [21]

    Deformable 3d gaussian splatting for animat- able human avatars.arXiv preprint arXiv:2312.15059, 2023

    HyunJun Jung, Nikolas Brasch, Jifei Song, Eduardo Perez- Pellitero, Yiren Zhou, Zhihao Li, Nassir Navab, and Ben- jamin Busam. Deformable 3d gaussian splatting for animat- able human avatars.arXiv preprint arXiv:2312.15059, 2023. 3

  22. [22]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 2, 4, 6, 8 9

  23. [23]

    Hinerv: Video compression with hierarchical encoding-based neural representation.Advances in Neural Information Processing Systems, 36:72692–72704, 2023

    Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, and David Bull. Hinerv: Video compression with hierarchical encoding-based neural representation.Advances in Neural Information Processing Systems, 36:72692–72704, 2023. 6

  24. [24]

    Immersive video compression using implicit neural repre- sentations.arXiv preprint arXiv:2402.01596, 2024

    Ho Man Kwan, Fan Zhang, Andrew Gower, and David Bull. Immersive video compression using implicit neural repre- sentations.arXiv preprint arXiv:2402.01596, 2024. 6

  25. [25]

    St-4dgs: Spatial-temporally consistent 4d gaus- sian splatting for efficient dynamic scene rendering

    Deqi Li, Shi-Sheng Huang, Zhiyuan Lu, Xinran Duan, and Hua Huang. St-4dgs: Spatial-temporally consistent 4d gaus- sian splatting for efficient dynamic scene rendering. InACM SIGGRAPH 2024 Conference Papers, pages 1–11, 2024. 1

  26. [26]

    Neural 3d video synthesis from multi-view video

    Tianye Li, Mira Slavcheva, Michael Zollhoefer, Simon Green, Christoph Lassner, Changil Kim, Tanner Schmidt, Steven Lovegrove, Michael Goesele, Richard Newcombe, et al. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 5521–5531, 2022. 3, 6, 7

  27. [27]

    Spacetime gaus- sian feature splatting for real-time dynamic view synthesis

    Zhan Li, Zhang Chen, Zhong Li, and Yi Xu. Spacetime gaus- sian feature splatting for real-time dynamic view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8508–8520, 2024. 1, 2, 3, 4, 5, 6, 8

  28. [28]

    Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling

    Zhe Li, Zerong Zheng, Lizhen Wang, and Yebin Liu. Ani- matable gaussians: Learning pose-dependent gaussian maps for high-fidelity human avatar modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 19711–19722, 2024. 1, 3

  29. [29]

    Ash: Animatable gaussian splats for efficient and photoreal human rendering

    Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, and Marc Habermann. Ash: Animatable gaussian splats for efficient and photoreal human rendering. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1165–1175, 2024. 3

  30. [30]

    Temporal interpola- tion is all you need for dynamic neural radiance fields

    Sungheon Park, Minjung Son, Seokhwan Jang, Young Chun Ahn, Ji-Yeon Kim, and Nahyup Kang. Temporal interpola- tion is all you need for dynamic neural radiance fields. In Proceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 4212–4221, 2023. 3

  31. [31]

    D-nerf: Neural radiance fields for dynamic scenes

    Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 10318–10327, 2021. 3

  32. [32]

    Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians

    Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, and Matthias Nießner. Gaus- sianavatars: Photorealistic head avatars with rigged 3d gaus- sians. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 20299–20309,

  33. [33]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 2

  34. [34]

    Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023

    Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, and Andreas Geiger. Nerf- player: A streamable dynamic scene representation with de- composed neural radiance fields.IEEE Transactions on Visu- alization and Computer Graphics, 29(5):2732–2742, 2023. 3

  35. [35]

    Neural residual radiance fields for streamably free-viewpoint videos

    Liao Wang, Qiang Hu, Qihan He, Ziyu Wang, Jingyi Yu, Tinne Tuytelaars, Lan Xu, and Minye Wu. Neural residual radiance fields for streamably free-viewpoint videos. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 76–87, 2023. 3

  36. [36]

    4d gaussian splatting for real-time dynamic scene rendering

    Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 20310–20320, 2024. 1, 2, 3, 4, 5, 6, 8

  37. [37]

    Gaussctrl: Multi-view consistent text-driven 3d gaussian splatting edit- ing

    Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, and Victor Adrian Prisacariu. Gaussctrl: Multi-view consistent text-driven 3d gaussian splatting edit- ing. InEuropean Conference on Computer Vision, pages 55–

  38. [38]

    ! InProceedings of the 32nd ACM International Conference on Multimedia, pages 7871–7880, 2024

    Jinbo Yan, Rui Peng, Luyang Tang, and Ronggang Wang. ! InProceedings of the 32nd ACM International Conference on Multimedia, pages 7871–7880, 2024. 3

  39. [39]

    Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting.arXiv preprint arXiv:2310.10642, 2023

    Zeyu Yang, Hongye Yang, Zijie Pan, and Li Zhang. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting.arXiv preprint arXiv:2310.10642, 2023. 3

  40. [40]

    Mip-splatting: Alias-free 3d gaussian splat- ting

    Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 19447–19456,

  41. [41]

    Ewa splatting.IEEE Transactions on Visual- ization and Computer Graphics, 8(3):223–238, 2002

    Matthias Zwicker, Hanspeter Pfister, Jeroen Van Baar, and Markus Gross. Ewa splatting.IEEE Transactions on Visual- ization and Computer Graphics, 8(3):223–238, 2002. 4 10