arxiv: 2605.10307 · v1 · submitted 2026-05-11 · 💻 cs.CV · cs.GR· cs.RO

Recognition: 2 theorem links

· Lean Theorem

PaMoSplat: Part-Aware Motion-Guided Gaussian Splatting for Dynamic Scene Reconstruction

Yinan Deng , Jianyu Dou , Jiahui Wang , Jingyu Zhao , Yi Yang , Yufeng Yue

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:15 UTC · model grok-4.3

classification 💻 cs.CV cs.GRcs.RO

keywords dynamic scene reconstructionGaussian splattingpart-aware modelingoptical flow guidancerigid motion estimation4D scene editingcomputer vision

0 comments

The pith

PaMoSplat models dynamic scenes as rigid parts initialized from 3D-lifted masks and guided by optical flow to improve Gaussian splatting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Dynamic scene reconstruction with 3D Gaussian splatting often struggles when motions are large or intricate. The paper seeks to show that treating parts as the basic units of deformation, derived by lifting multi-view masks into coherent 3D groups, and then driving their motions with optical flow priors solves much of this problem. It does so by first clustering masks into parts, estimating their rigid transformations via a differential evolutionary algorithm, and then refining the splatting model with adaptive steps, learnable rigidity, and a flow-supervised loss. If correct, the result is higher-quality novel-view rendering, more accurate part tracking, and quicker training, while also opening part-level editing tasks.

Core claim

PaMoSplat initializes Gaussian primitives as coherent 3D parts by lifting multi-view segmentation masks through graph clustering, estimates the rigid motion of each part at later times using a differential evolutionary algorithm driven by multi-view optical flow cues to provide a warm start, and optimizes the entire model with an adaptive iteration schedule, an internal learnable rigidity parameter, and a flow-supervised rendering loss, thereby achieving higher-fidelity rendering and tracking than prior dynamic Gaussian splatting approaches.

What carries the argument

Graph clustering that lifts 2D segmentation masks to coherent 3D Gaussian parts, together with differential evolutionary rigid-motion estimation guided by multi-view optical flow.

If this is right

Higher rendering quality than existing dynamic Gaussian methods across synthetic and real scenes
More precise part-level tracking enabled by the motion-guided initialization
Faster training convergence through the adaptive iteration count and auxiliary losses
Direct support for part-level 4D editing applications

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The rigid-part assumption could be relaxed to allow small non-rigid deformations inside each part without changing the overall pipeline.
Replacing the graph-clustering step with a learned 3D segmentation network might reduce dependence on accurate 2D masks.
The same motion-warm-start strategy could be tested on other deformable representations such as neural radiance fields.

Load-bearing premise

Lifting multi-view segmentation masks into 3D via graph clustering yields coherent Gaussian parts whose motions can be captured by rigid-body estimation informed by optical flow.

What would settle it

A dynamic scene in which cross-view mask consistency is low and part motions deviate strongly from rigid transformations, after which the reported gains in PSNR, tracking accuracy, and convergence speed disappear.

Figures

Figures reproduced from arXiv: 2605.10307 by Jiahui Wang, Jianyu Dou, Jingyu Zhao, Yinan Deng, Yi Yang, Yufeng Yue.

**Figure 2.** Figure 2: Overview of the PaMoSplat pipeline. PaMoSplat introduces Gaussian parts in the initial timestamp, which are shifted in subsequent timestamps based on optical flow. Additionally, PaMoSplat incorporates some rigid constraints and flow-supervised losses to further refine the dynamic scene representation. C. Dynamic 3D Gaussian Splatting Following the breakthrough of 3DGS [16], dynamic extensions broadly fall… view at source ↗

**Figure 3.** Figure 3: Process of Gaussian part generation. The initialized 3D Gaussian field S0 and the multi-view 2D segmentation masks {mi v} serve as inputs, allowing for the assignment of part IDs pi to Gaussians through cross-dimensional correspondence and graph clustering. 6) Part ID: pi ∈ Z Among these parameters, only the center position µt, rotation quaternion qt, and color ct (optional to simulate lighting variations… view at source ↗

**Figure 5.** Figure 5: An example of failed prior motion estimation using optical [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Learnable internal rigidity of Gaussian part. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: A multi-view video capture platform for self-captured [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparisons of novel view synthesis. We zoom in on the main comparison baselines to highlight details. With prior motion and optical flow supervision, PaMoSplat demonstrates a significant advantage in modeling highly dynamic elements, even with the sparse view (5 training cameras in the self-captured dataset displayed in the lower right corner). the difficulty of spatiotemporal fitting. Compara… view at source ↗

**Figure 9.** Figure 9: Qualitative comparisons of prior motion. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 11.** Figure 11: Qualitative comparisons of calculated flow. PaMoSplat produces less noise and reveals sharper boundaries. RGB PaMoSplat GT (RAFT) [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗

**Figure 12.** Figure 12: More optical flow visualization. PamoSplat even exceeds the performance of the front-end optical flow predictor RAFT. More Gaussian Part Visualization [PITH_FULL_IMAGE:figures/full_fig_p010_12.png] view at source ↗

**Figure 15.** Figure 15: Qualitative comparison of ablation studies. [PITH_FULL_IMAGE:figures/full_fig_p011_15.png] view at source ↗

**Figure 16.** Figure 16: Experiments on parameter influence. Reducing the number of training cameras or introducing noise into camera parameters leads to degradation in rendering and tracking quality. While part segmentation granularity shows some robustness, performance deteriorates significantly under severe under-segmentation. SAM SAM2 RAFT SEA-RAFT Noisy RAFT [PITH_FULL_IMAGE:figures/full_fig_p012_16.png] view at source ↗

**Figure 18.** Figure 18: Robustness to front-end inaccurate segmentation. [PITH_FULL_IMAGE:figures/full_fig_p012_18.png] view at source ↗

**Figure 19.** Figure 19: Novel view synthesized by part-level scene editing. [PITH_FULL_IMAGE:figures/full_fig_p013_19.png] view at source ↗

read the original abstract

Dynamic scene reconstruction represents a fundamental yet demanding challenge in computer vision and robotics. While recent progress in 3DGS-based methods has advanced dynamic scene modeling, obtaining high-fidelity rendering and accurate tracking in scenarios with substantial, intricate motions remains significantly challenging. To address these challenges, we propose PaMoSplat, a novel dynamic Gaussian splatting framework incorporating part awareness and motion priors. Our approach is grounded in two key observations: 1) Parts serve as primitives for scene deformation, and 2) Motion cues from optical flow can effectively guide part motion. Specifically, PaMoSplat initializes by lifting multi-view segmentation masks into 3D space via graph clustering, establishing coherent Gaussian parts. For subsequent timestamps, we leverage a differential evolutionary algorithm to estimate the rigid motion of these parts using multi-view optical flow cues, providing a robust warm-start for further optimization. Additionally, PaMoSplat introduces an adaptive iteration count mechanism, internal learnable rigidity, and flow-supervised rendering loss to accelerate and optimize the training process. Comprehensive evaluations across diverse scenes, including real-world environments, demonstrate that PaMoSplat delivers superior rendering quality, improved tracking precision, and faster convergence compared to existing methods. Furthermore, it enables multiple part-level downstream applications, such as 4D scene editing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PaMoSplat adds graph-clustered part initialization and differential-evolution motion warm-start to Gaussian splatting, but the rigid-part assumption needs direct evidence from the results.

read the letter

PaMoSplat tries to handle complex motions in dynamic scenes by treating parts as the deformation primitives. It lifts multi-view segmentation masks to 3D via graph clustering to create initial Gaussian groups, then runs a differential evolutionary algorithm on multi-view optical flow to estimate rigid per-part motion as a warm start. Adaptive iteration counts, a learnable rigidity term, and a flow-supervised loss are added to speed up and stabilize training. The abstract reports better rendering quality, tracking precision, and faster convergence than prior methods on diverse scenes, plus some part-level editing uses.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces PaMoSplat, a part-aware dynamic Gaussian splatting framework for reconstructing scenes with substantial and intricate motions. It initializes by lifting multi-view segmentation masks into 3D Gaussian parts via graph clustering, then applies differential evolution on multi-view optical flow to estimate rigid per-part motions as a warm-start. The method further incorporates an adaptive iteration count, internal learnable rigidity, and a flow-supervised rendering loss to accelerate optimization. Evaluations on diverse scenes, including real-world data, are reported to show gains in rendering quality, tracking precision, and convergence speed over existing methods, while enabling downstream tasks such as 4D scene editing.

Significance. If the graph-clustered parts reliably correspond to approximately rigid entities, the combination of motion-prior initialization and flow supervision could meaningfully advance 3DGS-based dynamic reconstruction in challenging regimes. The use of differential evolution for warm-starting rigid motions and the flow-supervised loss constitute concrete, testable contributions that build on standard optimization techniques. The paper's emphasis on part-level applications also opens clear avenues for downstream use.

major comments (3)

[Abstract and Method (initialization procedure)] The initialization step that lifts multi-view segmentation masks into 3D via graph clustering is presented as producing coherent Gaussian parts suitable for rigid-motion modeling. However, no quantitative validation—such as part-label stability across frames, agreement with synthetic ground-truth decompositions, or failure-case analysis on noisy 2D segmentations—is reported. This assumption is load-bearing for the subsequent differential-evolution motion estimation and the claimed improvements in tracking precision.
[Abstract and Experiments section] The abstract states that comprehensive evaluations demonstrate superior rendering quality, improved tracking precision, and faster convergence. Yet the provided description supplies no specific metrics (e.g., PSNR, SSIM, tracking error), baseline comparisons, or ablation results isolating the contribution of the graph-clustering step versus the flow-supervised loss. Without these, the central performance claims cannot be assessed for robustness.
[Method (optimization components)] The learnable rigidity and adaptive iteration count are introduced to optimize training, but their effect on convergence is not isolated from the warm-start provided by differential evolution. If these mechanisms are central to the faster-convergence claim, an ablation removing them while keeping the motion prior should be shown.

minor comments (3)

The abstract would be strengthened by including one or two key quantitative results (e.g., average PSNR gain or iteration reduction) rather than purely qualitative statements of superiority.
[Related Work] Ensure that all cited prior dynamic Gaussian splatting works are compared in a dedicated related-work section with explicit differences highlighted.
Figure captions should clearly label visualized elements such as part decompositions, estimated motion fields, and rendered outputs versus ground truth.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below, outlining the revisions we will incorporate to improve the manuscript.

read point-by-point responses

Referee: [Abstract and Method (initialization procedure)] The initialization step that lifts multi-view segmentation masks into 3D via graph clustering is presented as producing coherent Gaussian parts suitable for rigid-motion modeling. However, no quantitative validation—such as part-label stability across frames, agreement with synthetic ground-truth decompositions, or failure-case analysis on noisy 2D segmentations—is reported. This assumption is load-bearing for the subsequent differential-evolution motion estimation and the claimed improvements in tracking precision.

Authors: We agree that quantitative validation of part coherence would strengthen the load-bearing assumption. In the revised manuscript we will add experiments on synthetic scenes with available ground-truth decompositions, reporting part-label stability (e.g., frame-to-frame IoU) and robustness under controlled noise in the 2D masks. We will also include a qualitative and quantitative discussion of failure cases where graph clustering yields non-rigid parts. revision: yes
Referee: [Abstract and Experiments section] The abstract states that comprehensive evaluations demonstrate superior rendering quality, improved tracking precision, and faster convergence. Yet the provided description supplies no specific metrics (e.g., PSNR, SSIM, tracking error), baseline comparisons, or ablation results isolating the contribution of the graph-clustering step versus the flow-supervised loss. Without these, the central performance claims cannot be assessed for robustness.

Authors: The Experiments section reports quantitative results with PSNR, SSIM, LPIPS, and tracking-error metrics together with baseline comparisons; however, to make the claims more readily assessable we will revise the abstract to cite the key numerical improvements and add an explicit ablation table isolating the graph-clustering initialization from the flow-supervised loss. revision: partial
Referee: [Method (optimization components)] The learnable rigidity and adaptive iteration count are introduced to optimize training, but their effect on convergence is not isolated from the warm-start provided by differential evolution. If these mechanisms are central to the faster-convergence claim, an ablation removing them while keeping the motion prior should be shown.

Authors: We concur that isolating the contribution of learnable rigidity and adaptive iteration count from the differential-evolution warm-start is required. The revised manuscript will include an ablation that disables these two components while retaining the motion prior and reports the resulting convergence curves and final rendering/tracking metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: framework assembles external priors and standard optimization into a pipeline without reducing outputs to inputs by construction.

full rationale

The paper's core steps—lifting 2D segmentation masks to 3D parts via graph clustering, using differential evolution on optical flow for rigid-motion warm-start, and adding learnable rigidity plus flow-supervised loss—are presented as engineering choices that consume independent inputs (masks, flow fields) and produce optimized Gaussians. No equation or claim equates a derived quantity (e.g., part motion or rendering quality) back to a fitted parameter or self-citation by definition. The claimed improvements are asserted via external evaluations on diverse scenes rather than by algebraic identity with the initialization. This is the normal non-circular case for a method paper whose load-bearing assumptions are stated as testable (coherent rigid parts) rather than smuggled in as tautologies.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on domain assumptions about part-based deformation and optical flow reliability; full text unavailable so ledger is derived only from abstract statements.

free parameters (2)

learnable rigidity
Internal learnable rigidity introduced as part of the optimization process.
adaptive iteration count
Adaptive mechanism for determining training iteration counts.

axioms (2)

domain assumption Parts serve as primitives for scene deformation
Stated as one of the two key observations grounding the framework.
domain assumption Motion cues from optical flow can effectively guide part motion
Stated as the second key observation.

pith-pipeline@v0.9.0 · 5552 in / 1481 out tokens · 81485 ms · 2026-05-12T05:15:52.246355+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 1 internal anchor

[1]

://arxiv.org/abs/2401.00025

C. Wen, et al., “Any-point trajectory modeling for policy learning,”arXiv preprint arXiv:2401.00025, 2023

work page arXiv 2023
[2]

Rekep: Spatio- temporal reasoning of relational keypoint constraints for robotic manipulation.arXiv preprint arXiv:2409.01652, 2024

W. Huang, et al., “Rekep: Spatio-temporal reasoning of rela- tional keypoint constraints for robotic manipulation,”arXiv preprint arXiv:2409.01652, 2024

work page arXiv 2024
[3]

Nerf: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, et al., “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

work page 2021
[4]

K-planes: Explicit radiance fields in space, time, and appearance,

S. Fridovich-Keil, et al., “K-planes: Explicit radiance fields in space, time, and appearance,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12 479–12 488, 2023

work page 2023
[5]

Hexplane: A fast representation for dynamic scenes,

A. Cao and J. Johnson, “Hexplane: A fast representation for dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 130–141, 2023

work page 2023
[6]

Dynibar: Neural dynamic image-based rendering,

Z. Li, et al., “Dynibar: Neural dynamic image-based rendering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4273–4284, 2023

work page 2023
[7]

Neural 3d video synthesis from multi-view video,

T. Li, et al., “Neural 3d video synthesis from multi-view video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5521–5531, 2022

work page 2022
[8]

Tensor4d: Efficient neural 4d decomposition for high- fidelity dynamic reconstruction and rendering,

R. Shao, et al., “Tensor4d: Efficient neural 4d decomposition for high- fidelity dynamic reconstruction and rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16 632–16 642, 2023

work page 2023
[9]

D-nerf: Neural radiance fields for dynamic scenes,

A. Pumarola, et al., “D-nerf: Neural radiance fields for dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10 318–10 327, 2021

work page 2021
[10]

Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields,

K. Park, et al., “Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields,”ACM Transactions on Graphics (ToG), vol. 40, no. 6, Dec. 2021

work page 2021
[11]

Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields,

L. Song, et al., “Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields,”IEEE Transactions on Visual- ization and Computer Graphics, vol. 29, no. 5, pp. 2732–2742, 2023

work page 2023
[12]

Hyperreel: High-fidelity 6-dof video with ray- conditioned sampling,

B. Attal, et al., “Hyperreel: High-fidelity 6-dof video with ray- conditioned sampling,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16 610–16 620, 2023

work page 2023
[13]

Dynamic mesh-aware radiance fields,

Y .-L. Qiao, et al., “Dynamic mesh-aware radiance fields,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 385–396, 2023

work page 2023
[14]

High-fidelity and real-time novel view synthesis for dynamic scenes,

H. Lin, et al., “High-fidelity and real-time novel view synthesis for dynamic scenes,” inSIGGRAPH Asia 2023 Conference Papers, pp. 1– 9, 2023

work page 2023
[15]

Mixed neural voxels for fast multi-view video syn- thesis,

F. Wang, et al., “Mixed neural voxels for fast multi-view video syn- thesis,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19 706–19 716, 2023

work page 2023
[16]

3d gaussian splatting for real-time radiance field rendering,

B. Kerbl, et al., “3d gaussian splatting for real-time radiance field rendering,”ACM Transactions on Graphics, vol. 42, no. 4, pp. 1–14, 2023

work page 2023
[17]

4k4d: Real-time 4d view synthesis at 4k resolution,

Z. Xu, et al., “4k4d: Real-time 4d view synthesis at 4k resolution,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20 029–20 040, 2024

work page 2024
[18]

Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting.arXiv preprint arXiv:2310.10642, 2023

Z. Yang, et al., “Real-time photorealistic dynamic scene represen- tation and rendering with 4d gaussian splatting,”arXiv preprint arXiv:2310.10642, 2023

work page arXiv 2023
[19]

Spacetime gaussian feature splatting for real-time dy- namic view synthesis,

Z. Li, et al., “Spacetime gaussian feature splatting for real-time dy- namic view synthesis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8508–8520, 2024

work page 2024
[20]

4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes,

Y . Duan, et al., “4d-rotor gaussian splatting: towards efficient novel view synthesis for dynamic scenes,” inACM SIGGRAPH 2024 Conference Papers, pp. 1–11, 2024

work page 2024
[21]

4d gaussian splatting with scale-aware residual field and adaptive optimization for real-time rendering of temporally complex dynamic scenes,

J. Yan, et al., “4d gaussian splatting with scale-aware residual field and adaptive optimization for real-time rendering of temporally complex dynamic scenes,” inACM Multimedia 2024, 2024

work page 2024
[22]

Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes,

Y .-H. Huang, et al., “Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4220–4230, 2024

work page 2024
[23]

Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle,

Y . Lin, et al., “Gaussian-flow: 4d reconstruction with dynamic 3d gaussian particle,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21 136–21 145, 2024

work page 2024
[24]

4d gaussian splatting for real-time dynamic scene rendering,

G. Wu, et al., “4d gaussian splatting for real-time dynamic scene rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20 310–20 320, 2024

work page 2024
[25]

Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction,

Z. Yang, et al., “Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pp. 20 331–20 341, 2024

work page 2024
[26]

3d geometry-aware deformable gaussian splatting for dynamic view synthesis,

Z. Lu, et al., “3d geometry-aware deformable gaussian splatting for dynamic view synthesis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8900–8910, 2024

work page 2024
[27]

Superpoint gaussian splatting for real-time high-fidelity dynamic scene reconstruction,

D. Wan, R. Lu, and G. Zeng, “Superpoint gaussian splatting for real-time high-fidelity dynamic scene reconstruction,”arXiv preprint arXiv:2406.03697, 2024

work page arXiv 2024
[28]

Gaussianprediction: Dynamic 3d gaussian prediction for motion extrapolation and free view synthesis,

B. Zhao, et al., “Gaussianprediction: Dynamic 3d gaussian prediction for motion extrapolation and free view synthesis,” inACM SIGGRAPH 2024 Conference Papers, pp. 1–12, 2024

work page 2024
[29]

Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis,

J. Luiten, et al., “Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis,” in3DV, 2024

work page 2024
[30]

Physically embodied gaussian splatting: A realtime correctable world model for robotics,

J. Abou-Chakra, et al., “Physically embodied gaussian splatting: A realtime correctable world model for robotics,”arXiv preprint arXiv:2406.10788, 2024

work page arXiv 2024
[31]

Panoptic studio: A massively multiview system for social motion capture,

H. Joo, et al., “Panoptic studio: A massively multiview system for social motion capture,” inProceedings of the IEEE international conference on computer vision, pp. 3334–3342, 2015

work page 2015
[32]

Motion-aware 3d gaussian splatting for efficient dynamic scene reconstruction,

Z. Guo, et al., “Motion-aware 3d gaussian splatting for efficient dynamic scene reconstruction,”IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2024

work page 2024
[33]

arXiv preprint arXiv:2403.12365 (2024)

Q. Gao, et al., “Gaussianflow: Splatting gaussian dynamics for 4d content creation,”arXiv preprint arXiv:2403.12365, 2024

work page arXiv 2024
[34]

Segment anything,

A. Kirillov, et al., “Segment anything,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026, 2023

work page 2023
[35]

Raft: Recurrent all-pairs field transforms for op- tical flow,

Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for op- tical flow,” inComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 402–

work page 2020
[36]

Aleth-nerf: Illumination adaptive nerf with concealing field assumption,

Z. Cui, et al., “Aleth-nerf: Illumination adaptive nerf with concealing field assumption,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 2, pp. 1435–1444, 2024

work page 2024
[37]

Switch-nerf: Learning scene decomposition with mixture of experts for large-scale neural radiance fields,

M. Zhenxing and D. Xu, “Switch-nerf: Learning scene decomposition with mixture of experts for large-scale neural radiance fields,” inThe Eleventh International Conference on Learning Representations, 2022

work page 2022
[38]

F2-nerf: Fast neural radiance field training with free camera trajectories,

P. Wang, et al., “F2-nerf: Fast neural radiance field training with free camera trajectories,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4150–4159, 2023

work page 2023
[39]

Depth-guided robust point cloud fusion nerf for sparse input views,

S. Guo, et al., “Depth-guided robust point cloud fusion nerf for sparse input views,”IEEE Transactions on Circuits and Systems for Video Technology, 2024

work page 2024
[40]

Semantic is enough: Only semantic information for nerf reconstruction,

R. Wang, et al., “Semantic is enough: Only semantic information for nerf reconstruction,” in2023 IEEE International Conference on Unmanned Systems (ICUS), pp. 906–912. IEEE, 2023

work page 2023
[41]

Openobj: Open-vocabulary object-level neural radiance fields with fine-grained understanding,

Y . Deng, et al., “Openobj: Open-vocabulary object-level neural radiance fields with fine-grained understanding,”arXiv preprint arXiv:2406.08009, 2024

work page arXiv 2024
[42]

Macim: Multi-agent collaborative implicit mapping,

Y . Deng, et al., “Macim: Multi-agent collaborative implicit mapping,” IEEE Robotics and Automation Letters, 2024

work page 2024
[43]

Lgsdf: Continual global learning of signed distance fields aided by local updating,

Y . Yue, et al., “Lgsdf: Continual global learning of signed distance fields aided by local updating,”arXiv preprint arXiv:2404.05187, 2024

work page arXiv 2024
[44]

Lightspeed: light and fast neural light fields on mobile devices,

A. Gupta, et al., “Lightspeed: light and fast neural light fields on mobile devices,”Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024
[45]

Hashpoint: Accelerated point searching and sampling for neural rendering,

J. Ma, et al., “Hashpoint: Accelerated point searching and sampling for neural rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4462–4472, 2024

work page 2024
[46]

Generative neural fields by mixtures of neural implicit functions,

T. You, et al., “Generative neural fields by mixtures of neural implicit functions,”Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024
[47]

Instant neural graphics primitives with a multireso- lution hash encoding,

T. M ¨uller, et al., “Instant neural graphics primitives with a multireso- lution hash encoding,”ACM transactions on graphics (TOG), vol. 41, no. 4, pp. 1–15, 2022

work page 2022
[48]

Dynamic view synthesis from dynamic monocular video,

C. Gao, et al., “Dynamic view synthesis from dynamic monocular video,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5712–5721, 2021

work page 2021
[49]

Space-time neural irradiance fields for free-viewpoint video,

W. Xian, et al., “Space-time neural irradiance fields for free-viewpoint video,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9421–9431, 2021

work page 2021
[50]

Non-local guided neural fields for 4d ct reconstruction,

Q. Zhou, et al., “Non-local guided neural fields for 4d ct reconstruction,” IEEE Transactions on Circuits and Systems for Video Technology, 2025. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 15

work page 2025
[51]

Neural deformable voxel grid for fast optimization of dynamic view synthesis,

X. Guo, et al., “Neural deformable voxel grid for fast optimization of dynamic view synthesis,” inProceedings of the Asian Conference on Computer Vision, pp. 3757–3775, 2022

work page 2022
[52]

Robust dynamic radiance fields,

Y .-L. Liu, et al., “Robust dynamic radiance fields,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13–23, 2023

work page 2023
[53]

Nerfies: Deformable neural radiance fields,

K. Park, et al., “Nerfies: Deformable neural radiance fields,” inProceed- ings of the IEEE/CVF International Conference on Computer Vision, pp. 5865–5874, 2021

work page 2021
[54]

Dynamicsurf: Dynamic neural rgb- d surface reconstruction with an optimizable feature grid,

M. Mohamed and L. Agapito, “Dynamicsurf: Dynamic neural rgb- d surface reconstruction with an optimizable feature grid,” in2024 International Conference on 3D Vision (3DV), pp. 820–830. IEEE, 2024

work page 2024
[55]

Dynamic appearance particle neural radiance field,

A. Lin, et al., “Dynamic appearance particle neural radiance field,”IEEE Transactions on Circuits and Systems for Video Technology, 2025

work page 2025
[56]

Factorized fields for fast sparse input dynamic view synthesis,

N. Somraj, et al., “Factorized fields for fast sparse input dynamic view synthesis,” inACM SIGGRAPH 2024 Conference Papers, pp. 1–12, 2024

work page 2024
[57]

Forward flow for novel view synthesis of dynamic scenes,

X. Guo, et al., “Forward flow for novel view synthesis of dynamic scenes,” inProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16 022–16 033, 2023

work page 2023
[58]

Evdnerf: Reconstructing event data with dy- namic neural radiance fields,

A. Bhattacharya, et al., “Evdnerf: Reconstructing event data with dy- namic neural radiance fields,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5846–5855, 2024

work page 2024
[59]

Dynpoint: Dynamic neural point for view synthesis,

K. Zhou, et al., “Dynpoint: Dynamic neural point for view synthesis,” Advances in Neural Information Processing Systems, vol. 36, 2024

work page 2024
[60]

Per-gaussian embedding-based deformation for de- formable 3d gaussian splatting,

J. Bae, et al., “Per-gaussian embedding-based deformation for de- formable 3d gaussian splatting,” inEuropean Conference on Computer Vision, pp. 321–335. Springer, 2024

work page 2024
[61]

St-4dgs: Spatial-temporally consistent 4d gaussian splat- ting for efficient dynamic scene rendering,

D. Li, et al., “St-4dgs: Spatial-temporally consistent 4d gaussian splat- ting for efficient dynamic scene rendering,” inACM SIGGRAPH 2024 Conference Papers, pp. 1–11, 2024

work page 2024
[62]

Swift4d: Adaptive divide-and-conquer gaussian splatting for compact and efficient reconstruction of dynamic scene,

J. Wu, et al., “Swift4d: Adaptive divide-and-conquer gaussian splatting for compact and efficient reconstruction of dynamic scene,”arXiv preprint arXiv:2503.12307, 2025

work page arXiv 2025
[63]

3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free-viewpoint videos,

J. Sun, et al., “3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free-viewpoint videos,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20 675–20 685, 2024

work page 2024
[64]

Hicom: Hierarchical coherent motion for dynamic streamable scenes with 3d gaussian splatting,

Q. Gao, et al., “Hicom: Hierarchical coherent motion for dynamic streamable scenes with 3d gaussian splatting,”Advances in Neural Information Processing Systems, vol. 37, pp. 80 609–80 633, 2024

work page 2024
[65]

Fast unfolding of communities in large networks,

V . D. Blondel, et al., “Fast unfolding of communities in large networks,” Journal of statistical mechanics: theory and experiment, vol. 2008, no. 10, p. P10008, 2008

work page 2008
[66]

Pytorch: An imperative style, high-performance deep learning library,

A. Paszke, et al., “Pytorch: An imperative style, high-performance deep learning library,”Advances in neural information processing systems, vol. 32, 2019

work page 2019
[67]

Particlenerf: A particle-based encoding for online neural radiance fields,

J. Abou-Chakra, F. Dayoub, and N. S ¨underhauf, “Particlenerf: A particle-based encoding for online neural radiance fields,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5975–5984, 2024

work page 2024
[68]

Realtime multi-person 2d pose estimation using part affinity fields,

Z. Cao, et al., “Realtime multi-person 2d pose estimation using part affinity fields,” inProceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291–7299, 2017

work page 2017
[69]

SAM 2: Segment Anything in Images and Videos

N. Ravi, et al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[70]

Sea-raft: Simple, efficient, accurate raft for optical flow,

Y . Wang, L. Lipson, and J. Deng, “Sea-raft: Simple, efficient, accurate raft for optical flow,” inEuropean Conference on Computer Vision, pp. 36–54. Springer, 2024. Yinan Deng(Student Member, IEEE) received the B.Eng. degree in automation from Beijing Institute of Technology, Beijing, China, in 2021. He is cur- rently pursuing his Ph.D. degree in Control...

work page 2024