PersistGS: Differentiable Physics for Object Permanence in 4D Gaussian Splatting

Adrian Ramlal; John S. Zelek

arxiv: 2606.03479 · v1 · pith:54DABKEZnew · submitted 2026-06-02 · 💻 cs.CV · cs.GR

PersistGS: Differentiable Physics for Object Permanence in 4D Gaussian Splatting

Adrian Ramlal , John S. Zelek This is my paper

Pith reviewed 2026-06-28 10:34 UTC · model grok-4.3

classification 💻 cs.CV cs.GR

keywords 4D Gaussian SplattingObject PermanenceDifferentiable Rigid Body SimulationOcclusion HandlingDynamic Scene ReconstructionTrajectory ExtrapolationCentroid Silhouette Loss

0 comments

The pith

Coupling differentiable rigid body simulation with 3D Gaussian Splatting restores object permanence during full occlusions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

When a moving object becomes fully occluded from all training cameras, photometric supervision vanishes and the Gaussians representing it degrade. PersistGS decomposes the scene into per-object Gaussians paired with collision meshes, estimates friction and velocity parameters from the visible pre-occlusion trajectory using differentiable simulation, and applies the resulting physics-governed SE(3) trajectory to keep the Gaussians correctly placed throughout the occlusion. A centroid silhouette loss isolates positional gradients from appearance noise. On synthetic scenes the method outperforms constant-velocity extrapolation and approaches the performance of an oracle that knows the true trajectory.

Core claim

PersistGS restores object permanence during occlusion by coupling differentiable rigid body simulation with 3D Gaussian Splatting. The method decomposes scenes into per-object Gaussians and collision meshes, estimates friction and velocity from pre-occlusion data via differentiable simulation, and uses the resulting SE(3) trajectory to position the Gaussians throughout the occlusion interval. Because the trajectory satisfies rigid-body dynamics it captures contact events such as bounces and friction-based deceleration that kinematic extrapolation cannot model.

What carries the argument

Differentiable rigid body simulation that predicts SE(3) trajectories from estimated friction and velocity, paired with a centroid silhouette loss that isolates positional gradients.

If this is right

The predicted trajectory captures contact events such as bounces and friction-based deceleration that constant-velocity methods miss.
The centroid silhouette loss produces 40 percent lower trajectory error than photometric supervision alone.
On synthetic scenes the method improves PSNR by 2.46 dB over constant-velocity extrapolation.
Performance reaches within 0.19 dB of an upper bound that uses the ground-truth trajectory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on real captured video once reliable per-object decomposition and parameter estimation are available outside synthetic settings.
Extending the simulation to handle multiple simultaneous contacts or partial occlusions would test how far the rigid-body premise can be pushed.
Combining the physics trajectory with a learned generative prior might handle cases where rigid-body assumptions fail, such as deformable objects.

Load-bearing premise

The scene can be decomposed into per-object rigid bodies whose friction and velocity parameters can be reliably estimated from the observed pre-occlusion trajectory alone.

What would settle it

A direct comparison showing that the simulated trajectory deviates from the actual object positions recorded by cameras withheld from training during the occlusion interval would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.03479 by Adrian Ramlal, John S. Zelek.

**Figure 1.** Figure 1: Object permanence through physics. A ball falls past an occluder (opaque, but rendered translucent here for visualization). [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: PersistGS pipeline. (a) Scene Decomposition extracts per-object Gaussians and collision meshes via MV-SAM3D, and trains background Gaussians separately. All representations are frozen after this stage. (b) Physics Estimation simulates candidate parameters θ = (µ, v0) through Newton, renders the alpha channel of the positioned object Gaussians, and minimizes a centroid silhouette loss. Only θ is optimized. … view at source ↗

**Figure 3.** Figure 3: Qualitative results on ball bounce. Top three rows: renders from an evaluation camera (which sees past the occluder) at three stages of the occlusion event. Green dotted outlines indicate the ground-truth ball position. Without physics, the ball is absent; constant velocity misses the second bounce; linear interpolation follows a straight path through the nonlinear contact trajectory; PersistGS correctly t… view at source ↗

read the original abstract

Dynamic 3D Gaussian Splatting (3DGS) methods reconstruct time-varying scenes from synchronized multi-camera video using photometric supervision. When a moving object becomes fully occluded from all training cameras, this supervision vanishes: the Gaussians representing it receive no gradient signal and degrade. Existing approaches to incomplete observations in neural reconstruction rely on learned generative priors that prioritize visual plausibility over physical correctness. We propose $\textbf{PersistGS}$, a method that restores object permanence during occlusion by coupling differentiable rigid body simulation with 3D Gaussian Splatting. Our approach decomposes the scene into per-object Gaussians and collision meshes, estimates friction and velocity from the observed pre-occlusion trajectory via differentiable simulation, and uses the resulting SE(3) trajectory to position object Gaussians throughout the occlusion period. Because the predicted trajectory satisfies the governing equations of rigid body dynamics, it faithfully captures contact events (bounces, friction-based deceleration, direction changes) that kinematic extrapolation cannot model. We introduce a centroid silhouette loss that isolates positional gradients from appearance noise, yielding 40% lower trajectory error than photometric supervision. We evaluate using cameras withheld from training that observe the object during its occlusion. Experiments on synthetic scenes show that PersistGS outperforms constant velocity extrapolation by +2.46dB PSNR and comes within 0.19dB of a ground-truth trajectory upper bound.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PersistGS couples differentiable rigid-body simulation to per-object 4DGS to keep Gaussians positioned under full occlusion, with concrete PSNR gains on synthetic data, but the fitting of friction and velocity from short pre-occlusion segments looks under-determined for contact cases.

read the letter

The paper's main move is to decompose a dynamic scene into per-object Gaussians paired with collision meshes, estimate friction and velocity parameters from the visible pre-occlusion trajectory using differentiable simulation, and then roll out the resulting SE(3) pose to place the Gaussians during the occluded interval. This replaces both constant-velocity extrapolation and learned generative priors with actual rigid-body dynamics.

What works is the problem framing and the centroid silhouette loss. Photometric supervision really does vanish under full occlusion from all cameras, and pulling positional gradients from a silhouette term rather than raw appearance is a sensible way to reduce noise in the trajectory fit. The reported +2.46 dB PSNR lift over constant velocity and the 40% trajectory error drop on synthetic scenes are specific numbers that show the approach can outperform simple baselines when the physics model matches the data.

The soft spot is the parameter estimation step. Friction and initial velocity are recovered from the observed segment alone. In contact-rich motion, different friction values can produce indistinguishable pre-contact trajectories yet diverge sharply after the first bounce or slide. Nothing in the abstract demonstrates that the observed data uniquely determines the parameters needed for the full occlusion window, so the claim that the simulated trajectory "faithfully captures contact events" rests on an untested uniqueness assumption. The work also appears limited to synthetic scenes with known meshes; how well the pipeline holds when meshes must be estimated from real video is not addressed.

This is for people already working on dynamic neural rendering who want to add physics constraints rather than rely on priors. A reader who cares about downstream robotics or graphics tasks where object permanence matters will find the method and the synthetic results worth examining.

It deserves peer review. The integration is novel within 4DGS and the problem is real; referees can check the fitting robustness and any real-data experiments.

Referee Report

2 major / 1 minor

Summary. The paper introduces PersistGS, which couples differentiable rigid-body simulation with 4D Gaussian Splatting to enforce object permanence during full occlusions. The scene is decomposed into per-object Gaussians paired with collision meshes; friction and velocity parameters are estimated from the visible pre-occlusion trajectory segment via differentiable simulation; the resulting SE(3) trajectory then positions the object Gaussians throughout the occlusion interval. A centroid-silhouette loss is proposed to provide positional gradients, and synthetic-scene experiments report +2.46 dB PSNR over constant-velocity extrapolation together with a 40 % reduction in trajectory error relative to photometric supervision.

Significance. If the parameter estimation step yields unique, physically correct values that generalize across the occlusion interval, the method supplies a principled alternative to generative priors by enforcing rigid-body dynamics. The reported proximity to a ground-truth trajectory upper bound (0.19 dB) and the explicit modeling of contact events would constitute a concrete advance for dynamic reconstruction under incomplete observations.

major comments (2)

[Abstract] Abstract (description of scene decomposition and parameter estimation): the central claim that friction and velocity estimated from the pre-occlusion segment alone suffice to produce a correct SE(3) trajectory throughout occlusion rests on an unverified uniqueness assumption. Different (friction, velocity) pairs can generate indistinguishable pre-contact motion yet diverge after the first contact; the manuscript provides no analysis or experiment demonstrating that the observed segment constrains the parameters sufficiently for contact-rich cases.
[Abstract] Abstract (experimental claims): the reported +2.46 dB PSNR gain, 40 % trajectory-error reduction, and 0.19 dB gap to the ground-truth upper bound are stated without accompanying details on mesh construction, loss weighting, baseline implementations, number of scenes, or statistical significance testing. These omissions make it impossible to determine whether the gains are attributable to the physics model or to the particular synthetic setup.

minor comments (1)

The precise formulation of the centroid-silhouette loss and its weighting relative to the photometric loss should be stated explicitly (including any hyper-parameters) so that the isolation of positional gradients can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and indicate planned revisions where the manuscript requires strengthening.

read point-by-point responses

Referee: [Abstract] Abstract (description of scene decomposition and parameter estimation): the central claim that friction and velocity estimated from the pre-occlusion segment alone suffice to produce a correct SE(3) trajectory throughout occlusion rests on an unverified uniqueness assumption. Different (friction, velocity) pairs can generate indistinguishable pre-contact motion yet diverge after the first contact; the manuscript provides no analysis or experiment demonstrating that the observed segment constrains the parameters sufficiently for contact-rich cases.

Authors: We agree the uniqueness assumption is not explicitly verified. The current manuscript contains no dedicated sensitivity analysis or ablation for contact-rich cases. We will add a new experiment section that perturbs the estimated (friction, velocity) pairs within the range consistent with the pre-occlusion observations and quantifies divergence after first contact, using the same synthetic scenes. This analysis will be included in the revision. revision: yes
Referee: [Abstract] Abstract (experimental claims): the reported +2.46 dB PSNR gain, 40 % trajectory-error reduction, and 0.19 dB gap to the ground-truth upper bound are stated without accompanying details on mesh construction, loss weighting, baseline implementations, number of scenes, or statistical significance testing. These omissions make it impossible to determine whether the gains are attributable to the physics model or to the particular synthetic setup.

Authors: Implementation details for mesh construction, loss weighting, baselines, and the exact number of scenes appear in Sections 3 and 4 of the manuscript. The abstract itself is space-constrained and therefore omits them. We will revise the abstract to state the number of scenes evaluated and add a parenthetical note directing readers to the methods for implementation specifics. We will also include statistical significance testing (e.g., paired t-tests or error bars across scenes) in the experiments section of the revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity; physics equations are external and evaluation uses held-out data

full rationale

The method decomposes scenes into per-object Gaussians and meshes, fits friction/velocity parameters to pre-occlusion observations using differentiable rigid-body simulation, then simulates the SE(3) trajectory forward. The governing equations (rigid body dynamics, contacts) are imported as standard external models, not derived or fitted within the paper. The centroid silhouette loss and PSNR gains are evaluated against withheld cameras observing the occlusion interval, providing an independent test. No quoted step reduces a claimed prediction to a fitted input by construction, invokes self-citation for uniqueness, or renames an ansatz. This matches the default non-circular case.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

Abstract-only review yields limited visibility; the ledger captures the main fitted quantities and background assumptions visible in the summary.

free parameters (2)

friction coefficient
Estimated from pre-occlusion trajectory via differentiable simulation to drive the rigid-body model.
initial velocity
Estimated from observed pre-occlusion motion to initialize the SE(3) trajectory prediction.

axioms (1)

domain assumption Scene objects behave as rigid bodies whose motion obeys Newtonian dynamics with constant friction during occlusion intervals.
Invoked to justify using the fitted parameters to simulate the hidden trajectory.

invented entities (1)

per-object collision meshes no independent evidence
purpose: Provide geometry for contact and collision handling inside the differentiable rigid-body simulator.
Introduced to couple appearance Gaussians with physics simulation; no independent evidence supplied in abstract.

pith-pipeline@v0.9.1-grok · 5780 in / 1583 out tokens · 44005 ms · 2026-06-28T10:34:52.442380+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references

[1]

Physically embodied gaussian splatting: A re- altime correctable world model for robotics

Jad Abou-Chakra, Krishan Rana, Feras Dayoub, and Niko Suenderhauf. Physically embodied gaussian splatting: A re- altime correctable world model for robotics. In8th Annual Conference on Robot Learning, 2024. 3

2024
[2]

Taylor, and Michael Posa

Bibit Bianchini, Minghan Zhu, Mengti Sun, Bowen Jiang, Camillo J. Taylor, and Michael Posa. Vysics: Object recon- struction under occlusion by fusing vision and contact-rich physics. InRobotics: Science and Systems (RSS), 2025. 3

2025
[3]

Gic: Gaussian-informed continuum for physical property identi- fication and simulation

Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, and Qifeng Chen. Gic: Gaussian-informed continuum for physical property identi- fication and simulation. InAdvances in Neural Information Processing Systems, pages 75035–75063. Curran Associates, Inc., 2024. 3

2024
[4]

NeuMA: Neural material adaptor for visual grounding of intrinsic dynamics

Junyi Cao, Shanyan Guan, Yanhao Ge, Wei Li, Xiaokang Yang, and Chao Ma. NeuMA: Neural material adaptor for visual grounding of intrinsic dynamics. InThe Thirty-eighth Annual Conference on Neural Information Processing Sys- tems, 2024. 3

2024
[5]

Vid2sim: Generalizable, video-based reconstruction of ap- pearance, geometry and physics for mesh-free simulation

Chuhao Chen, Zhiyang Dou, Chen Wang, Yiming Huang, Anjun Chen, Qiao Feng, Jiatao Gu, and Lingjie Liu. Vid2sim: Generalizable, video-based reconstruction of ap- pearance, geometry and physics for mesh-free simulation. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 26545–26555, 2025. 3

2025
[6]

Dream- scene4d: Dynamic multi-object scene generation from monocular videos

Wen-Hsuan Chu, Lei Ke, and Katerina Fragkiadaki. Dream- scene4d: Dynamic multi-object scene generation from monocular videos. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2, 3

2024
[7]

Robust multi- object 4d generation for in-the-wild videos

Wen-Hsuan Chu, Lei Ke, Jianmeng Liu, Mingxiao Huo, Pavel Tokmakov, and Katerina Fragkiadaki. Robust multi- object 4d generation for in-the-wild videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 22067–22077, 2025. 2, 3

2025
[8]

Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes.arXiv preprint arXiv:2312.14937, 2023

Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes.arXiv preprint arXiv:2312.14937, 2023. 2

arXiv 2023
[9]

gradsim: Differentiable simulation for system identification and visuomotor con- trol.International Conference on Learning Representations (ICLR), 2021

Krishna Murthy Jatavallabhula, Miles Macklin, Florian Golemo, Vikram V oleti, Linda Petrini, Martin Weiss, Bre- andan Considine, Jerome Parent-Levesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, and Sanja Fidler. gradsim: Differentiable simulation for system identification and visuomotor con- trol.International Conferenc...

2021
[10]

Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025. 3

2025
[11]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 2

2023
[12]

4d gaussian splatting in the wild with uncertainty-aware regularization

Mijeong Kim, Jongwoo Lim, and Bohyung Han. 4d gaussian splatting in the wild with uncertainty-aware regularization. InThe Thirty-eighth Annual Conference on Neural Informa- tion Processing Systems, 2024. 3

2024
[13]

Differentiable physics simulation of dynamics-augmented neural objects.IEEE Robotics and Au- tomation Letters, 8(5):2780–2787, 2023

Simon Le Cleac’h, Hong-Xing Yu, Michelle Guo, Tay- lor Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, and Mac Schwager. Differentiable physics simulation of dynamics-augmented neural objects.IEEE Robotics and Au- tomation Letters, 8(5):2780–2787, 2023. 3

2023
[14]

Fully explicit dynamic gaussian splat- ting

Junoh Lee, Changyeon Won, Hyunjun Jung, Inhwan Bae, and Hae-Gon Jeon. Fully explicit dynamic gaussian splat- ting. InThe Thirty-eighth Annual Conference on Neural In- formation Processing Systems, 2024. 2, 8

2024
[15]

Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion

Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20775– 20785, 2024. 3

2024
[16]

Pac-nerf: Physics augmented continuum neural radi- ance fields for geometry-agnostic system identification

Xuan Li, Yi-Ling Qiao, Peter Yichen Chen, Krishna Murthy Jatavallabhula, Ming Lin, Chenfanfu Jiang, and Chuang Gan. Pac-nerf: Physics augmented continuum neural radi- ance fields for geometry-agnostic system identification. In ICLR, 2023. 3

2023
[17]

OmniphysGS: 3d constitutive gaussians for general physics- based dynamics generation

Yuchen Lin, Chenguo Lin, Jianjin Xu, and Yadong MU. OmniphysGS: 3d constitutive gaussians for general physics- based dynamics generation. InThe Thirteenth International Conference on Learning Representations, 2025. 3

2025
[18]

Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis. In3DV, 2024. 2

2024
[19]

Warp: A high-performance python frame- work for gpu simulation and graphics, 2022

Miles Macklin. Warp: A high-performance python frame- work for gpu simulation and graphics, 2022. Presented at the NVIDIA GPU Technology Conference (GTC). 4

2022
[20]

Splat- ting physical scenes: End-to-end real-to-sim from imperfect robot data, 2025

Ben Moran, Mauro Comi, Arunkumar Byravan, Steven Bo- hez, Tom Erez, Zhibin Li, and Leonard Hasenclever. Splat- ting physical scenes: End-to-end real-to-sim from imperfect robot data, 2025. 3

2025
[21]

Phyrecon: Physically plausible neural scene recon- struction

Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, and Siyuan Huang. Phyrecon: Physically plausible neural scene recon- struction. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 3

2024
[22]

pix2gestalt: Amodal segmentation by synthesizing wholes

Ege Ozguroglu, Ruoshi Liu, D ´ıdac Sur´ıs, Dian Chen, Achal Dave, Pavel Tokmakov, and Carl V ondrick. pix2gestalt: Amodal segmentation by synthesizing wholes. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3931–3940, 2024. 3

2024
[23]

Dropgaus- sian: Structural regularization for sparse-view gaussian splatting

Hyunwoo Park, Gun Ryu, and Wonjun Kim. Dropgaus- sian: Structural regularization for sparse-view gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21600–21609, 2025. 3, 4

2025
[24]

Language-driven physics-based scene synthesis and editing via feature splatting

Ri-Zhao Qiu, Ge Yang, Weijia Zeng, and Xiaolong Wang. Language-driven physics-based scene synthesis and editing via feature splatting. InEuropean Conference on Computer Vision (ECCV), 2024. 3

2024
[25]

Learning rigid-body simulators over implicit shapes for large-scale scenes and vision

Yulia Rubanova, Tatiana Lopez-Guevara, Kelsey R Allen, William F Whitney, Kim Stachenfeld, and Tobias Pfaff. Learning rigid-body simulators over implicit shapes for large-scale scenes and vision. InThe Thirty-eighth An- nual Conference on Neural Information Processing Systems,
[26]

Fleet, and Andrea Tagliasacchi

Sara Sabour, Suhani V ora, Daniel Duckworth, Ivan Krasin, David J. Fleet, and Andrea Tagliasacchi. Robustnerf: Ig- noring distractors with robust losses. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20626–20636, 2023. 3

2023
[27]

Spotlesssplats: Ignoring dis- tractors in 3d gaussian splatting.ACM Trans

Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David Fleet, and Andrea Tagliasacchi. Spotlesssplats: Ignoring dis- tractors in 3d gaussian splatting.ACM Trans. Graph., 44(2),
[28]

Newton: Gpu-accelerated physics simulation for robotics and simulation research.https: //github.com/newton- physics/newton, 2025

The Newton Contributors. Newton: Gpu-accelerated physics simulation for robotics and simulation research.https: //github.com/newton- physics/newton, 2025. Released April 22, 2025. Apache-2.0 License. 4, 5

2025
[29]

Learning to track with object permanence

Pavel Tokmakov, Jie Li, Wolfram Burgard, and Adrien Gaidon. Learning to track with object permanence. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10860–10869, 2021. 2, 3

2021
[30]

Superpoint Gaus- sian splatting for real-time high-fidelity dynamic scene re- construction

Diwen Wan, Ruijie Lu, and Gang Zeng. Superpoint Gaus- sian splatting for real-time high-fidelity dynamic scene re- construction. InProceedings of the 41st International Con- ference on Machine Learning, pages 49957–49972. PMLR,
[31]

Shape of mo- tion: 4d reconstruction from a single video

Qianqian Wang, Vickie Ye, Hang Gao, Weijia Zeng, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of mo- tion: 4d reconstruction from a single video. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 9660–9672, 2025. 2

2025
[32]

4d gaussian splatting for real-time dynamic scene render- ing

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene render- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20310– 20320, 2024. 2, 3, 8

2024
[33]

Amodal3r: Amodal 3d reconstruction from occluded 2d images

Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, and Tat-Jen Cham. Amodal3r: Amodal 3d reconstruction from occluded 2d images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9181–9193, 2025. 3

2025
[34]

Physgaussian: Physics- integrated 3d gaussians for generative dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. Physgaussian: Physics- integrated 3d gaussians for generative dynamics. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4389–4398, 2024. 3, 5

2024
[35]

Street gaussians: Modeling dynamic urban scenes with gaussian splatting

Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians: Modeling dynamic urban scenes with gaussian splatting. InECCV, 2024. 2

2024
[36]

Depth any- thing v2

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2. InAdvances in Neural Information Processing Sys- tems, pages 21875–21911. Curran Associates, Inc., 2024. 4

2024
[37]

Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20331–20341, 2024. 2

2024
[38]

Gaussian grouping: Segment and edit anything in 3d scenes

Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. Gaussian grouping: Segment and edit anything in 3d scenes. InECCV, 2024. 2, 8

2024
[39]

Ir- shad, and Ken Goldberg

Justin Yu, Kush Hari, Karim El-Refai, Arnav Dalil, Justin Kerr, Chung-Min Kim, Richard Cheng, Muhammad Z. Ir- shad, and Ken Goldberg. Persistent object gaussian splat (pogs) for tracking human and robot manipulation of irregu- larly shaped objects.ICRA, 2025. 3

2025
[40]

Mip-splatting: Alias-free 3d gaussian splat- ting

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 19447– 19456, 2024. 3, 4

2024
[41]

Reconstruction and simulation of elastic objects with spring- mass 3d gaussians.European Conference on Computer Vi- sion (ECCV), 2024

Licheng Zhong, Hong-Xing Yu, Jiajun Wu, and Yunzhu Li. Reconstruction and simulation of elastic objects with spring- mass 3d gaussians.European Conference on Computer Vi- sion (ECCV), 2024. 3

2024
[42]

Drivinggaussian: Composite gaussian splatting for surrounding dynamic au- tonomous driving scenes

Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. Drivinggaussian: Composite gaussian splatting for surrounding dynamic au- tonomous driving scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21634–21643, 2024. 2, 8

2024

[1] [1]

Physically embodied gaussian splatting: A re- altime correctable world model for robotics

Jad Abou-Chakra, Krishan Rana, Feras Dayoub, and Niko Suenderhauf. Physically embodied gaussian splatting: A re- altime correctable world model for robotics. In8th Annual Conference on Robot Learning, 2024. 3

2024

[2] [2]

Taylor, and Michael Posa

Bibit Bianchini, Minghan Zhu, Mengti Sun, Bowen Jiang, Camillo J. Taylor, and Michael Posa. Vysics: Object recon- struction under occlusion by fusing vision and contact-rich physics. InRobotics: Science and Systems (RSS), 2025. 3

2025

[3] [3]

Gic: Gaussian-informed continuum for physical property identi- fication and simulation

Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, and Qifeng Chen. Gic: Gaussian-informed continuum for physical property identi- fication and simulation. InAdvances in Neural Information Processing Systems, pages 75035–75063. Curran Associates, Inc., 2024. 3

2024

[4] [4]

NeuMA: Neural material adaptor for visual grounding of intrinsic dynamics

Junyi Cao, Shanyan Guan, Yanhao Ge, Wei Li, Xiaokang Yang, and Chao Ma. NeuMA: Neural material adaptor for visual grounding of intrinsic dynamics. InThe Thirty-eighth Annual Conference on Neural Information Processing Sys- tems, 2024. 3

2024

[5] [5]

Vid2sim: Generalizable, video-based reconstruction of ap- pearance, geometry and physics for mesh-free simulation

Chuhao Chen, Zhiyang Dou, Chen Wang, Yiming Huang, Anjun Chen, Qiao Feng, Jiatao Gu, and Lingjie Liu. Vid2sim: Generalizable, video-based reconstruction of ap- pearance, geometry and physics for mesh-free simulation. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 26545–26555, 2025. 3

2025

[6] [6]

Dream- scene4d: Dynamic multi-object scene generation from monocular videos

Wen-Hsuan Chu, Lei Ke, and Katerina Fragkiadaki. Dream- scene4d: Dynamic multi-object scene generation from monocular videos. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 2, 3

2024

[7] [7]

Robust multi- object 4d generation for in-the-wild videos

Wen-Hsuan Chu, Lei Ke, Jianmeng Liu, Mingxiao Huo, Pavel Tokmakov, and Katerina Fragkiadaki. Robust multi- object 4d generation for in-the-wild videos. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 22067–22077, 2025. 2, 3

2025

[8] [8]

Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes.arXiv preprint arXiv:2312.14937, 2023

Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes.arXiv preprint arXiv:2312.14937, 2023. 2

arXiv 2023

[9] [9]

gradsim: Differentiable simulation for system identification and visuomotor con- trol.International Conference on Learning Representations (ICLR), 2021

Krishna Murthy Jatavallabhula, Miles Macklin, Florian Golemo, Vikram V oleti, Linda Petrini, Martin Weiss, Bre- andan Considine, Jerome Parent-Levesque, Kevin Xie, Kenny Erleben, Liam Paull, Florian Shkurti, Derek Nowrouzezahrai, and Sanja Fidler. gradsim: Differentiable simulation for system identification and visuomotor con- trol.International Conferenc...

2021

[10] [10]

Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025. 3

2025

[11] [11]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 2

2023

[12] [12]

4d gaussian splatting in the wild with uncertainty-aware regularization

Mijeong Kim, Jongwoo Lim, and Bohyung Han. 4d gaussian splatting in the wild with uncertainty-aware regularization. InThe Thirty-eighth Annual Conference on Neural Informa- tion Processing Systems, 2024. 3

2024

[13] [13]

Differentiable physics simulation of dynamics-augmented neural objects.IEEE Robotics and Au- tomation Letters, 8(5):2780–2787, 2023

Simon Le Cleac’h, Hong-Xing Yu, Michelle Guo, Tay- lor Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, and Mac Schwager. Differentiable physics simulation of dynamics-augmented neural objects.IEEE Robotics and Au- tomation Letters, 8(5):2780–2787, 2023. 3

2023

[14] [14]

Fully explicit dynamic gaussian splat- ting

Junoh Lee, Changyeon Won, Hyunjun Jung, Inhwan Bae, and Hae-Gon Jeon. Fully explicit dynamic gaussian splat- ting. InThe Thirty-eighth Annual Conference on Neural In- formation Processing Systems, 2024. 2, 8

2024

[15] [15]

Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion

Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, and Lin Gu. Dngaussian: Optimizing sparse-view 3d gaussian radiance fields with global-local depth normaliza- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20775– 20785, 2024. 3

2024

[16] [16]

Pac-nerf: Physics augmented continuum neural radi- ance fields for geometry-agnostic system identification

Xuan Li, Yi-Ling Qiao, Peter Yichen Chen, Krishna Murthy Jatavallabhula, Ming Lin, Chenfanfu Jiang, and Chuang Gan. Pac-nerf: Physics augmented continuum neural radi- ance fields for geometry-agnostic system identification. In ICLR, 2023. 3

2023

[17] [17]

OmniphysGS: 3d constitutive gaussians for general physics- based dynamics generation

Yuchen Lin, Chenguo Lin, Jianjin Xu, and Yadong MU. OmniphysGS: 3d constitutive gaussians for general physics- based dynamics generation. InThe Thirteenth International Conference on Learning Representations, 2025. 3

2025

[18] [18]

Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis. In3DV, 2024. 2

2024

[19] [19]

Warp: A high-performance python frame- work for gpu simulation and graphics, 2022

Miles Macklin. Warp: A high-performance python frame- work for gpu simulation and graphics, 2022. Presented at the NVIDIA GPU Technology Conference (GTC). 4

2022

[20] [20]

Splat- ting physical scenes: End-to-end real-to-sim from imperfect robot data, 2025

Ben Moran, Mauro Comi, Arunkumar Byravan, Steven Bo- hez, Tom Erez, Zhibin Li, and Leonard Hasenclever. Splat- ting physical scenes: End-to-end real-to-sim from imperfect robot data, 2025. 3

2025

[21] [21]

Phyrecon: Physically plausible neural scene recon- struction

Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, and Siyuan Huang. Phyrecon: Physically plausible neural scene recon- struction. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. 3

2024

[22] [22]

pix2gestalt: Amodal segmentation by synthesizing wholes

Ege Ozguroglu, Ruoshi Liu, D ´ıdac Sur´ıs, Dian Chen, Achal Dave, Pavel Tokmakov, and Carl V ondrick. pix2gestalt: Amodal segmentation by synthesizing wholes. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3931–3940, 2024. 3

2024

[23] [23]

Dropgaus- sian: Structural regularization for sparse-view gaussian splatting

Hyunwoo Park, Gun Ryu, and Wonjun Kim. Dropgaus- sian: Structural regularization for sparse-view gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21600–21609, 2025. 3, 4

2025

[24] [24]

Language-driven physics-based scene synthesis and editing via feature splatting

Ri-Zhao Qiu, Ge Yang, Weijia Zeng, and Xiaolong Wang. Language-driven physics-based scene synthesis and editing via feature splatting. InEuropean Conference on Computer Vision (ECCV), 2024. 3

2024

[25] [25]

Learning rigid-body simulators over implicit shapes for large-scale scenes and vision

Yulia Rubanova, Tatiana Lopez-Guevara, Kelsey R Allen, William F Whitney, Kim Stachenfeld, and Tobias Pfaff. Learning rigid-body simulators over implicit shapes for large-scale scenes and vision. InThe Thirty-eighth An- nual Conference on Neural Information Processing Systems,

[26] [26]

Fleet, and Andrea Tagliasacchi

Sara Sabour, Suhani V ora, Daniel Duckworth, Ivan Krasin, David J. Fleet, and Andrea Tagliasacchi. Robustnerf: Ig- noring distractors with robust losses. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20626–20636, 2023. 3

2023

[27] [27]

Spotlesssplats: Ignoring dis- tractors in 3d gaussian splatting.ACM Trans

Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David Fleet, and Andrea Tagliasacchi. Spotlesssplats: Ignoring dis- tractors in 3d gaussian splatting.ACM Trans. Graph., 44(2),

[28] [28]

Newton: Gpu-accelerated physics simulation for robotics and simulation research.https: //github.com/newton- physics/newton, 2025

The Newton Contributors. Newton: Gpu-accelerated physics simulation for robotics and simulation research.https: //github.com/newton- physics/newton, 2025. Released April 22, 2025. Apache-2.0 License. 4, 5

2025

[29] [29]

Learning to track with object permanence

Pavel Tokmakov, Jie Li, Wolfram Burgard, and Adrien Gaidon. Learning to track with object permanence. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10860–10869, 2021. 2, 3

2021

[30] [30]

Superpoint Gaus- sian splatting for real-time high-fidelity dynamic scene re- construction

Diwen Wan, Ruijie Lu, and Gang Zeng. Superpoint Gaus- sian splatting for real-time high-fidelity dynamic scene re- construction. InProceedings of the 41st International Con- ference on Machine Learning, pages 49957–49972. PMLR,

[31] [31]

Shape of mo- tion: 4d reconstruction from a single video

Qianqian Wang, Vickie Ye, Hang Gao, Weijia Zeng, Jake Austin, Zhengqi Li, and Angjoo Kanazawa. Shape of mo- tion: 4d reconstruction from a single video. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 9660–9672, 2025. 2

2025

[32] [32]

4d gaussian splatting for real-time dynamic scene render- ing

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene render- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20310– 20320, 2024. 2, 3, 8

2024

[33] [33]

Amodal3r: Amodal 3d reconstruction from occluded 2d images

Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, and Tat-Jen Cham. Amodal3r: Amodal 3d reconstruction from occluded 2d images. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9181–9193, 2025. 3

2025

[34] [34]

Physgaussian: Physics- integrated 3d gaussians for generative dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. Physgaussian: Physics- integrated 3d gaussians for generative dynamics. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4389–4398, 2024. 3, 5

2024

[35] [35]

Street gaussians: Modeling dynamic urban scenes with gaussian splatting

Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians: Modeling dynamic urban scenes with gaussian splatting. InECCV, 2024. 2

2024

[36] [36]

Depth any- thing v2

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2. InAdvances in Neural Information Processing Sys- tems, pages 21875–21911. Curran Associates, Inc., 2024. 4

2024

[37] [37]

Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20331–20341, 2024. 2

2024

[38] [38]

Gaussian grouping: Segment and edit anything in 3d scenes

Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. Gaussian grouping: Segment and edit anything in 3d scenes. InECCV, 2024. 2, 8

2024

[39] [39]

Ir- shad, and Ken Goldberg

Justin Yu, Kush Hari, Karim El-Refai, Arnav Dalil, Justin Kerr, Chung-Min Kim, Richard Cheng, Muhammad Z. Ir- shad, and Ken Goldberg. Persistent object gaussian splat (pogs) for tracking human and robot manipulation of irregu- larly shaped objects.ICRA, 2025. 3

2025

[40] [40]

Mip-splatting: Alias-free 3d gaussian splat- ting

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 19447– 19456, 2024. 3, 4

2024

[41] [41]

Reconstruction and simulation of elastic objects with spring- mass 3d gaussians.European Conference on Computer Vi- sion (ECCV), 2024

Licheng Zhong, Hong-Xing Yu, Jiajun Wu, and Yunzhu Li. Reconstruction and simulation of elastic objects with spring- mass 3d gaussians.European Conference on Computer Vi- sion (ECCV), 2024. 3

2024

[42] [42]

Drivinggaussian: Composite gaussian splatting for surrounding dynamic au- tonomous driving scenes

Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. Drivinggaussian: Composite gaussian splatting for surrounding dynamic au- tonomous driving scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21634–21643, 2024. 2, 8

2024