Learning a Particle Dynamics Model with Real-world Videos

Chanho Kim; Li Fuxin; Suhas V. Sumukh

arxiv: 2605.23845 · v1 · pith:5VZNKTZPnew · submitted 2026-05-22 · 💻 cs.CV

Learning a Particle Dynamics Model with Real-world Videos

Chanho Kim , Suhas V. Sumukh , Li Fuxin This is my paper

Pith reviewed 2026-05-25 04:45 UTC · model grok-4.3

classification 💻 cs.CV

keywords particle dynamicsGaussian splattingreal-world videosrendering supervisiondynamics predictionunsupervised learningobject interactionsrotation forecasting

0 comments

The pith

A particle dynamics model can be trained directly on unlabeled real-world videos by supervising predictions through differentiable rendering of dense Gaussians.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to learn how objects move and rotate in physical scenes using only ordinary video recordings. It represents each scene as a dense collection of particles that carry position, scale, and rotation information taken from a Gaussian splatting reconstruction. A neural network then forecasts the future positions and rotations of every particle. The only training signal comes from rendering the updated particles back into images and penalizing the difference from the next video frame. This setup removes the need for simulated environments, point tracks, or any direct labels on particle states.

Core claim

The central claim is that a particle-based dynamics model compatible with Gaussian splatting can be trained on real videos alone. The model receives dense particles that already encode scale and rotation, then predicts their position and rotation increments at each time step. Supervision occurs exclusively by rendering the forecasted particles into images and comparing them to the observed video frames, without any particle-level ground truth, correspondences, or subsampling of the Gaussian set.

What carries the argument

Particle dynamics predictor that ingests dense Gaussian-derived particles carrying scales and rotations and outputs their position and rotation changes, trained end-to-end by rendering supervision.

If this is right

Dynamics models become trainable on real footage instead of requiring synthetic data with perfect state information.
The method works with the full dense set of particles without any anchor-point subsampling.
Both translational and rotational motion are predicted within the same learned model.
A dataset of roughly 500 real videos of object interactions is released to support further study.
Learning proceeds without any requirement for labeled particle trajectories or point matches across frames.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same rendering-supervision loop could be tested on longer prediction horizons to measure how quickly errors accumulate.
Combining the learned particle predictor with other differentiable renderers might broaden the range of scenes that can be handled.
Robotic systems that observe only camera streams could use the trained model to anticipate future states of manipulated objects.
The approach invites direct comparison against physics engines on the released video set to quantify any remaining sim-to-real gap.

Load-bearing premise

Rendering supervision from video frames alone supplies enough signal to recover accurate particle dynamics and rotations without direct state labels, point correspondences, or heuristic subsampling.

What would settle it

If the dynamics model, when rolled forward and rendered, produces image sequences that diverge substantially from held-out real video frames of new object interactions, the claim that rendering alone suffices for learning would be refuted.

Figures

Figures reproduced from arXiv: 2605.23845 by Chanho Kim, Li Fuxin, Suhas V. Sumukh.

**Figure 1.** Figure 1: Example sequences illustrating the physical scenarios of interest. The dataset captures multi-object interactions with complex [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Objects used in our dataset. The falling-cube-stack sce [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Data collection setup with four Intel D455 RealSense [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: An overview of our data collection pipeline. It enables learning collision dynamics from real-world videos by providing two [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Data-driven learning approaches for physics simulation, sometimes referred to as world models, have emerged as promising alternatives to traditional physics simulators due to their differentiable nature. Prior work has demonstrated impressive results in predicting the motions of rigid and non-rigid objects in complex scenes involving multiple interacting bodies. However, these models are typically trained in simulated environments because obtaining perfect state information such as complete scene point clouds and point correspondences over time is challenging in real-world settings. This reliance on synthetic data can limit their applicability when the sim-to-real gap is large. In this work, we aim to overcome these limitations by introducing a novel framework for training neural object dynamics models directly from unlabeled real-world videos. Specifically, we propose to learn a particle-based dynamics model compatible with a Gaussian splatting framework, which operates on dense particles derived from Gaussians (i.e., particles with scales and rotations) and predicts their position and rotation changes over time. The model is trained via rendering supervision, enabling learning from real-world videos without requiring particle-level labeled states. Our model operates directly on dense Gaussians without relying on heuristic subsampling anchor points. To enable this study, we also present a real-world dataset consisting of about 500 videos capturing diverse object interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper trains particle dynamics on dense Gaussians from real videos using only rendering supervision, but the abstract gives no evidence that this loss recovers accurate 3D positions and rotations.

read the letter

The paper's main move is to learn a particle-based dynamics model that works with Gaussian splatting, taking dense particles that carry scales and rotations, predicting their position and rotation deltas over time, and training the whole thing from unlabeled real videos through a rendering loss. They also release a dataset of roughly 500 videos showing object interactions. This combination—dense unsampled Gaussians, direct rotation prediction, and no need for state labels or point tracks—is presented as new relative to prior work that relied on synthetic data with perfect correspondences.

Referee Report

1 major / 0 minor

Summary. The paper proposes a particle-based dynamics model compatible with Gaussian splatting that operates on dense particles (with scales and rotations) derived from Gaussians and predicts per-particle position and rotation changes over time. The model is trained end-to-end via rendering supervision on unlabeled real-world videos, without particle-level state labels or point correspondences, and the authors introduce a new dataset of approximately 500 videos of object interactions.

Significance. If the central claim holds, the work would enable training of differentiable world models directly on real video data, reducing dependence on simulated environments with perfect state information and potentially narrowing the sim-to-real gap. The release of a real-world video dataset of object interactions is a concrete positive contribution that could support follow-on research.

major comments (1)

[Abstract] Abstract: the claim that rendering supervision alone supplies sufficient signal to recover accurate 3D position and rotation deltas for every dense Gaussian-derived particle is not supported by any derivation, loss formulation, or analysis in the provided manuscript. Because the model predicts deltas directly on the full unsampled set and receives no explicit 3D supervision or point tracks, the abstract leaves open the possibility that multiple incorrect dynamics produce visually plausible renderings.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and the opportunity to clarify the claims in our work. We address the major comment on the abstract below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that rendering supervision alone supplies sufficient signal to recover accurate 3D position and rotation deltas for every dense Gaussian-derived particle is not supported by any derivation, loss formulation, or analysis in the provided manuscript. Because the model predicts deltas directly on the full unsampled set and receives no explicit 3D supervision or point tracks, the abstract leaves open the possibility that multiple incorrect dynamics produce visually plausible renderings.

Authors: We agree the abstract would benefit from greater precision. The manuscript formulates the training objective as an image-space rendering loss between predicted particle states (position/rotation deltas applied to dense Gaussians) and observed video frames, optimized end-to-end without 3D labels. However, we acknowledge the absence of a formal identifiability analysis or derivation showing that the recovered dynamics are unique rather than merely rendering-consistent. In revision we will (1) revise the abstract to state that the model learns dynamics consistent with observed renderings, and (2) add a short discussion section on potential ambiguities and the role of temporal consistency and the dense particle representation in mitigating them. This is a substantive clarification rather than a change to the method or experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: dynamics model trained via independent rendering loss

full rationale

The paper's central derivation trains a particle dynamics model to predict per-particle position and rotation deltas directly from dense Gaussians, with supervision coming solely from a rendering loss on real video frames and no particle-level state labels or correspondences. This setup does not reduce the predicted deltas to the inputs by construction, nor does it rely on self-citations, fitted parameters renamed as predictions, or ansatzes smuggled from prior work. The abstract and description present a standard end-to-end learning pipeline where the loss signal is external to the model's forward predictions, making the chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no concrete information on free parameters, background axioms, or newly postulated entities; the particle representation and Gaussian splatting are presumed to draw from prior literature.

pith-pipeline@v0.9.0 · 5747 in / 1178 out tokens · 25553 ms · 2026-05-25T04:45:42.676395+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Physical design using differ- entiable learned simulators.Neural Information Processing Systems (NeurIPS 2022), 2022

Kelsey R Allen, Tatiana Lopez-Guevara, Kimberly Stachen- feld, Alvaro Sanchez-Gonzalez, Peter Battaglia, Jessica Hamrick, and Tobias Pfaff. Physical design using differ- entiable learned simulators.Neural Information Processing Systems (NeurIPS 2022), 2022. 1

work page 2022
[2]

Graph network simulators can learn discon- tinuous, rigid contact dynamics

Kelsey R Allen, Tatiana Lopez Guevara, Yulia Rubanova, Kim Stachenfeld, Alvaro Sanchez-Gonzalez, Peter Battaglia, and Tobias Pfaff. Graph network simulators can learn discon- tinuous, rigid contact dynamics. InCORL, pages 1157–1167. PMLR, 2023. 3

work page 2023
[3]

Physion: Evaluating physical prediction from vision in humans and machines

Daniel Bear, Elias Wang, Damian Mrowca, Felix Binder, Hsiao-Yu Tung, Pramod RT, Cameron Holdaway, Sirui Tao, Kevin Smith, Fan-Yun Sun, Fei-Fei Li, Nancy Kanwisher, Josh Tenenbaum, Dan Yamins, and Judith Fan. Physion: Evaluating physical prediction from vision in humans and machines. InProceedings of the Neural Information Pro- cessing Systems Track on Dat...

work page
[4]

Reinforcement learning with neural ra- diance fields

Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, and Marc Toussaint. Reinforcement learning with neural ra- diance fields. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 1, 2

work page 2022
[5]

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J Fleet, Dan Gnanapra- gasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh- Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Rad- wan, Daniel Rebain, Sara Sabour...

work page 2022
[6]

Learning physical dynam- ics with subequivariant graph neural networks

Jiaqi Han, Wenbing Huang, Hengbo Ma, Jiachen Li, Josh Tenenbaum, and Chuang Gan. Learning physical dynam- ics with subequivariant graph neural networks. InNeuRIPS, pages 26256–26268, 2022. 1

work page 2022
[7]

Adam W. Harley, Yang You, Xinglong Sun, Yang Zheng, Nikhil Raghuraman, Yunqi Gu, Sheldon Liang, Wen-Hsuan Chu, Achal Dave, Pavel Tokmakov, Suya You, Rares Am- brus, Katerina Fragkiadaki, and Leonidas J. Guibas. All- Tracker: Efficient dense point tracking at high resolution. In ICCV, 2025. 3, 6

work page 2025
[8]

Chainqueen: A real-time differen- tiable physical simulator for soft robotics

Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B Tenenbaum, William T Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik. Chainqueen: A real-time differen- tiable physical simulator for soft robotics. In2019 Interna- tional conference on robotics and automation (ICRA), 2019. 1

work page 2019
[9]

Particleformer: A 3d point cloud world model for multi-object, multi-material robotic manip- ulation

Suning Huang, Qianzhong Chen, Xiaohan Zhang, Jiankai Sun, and Mac Schwager. Particleformer: A 3d point cloud world model for multi-object, multi-material robotic manip- ulation. In9th Annual Conference on Robot Learning, 2025. 1

work page 2025
[10]

Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025. 3

work page 2025
[11]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3, 6

work page 2023
[12]

Object dynamics modeling with hierarchical point cloud-based representations

Chanho Kim and Li Fuxin. Object dynamics modeling with hierarchical point cloud-based representations. InCVPR,

work page
[13]

Adam: A method for stochastic optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations, 2015. 5

work page 2015
[14]

Learning particle dynamics for ma- nipulating rigid bodies, deformable objects, and fluids

Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B Tenenbaum, and Antonio Torralba. Learning particle dynamics for ma- nipulating rigid bodies, deformable objects, and fluids. In ICLR, 2019. 1, 7

work page 2019
[15]

Gwm: Towards scalable gaussian world models for robotic manipulation

Guanxing Lu, Baoxiong Jia, Puhao Li, Yixin Chen, Ziwei Wang, Yansong Tang, and Siyuan Huang. Gwm: Towards scalable gaussian world models for robotic manipulation. Proceedings of International Conference on Computer Vi- sion (ICCV), 2025. 1, 3, 7

work page 2025
[16]

Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis. In3DV, 2024. 3, 6, 1

work page 2024
[17]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2

work page 2020
[18]

Cosmos world foundation model platform for physical ai, 2025

NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Ji...

work page 2025
[19]

Battaglia

Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. Learning mesh-based simulation with graph networks. InICLR, 2021. 1

work page 2021
[20]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Learning to simulate complex physics with graph networks

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter Battaglia. Learning to simulate complex physics with graph networks. InICML,

work page
[22]

Robocook: Long-horizon elasto-plastic object manipulation with diverse tools.arXiv preprint arXiv:2306.14447, 2023

Haochen Shi, Huazhe Xu, Samuel Clarke, Yunzhu Li, and Jiajun Wu. Robocook: Long-horizon elasto-plastic object manipulation with diverse tools.arXiv preprint arXiv:2306.14447, 2023. 1

work page arXiv 2023
[23]

Bear, Chuang Gan, Joshua B

Hsiao-Yu Tung, Mingyu Ding, Zhenfang Chen, Daniel M. Bear, Chuang Gan, Joshua B. Tenenbaum, Daniel L. K. Yamins, Judith Fan, and Kevin A. Smith. Physion++: evalu- ating physical scene understanding that requires online infer- ence of different physical properties. InProceedings of the 37th International Conference on Neural Information Pro- cessing System...

work page 2023
[24]

Least-squares estimation of transformation parameters between two point patterns.IEEE Trans

Shinji Umeyama. Least-squares estimation of transformation parameters between two point patterns.IEEE Trans. Pattern Anal. Mach. Intell., 13(4):376–380, 1991. 5

work page 1991
[25]

A distractor-aware memory for visual object tracking with SAM2

Jovana Videnovic, Alan Lukezic, and Matej Kristan. A distractor-aware memory for visual object tracking with SAM2. InCVPR, 2025. 2, 4, 6

work page 2025
[26]

Del: Discrete el- ement learner for learning 3d particle dynamics with neural rendering

Jiaxu Wang, Jingkai Sun, Junhao He, Ziyi Zhang, Qiang Zhang, Mingyuan Sun, and Renjing Xu. Del: Discrete el- ement learner for learning 3d particle dynamics with neural rendering. InAdvances in Neural Information Processing Systems, pages 45703–45736. Curran Associates, Inc., 2024. 3

work page 2024
[27]

Bovik, H.R

Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4): 600–612, 2004. 6

work page 2004
[28]

Foundationstereo: Zero- shot stereo matching.CVPR, 2025

Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. Foundationstereo: Zero- shot stereo matching.CVPR, 2025. 5

work page 2025
[29]

Learning 3d particle-based simulators from RGB-d videos

William F Whitney, Tatiana Lopez-Guevara, Tobias Pfaff, Yulia Rubanova, Thomas Kipf, Kim Stachenfeld, and Kelsey R Allen. Learning 3d particle-based simulators from RGB-d videos. InThe Twelfth International Conference on Learning Representations, 2024. 3, 7

work page 2024
[30]

Modeling the real world with high-density visual particle dynamics

William F Whitney, Jake Varley, Deepali Jain, Krzysztof Marcin Choromanski, Sumeet Singh, and Vikas Sindhwani. Modeling the real world with high-density visual particle dynamics. In8th Annual Conference on Robot Learning, 2024. 2

work page 2024
[31]

4d gaussian splatting for real-time dynamic scene render- ing

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene render- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20310– 20320, 2024. 6, 1

work page 2024
[32]

Pointconv: Deep convolutional networks on 3d point clouds

Wenxuan Wu, Zhongang Qi, and Li Fuxin. Pointconv: Deep convolutional networks on 3d point clouds. InCVPR, pages 9621–9630, 2019. 3, 4

work page 2019
[33]

Pointconvformer: Revenge of the point-based convolution

Wenxuan Wu, Li Fuxin, and Qi Shan. Pointconvformer: Revenge of the point-based convolution. InCVPR, pages 21802–21813, 2023. 3, 4

work page 2023
[34]

Physgaussian: Physics- integrated 3d gaussians for generative dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. Physgaussian: Physics- integrated 3d gaussians for generative dynamics. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4389–4398, 2024. 1, 3, 8

work page 2024
[35]

Tenenbaum, Daniel LK Yamins, Yunzhu Li, and Hsiao-Yu Tung

Haotian Xue, Antonio Torralba, Joshua B. Tenenbaum, Daniel LK Yamins, Yunzhu Li, and Hsiao-Yu Tung. 3d- intphys: Towards more generalized 3d-grounded visual intuitive physics under challenging scenes. InThirty- seventh Conference on Neural Information Processing Sys- tems, 2023. 2

work page 2023
[36]

Particle-grid neural dynamics for learning deformable object models from rgb-d videos

Kaifeng Zhang, Baoyu Li, Kris Hauser, and Yunzhu Li. Particle-grid neural dynamics for learning deformable object models from rgb-d videos. InProceedings of Robotics: Sci- ence and Systems (RSS), 2025. 1, 3

work page 2025
[37]

Dynamic 3d gaussian tracking for graph-based neural dynamics mod- eling

Mingtong Zhang, Kaifeng Zhang, and Yunzhu Li. Dynamic 3d gaussian tracking for graph-based neural dynamics mod- eling. In8th Annual Conference on Robot Learning, 2024. 1, 3, 6, 7

work page 2024
[38]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 6

work page 2018
[39]

Learning 3d-gaussian simulators from rgb videos, 2025

Mikel Zhobro, Andreas Ren ´e Geist, and Georg Martius. Learning 3d-gaussian simulators from rgb videos, 2025. 3

work page 2025
[40]

Reconstruction and simulation of elastic objects with spring- mass 3d gaussians.European Conference on Computer Vi- sion (ECCV), 2024

Licheng Zhong, Hong-Xing Yu, Jiajun Wu, and Yunzhu Li. Reconstruction and simulation of elastic objects with spring- mass 3d gaussians.European Conference on Computer Vi- sion (ECCV), 2024. 3

work page 2024
[41]

On the continuity of rotation representations in neural networks

Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 5 Learning a Particle Dynamics Model with Real-world Videos Supplementary Material A. Network Architecture Details We adopt a U-Net...

work page 2019
[42]

This allows us to evaluate the results using rendering-based metrics, as presented in the main paper

to each predicted object to recover dense Gaussians. This allows us to evaluate the results using rendering-based metrics, as presented in the main paper. E. Ablation on Different 4D Gaussian Genera- tion Methods We compare our approach with dynamic-scene GS meth- ods [16, 31], which can produce 3D Gaussian trajectories for both model input and supervisio...

work page

[1] [1]

Physical design using differ- entiable learned simulators.Neural Information Processing Systems (NeurIPS 2022), 2022

Kelsey R Allen, Tatiana Lopez-Guevara, Kimberly Stachen- feld, Alvaro Sanchez-Gonzalez, Peter Battaglia, Jessica Hamrick, and Tobias Pfaff. Physical design using differ- entiable learned simulators.Neural Information Processing Systems (NeurIPS 2022), 2022. 1

work page 2022

[2] [2]

Graph network simulators can learn discon- tinuous, rigid contact dynamics

Kelsey R Allen, Tatiana Lopez Guevara, Yulia Rubanova, Kim Stachenfeld, Alvaro Sanchez-Gonzalez, Peter Battaglia, and Tobias Pfaff. Graph network simulators can learn discon- tinuous, rigid contact dynamics. InCORL, pages 1157–1167. PMLR, 2023. 3

work page 2023

[3] [3]

Physion: Evaluating physical prediction from vision in humans and machines

Daniel Bear, Elias Wang, Damian Mrowca, Felix Binder, Hsiao-Yu Tung, Pramod RT, Cameron Holdaway, Sirui Tao, Kevin Smith, Fan-Yun Sun, Fei-Fei Li, Nancy Kanwisher, Josh Tenenbaum, Dan Yamins, and Judith Fan. Physion: Evaluating physical prediction from vision in humans and machines. InProceedings of the Neural Information Pro- cessing Systems Track on Dat...

work page

[4] [4]

Reinforcement learning with neural ra- diance fields

Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, and Marc Toussaint. Reinforcement learning with neural ra- diance fields. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 1, 2

work page 2022

[5] [5]

Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J Fleet, Dan Gnanapra- gasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh- Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Rad- wan, Daniel Rebain, Sara Sabour...

work page 2022

[6] [6]

Learning physical dynam- ics with subequivariant graph neural networks

Jiaqi Han, Wenbing Huang, Hengbo Ma, Jiachen Li, Josh Tenenbaum, and Chuang Gan. Learning physical dynam- ics with subequivariant graph neural networks. InNeuRIPS, pages 26256–26268, 2022. 1

work page 2022

[7] [7]

Adam W. Harley, Yang You, Xinglong Sun, Yang Zheng, Nikhil Raghuraman, Yunqi Gu, Sheldon Liang, Wen-Hsuan Chu, Achal Dave, Pavel Tokmakov, Suya You, Rares Am- brus, Katerina Fragkiadaki, and Leonidas J. Guibas. All- Tracker: Efficient dense point tracking at high resolution. In ICCV, 2025. 3, 6

work page 2025

[8] [8]

Chainqueen: A real-time differen- tiable physical simulator for soft robotics

Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B Tenenbaum, William T Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik. Chainqueen: A real-time differen- tiable physical simulator for soft robotics. In2019 Interna- tional conference on robotics and automation (ICRA), 2019. 1

work page 2019

[9] [9]

Particleformer: A 3d point cloud world model for multi-object, multi-material robotic manip- ulation

Suning Huang, Qianzhong Chen, Xiaohan Zhang, Jiankai Sun, and Mac Schwager. Particleformer: A 3d point cloud world model for multi-object, multi-material robotic manip- ulation. In9th Annual Conference on Robot Learning, 2025. 1

work page 2025

[10] [10]

Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. Phystwin: Physics- informed reconstruction and simulation of deformable ob- jects from videos.ICCV, 2025. 3

work page 2025

[11] [11]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 1, 2, 3, 6

work page 2023

[12] [12]

Object dynamics modeling with hierarchical point cloud-based representations

Chanho Kim and Li Fuxin. Object dynamics modeling with hierarchical point cloud-based representations. InCVPR,

work page

[13] [13]

Adam: A method for stochastic optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations, 2015. 5

work page 2015

[14] [14]

Learning particle dynamics for ma- nipulating rigid bodies, deformable objects, and fluids

Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B Tenenbaum, and Antonio Torralba. Learning particle dynamics for ma- nipulating rigid bodies, deformable objects, and fluids. In ICLR, 2019. 1, 7

work page 2019

[15] [15]

Gwm: Towards scalable gaussian world models for robotic manipulation

Guanxing Lu, Baoxiong Jia, Puhao Li, Yixin Chen, Ziwei Wang, Yansong Tang, and Siyuan Huang. Gwm: Towards scalable gaussian world models for robotic manipulation. Proceedings of International Conference on Computer Vi- sion (ICCV), 2025. 1, 3, 7

work page 2025

[16] [16]

Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis

Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by per- sistent dynamic view synthesis. In3DV, 2024. 3, 6, 1

work page 2024

[17] [17]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. InECCV, 2020. 1, 2

work page 2020

[18] [18]

Cosmos world foundation model platform for physical ai, 2025

NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Ji...

work page 2025

[19] [19]

Battaglia

Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. Learning mesh-based simulation with graph networks. InICLR, 2021. 1

work page 2021

[20] [20]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junt- ing Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao- Yuan Wu, Ross Girshick, Piotr Doll´ar, and Christoph Feicht- enhofer. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[21] [21]

Learning to simulate complex physics with graph networks

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter Battaglia. Learning to simulate complex physics with graph networks. InICML,

work page

[22] [22]

Robocook: Long-horizon elasto-plastic object manipulation with diverse tools.arXiv preprint arXiv:2306.14447, 2023

Haochen Shi, Huazhe Xu, Samuel Clarke, Yunzhu Li, and Jiajun Wu. Robocook: Long-horizon elasto-plastic object manipulation with diverse tools.arXiv preprint arXiv:2306.14447, 2023. 1

work page arXiv 2023

[23] [23]

Bear, Chuang Gan, Joshua B

Hsiao-Yu Tung, Mingyu Ding, Zhenfang Chen, Daniel M. Bear, Chuang Gan, Joshua B. Tenenbaum, Daniel L. K. Yamins, Judith Fan, and Kevin A. Smith. Physion++: evalu- ating physical scene understanding that requires online infer- ence of different physical properties. InProceedings of the 37th International Conference on Neural Information Pro- cessing System...

work page 2023

[24] [24]

Least-squares estimation of transformation parameters between two point patterns.IEEE Trans

Shinji Umeyama. Least-squares estimation of transformation parameters between two point patterns.IEEE Trans. Pattern Anal. Mach. Intell., 13(4):376–380, 1991. 5

work page 1991

[25] [25]

A distractor-aware memory for visual object tracking with SAM2

Jovana Videnovic, Alan Lukezic, and Matej Kristan. A distractor-aware memory for visual object tracking with SAM2. InCVPR, 2025. 2, 4, 6

work page 2025

[26] [26]

Del: Discrete el- ement learner for learning 3d particle dynamics with neural rendering

Jiaxu Wang, Jingkai Sun, Junhao He, Ziyi Zhang, Qiang Zhang, Mingyuan Sun, and Renjing Xu. Del: Discrete el- ement learner for learning 3d particle dynamics with neural rendering. InAdvances in Neural Information Processing Systems, pages 45703–45736. Curran Associates, Inc., 2024. 3

work page 2024

[27] [27]

Bovik, H.R

Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4): 600–612, 2004. 6

work page 2004

[28] [28]

Foundationstereo: Zero- shot stereo matching.CVPR, 2025

Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, and Stan Birchfield. Foundationstereo: Zero- shot stereo matching.CVPR, 2025. 5

work page 2025

[29] [29]

Learning 3d particle-based simulators from RGB-d videos

William F Whitney, Tatiana Lopez-Guevara, Tobias Pfaff, Yulia Rubanova, Thomas Kipf, Kim Stachenfeld, and Kelsey R Allen. Learning 3d particle-based simulators from RGB-d videos. InThe Twelfth International Conference on Learning Representations, 2024. 3, 7

work page 2024

[30] [30]

Modeling the real world with high-density visual particle dynamics

William F Whitney, Jake Varley, Deepali Jain, Krzysztof Marcin Choromanski, Sumeet Singh, and Vikas Sindhwani. Modeling the real world with high-density visual particle dynamics. In8th Annual Conference on Robot Learning, 2024. 2

work page 2024

[31] [31]

4d gaussian splatting for real-time dynamic scene render- ing

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene render- ing. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 20310– 20320, 2024. 6, 1

work page 2024

[32] [32]

Pointconv: Deep convolutional networks on 3d point clouds

Wenxuan Wu, Zhongang Qi, and Li Fuxin. Pointconv: Deep convolutional networks on 3d point clouds. InCVPR, pages 9621–9630, 2019. 3, 4

work page 2019

[33] [33]

Pointconvformer: Revenge of the point-based convolution

Wenxuan Wu, Li Fuxin, and Qi Shan. Pointconvformer: Revenge of the point-based convolution. InCVPR, pages 21802–21813, 2023. 3, 4

work page 2023

[34] [34]

Physgaussian: Physics- integrated 3d gaussians for generative dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. Physgaussian: Physics- integrated 3d gaussians for generative dynamics. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4389–4398, 2024. 1, 3, 8

work page 2024

[35] [35]

Tenenbaum, Daniel LK Yamins, Yunzhu Li, and Hsiao-Yu Tung

Haotian Xue, Antonio Torralba, Joshua B. Tenenbaum, Daniel LK Yamins, Yunzhu Li, and Hsiao-Yu Tung. 3d- intphys: Towards more generalized 3d-grounded visual intuitive physics under challenging scenes. InThirty- seventh Conference on Neural Information Processing Sys- tems, 2023. 2

work page 2023

[36] [36]

Particle-grid neural dynamics for learning deformable object models from rgb-d videos

Kaifeng Zhang, Baoyu Li, Kris Hauser, and Yunzhu Li. Particle-grid neural dynamics for learning deformable object models from rgb-d videos. InProceedings of Robotics: Sci- ence and Systems (RSS), 2025. 1, 3

work page 2025

[37] [37]

Dynamic 3d gaussian tracking for graph-based neural dynamics mod- eling

Mingtong Zhang, Kaifeng Zhang, and Yunzhu Li. Dynamic 3d gaussian tracking for graph-based neural dynamics mod- eling. In8th Annual Conference on Robot Learning, 2024. 1, 3, 6, 7

work page 2024

[38] [38]

The unreasonable effectiveness of deep features as a perceptual metric

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018. 6

work page 2018

[39] [39]

Learning 3d-gaussian simulators from rgb videos, 2025

Mikel Zhobro, Andreas Ren ´e Geist, and Georg Martius. Learning 3d-gaussian simulators from rgb videos, 2025. 3

work page 2025

[40] [40]

Reconstruction and simulation of elastic objects with spring- mass 3d gaussians.European Conference on Computer Vi- sion (ECCV), 2024

Licheng Zhong, Hong-Xing Yu, Jiajun Wu, and Yunzhu Li. Reconstruction and simulation of elastic objects with spring- mass 3d gaussians.European Conference on Computer Vi- sion (ECCV), 2024. 3

work page 2024

[41] [41]

On the continuity of rotation representations in neural networks

Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 5 Learning a Particle Dynamics Model with Real-world Videos Supplementary Material A. Network Architecture Details We adopt a U-Net...

work page 2019

[42] [42]

This allows us to evaluate the results using rendering-based metrics, as presented in the main paper

to each predicted object to recover dense Gaussians. This allows us to evaluate the results using rendering-based metrics, as presented in the main paper. E. Ablation on Different 4D Gaussian Genera- tion Methods We compare our approach with dynamic-scene GS meth- ods [16, 31], which can produce 3D Gaussian trajectories for both model input and supervisio...

work page