PhysiFormer: Learning to Simulate Mechanics in World Space

Andrea Vedaldi; Yiming Chen; Yushi Lan

arxiv: 2606.27364 · v1 · pith:UXNED3O4new · submitted 2026-06-25 · 💻 cs.CV

PhysiFormer: Learning to Simulate Mechanics in World Space

Yiming Chen , Yushi Lan , Andrea Vedaldi This is my paper

Pith reviewed 2026-06-26 05:08 UTC · model grok-4.3

classification 💻 cs.CV

keywords diffusion models3D meshphysical simulationtrajectory predictionworld coordinatestransformerrigid elastic motionmulti-object dynamics

0 comments

The pith

Casting vertex trajectory prediction as a single denoising diffusion process directly in world coordinates produces physically plausible 3D mesh motion without explicit inductive biases for rigidity or causality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

PhysiFormer represents objects as 3D meshes in world coordinates and predicts future vertex trajectories from initial positions, velocities, and material type using a diffusion transformer. It casts the task as one denoising diffusion process rather than building ad-hoc latent spaces or enforcing physical rules. The probabilistic model captures uncertainty and generates diverse futures while generalizing from over 100k simulated trajectories to mixed materials, unseen real-world geometries, and larger object counts. It outperforms autoregressive baselines on trajectory accuracy, rigidity preservation, and momentum consistency. If the approach holds, coordinate-space diffusion offers a route to view-invariant physical world models for robotics and graphics.

Core claim

The central claim is that excellent results on physically-plausible 3D object motion can be obtained without ad-hoc latent spaces or explicit enforcement of rigidity and causality by representing objects as 3D meshes in world coordinates and casting vertex trajectory prediction as a single denoising diffusion process directly in those coordinates.

What carries the argument

A diffusion transformer whose attention is factorised over time, space, and objects, performing denoising directly on vertex positions and velocities in world coordinates.

If this is right

The probabilistic formulation enables sampling of multiple diverse yet plausible futures from identical initial conditions.
Factorised attention supports permutation-invariant reasoning over multiple objects without explicit object encodings.
The model generalises to mixed rigid-elastic interactions and to object counts and real-world geometries not seen during training.
Training on simulated data yields substantially higher trajectory accuracy, rigidity preservation, and physical consistency than autoregressive baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The world-coordinate formulation could integrate directly with 3D reconstruction systems to enable physical prediction from partial observations.
Fine-tuning the trained model on limited real motion-capture data might close the remaining sim-to-real gap for robotic manipulation tasks.
Extending the vertex representation to include additional surface properties could allow the same diffusion process to handle contact with deformable environments.

Load-bearing premise

Trajectories generated by the underlying simulator constitute a sufficient and unbiased training distribution that supports generalization to real-world geometries and mixed-material interactions.

What would settle it

A controlled test in which the model produces trajectories that violate momentum conservation or lose rigidity when applied to a real scanned object with material properties and geometry outside the training distribution would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.27364 by Andrea Vedaldi, Yiming Chen, Yushi Lan.

**Figure 1.** Figure 1: PHYSIFORMER overview. Given initial per-vertex positions X0 ∈ R N×3 and velocities V0 ∈ R N×3 , and material conditions of (1) rigid, (2) deformable, or (3) mixed, PHYSIFORMER predicts full-sequence future vertex trajectories in a single forward pass, producing physically plausible multi-object dynamics, with mesh topology imposed at inference time. Output can be rendered as 4D mesh motion under arbitrary … view at source ↗

**Figure 2.** Figure 2: PHYSIFORMER Architecture. During training, input mesh vertex coordinates in R T ×N×3 are projected into hidden dimension D = 1024 via a linear embedder x_embed, and diffused with noise according to the flow-matching schedule. Each noised vertex token is additively conditioned on first-frame position and velocity embeddings (via separate x_embed_cond and v_embed) and a material embedding. We use 16 prepende… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of PHYSIFORMER against autoregressive baselines on trained 10k rigid object data. At t = 10, rigidity is not preserved in ΦAR_ctx1, but objects remain rigid across all other models. As t increases, all autoregressive baselines diverge due to error accumulation: stationary objects fail to remain at rest, objects escape the implicit bounding box, and object shapes deform severely, eve… view at source ↗

**Figure 4.** Figure 4: PHYSIFORMER generalizes to complex real-world object geometries and object counts not seen during training. Top: Inference on 2 deformable objects (fish and teapot) plus 1 rigid bunny, each with 100 vertices per object. Deformation is most visible for the middle-frame purple teapot. PHYSIFORMER allows mixed-material inference although training only saw uniform material across all objects per scene. Bottom:… view at source ↗

**Figure 5.** Figure 5: PHYSIFORMER-L-10k generalizes to object geometries and counts not seen during training, shown at t = 0, 15, 30, 48. The first row shows the best AR model (T IEr=1.0) on two unseen convex objects. For the following rows, we have top: two unseen convex objects, middle: seven objects from seen convex templates, exceeding the training maximum of five, bottom: three objects with unseen concave geometry. PHYSIFO… view at source ↗

**Figure 6.** Figure 6: Mesh templates and real-world geometries used for dataset generation and out-of [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Physics simulator failure cases occur when boundary contacts are imperfectly resolved, [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Examples of object overlap during inference [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

read the original abstract

We present PhysiFormer, a diffusion transformer for physically-plausible 3D object motion. Unlike video world models that operate in view-dependent pixel space, PhysiFormer represents objects as 3D meshes expressed in world coordinates. Given the initial vertex positions and velocities, as well as object material type, rigid or elastic, the model samples future vertex trajectories. While related neural physics approaches build on ad-hoc latent spaces or explicitly enforce rigidity and causality, PhysiFormer shows that excellent results can be obtained without any such inductive biases, by casting vertex trajectory prediction as a single denoising diffusion process directly in world coordinates. The probabilistic formulation captures uncertainty in the learned dynamics, enabling diverse plausible futures from initial conditions, making this framework potentially useful for applications with unobserved uncertainty. The model features attention factorised over time, space, and objects for efficiency, enabling permutation-invariant multi-object reasoning without needing explicit object encoding. Trained on over 100k simulated trajectories, PhysiFormer generates rigid and elastic mechanics, and generalises to mixed-material settings, unseen real-world geometries, and larger object counts. It substantially outperforms autoregressive baselines in trajectory accuracy, rigidity preservation, and momentum-based physical consistency. Our results position coordinate-space diffusion as a promising step toward view-invariant, geometry-aware world modelling for robotics, graphics, and physical design. Visualisations, code, and models are available at https://yimingc9.github.io/physiformer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PhysiFormer runs a single diffusion process straight on world-coordinate mesh vertices for rigid and elastic motion, but the generalization claims rest on unverified simulator data.

read the letter

The core move is treating vertex trajectory prediction as one denoising diffusion process in raw 3D world coordinates, without latent spaces or hand-coded rigidity and causality terms. The model takes initial positions, velocities, and a material flag, then samples future meshes. Factorized attention across time, space, and objects keeps it efficient and permutation-invariant for multiple bodies.

That setup is the actual novelty. Most prior neural physics work adds explicit constraints or compresses into latents; this one bets that a plain diffusion transformer on coordinates can capture the dynamics anyway. Training on 100k simulated trajectories gives it scale, and the probabilistic output is a reasonable way to handle uncertainty.

The weak part is the evidence. The abstract asserts clear wins on trajectory error, rigidity, and momentum consistency plus generalization to mixed materials, unseen real geometries, and bigger scenes, yet supplies no tables, baselines, or error bars. The claim that simulator trajectories form a sufficient distribution for real-world transfer is stated but not tested in the provided material. If the training shapes and material ranges are narrow, the "no inductive biases needed" result could be narrower than it looks.

This paper is aimed at people building learned world models for robotics or graphics who want a coordinate-space alternative to latent or constraint-heavy simulators. A reader already working on diffusion for structured 3D data would get the most from it.

It deserves peer review. The idea is clean enough and the scale is reasonable; referees can check whether the quantitative results and generalization tests actually hold.

Referee Report

2 major / 1 minor

Summary. The paper presents PhysiFormer, a diffusion transformer that predicts 3D mesh vertex trajectories directly in world coordinates via a single denoising diffusion process. Given initial positions, velocities, and material type (rigid or elastic), the model generates future trajectories without explicit rigidity, causality, or latent-space inductive biases. Trained on over 100k simulated trajectories, it claims to produce physically consistent rigid and elastic motion, generalize to mixed-material interactions, unseen real-world geometries, and larger object counts, and substantially outperform autoregressive baselines on trajectory accuracy, rigidity preservation, and momentum consistency. The probabilistic formulation allows sampling diverse plausible futures, and the architecture uses factorized attention over time, space, and objects for efficiency and permutation invariance.

Significance. If the central claims hold, the work provides evidence that coordinate-space diffusion can achieve strong physical plausibility and generalization without hand-engineered biases or latent encodings, potentially simplifying geometry-aware world models for robotics, graphics, and design. The public release of code, models, and visualizations is a clear strength for reproducibility.

major comments (2)

[Abstract] Abstract: the claim of 'substantially outperforms autoregressive baselines in trajectory accuracy, rigidity preservation, and momentum-based physical consistency' is presented without any quantitative metrics, baseline definitions, error bars, or table references, leaving the magnitude and statistical reliability of the reported gains unassessable from the provided summary.
[Abstract] Abstract and training description: the generalization claims to 'mixed-material settings, unseen real-world geometries, and larger object counts' rest on training exclusively on 100k single-object rigid/elastic simulator trajectories; no details are given on how the simulator's geometry and material sampling distribution matches real-world variation, so out-of-distribution performance could reflect memorization rather than the diffusion formulation.

minor comments (1)

[Abstract] The abstract states that 'visualisations, code, and models are available' at a URL; the manuscript should include a brief statement on the exact license and reproducibility package contents.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve the clarity and specificity of the abstract claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'substantially outperforms autoregressive baselines in trajectory accuracy, rigidity preservation, and momentum-based physical consistency' is presented without any quantitative metrics, baseline definitions, error bars, or table references, leaving the magnitude and statistical reliability of the reported gains unassessable from the provided summary.

Authors: We agree that the abstract would be strengthened by including quantitative references. In the revised version we will update the abstract to cite specific metrics (e.g., relative error reductions) and point to the tables and sections that define the autoregressive baselines, report error bars, and present statistical comparisons. revision: yes
Referee: [Abstract] Abstract and training description: the generalization claims to 'mixed-material settings, unseen real-world geometries, and larger object counts' rest on training exclusively on 100k single-object rigid/elastic simulator trajectories; no details are given on how the simulator's geometry and material sampling distribution matches real-world variation, so out-of-distribution performance could reflect memorization rather than the diffusion formulation.

Authors: The experiments in Sections 4.3–4.4 evaluate zero-shot generalization on mixed-material interactions, unseen real-world geometries, and larger object counts. We acknowledge that the current manuscript provides limited explicit discussion of the simulator sampling distribution. We will add a dedicated paragraph in the dataset section detailing the geometry and material parameter ranges and will include a brief analysis relating these ranges to real-world variation to better support the generalization claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper trains PhysiFormer as a standard denoising diffusion process on externally generated simulation data (over 100k trajectories from rigid/elastic simulators) and evaluates using independent physical-consistency metrics such as trajectory accuracy, rigidity preservation, and momentum conservation. No load-bearing step reduces by construction to a fitted parameter, self-defined quantity, or self-citation chain; the central modeling choice (world-coordinate diffusion without explicit rigidity/causality biases) is an architectural decision whose performance is measured against external baselines and held-out data rather than being tautological with its inputs. Generalization claims rest on empirical results rather than definitional equivalence.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the 100k simulated trajectories are representative and that the diffusion objective alone is sufficient to recover dynamics; no new physical entities are postulated.

free parameters (1)

neural network weights
All model parameters are fitted to the 100k simulated trajectories.

axioms (1)

domain assumption Simulated trajectories accurately capture rigid and elastic mechanics for the objects used in training.
The model is trained exclusively on simulator output and evaluated on generalisation to real geometries.

pith-pipeline@v0.9.1-grok · 5781 in / 1159 out tokens · 25216 ms · 2026-06-26T05:08:49.900183+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 9 canonical work pages · 4 internal anchors

[1]

Building normalizing flows with stochastic interpolants

Michael S Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. InProc. ICLR, 2023

2023
[2]

Learning rigid dynamics with face interaction graph networks.arXiv preprint arXiv:2212.03574, 2022

Kelsey R Allen, Yulia Rubanova, Tatiana Lopez-Guevara, William Whitney, Alvaro Sanchez- Gonzalez, Peter Battaglia, and Tobias Pfaff. Learning rigid dynamics with face interaction graph networks.arXiv preprint arXiv:2212.03574, 2022

work page arXiv 2022
[3]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba, Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Arnaud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, Xia...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Genesis: A generative and universal physics engine for robotics and beyond,

Genesis Authors. Genesis: A generative and universal physics engine for robotics and beyond,
[5]

URLhttps://github.com/Genesis-Embodied-AI/Genesis
[6]

Large steps in cloth simulation

David Baraff and Andrew Witkin. Large steps in cloth simulation. InProc. SIGGRAPH, 1998

1998
[7]

Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray Kavukcuoglu

Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray Kavukcuoglu. Interaction networks for learning about objects, relations and physics. In Proc. NeurIPS, 2016

2016
[8]

Prediction of aerodynamic flow fields using convolutional neural networks.Computational Mechanics, 64(2), 2019

Saakaar Bhatnagar, Yaser Afshar, Shaowu Pan, Karthik Duraisamy, and Shailendra Kaushik. Prediction of aerodynamic flow fields using convolutional neural networks.Computational Mechanics, 64(2), 2019

2019
[9]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Do- minik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv.cs, abs/2311.15127, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Align your latents: High-resolution video synthesis with latent diffusion models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. InProc. CVPR, 2023

2023
[11]

Projective dynamics: Fusing constraint projections for fast simulation

Sofien Bouaziz, Sebastian Martin, Tiantian Liu, Ladislav Kavan, and Mark Pauly. Projective dynamics: Fusing constraint projections for fast simulation. InProc. SIGGRAPH, 2014

2014
[12]

Video generation models as world simulators

Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, and Aditya Ramesh. Video generation models as world simulators. Technical report, OpenAI, 2024

2024
[13]

A discussion of semi-supervised learning and transduction

Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. A discussion of semi-supervised learning and transduction. InSemi-Supervised Learning. The MIT Press, 2006

2006
[14]

Diffusion forcing: Next-token prediction meets full-sequence diffusion

Boyuan Chen, Yilun Du, Diego Martí, et al. Diffusion forcing: Next-token prediction meets full-sequence diffusion. InNeurIPS, 2024

2024
[15]

Diffusion forcing: Next-token prediction meets full-sequence diffusion.arXiv, 2407.01392, 2024

Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, and Vincent Sitzmann. Diffusion forcing: Next-token prediction meets full-sequence diffusion.arXiv, 2407.01392, 2024

work page arXiv 2024
[16]

PhysGen3D: Crafting a miniature interactive world from a single image.CVPR, 2025

Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, and Shenlong Wang. PhysGen3D: Crafting a miniature interactive world from a single image.CVPR, 2025

2025
[17]

Motion 3-to-4: 3D motion reconstruction for 4D synthesis.arXiv, 2601.14253, 2026

Hongyuan Chen, Xingyu Chen, Youjia Zhang, Zexiang Xu, and Anpei Chen. Motion 3-to-4: 3D motion reconstruction for 4D synthesis.arXiv, 2601.14253, 2026

work page arXiv 2026
[18]

Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016

Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016. URLhttp://pybullet.org. 11

2016
[19]

Vision transformers need registers.Proc

Timothée Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Vision transformers need registers.Proc. ICLR, 2024

2024
[20]

A generalization of transformer networks to graphs

Vijay Prakash Dwivedi and Xavier Bresson. A generalization of transformer networks to graphs. Proc. AAAI Workshop, 2021

2021
[21]

arXiv preprint arXiv:2505.19386 (2025) 4

Nate Gillman, Charles Herrmann, Michael Freeman, Daksh Aggarwal, Evan Luo, Deqing Sun, and Chen Sun. Force prompting: Video generation models can learn and generalize physics-based control signals. InProc. NeurIPS, volume 2505.19386, 2025

work page arXiv 2025
[22]

Hood: Hierarchical graphs for generalized modelling of clothing dynamics

Artur Grigorev, Michael J Black, and Otmar Hilliges. Hood: Hierarchical graphs for generalized modelling of clothing dynamics. InProc. CVPR, 2023

2023
[23]

Convolutional neural networks for steady flow approximation

Xiaoxiao Guo, Wei Li, and Francesco Iorio. Convolutional neural networks for steady flow approximation. InProc. SIGKDD, 2016

2016
[24]

Query-key normal- ization for transformers.arXiv, 2020

Alex Henry, Prudhvi Raj Dachapally, Shubham Pawar, and Yuxuan Chen. Query-key normal- ization for transformers.arXiv, 2020

2020
[25]

simple diffusion: End-to-end diffusion for high resolution images

Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. simple diffusion: End-to-end diffusion for high resolution images. InProc. ICML, 2023

2023
[26]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion.arXiv, 2506.08009, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

PhysTwin: Physics-informed reconstruction and simulation of deformable objects from videos

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. PhysTwin: Physics-informed reconstruction and simulation of deformable objects from videos. InProc. ICCV, 2025

2025
[28]

A solution for the best rotation to relate two sets of vectors.F oundations of Crystallography, 32(5), 1976

Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors.F oundations of Crystallography, 32(5), 1976

1976
[29]

How far is video generation from world model: A physical law perspective

Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, and Jiashi Feng. How far is video generation from world model: A physical law perspective. InProc. ICML, 2025

2025
[30]

DINO- foresight: Looking into the future with DINO

Efstathios Karypidis, Ioannis Kakogeorgiou, Spyros Gidaris, and Nikos Komodakis. DINO- foresight: Looking into the future with DINO. InProc. NeurIPS, 2025

2025
[31]

3D Gaussian Splatting for real-time radiance field rendering.Proc

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3D Gaussian Splatting for real-time radiance field rendering.Proc. SIGGRAPH, 42(4), 2023

2023
[32]

What about gravity in video generation? post-training Newton’s laws with verifiable rewards.arXiv, 2512.00425, 2025

Minh-Quan Le, Yuanzhi Zhu, Vicky Kalogeiton, and Dimitris Samaras. What about gravity in video generation? post-training Newton’s laws with verifiable rewards.arXiv, 2512.00425, 2025

work page arXiv 2025
[33]

Codimensional incremental potential contact

Minchen Li, Danny M Kaufman, and Chenfanfu Jiang. Codimensional incremental potential contact. InProc. SIGGRAPH, 2021

2021
[34]

Back to basics: Let denoising generative models denoise

Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise. In CVPR, 2025

2025
[35]

Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids

Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B Tenenbaum, and Antonio Torralba. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. InProc. ICLR, 2019

2019
[36]

Learning visible connectivity dynamics for cloth smoothing

Xingyu Lin, Yufei Wang, Zixuan Huang, and David Held. Learning visible connectivity dynamics for cloth smoothing. InProc. CoRL, 2021

2021
[37]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv.cs, abs/2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[38]

PhysGen: Rigid-body physics-grounded image-to-video generation

Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, and Shenlong Wang. PhysGen: Rigid-body physics-grounded image-to-video generation. InProc. ECCV, 2024. 12

2024
[39]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InProc. ICLR, 2023

2023
[40]

Black, Derek Nowrouzezahrai, Liam Paull, and Weiyang Liu

Zhen Liu, Yao Feng, Michael J. Black, Derek Nowrouzezahrai, Liam Paull, and Weiyang Liu. Meshdiffusion: Score-based generative 3d mesh modeling. InICLR, 2023

2023
[41]

Otaduy, and Steve Marschner

Eder Miguel, Derek Bradley, Bernhard Thomaszewski, Bernd Bickel, Wojciech Matusik, Miguel A. Otaduy, and Steve Marschner. Data-driven estimation of cloth simulation models. In Proc. Eurographics, 2012

2012
[42]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. InProc. ECCV, 2020

2020
[43]

Flexible neural representation for physics prediction

Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Fei-Fei Li, Josh Tenenbaum, and Daniel L K Yamins. Flexible neural representation for physics prediction. InProc. NeurIPS, 2018

2018
[44]

Particle-based fluid simulation for interactive applications

Matthias Müller, David Charypar, and Markus Gross. Particle-based fluid simulation for interactive applications. InProc. Eurographics, 2003

2003
[45]

Genie 3: A new frontier for world models, 2025

Jack Parker-Holder and Shlomi Fruchter. Genie 3: A new frontier for world models, 2025. URL https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/

2025
[46]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProc. ICCV, 2023

2023
[47]

Battaglia

Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. Learning mesh-based simulation with graph networks. InProc. ICLR, 2021

2021
[48]

Allen, William F

Yulia Rubanova, Tatiana Lopez-Guevara, Kelsey R. Allen, William F. Whitney, Kimberly Stachenfeld, and Tobias Pfaff. Learning rigid-body simulators over implicit shapes for large- scale scenes and vision. InProc. NeurIPS, volume k, 2024

2024
[49]

Battaglia

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. Learning to simulate complex physics with graph networks. InProc. ICML, 2020

2020
[50]

The graph neural network model.IEEE Trans

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model.IEEE Trans. on Neural Networks, 2009

2009
[51]

Transformer with implicit edges for particle-based physics simulation

Yidi Shao, Chen Change Loy, and Bo Dai. Transformer with implicit edges for particle-based physics simulation. InProc. ECCV, 2022

2022
[52]

Self-attention with relative position repre- sentations

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position repre- sentations. InProc. NAACL, 2018

2018
[53]

Noam M. Shazeer. GLU variants improve transformer. InarXiv, 2020

2020
[54]

Jianlin Su, Murtadha H. M. Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 2024

2024
[55]

Mujoco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. InProc.IROS, 2012

2012
[56]

Lagrangian fluid simulation with continuous convolutions

Benjamin Ummenhofer, Lukas Prantl, Nils Thuerey, and Vladlen Koltun. Lagrangian fluid simulation with continuous convolutions. InProc. ICLR, 2020

2020
[57]

A simple approach to nonlinear tensile stiffness for accurate cloth simulation

Pascal V olino, Nadia Magnenat-Thalmann, and Francois Faure. A simple approach to nonlinear tensile stiffness for accurate cloth simulation. InProc. SIGGRAPH, 2009

2009
[58]

Integrating physics and topology in neural networks for learning rigid body dynamics.Nature Communications, 16(1), 2025

Amaury Wei and Olga Fink. Integrating physics and topology in neural networks for learning rigid body dynamics.Nature Communications, 16(1), 2025. 13

2025
[59]

PhysGaussian: Physics-integrated 3D Gaussians for generative dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. PhysGaussian: Physics-integrated 3D Gaussians for generative dynamics. InProc. CVPR, 2024

2024
[60]

Learning flexible body collision dynamics with hierarchical contact mesh transformer

Youn-Yeol Yu, Jeongwhan Choi, Woojin Cho, Kookjin Lee, Nayong Kim, Kiseok Chang, ChangSeung Woo, Ilho Kim, SeokWoo Lee, Joon Young Yang, et al. Learning flexible body collision dynamics with hierarchical contact mesh transformer. InProc. ICLR, 2024

2024
[61]

RenderFormer: transformer- based neural rendering of triangle meshes with global illumination

Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, and Xin Tong. RenderFormer: transformer- based neural rendering of triangle meshes with global illumination. InProc. SIGGRAPH, 2025

2025
[62]

Root mean square layer normalization

Biao Zhang and Rico Sennrich. Root mean square layer normalization. InProc. NeurIPS, 2019

2019
[63]

3DShape2VecSet: A 3d shape representation for neural fields and generative diffusion models.ACM Transactions On Graphics (TOG), 42(4):1–16, 2023

Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3DShape2VecSet: A 3d shape representation for neural fields and generative diffusion models.ACM Transactions On Graphics (TOG), 42(4):1–16, 2023

2023
[64]

3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models

Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models. InACM Transactions on Graphics, 2023

2023
[65]

Gaussian variation field diffusion for high-fidelity video-to-4D synthesis

Bowen Zhang, Sicheng Xu, Chuxin Wang, Jiaolong Yang, Feng Zhao, Dong Chen, and Baining Guo. Gaussian variation field diffusion for high-fidelity video-to-4D synthesis. InProc. ICCV, 2025

2025
[66]

Dynamic 3D Gaussian tracking for graph- based neural dynamics modeling

Mingtong Zhang, Kaifeng Zhang, and Yunzhu Li. Dynamic 3D Gaussian tracking for graph- based neural dynamics modeling. InProc. CoRL, 2024

2024
[67]

Reconstruction and simulation of elastic objects with spring-mass 3D Gaussians

Licheng Zhong, Hong-Xing Yu, Jiajun Wu, and Yunzhu Li. Reconstruction and simulation of elastic objects with spring-mass 3D Gaussians. InProc. ECCV, 2024. 14 PHYSIFORMER: Learning to Simulate Mechanics in World Space Supplementary Material A Method Continued A.1 PHYSIFORMER Register Tokens.In input data tokenization, we further prepend Nreg = 16 shared, l...

2024

[1] [1]

Building normalizing flows with stochastic interpolants

Michael S Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. InProc. ICLR, 2023

2023

[2] [2]

Learning rigid dynamics with face interaction graph networks.arXiv preprint arXiv:2212.03574, 2022

Kelsey R Allen, Yulia Rubanova, Tatiana Lopez-Guevara, William Whitney, Alvaro Sanchez- Gonzalez, Peter Battaglia, and Tobias Pfaff. Learning rigid dynamics with face interaction graph networks.arXiv preprint arXiv:2212.03574, 2022

work page arXiv 2022

[3] [3]

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Mojtaba, Komeili, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, Sergio Arnaud, Abha Gejji, Ada Martin, Francois Robert Hogan, Daniel Dugas, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, Xia...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Genesis: A generative and universal physics engine for robotics and beyond,

Genesis Authors. Genesis: A generative and universal physics engine for robotics and beyond,

[5] [5]

URLhttps://github.com/Genesis-Embodied-AI/Genesis

[6] [6]

Large steps in cloth simulation

David Baraff and Andrew Witkin. Large steps in cloth simulation. InProc. SIGGRAPH, 1998

1998

[7] [7]

Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray Kavukcuoglu

Peter W. Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray Kavukcuoglu. Interaction networks for learning about objects, relations and physics. In Proc. NeurIPS, 2016

2016

[8] [8]

Prediction of aerodynamic flow fields using convolutional neural networks.Computational Mechanics, 64(2), 2019

Saakaar Bhatnagar, Yaser Afshar, Shaowu Pan, Karthik Duraisamy, and Shailendra Kaushik. Prediction of aerodynamic flow fields using convolutional neural networks.Computational Mechanics, 64(2), 2019

2019

[9] [9]

Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Do- minik Lorenz, Yam Levi, Zion English, Vikram V oleti, Adam Letts, Varun Jampani, and Robin Rombach. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv.cs, abs/2311.15127, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[10] [10]

Align your latents: High-resolution video synthesis with latent diffusion models

Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. Align your latents: High-resolution video synthesis with latent diffusion models. InProc. CVPR, 2023

2023

[11] [11]

Projective dynamics: Fusing constraint projections for fast simulation

Sofien Bouaziz, Sebastian Martin, Tiantian Liu, Ladislav Kavan, and Mark Pauly. Projective dynamics: Fusing constraint projections for fast simulation. InProc. SIGGRAPH, 2014

2014

[12] [12]

Video generation models as world simulators

Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, Clarence Ng, Ricky Wang, and Aditya Ramesh. Video generation models as world simulators. Technical report, OpenAI, 2024

2024

[13] [13]

A discussion of semi-supervised learning and transduction

Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien. A discussion of semi-supervised learning and transduction. InSemi-Supervised Learning. The MIT Press, 2006

2006

[14] [14]

Diffusion forcing: Next-token prediction meets full-sequence diffusion

Boyuan Chen, Yilun Du, Diego Martí, et al. Diffusion forcing: Next-token prediction meets full-sequence diffusion. InNeurIPS, 2024

2024

[15] [15]

Diffusion forcing: Next-token prediction meets full-sequence diffusion.arXiv, 2407.01392, 2024

Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, and Vincent Sitzmann. Diffusion forcing: Next-token prediction meets full-sequence diffusion.arXiv, 2407.01392, 2024

work page arXiv 2024

[16] [16]

PhysGen3D: Crafting a miniature interactive world from a single image.CVPR, 2025

Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, and Shenlong Wang. PhysGen3D: Crafting a miniature interactive world from a single image.CVPR, 2025

2025

[17] [17]

Motion 3-to-4: 3D motion reconstruction for 4D synthesis.arXiv, 2601.14253, 2026

Hongyuan Chen, Xingyu Chen, Youjia Zhang, Zexiang Xu, and Anpei Chen. Motion 3-to-4: 3D motion reconstruction for 4D synthesis.arXiv, 2601.14253, 2026

work page arXiv 2026

[18] [18]

Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016

Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016. URLhttp://pybullet.org. 11

2016

[19] [19]

Vision transformers need registers.Proc

Timothée Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Vision transformers need registers.Proc. ICLR, 2024

2024

[20] [20]

A generalization of transformer networks to graphs

Vijay Prakash Dwivedi and Xavier Bresson. A generalization of transformer networks to graphs. Proc. AAAI Workshop, 2021

2021

[21] [21]

arXiv preprint arXiv:2505.19386 (2025) 4

Nate Gillman, Charles Herrmann, Michael Freeman, Daksh Aggarwal, Evan Luo, Deqing Sun, and Chen Sun. Force prompting: Video generation models can learn and generalize physics-based control signals. InProc. NeurIPS, volume 2505.19386, 2025

work page arXiv 2025

[22] [22]

Hood: Hierarchical graphs for generalized modelling of clothing dynamics

Artur Grigorev, Michael J Black, and Otmar Hilliges. Hood: Hierarchical graphs for generalized modelling of clothing dynamics. InProc. CVPR, 2023

2023

[23] [23]

Convolutional neural networks for steady flow approximation

Xiaoxiao Guo, Wei Li, and Francesco Iorio. Convolutional neural networks for steady flow approximation. InProc. SIGKDD, 2016

2016

[24] [24]

Query-key normal- ization for transformers.arXiv, 2020

Alex Henry, Prudhvi Raj Dachapally, Shubham Pawar, and Yuxuan Chen. Query-key normal- ization for transformers.arXiv, 2020

2020

[25] [25]

simple diffusion: End-to-end diffusion for high resolution images

Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. simple diffusion: End-to-end diffusion for high resolution images. InProc. ICML, 2023

2023

[26] [26]

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion.arXiv, 2506.08009, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

PhysTwin: Physics-informed reconstruction and simulation of deformable objects from videos

Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, and Yunzhu Li. PhysTwin: Physics-informed reconstruction and simulation of deformable objects from videos. InProc. ICCV, 2025

2025

[28] [28]

A solution for the best rotation to relate two sets of vectors.F oundations of Crystallography, 32(5), 1976

Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors.F oundations of Crystallography, 32(5), 1976

1976

[29] [29]

How far is video generation from world model: A physical law perspective

Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, and Jiashi Feng. How far is video generation from world model: A physical law perspective. InProc. ICML, 2025

2025

[30] [30]

DINO- foresight: Looking into the future with DINO

Efstathios Karypidis, Ioannis Kakogeorgiou, Spyros Gidaris, and Nikos Komodakis. DINO- foresight: Looking into the future with DINO. InProc. NeurIPS, 2025

2025

[31] [31]

3D Gaussian Splatting for real-time radiance field rendering.Proc

Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3D Gaussian Splatting for real-time radiance field rendering.Proc. SIGGRAPH, 42(4), 2023

2023

[32] [32]

What about gravity in video generation? post-training Newton’s laws with verifiable rewards.arXiv, 2512.00425, 2025

Minh-Quan Le, Yuanzhi Zhu, Vicky Kalogeiton, and Dimitris Samaras. What about gravity in video generation? post-training Newton’s laws with verifiable rewards.arXiv, 2512.00425, 2025

work page arXiv 2025

[33] [33]

Codimensional incremental potential contact

Minchen Li, Danny M Kaufman, and Chenfanfu Jiang. Codimensional incremental potential contact. InProc. SIGGRAPH, 2021

2021

[34] [34]

Back to basics: Let denoising generative models denoise

Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise. In CVPR, 2025

2025

[35] [35]

Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids

Yunzhu Li, Jiajun Wu, Russ Tedrake, Joshua B Tenenbaum, and Antonio Torralba. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. InProc. ICLR, 2019

2019

[36] [36]

Learning visible connectivity dynamics for cloth smoothing

Xingyu Lin, Yufei Wang, Zixuan Huang, and David Held. Learning visible connectivity dynamics for cloth smoothing. InProc. CoRL, 2021

2021

[37] [37]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv.cs, abs/2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[38] [38]

PhysGen: Rigid-body physics-grounded image-to-video generation

Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, and Shenlong Wang. PhysGen: Rigid-body physics-grounded image-to-video generation. InProc. ECCV, 2024. 12

2024

[39] [39]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InProc. ICLR, 2023

2023

[40] [40]

Black, Derek Nowrouzezahrai, Liam Paull, and Weiyang Liu

Zhen Liu, Yao Feng, Michael J. Black, Derek Nowrouzezahrai, Liam Paull, and Weiyang Liu. Meshdiffusion: Score-based generative 3d mesh modeling. InICLR, 2023

2023

[41] [41]

Otaduy, and Steve Marschner

Eder Miguel, Derek Bradley, Bernhard Thomaszewski, Bernd Bickel, Wojciech Matusik, Miguel A. Otaduy, and Steve Marschner. Data-driven estimation of cloth simulation models. In Proc. Eurographics, 2012

2012

[42] [42]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: Representing scenes as neural radiance fields for view synthesis. InProc. ECCV, 2020

2020

[43] [43]

Flexible neural representation for physics prediction

Damian Mrowca, Chengxu Zhuang, Elias Wang, Nick Haber, Fei-Fei Li, Josh Tenenbaum, and Daniel L K Yamins. Flexible neural representation for physics prediction. InProc. NeurIPS, 2018

2018

[44] [44]

Particle-based fluid simulation for interactive applications

Matthias Müller, David Charypar, and Markus Gross. Particle-based fluid simulation for interactive applications. InProc. Eurographics, 2003

2003

[45] [45]

Genie 3: A new frontier for world models, 2025

Jack Parker-Holder and Shlomi Fruchter. Genie 3: A new frontier for world models, 2025. URL https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/

2025

[46] [46]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProc. ICCV, 2023

2023

[47] [47]

Battaglia

Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. Learning mesh-based simulation with graph networks. InProc. ICLR, 2021

2021

[48] [48]

Allen, William F

Yulia Rubanova, Tatiana Lopez-Guevara, Kelsey R. Allen, William F. Whitney, Kimberly Stachenfeld, and Tobias Pfaff. Learning rigid-body simulators over implicit shapes for large- scale scenes and vision. InProc. NeurIPS, volume k, 2024

2024

[49] [49]

Battaglia

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. Learning to simulate complex physics with graph networks. InProc. ICML, 2020

2020

[50] [50]

The graph neural network model.IEEE Trans

Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model.IEEE Trans. on Neural Networks, 2009

2009

[51] [51]

Transformer with implicit edges for particle-based physics simulation

Yidi Shao, Chen Change Loy, and Bo Dai. Transformer with implicit edges for particle-based physics simulation. InProc. ECCV, 2022

2022

[52] [52]

Self-attention with relative position repre- sentations

Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position repre- sentations. InProc. NAACL, 2018

2018

[53] [53]

Noam M. Shazeer. GLU variants improve transformer. InarXiv, 2020

2020

[54] [54]

Jianlin Su, Murtadha H. M. Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. RoFormer: Enhanced transformer with rotary position embedding.Neurocomputing, 2024

2024

[55] [55]

Mujoco: A physics engine for model-based control

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. InProc.IROS, 2012

2012

[56] [56]

Lagrangian fluid simulation with continuous convolutions

Benjamin Ummenhofer, Lukas Prantl, Nils Thuerey, and Vladlen Koltun. Lagrangian fluid simulation with continuous convolutions. InProc. ICLR, 2020

2020

[57] [57]

A simple approach to nonlinear tensile stiffness for accurate cloth simulation

Pascal V olino, Nadia Magnenat-Thalmann, and Francois Faure. A simple approach to nonlinear tensile stiffness for accurate cloth simulation. InProc. SIGGRAPH, 2009

2009

[58] [58]

Integrating physics and topology in neural networks for learning rigid body dynamics.Nature Communications, 16(1), 2025

Amaury Wei and Olga Fink. Integrating physics and topology in neural networks for learning rigid body dynamics.Nature Communications, 16(1), 2025. 13

2025

[59] [59]

PhysGaussian: Physics-integrated 3D Gaussians for generative dynamics

Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. PhysGaussian: Physics-integrated 3D Gaussians for generative dynamics. InProc. CVPR, 2024

2024

[60] [60]

Learning flexible body collision dynamics with hierarchical contact mesh transformer

Youn-Yeol Yu, Jeongwhan Choi, Woojin Cho, Kookjin Lee, Nayong Kim, Kiseok Chang, ChangSeung Woo, Ilho Kim, SeokWoo Lee, Joon Young Yang, et al. Learning flexible body collision dynamics with hierarchical contact mesh transformer. InProc. ICLR, 2024

2024

[61] [61]

RenderFormer: transformer- based neural rendering of triangle meshes with global illumination

Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, and Xin Tong. RenderFormer: transformer- based neural rendering of triangle meshes with global illumination. InProc. SIGGRAPH, 2025

2025

[62] [62]

Root mean square layer normalization

Biao Zhang and Rico Sennrich. Root mean square layer normalization. InProc. NeurIPS, 2019

2019

[63] [63]

3DShape2VecSet: A 3d shape representation for neural fields and generative diffusion models.ACM Transactions On Graphics (TOG), 42(4):1–16, 2023

Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3DShape2VecSet: A 3d shape representation for neural fields and generative diffusion models.ACM Transactions On Graphics (TOG), 42(4):1–16, 2023

2023

[64] [64]

3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models

Biao Zhang, Jiapeng Tang, Matthias Niessner, and Peter Wonka. 3DShape2VecSet: A 3D shape representation for neural fields and generative diffusion models. InACM Transactions on Graphics, 2023

2023

[65] [65]

Gaussian variation field diffusion for high-fidelity video-to-4D synthesis

Bowen Zhang, Sicheng Xu, Chuxin Wang, Jiaolong Yang, Feng Zhao, Dong Chen, and Baining Guo. Gaussian variation field diffusion for high-fidelity video-to-4D synthesis. InProc. ICCV, 2025

2025

[66] [66]

Dynamic 3D Gaussian tracking for graph- based neural dynamics modeling

Mingtong Zhang, Kaifeng Zhang, and Yunzhu Li. Dynamic 3D Gaussian tracking for graph- based neural dynamics modeling. InProc. CoRL, 2024

2024

[67] [67]

Reconstruction and simulation of elastic objects with spring-mass 3D Gaussians

Licheng Zhong, Hong-Xing Yu, Jiajun Wu, and Yunzhu Li. Reconstruction and simulation of elastic objects with spring-mass 3D Gaussians. InProc. ECCV, 2024. 14 PHYSIFORMER: Learning to Simulate Mechanics in World Space Supplementary Material A Method Continued A.1 PHYSIFORMER Register Tokens.In input data tokenization, we further prepend Nreg = 16 shared, l...

2024