Recognition: 3 theorem links
· Lean TheoremRigidFormer: Learning Rigid Dynamics using Transformers
Pith reviewed 2026-05-12 02:14 UTC · model grok-4.3
The pith
RigidFormer simulates multi-object rigid-body dynamics from point clouds by advancing objects via compact anchors in a Transformer and projecting updates onto the rigid manifold.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RigidFormer reasons at the object level by advancing each object through compact anchors; Anchor-Vertex Pooling enriches anchors with local geometry while avoiding dense vertex-level message passing; Anchor-based RoPE injects anchor geometry into attention in a permutation-equivariant way for objects and invariant way for anchors; and differentiable Kabsch projection enforces rigidity on the predicted updates. On standard benchmarks this yields performance that matches or exceeds mesh-based methods from point inputs, with faster runtime, better generalization to unseen point counts and other datasets, and scaling to more than 200 objects, plus a preliminary extension to articulated bodies.
What carries the argument
The central mechanism is object-centric attention over compact anchors enriched by Anchor-Vertex Pooling, equipped with Anchor-based RoPE for geometry-aware permutation-equivariant processing, followed by differentiable Kabsch alignment that projects updates onto the rigid-body manifold.
If this is right
- The model matches or beats mesh-based baselines on standard rigid-dynamics benchmarks while using only point inputs.
- Runtime is lower than mesh-based alternatives because computation stays at the anchor and object level rather than vertex level.
- Performance holds when tested on point clouds with resolutions never seen during training and when transferred across different datasets.
- The architecture scales to scenes containing more than 200 interacting objects.
- Treating articulated-body parts as separate objects yields a preliminary command-conditioned extension without changing the core design.
Where Pith is reading between the lines
- The same anchor-plus-projection pattern could let vision pipelines feed raw depth or LiDAR points straight into a physics simulator without an intermediate meshing step.
- Because attention operates at object granularity, the method might extend naturally to hybrid scenes that mix rigid and deformable objects by swapping only the projection step.
- If anchor count is treated as a hyperparameter, one could test whether increasing anchors per object recovers fine contact details that the current compact representation approximates.
Load-bearing premise
Compact anchors with local pooling and Kabsch projection are enough to capture discontinuous contact forces and limit error buildup over long horizons without dense vertex interactions or mesh topology.
What would settle it
A scene of many small objects in repeated stacking or sliding contact where short-term accuracy matches baselines but long-horizon rollouts diverge visibly from ground truth despite the claimed generalization.
Figures
read the original abstract
Learning-based simulation of multi-object rigid-body dynamics remains difficult because contact is discontinuous and errors compound over long horizons. Most existing methods remain tied to mesh connectivity and vertex-level message passing, which limits their applicability to mesh-free inputs such as point clouds and leads to high computational cost. Efficiently modeling high-fidelity rigid-body dynamics from mesh-free representations, therefore, remains challenging. We introduce RigidFormer, an object-centric Transformer-based model that learns mesh-free rigid-body dynamics with controllable integration step sizes. RigidFormer reasons at the object level and advances each object through compact anchors; Anchor-Vertex Pooling enriches these anchors with local vertex features, retaining contact-relevant geometry without dense vertex-level interaction. We propose Anchor-based RoPE to inject anchor geometry into attention while respecting the unordered nature of objects and anchors: object-token processing is permutation-equivariant, and the mean-pooled anchor descriptor is invariant to anchor reindexing while preserving shape extent. RigidFormer further enforces rigidity by projecting updates onto the rigid-body manifold using differentiable Kabsch alignment. On standard benchmarks, RigidFormer outperforms or matches mesh-based baselines using point inputs, runs faster, generalizes to unseen point resolutions and across datasets, and scales to 200+ objects; we also show a preliminary extension to command-conditioned articulated bodies by treating body parts as interacting object-level components.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces RigidFormer, an object-centric Transformer architecture for learning rigid-body dynamics directly from unordered point-cloud inputs. It advances each object via a small set of compact anchors, employs Anchor-Vertex Pooling to inject local vertex geometry into the anchors, uses Anchor-based RoPE to encode anchor geometry in a permutation-equivariant manner, and projects updates onto the rigid manifold via differentiable Kabsch alignment. The central empirical claims are that the model matches or exceeds mesh-based baselines on standard benchmarks while running faster, generalizes to unseen point resolutions and across datasets, and scales to scenes with 200+ objects; a preliminary extension to command-conditioned articulated bodies is also presented.
Significance. If the reported performance and generalization results hold under rigorous scrutiny, the work would be a meaningful step toward mesh-free, scalable rigid-body simulation. The combination of object-centric attention, local geometric pooling, and explicit rigidity projection offers a practical alternative to dense vertex message passing, with potential impact on robotics, graphics, and physics-based learning where point-cloud or depth-sensor data predominate. The Kabsch projection is a clean, differentiable mechanism that directly addresses manifold constraints.
major comments (3)
- [§3.2] §3.2 (Anchor-Vertex Pooling): the claim that mean-pooled local features around a small set of anchors suffice to capture the high-frequency geometric cues required for discontinuous contact forces is load-bearing for the long-horizon stability and cross-resolution generalization results. Because contacts are triggered by precise local geometry at the exact collision instant, averaging can smooth or omit the necessary discontinuities; the subsequent Kabsch projection corrects only the output state, not the learned force update. The manuscript should supply either (a) an ablation isolating pooling radius and anchor count against contact-rich test cases or (b) a quantitative analysis of force-error distribution at contact events.
- [§4] §4 (Experiments): the abstract states that RigidFormer “outperforms or matches mesh-based baselines,” yet the provided text supplies no numerical tables, baseline implementations, or error bars. Without these data it is impossible to assess whether the reported gains are robust to post-hoc hyper-parameter choices or dataset selection. The full experimental section must include (i) per-benchmark quantitative metrics with standard deviations, (ii) ablation tables isolating each proposed component, and (iii) failure-case analysis for long-horizon rollouts.
- [§3.3] §3.3 (Anchor-based RoPE): the invariance claim for the mean-pooled anchor descriptor under anchor re-indexing is stated but not formally proven. Because object-token processing must remain permutation-equivariant while the pooled descriptor must be invariant, a short derivation or explicit invariance check under anchor permutation would strengthen the architectural justification.
minor comments (2)
- [Abstract / §3] The abstract mentions “controllable integration step sizes” but the method section does not specify how step size is encoded or conditioned; a brief clarification would improve reproducibility.
- [Figures] Figure captions and axis labels should explicitly state the number of objects, point resolution, and integration horizon used in each rollout visualization.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Revisions will be made to strengthen the manuscript where the concerns identify gaps in justification or presentation.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Anchor-Vertex Pooling): the claim that mean-pooled local features around a small set of anchors suffice to capture the high-frequency geometric cues required for discontinuous contact forces is load-bearing for the long-horizon stability and cross-resolution generalization results. Because contacts are triggered by precise local geometry at the exact collision instant, averaging can smooth or omit the necessary discontinuities; the subsequent Kabsch projection corrects only the output state, not the learned force update. The manuscript should supply either (a) an ablation isolating pooling radius and anchor count against contact-rich test cases or (b) a quantitative analysis of force-error distribution at contact events.
Authors: We agree that contact discontinuities pose a significant challenge and that mean pooling could in principle attenuate high-frequency cues. Our architecture mitigates this through the combination of local pooling with Anchor-based RoPE (which preserves relative geometry) and the subsequent Kabsch projection. Nevertheless, to provide direct evidence, we will add an ablation varying pooling radius and anchor count on contact-rich subsets, together with a quantitative breakdown of force-prediction error specifically at detected contact instants. revision: yes
-
Referee: [§4] §4 (Experiments): the abstract states that RigidFormer “outperforms or matches mesh-based baselines,” yet the provided text supplies no numerical tables, baseline implementations, or error bars. Without these data it is impossible to assess whether the reported gains are robust to post-hoc hyper-parameter choices or dataset selection. The full experimental section must include (i) per-benchmark quantitative metrics with standard deviations, (ii) ablation tables isolating each proposed component, and (iii) failure-case analysis for long-horizon rollouts.
Authors: We will revise the experimental section to present all quantitative results in clearly formatted tables that include per-benchmark metrics, standard deviations across multiple random seeds, explicit baseline implementation details, component-wise ablation tables, and a dedicated subsection analyzing failure modes observed in long-horizon rollouts. revision: yes
-
Referee: [§3.3] §3.3 (Anchor-based RoPE): the invariance claim for the mean-pooled anchor descriptor under anchor re-indexing is stated but not formally proven. Because object-token processing must remain permutation-equivariant while the pooled descriptor must be invariant, a short derivation or explicit invariance check under anchor permutation would strengthen the architectural justification.
Authors: We will insert a short formal derivation in §3.3 showing that the mean operation over anchors is symmetric and therefore invariant to re-indexing, while the attention mechanism operating on object tokens remains permutation-equivariant with respect to object ordering. revision: yes
Circularity Check
No significant circularity; empirical architecture with independent design choices
full rationale
The paper introduces an empirical Transformer architecture for mesh-free rigid-body simulation. Its load-bearing elements (Anchor-Vertex Pooling, Anchor-based RoPE, differentiable Kabsch projection) are presented as engineering decisions motivated by the need to handle unordered point inputs, preserve local contact geometry, and enforce rigid outputs after learned updates. These choices do not reduce by construction to fitted parameters or prior self-citations; the reported performance gains are measured on external simulation benchmarks via standard supervised training. No self-definitional loops, renamed predictions, or load-bearing uniqueness theorems appear in the derivation. The central claims remain falsifiable against held-out data and baselines.
Axiom & Free-Parameter Ledger
free parameters (1)
- neural network weights and architecture hyperparameters
axioms (2)
- domain assumption Objects remain rigid throughout the simulation
- domain assumption Contact can be adequately captured by local anchor-vertex features without global mesh connectivity
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel (J uniquely satisfies the functional equation) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce RigidFormer, an object-centric Transformer-based model that learns mesh-free rigid-body dynamics with controllable integration step sizes. ... Anchor-Vertex Pooling enriches these anchors with local vertex features ... differentiable Kabsch alignment.
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking (D=3 forced by linking) unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose Anchor-based RoPE to inject anchor geometry into attention ... mean-pooled anchor descriptor is invariant to anchor reindexing
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
On standard benchmarks, RigidFormer outperforms or matches mesh-based baselines using point inputs ... scales to 200+ objects
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Learning rigid dynamics with face interaction graph networks.arXiv preprint arXiv:2212.03574, 2022
Kelsey R Allen, Yulia Rubanova, Tatiana Lopez-Guevara, William Whitney, Alvaro Sanchez- Gonzalez, Peter Battaglia, and Tobias Pfaff. Learning rigid dynamics with face interaction graph networks.arXiv preprint arXiv:2212.03574, 2022
-
[2]
Genesis: A universal and generative physics engine for robotics and beyond
Genesis Authors. Genesis: A universal and generative physics engine for robotics and beyond. URL https://github. com/Genesis-Embodied-AI/Genesis, 2024
work page 2024
-
[3]
Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics.Advances in neural information processing systems, 29, 2016
work page 2016
-
[4]
Deep regression on manifolds: a 3d rotation case study
Romain Brégier. Deep regression on manifolds: a 3d rotation case study. In2021 International Conference on 3D Vision (3DV), pages 166–174. IEEE, 2021
work page 2021
-
[5]
SE3-Nets: Learning rigid body motion using deep neural networks
Arunkumar Byravan and Dieter Fox. SE3-Nets: Learning rigid body motion using deep neural networks. In2017 IEEE International Conference on Robotics and Automation (ICRA), pages 173–180. IEEE, 2017. doi: 10.1109/ICRA.2017.7989023
-
[6]
Michael B Chang, Tomer Ullman, Antonio Torralba, and Joshua B Tenenbaum. A compositional object-based approach to learning physical dynamics.arXiv preprint arXiv:1612.00341, 2016
-
[7]
Hsiao-yu Chen, Edith Tretschk, Tuur Stuyck, Petr Kadlecek, Ladislav Kavan, Etienne V ouga, and Christoph Lassner. Virtual elastic objects. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15827–15837, 2022
work page 2022
-
[8]
Ricky TQ Chen, Brandon Amos, and Maximilian Nickel. Learning neural event functions for ordinary differential equations.arXiv preprint arXiv:2011.03902, 2020
-
[9]
Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016
Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016
work page 2016
-
[10]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019
work page 2019
-
[11]
arXiv preprint arXiv:2106.13281 , year=
C Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax–a differentiable physics engine for large scale rigid body simulation.arXiv preprint arXiv:2106.13281, 2021
-
[12]
Ross Girshick. Fast r-cnn. InProceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015
work page 2015
-
[13]
Teofilo F Gonzalez. Clustering to minimize the maximum intercluster distance.Theoretical computer science, 38:293–306, 1985
work page 1985
-
[14]
Klaus Greff, Francois Belletti, Lucas Beyer, Carl Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Dan Gnanapragasam, Florian Golemo, Charles Herrmann, Thomas Kipf, Abhijit Kundu, Dmitry Lagun, Issam Laradji, Hsueh-Ti (Derek) Liu, Henning Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Radwan, Daniel Rebain, Sara Sabour, Me...
work page 2022
-
[15]
Rotary position embedding for vision transformer
Byeongho Heo, Song Park, Dongyoon Han, and Sangdoo Yun. Rotary position embedding for vision transformer. InEuropean Conference on Computer Vision, pages 289–305. Springer, 2024. 11
work page 2024
-
[16]
arXiv preprint arXiv:1910.00935 , year=
Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand. Difftaichi: Differentiable programming for physical simulation.arXiv preprint arXiv:1910.00935, 2019
-
[17]
Wenlong Huang, Yu-Wei Chao, Arsalan Mousavian, Ming-Yu Liu, Dieter Fox, Kaichun Mo, and Li Fei-Fei. Pointworld: Scaling 3d world models for in-the-wild robotic manipulation. arXiv preprint arXiv:2601.03782, 2026
-
[18]
Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors.Foundations of Crystallography, 32(5):922–923, 1976
work page 1976
-
[19]
Object dynamics modeling with hierarchical point cloud-based representations
Chanho Kim and Li Fuxin. Object dynamics modeling with hierarchical point cloud-based representations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20977–20986, 2024
work page 2024
-
[20]
arXiv preprint arXiv:2603.15031 (2026)
Kimi Team, Guangyu Chen, Yu Zhang, Jianlin Su, Weixin Xu, Siyuan Pan, Yaoyu Wang, Yucheng Wang, Guanduo Chen, Bohong Yin, Yutian Chen, Junjie Yan, Ming Wei, Y . Zhang, Fanqing Meng, Chao Hong, Xiaotong Xie, Shaowei Liu, Enzhe Lu, Yunpeng Tai, Yanru Chen, Xin Men, Haiqing Guo, Y . Charles, Haoyu Lu, Lin Sui, Jinguo Zhu, Zaida Zhou, Weiran He, Weixiao Huang...
-
[21]
Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds
Jiahui Lei, Yijia Weng, Adam W Harley, Leonidas Guibas, and Kostas Daniilidis. Mosca: Dynamic gaussian fusion from casual videos via 4d motion scaffolds. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 6165–6177, 2025
work page 2025
-
[22]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019
work page 2019
-
[23]
Warp: A high-performance python framework for gpu simulation and graphics
Miles Macklin. Warp: A high-performance python framework for gpu simulation and graphics. InNVIDIA GPU Technology Conference (GTC), volume 3, 2022
work page 2022
-
[24]
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning
Viktor Makoviychuk, Lukasz Wawrzyniak, Yunrong Guo, Michelle Lu, Kier Storey, Miles Macklin, David Hoeller, Nikita Rudin, Arthur Allshire, Ankur Handa, et al. Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021
work page internal anchor Pith review arXiv 2021
-
[25]
Mimickit: A reinforcement learning framework for motion imitation and control
Xue Bin Peng. Mimickit: A reinforcement learning framework for motion imitation and control. arXiv preprint arXiv:2510.13794, 2025
-
[26]
Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, and Angjoo Kanazawa. Amp: Adversarial motion priors for stylized physics-based character control.ACM Transactions on Graphics (ToG), 40(4):1–20, 2021
work page 2021
-
[27]
Xue Bin Peng, Yunrong Guo, Lina Halper, Sergey Levine, and Sanja Fidler. Ase: Large-scale reusable adversarial skill embeddings for physically simulated characters.ACM Transactions On Graphics (TOG), 41(4):1–17, 2022
work page 2022
-
[28]
FiLM: Visual reasoning with a general conditioning layer
Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville. FiLM: Visual reasoning with a general conditioning layer. InProceedings of the AAAI Conference on Artificial Intelligence, 2018
work page 2018
-
[29]
Learning mesh- based simulation with graph networks
Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter Battaglia. Learning mesh- based simulation with graph networks. InInternational conference on learning representations, 2020
work page 2020
-
[30]
Pointnet: Deep learning on point sets for 3d classification and segmentation
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017
work page 2017
-
[31]
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017. 12
work page 2017
-
[32]
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, et al. Gated attention for large language models: Non-linearity, sparsity, and attention-sink-free.arXiv preprint arXiv:2505.06708, 2025
work page internal anchor Pith review arXiv 2025
-
[33]
Yulia Rubanova, Tatiana Lopez-Guevara, Kelsey R Allen, William F Whitney, Kimberly Stachenfeld, and Tobias Pfaff. Learning rigid-body simulators over implicit shapes for large- scale scenes and vision.Advances in Neural Information Processing Systems, 37:125809– 125838, 2024
work page 2024
-
[34]
Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
work page 2024
-
[35]
Mujoco: A physics engine for model-based control
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012
work page 2012
-
[36]
Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis, et al. Lion: Latent point diffusion models for 3d shape generation.Advances in Neural Information Processing Systems, 35:10021–10039, 2022
work page 2022
-
[37]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[38]
6-PACK: Category-level 6D pose tracker with anchor-based keypoints
Chen Wang, Roberto Martín-Martín, Danfei Xu, Jun Lv, Cewu Lu, Li Fei-Fei, Silvio Savarese, and Yuke Zhu. 6-PACK: Category-level 6D pose tracker with anchor-based keypoints. In2020 IEEE International Conference on Robotics and Automation (ICRA), pages 10059–10066. IEEE,
-
[39]
doi: 10.1109/ICRA40945.2020.9196679
-
[40]
Tracking everything everywhere all at once
Qianqian Wang, Yen-Yu Chang, Ruojin Cai, Zhengqi Li, Bharath Hariharan, Aleksander Holynski, and Noah Snavely. Tracking everything everywhere all at once. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19795–19806, 2023
work page 2023
-
[41]
Amaury Wei and Olga Fink. Integrating physics and topology in neural networks for learning rigid body dynamics.Nature Communications, 16(1):6867, 2025
work page 2025
-
[42]
Learning 3d particle-based simulators from rgb-d videos.arXiv preprint arXiv:2312.05359, 2023
William F Whitney, Tatiana Lopez-Guevara, Tobias Pfaff, Yulia Rubanova, Thomas Kipf, Kimberly Stachenfeld, and Kelsey R Allen. Learning 3d particle-based simulators from rgb-d videos.arXiv preprint arXiv:2312.05359, 2023
-
[43]
William F Whitney, Jacob Varley, Deepali Jain, Krzysztof Choromanski, Sumeet Singh, and Vikas Sindhwani. Modeling the real world with high-density visual particle dynamics.arXiv preprint arXiv:2406.19800, 2024
-
[44]
Pointflow: 3d point cloud generation with continuous normalizing flows
Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, and Bharath Hariharan. Pointflow: 3d point cloud generation with continuous normalizing flows. InProceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019
work page 2019
-
[45]
Youn-Yeol Yu, Jeongwhan Choi, Woojin Cho, Kookjin Lee, Nayong Kim, Kiseok Chang, Chang- Seung Woo, Ilho Kim, Seok-Woo Lee, Joon-Young Yang, et al. Learning flexible body collision dynamics with hierarchical contact mesh transformer.arXiv preprint arXiv:2312.12467, 2023
-
[46]
Jingyang Yuan, Gongbo Sun, Zhiping Xiao, Hang Zhou, Xiao Luo, Junyu Luo, Yusheng Zhao, Wei Ju, and Ming Zhang. Egode: An event-attended graph ode framework for modeling rigid dynamics.Advances in Neural Information Processing Systems, 37:59093–59118, 2024
work page 2024
-
[47]
Renderformer: Transformer- based neural rendering of triangle meshes with global illumination
Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, and Xin Tong. Renderformer: Transformer- based neural rendering of triangle meshes with global illumination. InProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers, pages 1–11, 2025
work page 2025
-
[48]
Junyi Zhang, Charles Herrmann, Junhwa Hur, Varun Jampani, Trevor Darrell, Forrester Cole, Deqing Sun, and Ming-Hsuan Yang. Monst3r: A simple approach for estimating geometry in the presence of motion.arXiv preprint arXiv:2410.03825, 2024. 13
-
[49]
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip HS Torr, and Vladlen Koltun. Point transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268, 2021
work page 2021
-
[50]
Tesseract: Learning 4d embodied world models, 2025
Haoyu Zhen, Qiao Sun, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, and Chuang Gan. Tesseract: learning 4d embodied world models.arXiv preprint arXiv:2504.20995, 2025
-
[51]
Yaofeng Desmond Zhong, Biswadip Dey, and Amit Chakraborty. Extending lagrangian and hamiltonian neural networks with differentiable contact models.Advances in Neural Information Processing Systems, 34:21910–21922, 2021
work page 2021
-
[52]
Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5745–5753, 2019. 14 Appendix Contents A. More Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
work page 2019
-
[53]
Physics Export:We run the trained policy in Isaac Gym [ 24] and record per-frame rigid body transforms (position and quaternion) for each body part
-
[54]
Meshes exceeding vertex limits undergo quadric decimation
Mesh Conversion:Body part transforms are applied to reference meshes from MuJoCo XML files. Meshes exceeding vertex limits undergo quadric decimation. Temporal Subsampling.We train with step size s=10 for both ASE and G1 (3 Hz effective rate) to capture meaningful locomotion dynamics rather than high-frequency contact oscillations. Training ConfigurationW...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.