EndoGSim: Physics-Aware 4D Dynamic Endoscopic Scene Simulations via MLLM-Guided Gaussian Splatting

Beilei Cui; Changjing Liu; Hongliang Ren; Long Bai; Yiming Huang

arxiv: 2605.16022 · v1 · pith:3Y3IL25Enew · submitted 2026-05-15 · 💻 cs.CV

EndoGSim: Physics-Aware 4D Dynamic Endoscopic Scene Simulations via MLLM-Guided Gaussian Splatting

Changjing Liu , Yiming Huang , Long Bai , Beilei Cui , Hongliang Ren This is my paper

Pith reviewed 2026-05-20 19:49 UTC · model grok-4.3

classification 💻 cs.CV

keywords endoscopic scene simulation4D Gaussian splattingphysics-aware reconstructionmaterial point methodmulti-modal large language modelsrobot-assisted surgerydynamic scene simulationdifferentiable physics

0 comments

The pith

A framework initializes material properties via MLLM then refines them with differentiable MPM inside 4D Gaussian Splatting to produce physics-aware endoscopic scene simulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EndoGSim, a unified method that reconstructs and physically simulates dynamic endoscopic scenes for robot-assisted surgery. It represents deformable tissues and tools with 4D Gaussian Splatting augmented by segmentation and depth estimates. An object-wise material field starts with parameters suggested by a pre-trained multi-modal large language model and then tunes those parameters through a differentiable Material Point Method driven by both rendered images and optical flow. The resulting simulations show higher visual fidelity and better physical accuracy than prior techniques on both public and private datasets. If the method holds, it supplies the missing physics layer needed for realistic surgical planning and training.

Core claim

The integration of 4D Gaussian Splatting with an object-wise material field, whose parameters are initialized by pre-trained MLLMs and refined through a differentiable Material Point Method under joint supervision from rendered images and optical flow, produces physics-aware reconstruction and physical simulation of endoscopic scenes.

What carries the argument

The object-wise material field that initializes material parameters via MLLM and refines them through differentiable Material Point Method under joint supervision from rendered images and optical flow.

If this is right

Supplies explicit physical descriptions of tissue and tool dynamics missing from purely visual endoscopic reconstructions.
Delivers higher simulation fidelity and physical accuracy than prior methods on both open-source and in-house datasets.
Supports improved planning, training, and control loops in robot-assisted minimally invasive surgery.
Allows automatic inference of material properties without manual tuning for each new scene.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline could generate large amounts of physically consistent synthetic training data for surgical robots.
Extending the material field to include contact forces between tools and tissue would enable predictive simulation of instrument-tissue interactions.
The MLLM-plus-differentiable-physics pattern may transfer to other domains that need both semantic priors and measurable dynamics, such as soft robotics or fluid simulation.

Load-bearing premise

Pre-trained MLLMs can provide reliable initial material parameters for endoscopic tissues and tools which are then successfully refined by the differentiable MPM under joint image and optical flow supervision.

What would settle it

Observed tissue deformations under controlled instrument forces in real endoscopic video that systematically mismatch the forces predicted by the refined material field would disprove the physical accuracy.

Figures

Figures reproduced from arXiv: 2605.16022 by Beilei Cui, Changjing Liu, Hongliang Ren, Long Bai, Yiming Huang.

**Figure 1.** Figure 1: Overview of our physics-aware framework for surgical scene reconstruction and 4D dynamic simulation with automatic estimation of physical parameters. trained depth and segmentation models to construct a Gaussian splat representation of the surgical scene. Then, we propose an object-wise material field to estimate the physical properties of the tissues and tools. Material parameters are automatically initi… view at source ↗

**Figure 2.** Figure 2: Qualitative results of all methods on EndoNerf, CholecSeg8K, and Porcineendo dataset Qualitative results on the EndoNeRF and PorcineEndo datasets, shown for [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of simulation results from all methods on a sequence of the EndoNeRF dataset, illustrating rendered images and optical flow errors [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation on Material Field (MF): quantitative results on EndoNeRF and CholecSeg8K datasets (left), and qualitative comparison with vs. without MF (right). estimation, coarsely initialized via MLLMs-guide estimation and then jointly refined with render and optical flow loss in a differentiable MLS-MPM. The optimized material properties are then incorporated into the simulation pipeline to enable realistic … view at source ↗

read the original abstract

In robot-assisted minimally invasive surgery, high-fidelity dynamic endoscopic scene reconstruction and simulation are crucial to enhancing downstream tasks and advancing surgical outcomes. However, existing methods primarily focus on visual reconstruction, lacking physics-based descriptions of the scene required for realistic simulation. We propose a unified framework that achieves physics-aware reconstruction and physical simulation of endoscopic scenes through Multi-modal Large Language Models (MLLMs)-guided Gaussian Splatting. Our approach utilizes 4D Gaussian Splatting (4DGS) integrated with pre-trained segmentation and depth estimation to represent deformable tissues and tools. To achieve automatic inference of physical properties, we introduce an object-wise material field that initializes material parameters via MLLM and refines them through a differentiable Material Point Method (MPM) under joint supervision from rendered images and optical flow. Validated on both open-source and in-house datasets, our framework achieves superior simulation fidelity and physical accuracy compared to state-of-the-art methods, underscoring its potential to advance robot-assisted surgical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main move is wiring an MLLM to seed material parameters into 4D Gaussian splatting and then refining them with differentiable MPM for endoscopic scenes.

read the letter

The core contribution is a pipeline that takes 4D Gaussian splatting for dynamic endoscopic reconstruction and adds an object-wise material field. The field starts with parameter guesses from a pre-trained MLLM and gets updated by a differentiable Material Point Method under image and optical-flow losses. This is a fresh combination for surgical simulation work, where most prior efforts stop at visual fidelity and leave physics out. It directly targets the gap between reconstruction and usable simulation for robot-assisted procedures, which is a practical need in training and planning tools. The approach is straightforward in outline and builds on established pieces like 4DGS and MPM without obvious reinvention of those components. Validation is reported on both public and in-house endoscopic datasets, with claims of better simulation fidelity and physical accuracy than existing methods. That framing makes sense for the target audience in medical robotics. The soft spot is the initialization step. Pre-trained MLLMs have no built-in knowledge of soft-tissue stiffness or tool mechanics, so the starting guesses could be off. The paper needs to show that the joint supervision actually recovers accurate constitutive parameters rather than just fitting the 2D observations. Without detailed ablations on how much the MPM refinement corrects bad initials, or quantitative checks against ground-truth material values, the physical-accuracy advantage stays hard to assess. The optical-flow signal is also under-constrained for 3D deformation behavior, which could mask issues. This work is aimed at researchers building physics-aware simulators for minimally invasive surgery. A reader working on 4D reconstruction or surgical robotics would get value from the integration idea and the dataset experiments. It is coherent enough on its own terms to warrant a serious referee, even if the physical claims require tighter evidence in revision.

Referee Report

2 major / 2 minor

Summary. The paper proposes EndoGSim, a unified framework for physics-aware 4D dynamic endoscopic scene reconstruction and simulation. It combines 4D Gaussian Splatting (4DGS) with pre-trained segmentation and depth estimation to represent deformable tissues and tools, introduces an object-wise material field initialized via Multi-modal Large Language Models (MLLMs), and refines material parameters through a differentiable Material Point Method (MPM) under joint supervision from rendered images and optical flow. The method is validated on open-source and in-house datasets and claims superior simulation fidelity and physical accuracy over state-of-the-art approaches for robot-assisted surgical applications.

Significance. If the central claims hold, the work would represent a meaningful step toward bridging visual 4D reconstruction with physics-based simulation in endoscopic scenes. The integration of MLLM-guided initialization with differentiable MPM refinement could enable more realistic deformable tissue modeling, with direct relevance to downstream tasks such as surgical planning and robot control in minimally invasive procedures.

major comments (2)

[§3.3] §3.3 (Object-wise Material Field): The initialization of biomechanical parameters (e.g., Young's modulus, Poisson ratio) for endoscopic tissues and instruments via pre-trained MLLMs is presented as automatic and reliable, yet no experiments quantify the accuracy of these initial values against known tissue properties or demonstrate recovery when initial guesses are deliberately perturbed. This is load-bearing for the physical-accuracy claim because 2-D image and optical-flow losses may under-constrain 3-D constitutive behavior.
[§5.2] §5.2 (Ablation Studies and Quantitative Results): The reported gains in simulation fidelity are attributed to the joint image + optical-flow supervision of the differentiable MPM, but the manuscript lacks an ablation that isolates the MPM refinement step (e.g., comparing MLLM initialization alone versus full refinement, or random versus MLLM initialization). Without this, it is unclear whether the final parameters correspond to real physics or simply overfit the visual losses.

minor comments (2)

[§4.1] Figure 4 caption and §4.1: The description of how MLLM prompts are constructed for material inference is terse; expanding the prompt template and providing example outputs would improve reproducibility.
[§2] Related Work (§2): The discussion of prior physics-informed neural rendering and differentiable simulation methods could cite additional recent works on MPM in medical imaging to better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for highlighting areas where additional evidence would strengthen the physical-accuracy claims. We address each major comment below and have revised the manuscript accordingly to improve clarity and rigor.

read point-by-point responses

Referee: [§3.3] §3.3 (Object-wise Material Field): The initialization of biomechanical parameters (e.g., Young's modulus, Poisson ratio) for endoscopic tissues and instruments via pre-trained MLLMs is presented as automatic and reliable, yet no experiments quantify the accuracy of these initial values against known tissue properties or demonstrate recovery when initial guesses are deliberately perturbed. This is load-bearing for the physical-accuracy claim because 2-D image and optical-flow losses may under-constrain 3-D constitutive behavior.

Authors: We agree that direct quantification of MLLM initialization accuracy and explicit perturbation-recovery experiments would provide stronger support for the physical claims. Obtaining reliable ground-truth biomechanical parameters for in-vivo endoscopic tissues is difficult because such measurements are rarely available in public datasets or the literature. Nevertheless, we have added a new perturbation study in the revised Section 5.2: initial material values are deliberately offset by ±20 % from the MLLM outputs, after which the differentiable MPM is run to convergence. The refined parameters yield measurably lower forward-simulation error (image and optical-flow metrics) than the perturbed initials, indicating that the refinement step corrects for initialization inaccuracies. Regarding potential under-constraint by 2-D losses, the object-wise material field together with joint image-plus-flow supervision and the MPM’s constitutive constraints provide additional regularization; this is evidenced by our method’s superior generalization on held-out sequences compared with purely visual baselines. revision: yes
Referee: [§5.2] §5.2 (Ablation Studies and Quantitative Results): The reported gains in simulation fidelity are attributed to the joint image + optical-flow supervision of the differentiable MPM, but the manuscript lacks an ablation that isolates the MPM refinement step (e.g., comparing MLLM initialization alone versus full refinement, or random versus MLLM initialization). Without this, it is unclear whether the final parameters correspond to real physics or simply overfit the visual losses.

Authors: We acknowledge that an explicit isolation of the MPM refinement contribution is necessary to address concerns about overfitting versus genuine physical improvement. In the revised manuscript we have expanded the ablation table in Section 5.2 with three additional configurations: (i) MLLM initialization without any MPM refinement, (ii) random initialization followed by MPM refinement, and (iii) the full MLLM-plus-MPM pipeline. Quantitative results show that MPM refinement alone improves simulation fidelity over initialization-only baselines, while MLLM initialization yields better starting points and faster convergence than random initialization. Cross-validation on unseen sequences further indicates that the refined parameters do not merely overfit the training losses but generalize, supporting that the final values reflect physically plausible behavior rather than pure visual overfitting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper's central pipeline initializes an object-wise material field from a pre-trained MLLM and refines parameters via differentiable MPM under image and optical-flow supervision. This does not reduce by construction to the inputs: the MLLM supplies an external starting point drawn from general multimodal training rather than a fitted quantity internal to the endoscopic data, and the subsequent optimization is driven by explicit rendering losses. No self-definitional loops, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear in the described derivation. The reported gains in simulation fidelity therefore rest on the empirical success of the joint optimization rather than tautological equivalence to prior quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the effectiveness of MLLM for material initialization and the ability of differentiable MPM to refine parameters under rendering and flow losses; these are treated as domain assumptions rather than derived results.

axioms (1)

domain assumption Pre-trained segmentation and depth estimation models provide accurate enough representations of deformable tissues and tools to support 4DGS initialization.
Invoked to integrate visual reconstruction with the material field.

invented entities (1)

object-wise material field no independent evidence
purpose: To store and optimize per-object physical parameters initialized by MLLM and refined by MPM.
New component introduced to enable automatic inference of material properties for simulation.

pith-pipeline@v0.9.0 · 5720 in / 1356 out tokens · 69270 ms · 2026-05-20T19:49:38.334558+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

object-wise material field that initializes material parameters via MLLM and refines them through a differentiable Material Point Method (MPM) under joint supervision from rendered images and optical flow
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

hyperelastic model parameterized by a vector θ_p = {E, ν}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 2 internal anchors

[1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Anthropic: Introducing claude sonnet 4.5.https://www.anthropic.com/news/ claude-sonnet-4-5(2025)

work page 2025
[3]

Advances in Neural Information Processing Systems37, 75035–75063 (2024)

Cai, J., Yang, Y., Yuan, W., He, Y., Dong, Z., Bo, L., Cheng, H., Chen, Q.: Gic: Gaussian-informed continuum for physical property identification and simulation. Advances in Neural Information Processing Systems37, 75035–75063 (2024)

work page 2024
[4]

Frontiers in Oncology15, 1502014 (2025)

Chen, E., Chen, L., Zhang, W.: Robotic-assisted colorectal surgery in colorectal cancer management: A narrative review of clinical efficacy and multidisciplinary integration. Frontiers in Oncology15, 1502014 (2025)

work page 2025
[5]

In: European conference on computer vision

Chen, Y., Xu, H., Zheng, C., Zhuang, B., Pollefeys, M., Geiger, A., Cham, T.J., Cai, J.: Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In: European conference on computer vision. pp. 370–386. Springer (2024)

work page 2024
[6]

F., Chen, A

Dagli, R., Xiang, D., Modi, V., Loop, C., Tsang, C.F., Chen, A.H., Hu, A., State, G., Levin, D.I., Shugrina, M.: Vomp: Predicting volumetric mechanical property fields. arXiv preprint arXiv:2510.22975 (2025)

work page arXiv 2025
[7]

Advances in applied mechanics 53, 185–398 (2020)

De Vaucorbeil, A., Nguyen, V.P., Sinaie, S., Wu, J.Y.: Material point method after 25 years: Theory, implementation, and applications. Advances in applied mechanics 53, 185–398 (2020)

work page 2020
[8]

International Journal of Surgery112(1), 1652–1672 (2026)

Ding, Y., Wang, S., Lan, R., Lin, W., Liu, X., He, W.: Telerobotic surgery: a comprehensive two-decade evolution and the integration of emerging technologies. International Journal of Surgery112(1), 1652–1672 (2026)

work page 2026
[9]

In: Proceedings of the IEEE international conference on computer vision

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: Flownet: Learning optical flow with convolu- tional networks. In: Proceedings of the IEEE international conference on computer vision. pp. 2758–2766 (2015) 10 Anonymized Author et al

work page 2015
[10]

IEEE Transactions on Medical Imaging 45(2), 528–541 (2025)

Gao, B., Zhou, J., Zou, J., Qin, J.: Endord-gs: Robust deformable endoscopic scene reconstruction via gaussian splatting. IEEE Transactions on Medical Imaging 45(2), 528–541 (2025)

work page 2025
[11]

Google: A new era of intelligence with gemini 3.https://blog.google/products/ gemini/gemini-3(2025)

work page 2025
[12]

Advances in Neural Information Processing Systems30(2017)

Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems30(2017)

work page 2017
[13]

Cholecseg8k: a semantic segmen- tation dataset for laparoscopic cholecystectomy based on cholec80

Hong, W.Y., Kao, C.L., Kuo, Y.H., Wang, J.R., Chang, W.L., Shih, C.S.: Cholec- seg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on cholec80. arXiv preprint arXiv:2012.12453 (2020)

work page arXiv 2012
[14]

ACM Transactions on Graphics (TOC)37(4), 1–14 (2018)

Hu, Y., Fang, Y., Ge, Z., Qu, Z., Zhu, Y., Pradhana, A., Jiang, C.: A moving least squares material point method with displacement discontinuity and two-way rigid body coupling. ACM Transactions on Graphics (TOC)37(4), 1–14 (2018)

work page 2018
[15]

In: Medical Image Computing and Computer Assisted Inter- vention (MICCAI)

Huang,Y.,Bai,L.,Cui,B.,Yuan,K.,Wang,G.,Hoque,M.I.,Padoy,N.,Navab,N., Ren, H.: Surgtpgs: Semantic 3d surgical scene understanding with text promptable gaussian splatting. In: Medical Image Computing and Computer Assisted Inter- vention (MICCAI). pp. 584–594. Springer (2026)

work page 2026
[16]

In: Medical Image Computing and Computer-Assisted Intervention (MICCAI)

Huang, Y., Cui, B., Bai, L., Guo, Z., Xu, M., Islam, M., Ren, H.: Endo-4dgs: Endoscopic monocular scene reconstruction with 4d gaussian splatting. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI). pp. 197–207. Springer (2024)

work page 2024
[17]

ACM Trans

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)

work page 2023
[18]

Physics3D: Learning physical properties of 3D gaussians via video diffusion.arXiv preprint arXiv:2406.04338, 2024

Liu,F.,Wang,H.,Yao,S.,Zhang,S.,Zhou,J.,Duan,Y.:Physics3d:Learningphys- ical properties of 3d gaussians via video diffusion. arXiv preprint arXiv:2406.04338 (2024)

work page arXiv 2024
[19]

arXiv preprint arXiv:2408.07931 (2024)

Liu, H., Zhang, E., Wu, J., Hong, M., Jin, Y.: Surgical sam 2: Real-time segment anything in surgical video by efficient frame pruning. arXiv preprint arXiv:2408.07931 (2024)

work page arXiv 2024
[20]

In: European Conference on Computer Vision (ECCV)

Liu, S., Ren, Z., Gupta, S., Wang, S.: Physgen: Rigid-body physics-grounded image-to-video generation. In: European Conference on Computer Vision (ECCV). pp. 360–378. Springer (2024)

work page 2024
[21]

In: Proceedings of the Computer Vision and Pattern Recognition Con- ference (CVPR)

Liu, Z., Ye, W., Luximon, Y., Wan, P., Zhang, D.: Unleashing the potential of multi-modal foundation models and video diffusion for 4d dynamic physical scene simulation. In: Proceedings of the Computer Vision and Pattern Recognition Con- ference (CVPR). pp. 11016–11025 (2025)

work page 2025
[22]

In: NVIDIA GPU Technology Conference (GTC)

Macklin, M.: Warp: A high-performance python framework for gpu simulation and graphics. In: NVIDIA GPU Technology Conference (GTC). vol. 3 (2022)

work page 2022
[23]

Commu- nications of the ACM65(1), 99–106 (2021)

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commu- nications of the ACM65(1), 99–106 (2021)

work page 2021
[24]

Journal of Robotic Surgery20(1), 186 (2026)

Raptis,S.P.,Theocharopoulos,A.,Theocharopoulos,C.,Papadakos,S.P.,Levantis, G., Kontis, E., Vrahatis, A.G.: Artificial intelligence analysis of minimally invasive surgery data. Journal of Robotic Surgery20(1), 186 (2026)

work page 2026
[25]

ACM Transactions on Graphics (TOG)32(4), 1–10 (2013)

Stomakhin, A., Schroeder, C., Chai, L., Teran, J., Selle, A.: A material point method for snow simulation. ACM Transactions on Graphics (TOG)32(4), 1–10 (2013)

work page 2013
[26]

In: European conference on computer vision

Teed, Z., Deng, J.: Raft: Recurrent all-pairs field transforms for optical flow. In: European conference on computer vision. pp. 402–419. Springer (2020) Title Suppressed Due to Excessive Length 11

work page 2020
[27]

$\pi^3$: Permutation-Equivariant Visual Geometry Learning

Wang, Y., Zhou, J., Zhu, H., Chang, W., Zhou, Y., Li, Z., Chen, J., Pang, J., Shen, C., He, T.:π 3: Permutation-equivariant visual geometry learning. arXiv preprint arXiv:2507.13347 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

In: Medical Image Computing and Computer-Assisted Intervention (MICCAI)

Wang, Y., Long, Y., Fan, S.H., Dou, Q.: Neural rendering for stereo 3d reconstruc- tion of deformable tissues in robotic surgery. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI). pp. 431–441. Springer (2022)

work page 2022
[29]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W., Liu, W., Tian, Q., Wang, X.: 4d gaussian splatting for real-time dynamic scene rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20310– 20320 (2024)

work page 2024
[30]

In: Proceedings of the Computer Vision and Pattern Recognition (CVPR)

Xie, T., Zong, Z., Qiu, Y., Li, X., Feng, Y., Yang, Y., Jiang, C.: Physgaussian: Physics-integrated 3d gaussians for generative dynamics. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR). pp. 4389–4398 (2024)

work page 2024
[31]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20331– 20341 (2024)

work page 2024
[32]

In: International conference on medical image computing and computer-assisted intervention

Zha, R., Cheng, X., Li, H., Harandi, M., Ge, Z.: Endosurf: Neural surface re- construction of deformable tissues with stereo endoscope videos. In: International conference on medical image computing and computer-assisted intervention. pp. 13–23. Springer (2023)

work page 2023

[1] [1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Anthropic: Introducing claude sonnet 4.5.https://www.anthropic.com/news/ claude-sonnet-4-5(2025)

work page 2025

[3] [3]

Advances in Neural Information Processing Systems37, 75035–75063 (2024)

Cai, J., Yang, Y., Yuan, W., He, Y., Dong, Z., Bo, L., Cheng, H., Chen, Q.: Gic: Gaussian-informed continuum for physical property identification and simulation. Advances in Neural Information Processing Systems37, 75035–75063 (2024)

work page 2024

[4] [4]

Frontiers in Oncology15, 1502014 (2025)

Chen, E., Chen, L., Zhang, W.: Robotic-assisted colorectal surgery in colorectal cancer management: A narrative review of clinical efficacy and multidisciplinary integration. Frontiers in Oncology15, 1502014 (2025)

work page 2025

[5] [5]

In: European conference on computer vision

Chen, Y., Xu, H., Zheng, C., Zhuang, B., Pollefeys, M., Geiger, A., Cham, T.J., Cai, J.: Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In: European conference on computer vision. pp. 370–386. Springer (2024)

work page 2024

[6] [6]

F., Chen, A

Dagli, R., Xiang, D., Modi, V., Loop, C., Tsang, C.F., Chen, A.H., Hu, A., State, G., Levin, D.I., Shugrina, M.: Vomp: Predicting volumetric mechanical property fields. arXiv preprint arXiv:2510.22975 (2025)

work page arXiv 2025

[7] [7]

Advances in applied mechanics 53, 185–398 (2020)

De Vaucorbeil, A., Nguyen, V.P., Sinaie, S., Wu, J.Y.: Material point method after 25 years: Theory, implementation, and applications. Advances in applied mechanics 53, 185–398 (2020)

work page 2020

[8] [8]

International Journal of Surgery112(1), 1652–1672 (2026)

Ding, Y., Wang, S., Lan, R., Lin, W., Liu, X., He, W.: Telerobotic surgery: a comprehensive two-decade evolution and the integration of emerging technologies. International Journal of Surgery112(1), 1652–1672 (2026)

work page 2026

[9] [9]

In: Proceedings of the IEEE international conference on computer vision

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: Flownet: Learning optical flow with convolu- tional networks. In: Proceedings of the IEEE international conference on computer vision. pp. 2758–2766 (2015) 10 Anonymized Author et al

work page 2015

[10] [10]

IEEE Transactions on Medical Imaging 45(2), 528–541 (2025)

Gao, B., Zhou, J., Zou, J., Qin, J.: Endord-gs: Robust deformable endoscopic scene reconstruction via gaussian splatting. IEEE Transactions on Medical Imaging 45(2), 528–541 (2025)

work page 2025

[11] [11]

Google: A new era of intelligence with gemini 3.https://blog.google/products/ gemini/gemini-3(2025)

work page 2025

[12] [12]

Advances in Neural Information Processing Systems30(2017)

Heusel,M.,Ramsauer,H.,Unterthiner,T.,Nessler,B.,Hochreiter,S.:Ganstrained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems30(2017)

work page 2017

[13] [13]

Cholecseg8k: a semantic segmen- tation dataset for laparoscopic cholecystectomy based on cholec80

Hong, W.Y., Kao, C.L., Kuo, Y.H., Wang, J.R., Chang, W.L., Shih, C.S.: Cholec- seg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on cholec80. arXiv preprint arXiv:2012.12453 (2020)

work page arXiv 2012

[14] [14]

ACM Transactions on Graphics (TOC)37(4), 1–14 (2018)

Hu, Y., Fang, Y., Ge, Z., Qu, Z., Zhu, Y., Pradhana, A., Jiang, C.: A moving least squares material point method with displacement discontinuity and two-way rigid body coupling. ACM Transactions on Graphics (TOC)37(4), 1–14 (2018)

work page 2018

[15] [15]

In: Medical Image Computing and Computer Assisted Inter- vention (MICCAI)

Huang,Y.,Bai,L.,Cui,B.,Yuan,K.,Wang,G.,Hoque,M.I.,Padoy,N.,Navab,N., Ren, H.: Surgtpgs: Semantic 3d surgical scene understanding with text promptable gaussian splatting. In: Medical Image Computing and Computer Assisted Inter- vention (MICCAI). pp. 584–594. Springer (2026)

work page 2026

[16] [16]

In: Medical Image Computing and Computer-Assisted Intervention (MICCAI)

Huang, Y., Cui, B., Bai, L., Guo, Z., Xu, M., Islam, M., Ren, H.: Endo-4dgs: Endoscopic monocular scene reconstruction with 4d gaussian splatting. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI). pp. 197–207. Springer (2024)

work page 2024

[17] [17]

ACM Trans

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)

work page 2023

[18] [18]

Physics3D: Learning physical properties of 3D gaussians via video diffusion.arXiv preprint arXiv:2406.04338, 2024

Liu,F.,Wang,H.,Yao,S.,Zhang,S.,Zhou,J.,Duan,Y.:Physics3d:Learningphys- ical properties of 3d gaussians via video diffusion. arXiv preprint arXiv:2406.04338 (2024)

work page arXiv 2024

[19] [19]

arXiv preprint arXiv:2408.07931 (2024)

Liu, H., Zhang, E., Wu, J., Hong, M., Jin, Y.: Surgical sam 2: Real-time segment anything in surgical video by efficient frame pruning. arXiv preprint arXiv:2408.07931 (2024)

work page arXiv 2024

[20] [20]

In: European Conference on Computer Vision (ECCV)

Liu, S., Ren, Z., Gupta, S., Wang, S.: Physgen: Rigid-body physics-grounded image-to-video generation. In: European Conference on Computer Vision (ECCV). pp. 360–378. Springer (2024)

work page 2024

[21] [21]

In: Proceedings of the Computer Vision and Pattern Recognition Con- ference (CVPR)

Liu, Z., Ye, W., Luximon, Y., Wan, P., Zhang, D.: Unleashing the potential of multi-modal foundation models and video diffusion for 4d dynamic physical scene simulation. In: Proceedings of the Computer Vision and Pattern Recognition Con- ference (CVPR). pp. 11016–11025 (2025)

work page 2025

[22] [22]

In: NVIDIA GPU Technology Conference (GTC)

Macklin, M.: Warp: A high-performance python framework for gpu simulation and graphics. In: NVIDIA GPU Technology Conference (GTC). vol. 3 (2022)

work page 2022

[23] [23]

Commu- nications of the ACM65(1), 99–106 (2021)

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commu- nications of the ACM65(1), 99–106 (2021)

work page 2021

[24] [24]

Journal of Robotic Surgery20(1), 186 (2026)

Raptis,S.P.,Theocharopoulos,A.,Theocharopoulos,C.,Papadakos,S.P.,Levantis, G., Kontis, E., Vrahatis, A.G.: Artificial intelligence analysis of minimally invasive surgery data. Journal of Robotic Surgery20(1), 186 (2026)

work page 2026

[25] [25]

ACM Transactions on Graphics (TOG)32(4), 1–10 (2013)

Stomakhin, A., Schroeder, C., Chai, L., Teran, J., Selle, A.: A material point method for snow simulation. ACM Transactions on Graphics (TOG)32(4), 1–10 (2013)

work page 2013

[26] [26]

In: European conference on computer vision

Teed, Z., Deng, J.: Raft: Recurrent all-pairs field transforms for optical flow. In: European conference on computer vision. pp. 402–419. Springer (2020) Title Suppressed Due to Excessive Length 11

work page 2020

[27] [27]

$\pi^3$: Permutation-Equivariant Visual Geometry Learning

Wang, Y., Zhou, J., Zhu, H., Chang, W., Zhou, Y., Li, Z., Chen, J., Pang, J., Shen, C., He, T.:π 3: Permutation-equivariant visual geometry learning. arXiv preprint arXiv:2507.13347 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

In: Medical Image Computing and Computer-Assisted Intervention (MICCAI)

Wang, Y., Long, Y., Fan, S.H., Dou, Q.: Neural rendering for stereo 3d reconstruc- tion of deformable tissues in robotic surgery. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI). pp. 431–441. Springer (2022)

work page 2022

[29] [29]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W., Liu, W., Tian, Q., Wang, X.: 4d gaussian splatting for real-time dynamic scene rendering. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20310– 20320 (2024)

work page 2024

[30] [30]

In: Proceedings of the Computer Vision and Pattern Recognition (CVPR)

Xie, T., Zong, Z., Qiu, Y., Li, X., Feng, Y., Yang, Y., Jiang, C.: Physgaussian: Physics-integrated 3d gaussians for generative dynamics. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR). pp. 4389–4398 (2024)

work page 2024

[31] [31]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yang, Z., Gao, X., Zhou, W., Jiao, S., Zhang, Y., Jin, X.: Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20331– 20341 (2024)

work page 2024

[32] [32]

In: International conference on medical image computing and computer-assisted intervention

Zha, R., Cheng, X., Li, H., Harandi, M., Ge, Z.: Endosurf: Neural surface re- construction of deformable tissues with stereo endoscope videos. In: International conference on medical image computing and computer-assisted intervention. pp. 13–23. Springer (2023)

work page 2023