DP-DeGauss: Dynamic Probabilistic Gaussian Decomposition for Egocentric 4D Scene Reconstruction

Houqiang Zhong; Li Song; Rong Xie; Su Wang; Tingxi Chen; Zhengxue Cheng

arxiv: 2604.07986 · v1 · submitted 2026-04-09 · 💻 cs.CV

DP-DeGauss: Dynamic Probabilistic Gaussian Decomposition for Egocentric 4D Scene Reconstruction

Tingxi Chen , Zhengxue Cheng , Houqiang Zhong , Su Wang , Rong Xie , Li Song This is my paper

Pith reviewed 2026-05-10 17:17 UTC · model grok-4.3

classification 💻 cs.CV

keywords egocentric video4D scene reconstruction3D Gaussian splattingdynamic decompositionhand-object interactionscene disentanglementfirst-person visionprobabilistic routing

0 comments

The pith

DP-DeGauss assigns learnable probabilities to 3D Gaussians to route them into separate background, hand, and object models for egocentric 4D reconstruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that reconstructs moving first-person scenes by starting from a single set of 3D Gaussian points and giving each point a probability value that decides its category. These probabilities feed into dedicated deformation paths that model background, hands, or objects independently, supported by masks that refine the assignments and controls that adjust brightness and motion flow. The result is higher-fidelity rendering plus explicit separation of the three components. A sympathetic reader would value this because egocentric videos contain intertwined motion and interactions that standard reconstruction pipelines collapse together, limiting downstream uses in AR, VR, and robotics.

Core claim

DP-DeGauss initializes a unified 3D Gaussian set from COLMAP priors, augments each Gaussian with a learnable category probability, and dynamically routes the Gaussians into specialized deformation branches for background, hands, or objects; category-specific masks together with brightness and motion-flow controls then produce disentangled 4D reconstructions from egocentric video.

What carries the argument

The learnable category probability attached to each Gaussian, which performs dynamic routing into category-specific deformation branches while category-specific masks and brightness/motion-flow controls refine the separation.

If this is right

The method records an average PSNR gain of 1.70 dB over baselines together with corresponding SSIM and LPIPS improvements.
It produces the first explicit separation of background, hand, and object components in egocentric scenes.
The separated components support direct scene editing and component-wise understanding.
Brightness and motion-flow controls improve both static image quality and dynamic motion accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same routing mechanism could be tested on third-person videos with multiple moving people to check whether category probabilities remain stable without retraining.
If the separation holds, downstream tasks such as hand tracking or object manipulation in AR could operate on the isolated hand or object branch alone.
A natural next measurement would be to quantify how much the explicit masks reduce leakage between categories on held-out interaction sequences.

Load-bearing premise

Learnable category probabilities together with category-specific masks and brightness/motion-flow controls are enough to separate background, hands, and objects amid ego-motion, occlusions, and interactions without extra supervision or post-processing.

What would settle it

Egocentric video frames containing sustained hand-object overlaps in which the rendered background output still contains hand geometry or the hand output contains background texture.

read the original abstract

Egocentric video is crucial for next-generation 4D scene reconstruction, with applications in AR/VR and embodied AI. However, reconstructing dynamic first-person scenes is challenging due to complex ego-motion, occlusions, and hand-object interactions. Existing decomposition methods are ill-suited, assuming fixed viewpoints or merging dynamics into a single foreground. To address these limitations, we introduce DP-DeGauss, a dynamic probabilistic Gaussian decomposition framework for egocentric 4D reconstruction. Our method initializes a unified 3D Gaussian set from COLMAP priors, augments each with a learnable category probability, and dynamically routes them into specialized deformation branches for background, hands, or object modeling. We employ category-specific masks for better disentanglement and introduce brightness and motion-flow control to improve static rendering and dynamic reconstruction. Extensive experiments show that DP-DeGauss outperforms baselines by +1.70dB in PSNR on average with SSIM and LPIPS gains. More importantly, our framework achieves the first and state-of-the-art disentanglement of background, hand, and object components, enabling explicit, fine-grained separation, paving the way for more intuitive ego scene understanding and editing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DP-DeGauss adds learnable category probabilities and dynamic routing to Gaussian splatting for egocentric scenes, delivering a modest PSNR gain, but the headline disentanglement claim rests on thin evidence.

read the letter

The core contribution here is a probabilistic assignment of Gaussians to background, hand, or object categories, with routing to separate deformation branches plus some extra masks and brightness/motion controls. That setup is new relative to standard 3D Gaussian splatting pipelines and the decomposition methods cited in the abstract. The reported +1.7 dB PSNR improvement on rendering is concrete and worth noting for anyone already working in egocentric reconstruction.

Referee Report

3 major / 2 minor

Summary. The paper introduces DP-DeGauss, a dynamic probabilistic Gaussian decomposition framework for egocentric 4D scene reconstruction. It initializes a unified set of 3D Gaussians from COLMAP priors, augments each with a learnable category probability, and dynamically routes them into specialized deformation branches for background, hands, or objects. Category-specific masks are employed for disentanglement, along with brightness and motion-flow controls. Experiments report average gains of +1.70 dB PSNR (with SSIM/LPIPS improvements) over baselines and claim the first explicit SOTA disentanglement of the three components without additional supervision.

Significance. If the disentanglement claims are substantiated, the work would be significant for AR/VR and embodied AI applications by enabling fine-grained, editable 4D scene components in dynamic first-person videos. The probabilistic routing of Gaussians to category-specific branches addresses a gap in existing decomposition methods that assume fixed viewpoints or single foregrounds. The reported rendering improvements and novel controls represent a practical advance on 3D Gaussian splatting for ego-motion and interaction challenges.

major comments (3)

[Abstract and §5] Abstract and §5 (Experiments): The claim of achieving 'the first and state-of-the-art disentanglement of background, hand, and object components' is not supported by any category-specific quantitative metrics. Only aggregate rendering metrics (PSNR +1.70 dB, SSIM, LPIPS) are reported; no mask IoU, separation accuracy, per-component error analysis, or ablation on interaction robustness under occlusions/ego-motion is provided.
[§4] §4 (Method, Optimization): The framework reduces to a rendering loss on routed Gaussians with learnable category probabilities. No regularization, auxiliary loss, or analysis is described to prevent probability collapse or cross-category leakage, which directly undermines the sufficiency of the 'learnable category probability per Gaussian' plus masks/controls for robust disentanglement.
[§5] §5 (Experiments): Details on baseline implementations, dataset splits, number of scenes, and statistical significance of the +1.70 dB gain are missing. Without these, the outperformance claim and the 'SOTA disentanglement' assertion cannot be verified as load-bearing for the central contribution.

minor comments (2)

[§3] The definition of the dynamic routing mechanism and brightness/motion-flow controls would benefit from an explicit equation or pseudocode in §3 to clarify how they interact with the category probabilities.
[Figures] Figure captions and legends should explicitly label which components (background/hand/object) are visualized in the disentanglement results to aid reader interpretation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment below and will make revisions to improve clarity and substantiation of our claims.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (Experiments): The claim of achieving 'the first and state-of-the-art disentanglement of background, hand, and object components' is not supported by any category-specific quantitative metrics. Only aggregate rendering metrics (PSNR +1.70 dB, SSIM, LPIPS) are reported; no mask IoU, separation accuracy, per-component error analysis, or ablation on interaction robustness under occlusions/ego-motion is provided.

Authors: We acknowledge that the manuscript relies on aggregate metrics and qualitative results to support the disentanglement. In the revision, we will add category-specific quantitative evaluations including per-component PSNR, mask IoU scores for background/hand/object separation, and ablations testing robustness under occlusions and ego-motion to better substantiate the claims. revision: yes
Referee: [§4] §4 (Method, Optimization): The framework reduces to a rendering loss on routed Gaussians with learnable category probabilities. No regularization, auxiliary loss, or analysis is described to prevent probability collapse or cross-category leakage, which directly undermines the sufficiency of the 'learnable category probability per Gaussian' plus masks/controls for robust disentanglement.

Authors: The dynamic routing to specialized branches combined with category masks and motion/brightness controls provides implicit separation during optimization. However, we agree an explicit mechanism would strengthen robustness. We will add an entropy regularization term on the category probabilities in the revised §4, along with analysis demonstrating reduced collapse and leakage. revision: yes
Referee: [§5] §5 (Experiments): Details on baseline implementations, dataset splits, number of scenes, and statistical significance of the +1.70 dB gain are missing. Without these, the outperformance claim and the 'SOTA disentanglement' assertion cannot be verified as load-bearing for the central contribution.

Authors: We will expand §5 with full details on baseline adaptations, exact dataset splits and scene counts, and statistical significance (e.g., standard deviations across runs). This will enable verification of the reported gains and support for the central claims. revision: yes

Circularity Check

0 steps flagged

No circularity: architectural innovations and experimental claims are independent of inputs

full rationale

The paper proposes DP-DeGauss by initializing Gaussians from COLMAP, adding learnable category probabilities, dynamic routing to deformation branches, category-specific masks, and brightness/motion-flow controls. These are new parameters and architectural choices, not derived from or equivalent to prior fitted quantities. The +1.70 dB PSNR improvement and disentanglement claim are presented as outcomes of experiments on rendering loss, without any self-definitional reduction, fitted-input-as-prediction, or load-bearing self-citation chain. The derivation chain remains self-contained against external benchmarks like COLMAP and standard Gaussian splatting losses.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim depends on the effectiveness of the new probabilistic routing mechanism and auxiliary controls; these are introduced without external validation or parameter-free derivation in the abstract.

free parameters (1)

learnable category probability per Gaussian
Each Gaussian receives an additional learnable probability value used for routing into background/hand/object branches.

axioms (1)

domain assumption COLMAP structure-from-motion priors remain usable for initializing a unified Gaussian set even under ego-motion and dynamic interactions
The method begins by initializing from COLMAP despite the abstract noting that ego-motion and interactions make reconstruction challenging.

invented entities (2)

category-specific masks no independent evidence
purpose: Improve disentanglement between background, hands, and objects
Introduced as an additional mechanism for separation; no independent evidence provided.
brightness and motion-flow control no independent evidence
purpose: Improve static rendering and dynamic reconstruction quality
New control signals added to the pipeline; no independent evidence provided.

pith-pipeline@v0.9.0 · 5513 in / 1439 out tokens · 37046 ms · 2026-05-10T17:17:17.929576+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 1 internal anchor

[1]

DP-DeGauss: Dynamic Probabilistic Gaussian Decomposition for Egocentric 4D Scene Reconstruction

INTRODUCTION Egocentric video offers a unique window into human activities, capturing continuous interactions between hands, objects, and the surrounding environment. With the rise of large-scale egocentric datasets, researchers have begun exploring 4D reconstruction and interaction modeling from this perspective [1, 2, 3, 4, 5]. However, dynamic scene re...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[2]

METHODS Our method (Fig.2) is a dynamic probabilistic Gaussian decomposi- tion framework from soft to hard for egocentric 4D scene reconstruc- tion. Starting from the standard 3D Gaussian Splatting, we propose a unified Gaussian representation with learnable category probabil- ities for background, hand, and object, followed by category-level control stra...

work page
[3]

Experimental Settings Implementation DetailsOur PyTorch-based implementation runs on a single RTX 3090 GPU

EXPERIMENT 3.1. Experimental Settings Implementation DetailsOur PyTorch-based implementation runs on a single RTX 3090 GPU. Scene boundaries and Gaussians are initialized from COLMAP [19] point clouds, with [21] and [22] used for hand and object segmentation. Training comprises 10k soft itera- tions—starting with a 1k-iteration warm-up focusing only on pr...

work page arXiv
[4]

CONCLUSION We proposed DP-DeGauss, a dynamic probabilistic Gaussian de- composition framework from soft to hard for egocentric 4D re- construction with explicit background–hand–object separation. By combining unified initialization, learnable category probabilities, and category-level controls, our method produces high-quality, fine- grained reconstructio...

work page
[5]

Hoi4d: A 4d egocentric dataset for category-level human- object interaction,

Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, and Li Yi, “Hoi4d: A 4d egocentric dataset for category-level human- object interaction,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2022, pp. 21013–21022

work page 2022
[6]

Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100,

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, , An- tonino Furnari, Jian Ma, Evangelos Kazakos, Davide Molti- santi, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray, “Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100,”International Journal of Computer Vision (IJCV), vol. 130, pp. 33–55, 2022

work page 2022
[7]

Hot3d: Hand and object tracking in 3d from egocentric multi-view videos,

Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Shangchen Han, Fan Zhang, Linguang Zhang, Jade Fountain, Edward Miller, Selen Basol, et al., “Hot3d: Hand and object tracking in 3d from egocentric multi-view videos,” inProceedings of the Computer Vision and Pattern Recogni- tion Conference, 2025, pp. 7061–7071

work page 2025
[8]

Aria everyday activities dataset,

Zhaoyang Lv, Nicholas Charron, Pierre Moulon, Alexan- der Gamino, Cheng Peng, Chris Sweeney, Edward Miller, Huixuan Tang, Jeff Meissner, Jing Dong, Kiran Somasun- daram, Luis Pesqueira, Mark Schwesinger, Omkar Parkhi, Qiao Gu, Renzo De Nardi, Shangyi Cheng, Steve Saarinen, Vijay Baiyya, Yuyang Zou, Richard Newcombe, Jakob Julian Engel, Xiaqing Pan, and Ca...

work page 2024
[9]

Aria digital twin: A new benchmark dataset for egocentric 3d machine perception,

Xiaqing Pan, Nicholas Charron, Yongqian Yang, Scott Peters, Thomas Whelan, Chen Kong, Omkar Parkhi, Richard New- combe, and Yuheng (Carl) Ren, “Aria digital twin: A new benchmark dataset for egocentric 3d machine perception,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 20133–20143

work page 2023
[10]

Nerf: Representing scenes as neural radiance fields for view synthe- sis,

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng, “Nerf: Representing scenes as neural radiance fields for view synthe- sis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

work page 2021
[11]

3d gaussian splatting for real-time radiance field rendering.,

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis, “3d gaussian splatting for real-time radiance field rendering.,”ACM Trans. Graph., vol. 42, no. 4, pp. 139– 1, 2023

work page 2023
[12]

4d gaussian splatting for real-time dynamic scene rendering,

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang, “4d gaussian splatting for real-time dynamic scene rendering,” inProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, 2024, pp. 20310–20320

work page 2024
[13]

Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction,

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin, “Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction,” inProceed- ings of the IEEE/CVF conference on computer vision and pat- tern recognition, 2024, pp. 20331–20341

work page 2024
[14]

Sdd-4dgs: Static-dynamic aware decoupling in gaus- sian splatting for 4d scene reconstruction,

Dai Sun, Huhao Guan, Kun Zhang, Xike Xie, and S Kevin Zhou, “Sdd-4dgs: Static-dynamic aware decoupling in gaus- sian splatting for 4d scene reconstruction,”arXiv preprint arXiv:2503.09332, 2025

work page arXiv 2025
[15]

Swift4d: Adaptive divide-and-conquer gaussian splatting for compact and efficient reconstruction of dynamic scene,

Jiahao Wu, Rui Peng, Zhiyan Wang, Lu Xiao, Luyang Tang, Jinbo Yan, Kaiqiang Xiong, and Ronggang Wang, “Swift4d: Adaptive divide-and-conquer gaussian splatting for compact and efficient reconstruction of dynamic scene,”arXiv preprint arXiv:2503.12307, 2025

work page arXiv 2025
[16]

Egogaussian: Dynamic scene understanding from egocentric video with 3d gaussian splatting,

Daiwei Zhang, Gengyan Li, Jiajie Li, Micka ¨el Bressieux, Ot- mar Hilliges, Marc Pollefeys, Luc Van Gool, and Xi Wang, “Egogaussian: Dynamic scene understanding from egocentric video with 3d gaussian splatting,” in2025 International Con- ference on 3D Vision (3DV). IEEE, 2025, pp. 1091–1102

work page 2025
[17]

Degauss: Dynamic-static decomposition with gaus- sian splatting for distractor-free 3d reconstruction.arXiv preprint arXiv:2503.13176, 2025

Rui Wang, Quentin Lohmeyer, Mirko Meboldt, and Siyu Tang, “Degauss: Dynamic-static decomposition with gaussian splat- ting for distractor-free 3d reconstruction,”arXiv preprint arXiv:2503.13176, 2025

work page arXiv 2025
[18]

Diffusion-guided reconstruction of everyday hand- object interaction clips,

Yufei Ye, Poorvi Hebbar, Abhinav Gupta, and Shubham Tul- siani, “Diffusion-guided reconstruction of everyday hand- object interaction clips,” inProceedings of the IEEE/CVF in- ternational conference on computer vision, 2023, pp. 19717– 19728

work page 2023
[19]

Get a grip: Reconstructing hand-object stable grasps in egocentric videos.arXiv preprint arXiv:2312.15719, 2023

Zhifan Zhu and Dima Damen, “Get a grip: Reconstructing hand-object stable grasps in egocentric videos,”arXiv preprint arXiv:2312.15719, 2023

work page arXiv 2023
[20]

Cpf: Learning a contact potential field to model the hand-object interaction,

Lixin Yang, Xinyu Zhan, Kailin Li, Wenqiang Xu, Jiefeng Li, and Cewu Lu, “Cpf: Learning a contact potential field to model the hand-object interaction,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11097– 11106

work page 2021
[21]

Hold: Category-agnostic 3d reconstruction of interacting hands and objects from video,

Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Xu Chen, Muhammed Kocabas, Michael J Black, and Otmar Hilliges, “Hold: Category-agnostic 3d reconstruction of interacting hands and objects from video,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, 2024, pp. 494–504

work page 2024
[22]

Novel-view synthesis and pose estimation for hand-object interaction from sparse views,

Wentian Qu, Zhaopeng Cui, Yinda Zhang, Chenyu Meng, Cuixia Ma, Xiaoming Deng, and Hongan Wang, “Novel-view synthesis and pose estimation for hand-object interaction from sparse views,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 15100–15111

work page 2023
[23]

Structure- from-motion revisited,

Johannes L Schonberger and Jan-Michael Frahm, “Structure- from-motion revisited,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104– 4113

work page 2016
[24]

Motiongs: Exploring explicit motion guidance for deformable 3d gaussian splatting,

Ruijie Zhu, Yanzhe Liang, Hanzhi Chang, Jiacheng Deng, Ji- ahao Lu, Wenfei Yang, Tianzhu Zhang, and Yongdong Zhang, “Motiongs: Exploring explicit motion guidance for deformable 3d gaussian splatting,”Advances in Neural Information Pro- cessing Systems, vol. 37, pp. 101790–101817, 2024

work page 2024
[25]

Fine-grained egocentric hand-object segmentation: Dataset, model, and applications,

Lingzhi Zhang, Shenghao Zhou, Simon Stent, and Jianbo Shi, “Fine-grained egocentric hand-object segmentation: Dataset, model, and applications,” inEuropean Conference on Com- puter Vision. Springer, 2022, pp. 127–145

work page 2022
[26]

arXiv preprint arXiv:2304.11968 (2023)

Jinyu Yang, Mingqi Gao, Zhe Li, Shang Gao, Fangjing Wang, and Feng Zheng, “Track anything: Segment anything meets videos,”arXiv preprint arXiv:2304.11968, 2023

work page arXiv 2023
[27]

Neu- raldiff: Segmenting 3d objects that move in egocentric videos,

Vadim Tschernezki, Diane Larlus, and Andrea Vedaldi, “Neu- raldiff: Segmenting 3d objects that move in egocentric videos,” in2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 910–919

work page 2021

[1] [1]

DP-DeGauss: Dynamic Probabilistic Gaussian Decomposition for Egocentric 4D Scene Reconstruction

INTRODUCTION Egocentric video offers a unique window into human activities, capturing continuous interactions between hands, objects, and the surrounding environment. With the rise of large-scale egocentric datasets, researchers have begun exploring 4D reconstruction and interaction modeling from this perspective [1, 2, 3, 4, 5]. However, dynamic scene re...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[2] [2]

METHODS Our method (Fig.2) is a dynamic probabilistic Gaussian decomposi- tion framework from soft to hard for egocentric 4D scene reconstruc- tion. Starting from the standard 3D Gaussian Splatting, we propose a unified Gaussian representation with learnable category probabil- ities for background, hand, and object, followed by category-level control stra...

work page

[3] [3]

Experimental Settings Implementation DetailsOur PyTorch-based implementation runs on a single RTX 3090 GPU

EXPERIMENT 3.1. Experimental Settings Implementation DetailsOur PyTorch-based implementation runs on a single RTX 3090 GPU. Scene boundaries and Gaussians are initialized from COLMAP [19] point clouds, with [21] and [22] used for hand and object segmentation. Training comprises 10k soft itera- tions—starting with a 1k-iteration warm-up focusing only on pr...

work page arXiv

[4] [4]

CONCLUSION We proposed DP-DeGauss, a dynamic probabilistic Gaussian de- composition framework from soft to hard for egocentric 4D re- construction with explicit background–hand–object separation. By combining unified initialization, learnable category probabilities, and category-level controls, our method produces high-quality, fine- grained reconstructio...

work page

[5] [5]

Hoi4d: A 4d egocentric dataset for category-level human- object interaction,

Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, and Li Yi, “Hoi4d: A 4d egocentric dataset for category-level human- object interaction,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2022, pp. 21013–21022

work page 2022

[6] [6]

Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100,

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, , An- tonino Furnari, Jian Ma, Evangelos Kazakos, Davide Molti- santi, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray, “Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100,”International Journal of Computer Vision (IJCV), vol. 130, pp. 33–55, 2022

work page 2022

[7] [7]

Hot3d: Hand and object tracking in 3d from egocentric multi-view videos,

Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Shangchen Han, Fan Zhang, Linguang Zhang, Jade Fountain, Edward Miller, Selen Basol, et al., “Hot3d: Hand and object tracking in 3d from egocentric multi-view videos,” inProceedings of the Computer Vision and Pattern Recogni- tion Conference, 2025, pp. 7061–7071

work page 2025

[8] [8]

Aria everyday activities dataset,

Zhaoyang Lv, Nicholas Charron, Pierre Moulon, Alexan- der Gamino, Cheng Peng, Chris Sweeney, Edward Miller, Huixuan Tang, Jeff Meissner, Jing Dong, Kiran Somasun- daram, Luis Pesqueira, Mark Schwesinger, Omkar Parkhi, Qiao Gu, Renzo De Nardi, Shangyi Cheng, Steve Saarinen, Vijay Baiyya, Yuyang Zou, Richard Newcombe, Jakob Julian Engel, Xiaqing Pan, and Ca...

work page 2024

[9] [9]

Aria digital twin: A new benchmark dataset for egocentric 3d machine perception,

Xiaqing Pan, Nicholas Charron, Yongqian Yang, Scott Peters, Thomas Whelan, Chen Kong, Omkar Parkhi, Richard New- combe, and Yuheng (Carl) Ren, “Aria digital twin: A new benchmark dataset for egocentric 3d machine perception,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 20133–20143

work page 2023

[10] [10]

Nerf: Representing scenes as neural radiance fields for view synthe- sis,

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng, “Nerf: Representing scenes as neural radiance fields for view synthe- sis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

work page 2021

[11] [11]

3d gaussian splatting for real-time radiance field rendering.,

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis, “3d gaussian splatting for real-time radiance field rendering.,”ACM Trans. Graph., vol. 42, no. 4, pp. 139– 1, 2023

work page 2023

[12] [12]

4d gaussian splatting for real-time dynamic scene rendering,

Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang, “4d gaussian splatting for real-time dynamic scene rendering,” inProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, 2024, pp. 20310–20320

work page 2024

[13] [13]

Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction,

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin, “Deformable 3d gaussians for high- fidelity monocular dynamic scene reconstruction,” inProceed- ings of the IEEE/CVF conference on computer vision and pat- tern recognition, 2024, pp. 20331–20341

work page 2024

[14] [14]

Sdd-4dgs: Static-dynamic aware decoupling in gaus- sian splatting for 4d scene reconstruction,

Dai Sun, Huhao Guan, Kun Zhang, Xike Xie, and S Kevin Zhou, “Sdd-4dgs: Static-dynamic aware decoupling in gaus- sian splatting for 4d scene reconstruction,”arXiv preprint arXiv:2503.09332, 2025

work page arXiv 2025

[15] [15]

Swift4d: Adaptive divide-and-conquer gaussian splatting for compact and efficient reconstruction of dynamic scene,

Jiahao Wu, Rui Peng, Zhiyan Wang, Lu Xiao, Luyang Tang, Jinbo Yan, Kaiqiang Xiong, and Ronggang Wang, “Swift4d: Adaptive divide-and-conquer gaussian splatting for compact and efficient reconstruction of dynamic scene,”arXiv preprint arXiv:2503.12307, 2025

work page arXiv 2025

[16] [16]

Egogaussian: Dynamic scene understanding from egocentric video with 3d gaussian splatting,

Daiwei Zhang, Gengyan Li, Jiajie Li, Micka ¨el Bressieux, Ot- mar Hilliges, Marc Pollefeys, Luc Van Gool, and Xi Wang, “Egogaussian: Dynamic scene understanding from egocentric video with 3d gaussian splatting,” in2025 International Con- ference on 3D Vision (3DV). IEEE, 2025, pp. 1091–1102

work page 2025

[17] [17]

Degauss: Dynamic-static decomposition with gaus- sian splatting for distractor-free 3d reconstruction.arXiv preprint arXiv:2503.13176, 2025

Rui Wang, Quentin Lohmeyer, Mirko Meboldt, and Siyu Tang, “Degauss: Dynamic-static decomposition with gaussian splat- ting for distractor-free 3d reconstruction,”arXiv preprint arXiv:2503.13176, 2025

work page arXiv 2025

[18] [18]

Diffusion-guided reconstruction of everyday hand- object interaction clips,

Yufei Ye, Poorvi Hebbar, Abhinav Gupta, and Shubham Tul- siani, “Diffusion-guided reconstruction of everyday hand- object interaction clips,” inProceedings of the IEEE/CVF in- ternational conference on computer vision, 2023, pp. 19717– 19728

work page 2023

[19] [19]

Get a grip: Reconstructing hand-object stable grasps in egocentric videos.arXiv preprint arXiv:2312.15719, 2023

Zhifan Zhu and Dima Damen, “Get a grip: Reconstructing hand-object stable grasps in egocentric videos,”arXiv preprint arXiv:2312.15719, 2023

work page arXiv 2023

[20] [20]

Cpf: Learning a contact potential field to model the hand-object interaction,

Lixin Yang, Xinyu Zhan, Kailin Li, Wenqiang Xu, Jiefeng Li, and Cewu Lu, “Cpf: Learning a contact potential field to model the hand-object interaction,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11097– 11106

work page 2021

[21] [21]

Hold: Category-agnostic 3d reconstruction of interacting hands and objects from video,

Zicong Fan, Maria Parelli, Maria Eleni Kadoglou, Xu Chen, Muhammed Kocabas, Michael J Black, and Otmar Hilliges, “Hold: Category-agnostic 3d reconstruction of interacting hands and objects from video,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, 2024, pp. 494–504

work page 2024

[22] [22]

Novel-view synthesis and pose estimation for hand-object interaction from sparse views,

Wentian Qu, Zhaopeng Cui, Yinda Zhang, Chenyu Meng, Cuixia Ma, Xiaoming Deng, and Hongan Wang, “Novel-view synthesis and pose estimation for hand-object interaction from sparse views,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 15100–15111

work page 2023

[23] [23]

Structure- from-motion revisited,

Johannes L Schonberger and Jan-Michael Frahm, “Structure- from-motion revisited,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104– 4113

work page 2016

[24] [24]

Motiongs: Exploring explicit motion guidance for deformable 3d gaussian splatting,

Ruijie Zhu, Yanzhe Liang, Hanzhi Chang, Jiacheng Deng, Ji- ahao Lu, Wenfei Yang, Tianzhu Zhang, and Yongdong Zhang, “Motiongs: Exploring explicit motion guidance for deformable 3d gaussian splatting,”Advances in Neural Information Pro- cessing Systems, vol. 37, pp. 101790–101817, 2024

work page 2024

[25] [25]

Fine-grained egocentric hand-object segmentation: Dataset, model, and applications,

Lingzhi Zhang, Shenghao Zhou, Simon Stent, and Jianbo Shi, “Fine-grained egocentric hand-object segmentation: Dataset, model, and applications,” inEuropean Conference on Com- puter Vision. Springer, 2022, pp. 127–145

work page 2022

[26] [26]

arXiv preprint arXiv:2304.11968 (2023)

Jinyu Yang, Mingqi Gao, Zhe Li, Shang Gao, Fangjing Wang, and Feng Zheng, “Track anything: Segment anything meets videos,”arXiv preprint arXiv:2304.11968, 2023

work page arXiv 2023

[27] [27]

Neu- raldiff: Segmenting 3d objects that move in egocentric videos,

Vadim Tschernezki, Diane Larlus, and Andrea Vedaldi, “Neu- raldiff: Segmenting 3d objects that move in egocentric videos,” in2021 International Conference on 3D Vision (3DV). IEEE, 2021, pp. 910–919

work page 2021