Flux4D: Flow-based Unsupervised 4D Reconstruction
Pith reviewed 2026-05-17 01:55 UTC · model grok-4.3
The pith
Flux4D reconstructs large-scale dynamic scenes unsupervised by predicting 3D Gaussians and their motion dynamics from raw data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Flux4D directly predicts 3D Gaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an 'as static as possible' regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes.
What carries the argument
Direct prediction of 3D Gaussians together with their motion dynamics, regularized to remain as static as possible, which decomposes moving elements across multiple raw scenes.
If this is right
- Dynamic scenes can be reconstructed efficiently within seconds.
- The method scales to large collections of driving data without per-scene tuning.
- It generalizes to unseen environments and to rare or unknown objects.
- It outperforms existing methods on scalability, generalization, and reconstruction quality for outdoor driving datasets.
Where Pith is reading between the lines
- The same regularization principle could be tested on indoor sequences to check whether static assumptions hold when objects interact closely.
- Combining the predicted Gaussians with existing SLAM pipelines might improve real-time dynamic mapping in vehicles.
- If the multi-scene training proves robust, similar direct-prediction approaches could be applied to other 3D representations beyond Gaussians.
Load-bearing premise
An 'as static as possible' regularization term combined with photometric losses is sufficient to correctly decompose dynamic elements from raw multi-scene data without any pre-trained supervised models or foundational priors.
What would settle it
Reconstruction quality on a held-out driving sequence containing previously unseen object motions, such as an unusual pedestrian path, where Gaussians either fail to track the motion or produce visible artifacts in novel views.
Figures
read the original abstract
Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision, with critical implications for robotics and autonomous systems. While recent differentiable rendering methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have achieved impressive photorealistic reconstruction, they suffer from scalability limitations and require annotations to decouple actor motion. Existing self-supervised methods attempt to eliminate explicit annotations by leveraging motion cues and geometric priors, yet they remain constrained by per-scene optimization and sensitivity to hyperparameter tuning. In this paper, we introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes. Flux4D directly predicts 3D Gaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an "as static as possible" regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes. Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments, including rare and unknown objects. Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents Flux4D, a scalable unsupervised framework for 4D reconstruction of large-scale dynamic scenes. It directly predicts 3D Gaussians together with per-Gaussian motion dynamics, trained end-to-end across many scenes using only photometric reconstruction losses plus an 'as static as possible' regularization term. The method claims to decompose static and dynamic elements without pre-trained models, annotations, or per-scene optimization, enabling second-scale inference, strong generalization to unseen objects, and superior performance on outdoor driving datasets relative to prior self-supervised approaches.
Significance. If the central unsupervised decomposition claim holds with rigorous verification, the work would be significant for enabling annotation-free 4D reconstruction at scale, with direct relevance to robotics and autonomous driving. The multi-scene training strategy and avoidance of foundational priors are notable strengths that could improve generalization over per-scene methods.
major comments (3)
- [Abstract and §4] Abstract and §4 (Experiments): the claim of 'significantly outperforms existing methods' is unsupported by any reported quantitative metrics, error bars, ablation tables, or details on how the static-regularization weight was selected or validated across datasets. Without these, the central experimental claims cannot be assessed.
- [§3.2] §3.2 (Regularization): the 'as static as possible' term is load-bearing for the unsupervised motion decomposition claim, yet it is unclear whether its weight is a fixed hyperparameter or effectively tuned per dataset. If the latter, the decomposition may be circular rather than emergent from photometric losses alone.
- [§3] §3 (Method): the paper does not report any diagnostic that the learned per-Gaussian trajectories match independent motion cues (e.g., optical flow or LiDAR) rather than absorbing dynamics into static Gaussian attributes (position, opacity, or SH coefficients). This leaves the under-constrained decomposition unverified.
minor comments (2)
- [§3.2] Clarify the exact mathematical form of the regularization term and its weighting schedule in the loss.
- [§5] Add a limitations paragraph discussing failure modes on highly dynamic or occluded scenes.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our manuscript. We address each of the major comments below and outline the corresponding revisions.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim of 'significantly outperforms existing methods' is unsupported by any reported quantitative metrics, error bars, ablation tables, or details on how the static-regularization weight was selected or validated across datasets. Without these, the central experimental claims cannot be assessed.
Authors: We agree that quantitative support is essential for the performance claims. In the revised version, we will add detailed quantitative comparisons, including tables with PSNR, SSIM, and other metrics, along with error bars from multiple runs. We will also include ablation studies on the regularization weight selection process, validated across datasets using a held-out validation set. revision: yes
-
Referee: [§3.2] §3.2 (Regularization): the 'as static as possible' term is load-bearing for the unsupervised motion decomposition claim, yet it is unclear whether its weight is a fixed hyperparameter or effectively tuned per dataset. If the latter, the decomposition may be circular rather than emergent from photometric losses alone.
Authors: The regularization weight is a fixed hyperparameter used consistently across all experiments and datasets. We will revise §3.2 to explicitly state this and provide the specific value along with justification based on preliminary experiments on a small set of scenes to ensure the decomposition emerges primarily from the photometric losses and multi-scene training. revision: yes
-
Referee: [§3] §3 (Method): the paper does not report any diagnostic that the learned per-Gaussian trajectories match independent motion cues (e.g., optical flow or LiDAR) rather than absorbing dynamics into static Gaussian attributes (position, opacity, or SH coefficients). This leaves the under-constrained decomposition unverified.
Authors: We acknowledge the value of such diagnostics for verifying the motion decomposition. In the revised manuscript, we will add qualitative and quantitative comparisons of the predicted trajectories against optical flow and LiDAR-derived motion where available, to demonstrate that dynamics are captured in the per-Gaussian motion parameters rather than static attributes. revision: yes
Circularity Check
No significant circularity: derivation self-contained via proposed regularization and multi-scene training
full rationale
The abstract and provided text present Flux4D as a new framework that directly predicts 3D Gaussians and motion dynamics using only photometric losses plus an 'as static as possible' regularization term, trained across many scenes to decompose dynamics without pre-trained models. No equations, self-citations, or fitted parameters are shown that reduce any prediction or uniqueness claim to the inputs by construction. The central inductive bias is introduced as an external regularization choice rather than derived from or equivalent to the target outputs, leaving the derivation independent and self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- static regularization weight
axioms (1)
- domain assumption Photometric loss between rendered and observed images is a sufficient signal for 3D structure and motion.
Reference graph
Works this paper leans on
-
[1]
Uno: Unsuper- vised occupancy fields for perception and forecasting
Ben Agro, Quinlan Sykora, Sergio Casas, Thomas Gilles, and Raquel Urtasun. Uno: Unsuper- vised occupancy fields for perception and forecasting. InCVPR, 2024. 3
work page 2024
-
[2]
pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InCVPR, 2024. 2, 8
work page 2024
-
[3]
Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo
Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Fanbo Xiang, Jingyi Yu, and Hao Su. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. InICCV,
-
[4]
Rethinking Atrous Convolution for Semantic Image Segmentation
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation.arXiv preprint arXiv:1706.05587, 2017. 3, 7
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. InECCV, 2024. 2, 8
work page 2024
-
[6]
SaLF: Sparse Local Fields for Multi-Sensor Rendering in Real-Time
Yun Chen, Matthew Haines, Jingkang Wang, Krzysztof Baron-Lis, Sivabalan Manivasagam, Ze Yang, and Raquel Urtasun. Salf: Sparse local fields for multi-sensor rendering in real-time. arXiv preprint arXiv:2507.18713, 2025. 1
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
G3R: Gradient guided generalizable reconstruction
Yun Chen, Jingkang Wang, Ze Yang, Sivabalan Manivasagam, and Raquel Urtasun. G3R: Gradient guided generalizable reconstruction. InECCV, 2025. 2, 3, 5, 7, 8
work page 2025
-
[8]
Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering
Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, and Li Zhang. Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering.arXiv preprint arXiv:2311.18561,
-
[9]
Vision transformer adapter for dense predictions
Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter for dense predictions. InICLR, 2023. 3
work page 2023
-
[10]
Omnire: Omni urban scene reconstruction.arXiv preprint arXiv:2408.16760, 2024
Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, et al. Omnire: Omni urban scene reconstruction.arXiv preprint arXiv:2408.16760, 2024. 2
-
[11]
Re-evaluating lidar scene flow for autonomous driving
Nathaniel Chodosh, Deva Ramanan, and Simon Lucey. Re-evaluating lidar scene flow for autonomous driving. InWACV, 2024. 7
work page 2024
-
[12]
Vista: A generalizable driving world model with high fidelity and versatile controllability
Shenyuan Gao, Jiazhi Yang, Li Chen, Kashyap Chitta, Yihang Qiu, Andreas Geiger, Jun Zhang, and Hongyang Li. Vista: A generalizable driving world model with high fidelity and versatile controllability.arXiv preprint arXiv:2405.17398, 2024. 3
-
[13]
Splatad: Real-time li- dar and camera rendering with 3d gaussian splatting for au- tonomous driving
Georg Hess, Carl Lindström, Maryam Fatemi, Christoffer Petersson, and Lennart Svensson. Splatad: Real-time lidar and camera rendering with 3d gaussian splatting for autonomous driving.arXiv preprint arXiv:2411.16816, 2024. 2
-
[14]
LRM: Large reconstruction model for single image to 3d
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. LRM: Large reconstruction model for single image to 3d. InICLR, 2024. 2
work page 2024
-
[15]
GAIA-1: A Generative World Model for Autonomous Driving
Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. Gaia-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080, 2023. 3 11
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[16]
Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation. InTPAMI, 2024. 9, 10
work page 2024
-
[17]
Nan Huang, Xiaobao Wei, Wenzhao Zheng, Pengju An, Ming Lu, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, and Shanghang Zhang. S3gaussian: Self-supervised street gaussians for autonomous driving.arXiv preprint arXiv:2405.20323, 2024. 2, 5
-
[18]
3D gaussian splatting for real-time radiance field rendering
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3D gaussian splatting for real-time radiance field rendering. InTOG, 2023. 1, 2, 4
work page 2023
-
[19]
Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction
Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tongtong Cao, Dongfeng Bai, Yuan Ren, and Bingbing Liu. Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction.arXiv preprint arXiv:2407.02598, 2024. 1
-
[20]
I can’t believe it’s not scene flow! InECCV, 2024
Ishan Khatri, Kyle Vedder, Neehar Peri, Deva Ramanan, and James Hays. I can’t believe it’s not scene flow! InECCV, 2024. 7, 8
work page 2024
-
[21]
Point cloud forecasting as a proxy for 4d occupancy forecasting
Tarasha Khurana, Peiyun Hu, David Held, and Deva Ramanan. Point cloud forecasting as a proxy for 4d occupancy forecasting. InCVPR, 2023. 3
work page 2023
-
[22]
Flow4d: Leveraging 4d voxel network for lidar scene flow estimation
Jaeyeul Kim, Jungwan Woo, Ukcheol Shin, Jean Oh, and Sunghoon Im. Flow4d: Leveraging 4d voxel network for lidar scene flow estimation. InRA-L, 2025. 8
work page 2025
-
[23]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. In ICCV, 2023. 3, 7
work page 2023
-
[24]
Freegave: 3d physics learning from dynamic videos by gaussian velocity
Jinxi Li, Ziyang Song, Siyuan Zhou, and Bo Yang. Freegave: 3d physics learning from dynamic videos by gaussian velocity. InCVPR, 2025. 4, 5
work page 2025
-
[25]
Uniflow: Towards zero-shot lidar scene flow for autonomous vehicles via cross-domain generalization
Siyi Li, Qingwen Zhang, Ishan Khatri, Kyle Vedder, Deva Ramanan, and Neehar Peri. Uniflow: Towards zero-shot lidar scene flow for autonomous vehicles via cross-domain generalization. arXiv preprint arXiv:2511.18254, 2025. 8
-
[26]
Xiaofan Li, Yifu Zhang, and Xiaoqing Ye. Drivingdiffusion: Layout-guided multi-view driving scenarios video generation with latent diffusion model. InECCV, 2024. 3
work page 2024
-
[27]
Xueqian Li, Jhony Kaesemodel Pontes, and Simon Lucey. Neural scene flow prior. InNeurIPS,
-
[28]
Xueqian Li, Jianqiao Zheng, Francesco Ferroni, Jhony Kaesemodel Pontes, and Simon Lucey. Fast neural scene flow. InCVPR, 2023. 7, 9
work page 2023
-
[29]
Real-time neural rasterization for large scenes
Jeffrey Yunfan Liu, Yun Chen, Ze Yang, Jingkang Wang, Sivabalan Manivasagam, and Raquel Urtasun. Real-time neural rasterization for large scenes. InICCV, 2023. 2
work page 2023
-
[30]
Hao Lu, Tianshuo Xu, Wenzhao Zheng, Yunpeng Zhang, Wei Zhan, Dalong Du, Masayoshi Tomizuka, Kurt Keutzer, and Yingcong Chen. Drivingrecon: Large 4d gaussian reconstruction model for autonomous driving.arXiv preprint arXiv:2412.09043, 2024. 2, 3, 4, 6, 7, 8, 9
-
[31]
Towards zero domain gap: A comprehensive study of realistic LiDAR simulation for autonomy testing
Sivabalan Manivasagam, Ioan Andrei Bârsan, Jingkang Wang, Ze Yang, and Raquel Urtasun. Towards zero domain gap: A comprehensive study of realistic LiDAR simulation for autonomy testing. InICCV, 2023. 1
work page 2023
-
[32]
Nerf: Representing scenes as neural radiance fields for view synthesis
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InECCV,
-
[33]
Driveworld: 4d pre-trained scene understanding via world models for autonomous driving
Chen Min, Dawei Zhao, Liang Xiao, Jian Zhao, Xinli Xu, Zheng Zhu, Lei Jin, Jianshu Li, Yulan Guo, Junliang Xing, et al. Driveworld: 4d pre-trained scene understanding via world models for autonomous driving. InCVPR, 2024. 3 12
work page 2024
-
[34]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023. 7
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[35]
Julian Ost, Fahim Mannan, Nils Thuerey, Julian Knodt, and Felix Heide. Neural scene graphs. InCVPR, 2021. 2
work page 2021
-
[36]
Nerfies: Deformable neural radiance fields
Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Goldman, Steven M Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. InICCV, 2021. 2
work page 2021
-
[37]
Chensheng Peng, Chengwei Zhang, Yixiao Wang, Chenfeng Xu, Yichen Xie, Wenzhao Zheng, Kurt Keutzer, Masayoshi Tomizuka, and Wei Zhan. Desire-gs: 4d street gaussians for static- dynamic decomposition and surface reconstruction for urban driving scenes.arXiv preprint arXiv:2411.11921, 2024. 2, 4, 5, 6, 7, 8
-
[38]
D-nerf: Neural radiance fields for dynamic scenes
Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes. InCVPR, 2021. 2
work page 2021
-
[39]
Neural lighting simulation for urban scenes
Ava Pun, Gary Sun, Jingkang Wang, Yun Chen, Ze Yang, Sivabalan Manivasagam, Wei-Chiu Ma, and Raquel Urtasun. Neural lighting simulation for urban scenes. InNeurIPS, 2023. 2
work page 2023
-
[40]
L4gm: Large 4d gaussian reconstruction model
Jiawei Ren, Cheng Xie, Ashkan Mirzaei, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling, et al. L4gm: Large 4d gaussian reconstruction model. In NeurIPS, 2025. 2, 5, 7, 8
work page 2025
-
[41]
Scube: Instant large-scale scene reconstruction using voxsplats
Xuanchi Ren, Yifan Lu, Hanxue Liang, Jay Zhangjie Wu, Huan Ling, Mike Chen, Francis Fidler, Sanja annd Williams, and Jiahui Huang. Scube: Instant large-scale scene reconstruction using voxsplats. InNeurIPS, 2024. 2
work page 2024
-
[42]
Scalability in perception for autonomous driving: Waymo open dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi, Yu Zhang, Jonathon Shlens, Zhifeng Chen, and Dragomir Anguelov. Scalability in perception...
work page 2020
-
[43]
Torchsparse++: Efficient training and inference framework for sparse convolution on gpus
Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, and Song Han. Torchsparse++: Efficient training and inference framework for sparse convolution on gpus. InMICRO, 2023. 7
work page 2023
-
[44]
Lgm: Large multi-view gaussian model for high-resolution 3d content creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaussian model for high-resolution 3d content creation. InECCV, 2024. 8
work page 2024
-
[45]
NeuRAD: Neural rendering for autonomous driving
Adam Tonderski, Carl Lindström, Georg Hess, William Ljungbergh, Lennart Svensson, and Christoffer Petersson. NeuRAD: Neural rendering for autonomous driving. InCVPR, 2024. 1, 5, 7, 8
work page 2024
-
[46]
Haithem Turki, Qi Wu, Xin Kang, Janick Martinez Esturo, Shengyu Huang, Ruilong Li, Zan Gojcic, and Riccardo de Lutio. Simuli: Real-time lidar and camera simulation with unscented transforms.arXiv preprint arXiv:2510.12901, 2025. 1
-
[47]
Suds: Scalable urban dynamic scenes
Haithem Turki, Jason Y Zhang, Francesco Ferroni, and Deva Ramanan. Suds: Scalable urban dynamic scenes. InCVPR, 2023. 2
work page 2023
-
[48]
Neural eulerian scene flow fields
Kyle Vedder, Neehar Peri, Ishan Khatri, Siyi Li, Eric Eaton, Mehmet Kemal Kocamaz, Yue Wang, Zhiding Yu, Deva Ramanan, and Joachim Pehserl. Neural eulerian scene flow fields. In ICLR, 2025. 7
work page 2025
-
[49]
CADSim: Robust and scalable in-the-wild 3d reconstruction for controllable sensor simulation
Jingkang Wang, Sivabalan Manivasagam, Yun Chen, Ze Yang, Ioan Andrei Bârsan, Anqi Joyce Yang, Wei-Chiu Ma, and Raquel Urtasun. CADSim: Robust and scalable in-the-wild 3d reconstruction for controllable sensor simulation. InCoRL, 2022. 2
work page 2022
-
[50]
Advsim: Generating safety-critical scenarios for self-driving vehicles
Jingkang Wang, Ava Pun, James Tu, Sivabalan Manivasagam, Abbas Sadat, Sergio Casas, Mengye Ren, and Raquel Urtasun. Advsim: Generating safety-critical scenarios for self-driving vehicles. InCVPR, 2021. 1 13
work page 2021
-
[51]
Ibrnet: Learning multi-view image-based rendering
Qianqian Wang, Zhicheng Wang, Kyle Genova, Pratul P Srinivasan, Howard Zhou, Jonathan T Barron, Ricardo Martin-Brualla, Noah Snavely, and Thomas Funkhouser. Ibrnet: Learning multi-view image-based rendering. InCVPR, 2021. 2
work page 2021
-
[52]
Drive- dreamer: Towards real-world-drive world models for autonomous driving
Xiaofeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiagang Zhu, and Jiwen Lu. Drive- dreamer: Towards real-world-drive world models for autonomous driving. InECCV, 2024. 3
work page 2024
-
[53]
Meshlrm: Large reconstruction model for high-quality meshes.arXiv preprint arXiv:2404.12385, 2024
Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, and Zexiang Xu. Meshlrm: Large reconstruction model for high-quality meshes.arXiv preprint arXiv:2404.12385, 2024. 2
-
[54]
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
Benjamin Wilson, William Qi, Tanmay Agarwal, John Lambert, Jagjeet Singh, Siddhesh Khandelwal, Bowen Pan, Ratnesh Kumar, Andrew Hartnett, Jhony Kaesemodel Pontes, et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting.arXiv preprint arXiv:2301.00493, 2023. 9
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[55]
4d gaussian splatting for real-time dynamic scene rendering
Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. In CVPR, 2024. 2
work page 2024
-
[56]
Dˆ 2nerf: Self-supervised decoupling of dynamic and static objects from a monocular video
Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole, and Cengiz Oztireli. Dˆ 2nerf: Self-supervised decoupling of dynamic and static objects from a monocular video. In NeurIPS, 2022. 2, 4, 5
work page 2022
-
[57]
Pandaset: Advanced sensor suite dataset for autonomous driving
Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. InITSC, 2021. 2, 6
work page 2021
-
[58]
Depthsplat: Connecting gaussian splatting and depth.arXiv preprint arXiv:2410.13862, 2024
Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, and Marc Pollefeys. Depthsplat: Connecting gaussian splatting and depth.arXiv preprint arXiv:2410.13862, 2024. 5, 7
-
[59]
Street gaussians for modeling dynamic urban scenes
Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians for modeling dynamic urban scenes. In ECCV, 2024. 1, 2, 5, 7, 8
work page 2024
-
[60]
Jiawei Yang, Jiahui Huang, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You, Maximilian Igl, Apoorva Sharma, Peter Karkus, Danfei Xu, Boris Ivanovic, Yue Wang, and Marco Pavone. Storm: Spatio-temporal reconstruction model for large-scale outdoor scenes.arXiv preprint arXiv:2501.00602, 2025. 2, 3, 4, 5, 6, 7, 8, 9
-
[61]
Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, and Yue Wang. Emernerf: Emergent spatial-temporal scene decomposition via self-supervision.arXiv preprint arXiv:2311.02077, 2023. 2, 4, 5, 6, 7, 8
-
[62]
Unisim: A neural closed-loop sensor simulator
Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Urtasun. Unisim: A neural closed-loop sensor simulator. InCVPR, 2023. 1, 2
work page 2023
-
[63]
Reconstructing objects in-the-wild for realistic sensor simulation
Ze Yang, Sivabalan Manivasagam, Yun Chen, Jingkang Wang, Rui Hu, and Raquel Urtasun. Reconstructing objects in-the-wild for realistic sensor simulation. InICRA, 2023. 2
work page 2023
-
[64]
Genassets: Generating in-the-wild 3d assets in latent space
Ze Yang, Jingkang Wang, Haowei Zhang, Sivabalan Manivasagam, Yun Chen, and Raquel Urtasun. Genassets: Generating in-the-wild 3d assets in latent space. InCVPR, 2025. 2
work page 2025
-
[65]
Visual point cloud forecasting enables scalable autonomous driving
Zetong Yang, Li Chen, Yanan Sun, and Hongyang Li. Visual point cloud forecasting enables scalable autonomous driving. InCVPR, 2024. 3
work page 2024
-
[66]
Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction
Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. InCVPR, 2024. 2
work page 2024
-
[67]
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue, Anurag Das, Francis Engelmann, Siyu Tang, and Jan Eric Lenssen. Improving 2D Feature Representations by 3D-Aware Fine-Tuning. InECCV, 2024. 6 14
work page 2024
-
[68]
Haiming Zhang, Wending Zhou, Yiyao Zhu, Xu Yan, Jiantao Gao, Dongfeng Bai, Yingjie Cai, Bingbing Liu, Shuguang Cui, and Zhen Li. Visionpad: A vision-centric pre-training paradigm for autonomous driving.arXiv preprint arXiv:2411.14716, 2024. 4
-
[69]
GS-LRM: Large reconstruction model for 3D gaussian splatting
Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. GS-LRM: Large reconstruction model for 3D gaussian splatting. InECCV, 2025. 2
work page 2025
-
[70]
Learning unsupervised world models for autonomous driving via discrete diffusion
Lunjun Zhang, Yuwen Xiong, Ze Yang, Sergio Casas, Rui Hu, and Raquel Urtasun. Learning unsupervised world models for autonomous driving via discrete diffusion. InICLR, 2024. 3
work page 2024
-
[71]
Occworld: Learning a 3d occupancy world model for autonomous driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, and Jiwen Lu. Occworld: Learning a 3d occupancy world model for autonomous driving. InECCV, 2024. 3
work page 2024
-
[72]
DrivingGaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes
Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. DrivingGaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes. InCVPR, 2024. 1
work page 2024
-
[73]
Long-lrm: Long-sequence large reconstruction model for wide-coverage gaussian splats
Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, and Zexiang Xu. Long-lrm: Long-sequence large reconstruction model for wide-coverage gaussian splats. arXiv preprint arXiv:2410.12781, 2024. 3 15
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.