pith. sign in

arxiv: 2605.29997 · v1 · pith:AEP723VOnew · submitted 2026-05-28 · 💻 cs.CV

FRUC: Feedforward Dynamic Scene Reconstruction from Uncalibrated Collaborative Driving Views

Pith reviewed 2026-06-29 08:16 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian splattingdynamic scene reconstructioncollaborative drivinguncalibrated multi-viewocclusion fieldresidual injectionfeedforward frameworkmulti-agent fusion
0
0 comments X

The pith

FRUC performs one-shot dynamic scene reconstruction from uncalibrated multi-vehicle views by deriving ego-centric occlusion priors for residual fusion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FRUC as a feed-forward 3D Gaussian splatting method for reconstructing dynamic scenes in collaborative driving from uncalibrated views of multiple vehicles. It reframes the problem as enhancing an ego vehicle's view with collaborative data without calibration or per-scene optimization. The approach models the system as an ego-centric multi-camera setup and uses spatio-temporal correlations to build an occlusion field as priors. These priors guide a residual denoising process with zero initialization to complete hidden areas safely. Evaluations on real datasets show it surpasses previous methods in quality and speed.

Core claim

FRUC is a feed-forward framework that builds an ego-centric causal occlusion field from uncalibrated cross-agent spatio-temporal correlations to obtain latent priors for occlusion evolution, then uses these to guide cross-agent integration as a deterministic residual denoising process through zero-initialized injection, enabling robust collaborative blind-spot completion while preserving the ego vehicle's geometry.

What carries the argument

Ego-centric causal occlusion field derived from agent-wise spatio-temporal correlations that provides latent priors for modeling occlusion evolution, which guides the zero-initialized residual injection for cross-agent fusion.

If this is right

  • Supports one-shot, calibration-free inference from a variable number of multi-vehicle views using a visual grounded geometric Transformer backbone.
  • Achieves non-destructive geometric supplementation for occluded regions in dynamic scenes.
  • Converts challenging cross-agent fusion into bounded residual learning for reliable blind-spot completion.
  • Delivers state-of-the-art rendering quality and efficiency on the V2XReal and UrbanIng-V2X datasets for dynamic collaborative driving environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • May allow autonomous vehicle fleets to share views for better perception without requiring synchronized calibration procedures.
  • Could apply the occlusion prior idea to other distributed camera networks facing misalignment issues.
  • Raises the prospect of testing the residual injection approach on synthetic data with controlled misalignment levels to isolate its contribution.

Load-bearing premise

The ego-centric causal occlusion field from uncalibrated cross-agent correlations supplies reliable latent priors that permit non-destructive blind-spot completion without harming the ego vehicle's accurately observed geometry.

What would settle it

Measuring if novel view synthesis quality on a test set of collaborative driving data drops when the occlusion field is removed or when cross-agent views have large uncalibrated errors compared to using only ego views.

read the original abstract

We present FRUC, a feed-forward 3D Gaussian splatting framework for dynamic scene reconstruction from uncalibrated collaborative driving views. Existing multi-agent reconstruction frameworks are often hindered by rigid prerequisites, demanding precise spatial calibration and slow per-scene optimization. In this paper, we rethink this task by conceptualizing a distributed multi-vehicle network as a spatio-temporally unstructured ego-centric multi-camera system, where the core challenge lies in enhancing ego-centric occluded geometry through collaboration without degrading the ego's accurately observed visible geometry, while preserving reconstruction efficiency. For efficient reconstruction, FRUC is built upon a visual grounded geometric Transformer backbone to enable one-shot, calibration-free inference from a flexible number of multi-vehicle views. To achieve non-destructive geometric supplementation under uncalibrated cross-agent misalignment, FRUC first introduces an ego-centric causal occlusion field that explicitly derives occlusion evolution as latent priors by modeling agent-wise spatio-temporal correlations. Guided by these occlusion priors, it further formulates cross-agent integration as a deterministic residual denoising process via zero-initialized injection, turning challenging cross-agent fusion into bounded residual learning for robust collaborative blind-spot completion. Through extensive evaluations on the real-world V2XReal and UrbanIng-V2X datasets, FRUC is shown to be a new state-of-the-art for the scene reconstruction of dynamic collaborative driving environments, significantly outperforming existing methods in both rendering quality and efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper presents FRUC, a feed-forward 3D Gaussian splatting framework for dynamic scene reconstruction from uncalibrated collaborative driving views. It models multi-vehicle setups as an ego-centric multi-camera system and introduces a visual grounded geometric Transformer backbone for one-shot calibration-free inference. The core technical contributions are an ego-centric causal occlusion field that derives latent priors from agent-wise spatio-temporal correlations and a deterministic residual denoising process using zero-initialized injection to complete blind spots without degrading ego-visible geometry. Extensive evaluations on the V2XReal and UrbanIng-V2X datasets are reported to establish state-of-the-art performance in rendering quality and efficiency over existing methods.

Significance. If the results hold, this represents a meaningful step toward practical collaborative 3D reconstruction in autonomous driving by replacing per-scene optimization with feed-forward inference while explicitly addressing cross-agent misalignment. The manuscript strengthens its central claim through ablations that directly test the non-degradation property on ego-visible regions, and the architectural description (Transformer backbone, causal occlusion modeling, zero-init residual path) is internally consistent.

minor comments (3)
  1. Abstract: the claim of 'extensive evaluations' and 'significantly outperforming' would be more informative if a brief summary of key quantitative metrics (e.g., PSNR, SSIM, runtime) and the number of baselines were included, even if full tables appear later.
  2. Method section (around the residual injection formulation): the description of how the zero-initialized injection interacts with the Transformer features could be expanded with a short pseudocode or explicit equation showing the bounded residual update to improve reproducibility.
  3. Experiments: ensure all reported improvements include standard deviations across scenes or multiple runs, and clarify whether the same set of dynamic objects is used for both qualitative and quantitative comparisons.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work, the recognition of its potential significance for practical collaborative reconstruction in autonomous driving, and the recommendation for minor revision. The report correctly identifies the core technical elements (geometric Transformer, ego-centric causal occlusion field, zero-initialized residual denoising) and notes the strength of our ablations on the non-degradation property.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents FRUC as a new feed-forward framework built on a visual grounded geometric Transformer backbone, introducing an ego-centric causal occlusion field and zero-initialized residual injection for collaborative reconstruction. No equations, fitted parameters, or self-citations are shown that reduce the claimed outputs (rendering quality, efficiency, non-destructive supplementation) to the inputs by construction. The derivation chain consists of novel architectural choices tested via ablations on external datasets (V2XReal, UrbanIng-V2X), remaining self-contained without self-definitional loops, fitted-input predictions, or load-bearing self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The abstract relies on the unproven effectiveness of the newly introduced occlusion field and residual injection mechanism; no free parameters are explicitly named, but the Transformer backbone and occlusion modeling are treated as working without further justification.

axioms (2)
  • domain assumption A visual grounded geometric Transformer backbone enables one-shot, calibration-free inference from a flexible number of multi-vehicle views.
    Invoked as the core efficiency mechanism without supporting derivation.
  • domain assumption Modeling agent-wise spatio-temporal correlations produces reliable latent priors for occlusion evolution.
    Central to the non-destructive supplementation claim.
invented entities (2)
  • ego-centric causal occlusion field no independent evidence
    purpose: Explicitly derives occlusion evolution as latent priors from uncalibrated views.
    New modeling construct introduced to guide fusion.
  • zero-initialized injection no independent evidence
    purpose: Turns cross-agent fusion into bounded residual learning for blind-spot completion.
    New formulation for integration without harming visible geometry.

pith-pipeline@v0.9.1-grok · 5788 in / 1420 out tokens · 33791 ms · 2026-06-29T08:16:52.000901+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 17 canonical work pages · 4 internal anchors

  1. [1]

    pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction

    David Charatan, Sizhe Li, Andrea Tagliasacchi, and Vincent Sitzmann. pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. InCVPR, 2024

  2. [3]

    URLhttps://arxiv.org/abs/2512.03004

  3. [4]

    Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images

    Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, and Jianfei Cai. Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. In European Conference on Computer Vision, pages 370–386, Cham, 2025

  4. [5]

    Freesim: Towardfree-viewpoint camera simulation in driving scenes

    LueFan, HaoZhang, QitaiWang, HongshengLi, andZhaoxiangZhang. Freesim: Towardfree-viewpoint camera simulation in driving scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12004–12014, June 2025

  5. [6]

    PACP: Priority-Aware Collaborative Perception for Connected and Autonomous Vehicles.IEEE Transactions on Mobile Computing, 23(12):15003–15018, 2024

    Zhengru Fang, Senkang Hu, Haonan An, Yuang Zhang, Jingjing Wang, Hangcheng Cao, Xianhao Chen, and Yuguang Fang. PACP: Priority-Aware Collaborative Perception for Connected and Autonomous Vehicles.IEEE Transactions on Mobile Computing, 23(12):15003–15018, 2024

  6. [7]

    R-acp: Real-time adaptive collaborative perception leveraging robust task-oriented communications

    Zhengru Fang, Jingjing Wang, Yanan Ma, Yihang Tao, Yiqin Deng, Xianhao Chen, and Yuguang Fang. R-acp: Real-time adaptive collaborative perception leveraging robust task-oriented communications. IEEE Journal on Selected Areas in Communications, 43(12):4215–4230, 2025. doi: 10.1109/JSAC.2025. 3623179

  7. [8]

    Agent-Centric Observation Adaptation for Robust Visual Control under Dynamic Perturbations

    Zhengru Fang, Yu Guo, Fei Liu, Yuang Zhang, Yihang Tao, Senkang Hu, Wenbo Ding, and Yuguang Fang. Agent-centric visual reinforcement learning under dynamic perturbations.arXiv preprint arXiv:2604.24661, 2026

  8. [9]

    Onerestore: A universal restoration framework for composite degradation

    Yu Guo, Yuan Gao, Yuxu Lu, Huilin Zhu, Ryan Wen Liu, and Shengfeng He. Onerestore: A universal restoration framework for composite degradation. InEuropean conference on computer vision, pages 255–272. Springer, 2024

  9. [10]

    Neptune-x: Active x-to-maritime generation for universal maritime object detection.Advances in Neural Information Processing Systems, 38:146587–146614, 2026

    Yu Guo, Shengfeng He, Yuxu Lu, Haonan An, Yihang Tao, Huilin Zhu, Jingxian Liu, and Yuguang Fang. Neptune-x: Active x-to-maritime generation for universal maritime object detection.Advances in Neural Information Processing Systems, 38:146587–146614, 2026

  10. [11]

    Denoising Diffusion Probabilistic Models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.arXiv preprint arXiv:2006.11239, 2020. URLhttps://arxiv.org/abs/2006.11239

  11. [12]

    Drivingscene: A multi-task online feed-forward 3d gaussian splatting method for dynamic driving scenes.arXiv preprint arXiv:2510.24734, 2025

    Qirui Hou, Wenzhang Sun, Chang Zeng, Chunfeng Wang, Hao Li, and Jianxun Cui. Drivingscene: A multi-task online feed-forward 3d gaussian splatting method for dynamic driving scenes.arXiv preprint arXiv:2510.24734, 2025. URLhttps://arxiv.org/abs/2510.24734

  12. [13]

    Where2comm: communication- efficient collaborative perception via spatial confidence maps

    Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, and Siheng Chen. Where2comm: communication- efficient collaborative perception via spatial confidence maps. InProceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS), pages 4874–4886, Red Hook, NY, USA, April 2024. 13/24 FRUC: Feedforward Dynamic Scene Reconstructio...

  13. [14]

    V2x-gaussians: Gaussian splatting for multi-agent cooperative dynamic scene reconstruction

    Abhishek Dinkar Jagtap, Rui Song, Sanath Tiptur Sadashivaiah, and Andreas Festag. V2x-gaussians: Gaussian splatting for multi-agent cooperative dynamic scene reconstruction. In2025 IEEE Intelligent Vehicles Symposium (IV), pages 1033–1039, 2025. doi: 10.1109/IV64158.2025.11097436

  14. [15]

    Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025

    Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, et al. Anysplat: Feed-forward 3d gaussian splatting from unconstrained views.ACM Transactions on Graphics (TOG), 44(6):1–16, 2025

  15. [16]

    3d gaussian splatting for real-time radiance field rendering.ACM Trans

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, et al. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1, 2023

  16. [17]

    Learning Distilled Collaboration Graph for Multi-Agent Perception

    Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, and Wenjun Zhang. Learning Distilled Collaboration Graph for Multi-Agent Perception. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems (NeurIPS), volume 34, pages 29541–29552, 2021

  17. [18]

    Drivingrecon: Large 4d gaussian reconstruction model for au- tonomous driving

    Hao LU, Tianshuo Xu, Wenzhao Zheng, Yunpeng Zhang, Wei Zhan, Dalong Du, Masayoshi TOMIZUKA, Kurt Keutzer, and Yingcong Chen. Drivingrecon: Large 4d gaussian reconstruction model for au- tonomous driving. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Neural Information Processing Systems, volume 38,...

  18. [19]

    Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis

    Jonathon Luiten, Georgios Kopanas, Bastian Leibe, and Deva Ramanan. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In3DV, 2024

  19. [20]

    Maxime Oquab, Timothée Darcet, Theo Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Russell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang-Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nicolas Ballas, Gabriel Syn- naeve, Ishan Misra, Herve Jegou, Julien Mairal, Patrick Laba...

  20. [21]

    Semantic image synthesis with spatially-adaptive normalization

    Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic image synthesis with spatially-adaptive normalization. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

  21. [22]

    Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

    René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020

  22. [23]

    Urbaning-v2x: A large-scale multi-vehicle, multi-infrastructure dataset across multiple intersections for cooperative perception

    Karthikeyan Chandra Sekaran, Markus Geisler, Dominik Rößle, Adithya Mohan, Daniel Cremers, Wolfgang Utschick, Michael Botsch, Werner Huber, and Torsten Schön. Urbaning-v2x: A large-scale multi-vehicle, multi-infrastructure dataset across multiple intersections for cooperative perception. In The Thirty-ninth Annual Conference on Neural Information Processi...

  23. [24]

    Tensor4d: Efficientneural4ddecompositionforhigh-fidelitydynamicreconstructionandrendering

    Ruizhi Shao, Zerong Zheng, Hanzhang Tu, Boning Liu, Hongwen Zhang, and Yebin Liu. Tensor4d: Efficientneural4ddecompositionforhigh-fidelitydynamicreconstructionandrendering. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2023. 14/24 FRUC: Feedforward Dynamic Scene Reconstruction from Uncalibrated Collaborative Driving Views

  24. [25]

    Splatter image: Ultra-fast single- view 3d reconstruction

    Stanislaw Szymanowicz, Christian Rupprecht, and Andrea Vedaldi. Splatter image: Ultra-fast single- view 3d reconstruction. InThe IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  25. [26]

    Digital twin and drl-driven semantic dissemination for 6g autonomous driving service

    Yihang Tao, Jun Wu, Xi Lin, Shahid Mumtaz, and Soumaya Cherkaoui. Digital twin and drl-driven semantic dissemination for 6g autonomous driving service. InGLOBECOM 2023 - 2023 IEEE Global Com- munications Conference, pages 2075–2080, 2023. doi: 10.1109/GLOBECOM54140.2023.10437455

  26. [27]

    Drl-driven digital twin function virtualization for adaptive service response in 6g networks.IEEE Networking Letters, 5(2):125–129, 2023

    Yihang Tao, Jun Wu, Xi Lin, and Wu Yang. Drl-driven digital twin function virtualization for adaptive service response in 6g networks.IEEE Networking Letters, 5(2):125–129, 2023. doi: 10.1109/LNET. 2023.3269766

  27. [28]

    Yihang Tao, Jun Wu, Qianqian Pan, Ali Kashif Bashir, and Marwan Omar. O-ran-based digital twin function virtualization for sustainable iov service response: An asynchronous hierarchical reinforcement learning approach.IEEE Transactions on Green Communications and Networking, 8(3):1049–1060,

  28. [29]

    doi: 10.1109/TGCN.2024.3435796

  29. [30]

    Directed-cp: Directed collaborative perception for connected and autonomous vehicles via proactive attention

    Yihang Tao, Senkang Hu, Zhengru Fang, and Yuguang Fang. Directed-cp: Directed collaborative perception for connected and autonomous vehicles via proactive attention. In2025 IEEE International Conference on Robotics and Automation (ICRA), pages 7004–7010, 2025. doi: 10.1109/ICRA55743. 2025.11127818

  30. [31]

    Learning mutual view information graph for adaptive adversarial collaborative perception.arXiv preprint arXiv:2602.19596, 2026

    Yihang Tao, Senkang Hu, Haonan An, Zhengru Fang, Hangcheng Cao, and Yuguang Fang. Learning mutual view information graph for adaptive adversarial collaborative perception.arXiv preprint arXiv:2602.19596, 2026. URLhttps://arxiv.org/abs/2602.19596

  31. [32]

    Gcp: Guarded collaborative perception with spatial-temporal aware malicious agent detection.IEEE Transactions on Dependable and Secure Computing, pages 1–14, 2026

    Yihang Tao, Senkang Hu, Yue Hu, Haonan An, Hangcheng Cao, and Yuguang Fang. Gcp: Guarded collaborative perception with spatial-temporal aware malicious agent detection.IEEE Transactions on Dependable and Secure Computing, pages 1–14, 2026. doi: 10.1109/TDSC.2026.3693684

  32. [33]

    Drivingforward: Feed-forward 3d gaussian splatting for driving scene reconstruction from flexible surround-view input

    Qijian Tian, Xin Tan, Yuan Xie, and Lizhuang Ma. Drivingforward: Feed-forward 3d gaussian splatting for driving scene reconstruction from flexible surround-view input. InProceedings of the AAAI Conference on Artificial Intelligence, 2025

  33. [34]

    Vggt: Visual geometry grounded transformer

    JianyuanWang, MinghaoChen, NikitaKaraev, AndreaVedaldi, ChristianRupprecht, andDavidNovotny. Vggt: Visual geometry grounded transformer. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

  34. [35]

    4d gaussian splatting for real-time dynamic scene rendering

    Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20310–20320, June 2024

  35. [36]

    V2x-real: A largs-scale dataset for vehicle-to-everything cooperative perception

    Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li Jin, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, and Jiaqi Ma. V2x-real: A largs-scale dataset for vehicle-to-everything cooperative perception. InEuropeanConferenceonComputerVision(ECCV)2024,pages455...

  36. [37]

    Segformer: Simple and efficient design for semantic segmentation with transformers

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. Segformer: Simple and efficient design for semantic segmentation with transformers. InNeural Information Processing Systems (NeurIPS), 2021. 15/24 FRUC: Feedforward Dynamic Scene Reconstruction from Uncalibrated Collaborative Driving Views

  37. [38]

    Sparsegs: Sparse view synthesis using 3d gaussian splatting

    Haolin Xiong, Sairisheek Muttukuru, Hanyuan Xiao, Rishi Upadhyay, Pradyumna Chari, Yajie Zhao, and Achuta Kadambi. Sparsegs: Sparse view synthesis using 3d gaussian splatting. In2025 International Conference on 3D Vision (3DV), pages 1032–1041, 2025. doi: 10.1109/3DV66043.2025.00100

  38. [39]

    Cruise: Cooperative reconstruction and editing in v2x scenarios using gaussian splatting

    Haoran Xu, Saining Zhang, Peishuo Li, Baijun Ye, Xiaoxue Chen, Huan-Ang Gao, Jv Zheng, Xiaowei Song, Ziqiao Peng, Run Miao, Jinrang Jia, Yifeng Shi, Guangqi Yi, Hang Zhao, Hao Tang, Hongyang Li, Kaicheng Yu, and Hao Zhao. Cruise: Cooperative reconstruction and editing in v2x scenarios using gaussian splatting. In2025 IEEE/RSJ International Conference on I...

  39. [40]

    Opv2v: Anopenbenchmarkdataset and fusion pipeline for perception with vehicle-to-vehicle communication

    RunshengXu, HaoXiang, XinXia, XuHan, JinlongLi, andJiaqiMa. Opv2v: Anopenbenchmarkdataset and fusion pipeline for perception with vehicle-to-vehicle communication. In2022 IEEE International Conference on Robotics and Automation (ICRA), 2022

  40. [41]

    EmerneRF: Emergent spatial-temporal scene decomposition via self-supervision

    Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, and Yue Wang. EmerneRF: Emergent spatial-temporal scene decomposition via self-supervision. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=ycv2z8TYur

  41. [42]

    STORM: Spatio- temporalreconstructionmodelforlarge-scaleoutdoorscenes

    Jiawei Yang, Jiahui Huang, Boris Ivanovic, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You, Apoorva Sharma, Maximilian Igl, Peter Karkus, Danfei Xu, Yue Wang, and Marco Pavone. STORM: Spatio- temporalreconstructionmodelforlarge-scaleoutdoorscenes. InTheThirteenthInternationalConference on Learning Representations, 2025. URLhttps://openreview.net/forum?id=M2NFWRPMUd

  42. [44]

    URLhttps://arxiv.org/abs/2603.19552

  43. [45]

    Gs-lrm: Large reconstruction model for 3d gaussian splatting.European Conference on Computer Vision, 2024

    Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. Gs-lrm: Large reconstruction model for 3d gaussian splatting.European Conference on Computer Vision, 2024

  44. [46]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InCVPR, 2018

  45. [47]

    Driv- inggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes

    Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. Driv- inggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21634–21643, 2024

  46. [49]

    DVGT: Driving Visual Geometry Transformer

    URLhttps://arxiv.org/abs/2512.16919. 16/24 FRUC: Feedforward Dynamic Scene Reconstruction from Uncalibrated Collaborative Driving Views A. More Implementation Details A.1. Dataset Preparation and Utilization Data Preparation.We unify V2X-Real [34] and UrbanIng-V2X [22] into a common OPV2V [38] format benchmark. The processed benchmark preserves the full o...