pith. sign in

arxiv: 2605.17907 · v1 · pith:GLMPO6AFnew · submitted 2026-05-18 · 💻 cs.CV · cs.AI

One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative Perception

Pith reviewed 2026-05-20 11:40 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords collaborative perceptionfeature modality translationheterogeneous sensorszero-shot translationuniversal modelany-to-any mappingintermediate features
0
0 comments X

The pith

A single pretrained model can translate intermediate features between any pair of sensor modalities without retraining or fine-tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes UniTrans to overcome modality heterogeneity in collaborative perception, where agents share intermediate features but often use different sensors that require costly per-modality adapters. It pretrains a bank of translator expert parameters and predicts their combination coefficients from a source-to-target mapping computed in a modality-intrinsic latent space extracted by an intrinsic encoder. A sympathetic reader would care because repeated training is expensive and frequently blocked by privacy rules across manufacturers, so a universal zero-shot translator could scale multi-agent systems to arbitrary new modalities. Experiments on OPV2V-H and DAIR-V2X show consistent gains over prior methods in both simulated and real-world data.

Core claim

UniTrans pretrains a bank of translator expert parameters and learns their combination coefficients as a function of source-to-target modality mapping measured in a modality-intrinsic latent space, where an intrinsic encoder extracts modality-specific yet scene-invariant codes from single-frame intermediate features, enabling the model to instantiate translators for arbitrary modalities in a zero-shot manner.

What carries the argument

Bank of pretrained translator expert parameters whose combination coefficients are predicted from a modality mapping in a latent space produced by an intrinsic encoder on single-frame features.

If this is right

  • Any-to-any feature translation becomes possible through one universal model instead of training separate adapters for each new modality.
  • Translation works in zero-shot fashion for previously unseen modality pairs measured in the latent space.
  • The approach respects model and data privacy by eliminating the need for additional retraining across different manufacturers.
  • Performance improves over existing direct-adaptation and protocol-based methods on both simulated and real-world collaborative perception benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could support dynamic addition of new sensors in deployed fleets without requiring system-wide updates or data sharing.
  • Similar expert-bank techniques might apply to other heterogeneous multi-agent tasks such as joint mapping or coordinated planning.
  • Extending the latent-space mapping to handle temporal sequences rather than single frames could improve robustness in fast-moving scenes.

Load-bearing premise

An intrinsic encoder can reliably extract modality-specific yet scene-invariant codes from single-frame intermediate features that suffice to determine accurate combination coefficients for any unseen modality pair.

What would settle it

A clear drop in translation quality or end-to-end perception performance when the model is tested on a new sensor modality whose intrinsic code was never seen during coefficient learning.

Figures

Figures reproduced from arXiv: 2605.17907 by Congzhang Shao, Guiyang Luo, Jinglin Li, Quan Yuan, Weize Li, Xiaoyuan Fu, Xinyuan Ding, Xuanhan Zhu, Yang Li, Yunqi Ba.

Figure 1
Figure 1. Figure 1: Comparison of heterogeneous collaboration perception paradigms and our proposed UniTrans. (a) One-to-one adaptation: trains a dedicated adapter for each pairwise modality mapping, incurring substantial training complexity. (b) Two-step adaptation: only needs to learn an adapter to the protocol space, which is not universally suitable for newly emerging agents. (c) Any-to-any translation (ours): instantiate… view at source ↗
Figure 2
Figure 2. Figure 2: The overall architecture of our proposed UniTrans. After a single two-stage pretraining procedure, UniTrans can perform zero-shot feature translation at inference time under arbitrary heterogeneous modality mappings, without any repeated retraining. or protocol-space alignment. One-to-one adaptation learns mapping-specific adapters, such as MPDA (Xu et al., 2023), PnPDA (Luo et al., 2025), and PolyInter (X… view at source ↗
Figure 3
Figure 3. Figure 3: Perception performance under different location-error noise levels for four ego modalities. reflected by the performance drop for a larger d. Overall, these results suggest that a compact yet expressive intrinsic space is crucial for stable mapping estimation and robust any-to-any translation. C.2. Modality organization of the intrinsic space. To verify whether the learned intrinsic space is organized by m… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative visualization of any-to-any feature translation on OPV2V-H. Ego Feature denotes the target-domain representation, Neighbor Feature is the source feature to be translated, and the last two columns show the translated features produced by MPDA and UniTrans (camera ego in Scene 1 and LiDAR ego in Scene 2) [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5 [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
read the original abstract

By sharing intermediate features, collaborative perception extends each agent's sensing beyond standalone limits, but real-world feature modality heterogeneity remains a key barrier to effective fusion. Most existing methods, including direct adaption and protocol-based transformation, typically rely on training adapters for newly emerging feature modalities and often require additional retraining or fine-tuning. Such repeated training is costly and is often infeasible across manufacturers due to model and data privacy constraints, limiting real-world scalability. To address this issue, we propose UniTrans, a universal any-to-any feature modality translation model that instantiates translators on the fly for arbitrary modalities. UniTrans pretrains a bank of translator expert parameters and learns their combination coefficients as a function of source-to-target modality mapping. The mapping is measured in a modality-intrinsic latent space, where an intrinsic encoder extracts modality-specific yet scene-invariant codes from single-frame intermediate features, enabling UniTrans to instantiate translators in a zero-shot manner. Experiments on OPV2V-H and DAIR-V2X demonstrate that UniTrans consistently outperforms state-of-the-art methods in both simulated and real-world settings, enabling efficient any-to-any translation through a universal model. The code is available at https://github.com/CheeryLeeyy/UniTrans.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes UniTrans, a universal any-to-any feature modality translation model for heterogeneous collaborative perception. It pretrains a bank of translator expert parameters and learns their combination coefficients as a function of source-to-target modality mapping measured in a modality-intrinsic latent space. An intrinsic encoder extracts modality-specific yet scene-invariant codes from single-frame intermediate features, enabling zero-shot instantiation of translators for arbitrary modalities without retraining. Experiments on OPV2V-H and DAIR-V2X demonstrate consistent outperformance over state-of-the-art methods in simulated and real-world settings.

Significance. If the central claims hold, this work would have substantial significance for collaborative perception and multi-modal fusion. It directly tackles the practical barrier of feature modality heterogeneity by enabling a single model to handle any-to-any translation scalably, avoiding costly per-modality retraining or fine-tuning that conflicts with privacy constraints across manufacturers. The dynamic expert combination via intrinsic latent codes represents a novel direction that could extend to other heterogeneous multi-agent systems. The public code release at the provided GitHub link supports reproducibility.

major comments (2)
  1. [Section 4] Section 4 (Experiments): All reported results use modalities from the OPV2V-H and DAIR-V2X training distributions; no evaluation is provided for translation involving a truly novel sensor type or representation outside this support. This is load-bearing for the zero-shot any-to-any claim, as the intrinsic encoder and expert bank are fitted exclusively to seen modalities.
  2. [Section 3.2] Section 3.2 (Intrinsic Encoder): The claim that single-frame intermediate features yield modality-specific yet scene-invariant codes sufficient to compute accurate combination coefficients for unseen pairs lacks supporting analysis or ablation; nothing in the architecture enforces invariance under distribution shift to new resolutions or noise statistics.
minor comments (2)
  1. [Section 4] Abstract and Section 4: Outperformance statements lack error bars, ablation details on the expert bank size or latent space, and statistical tests, which would strengthen verification of the results.
  2. [Figures] Figure captions: Some visualizations of the latent space or translator instantiation could include more explicit annotations for modality clusters to aid clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments, which help clarify the scope of our zero-shot claims. We address each major comment in detail below, providing clarifications on the experimental support and architectural design while noting where revisions will strengthen the presentation.

read point-by-point responses
  1. Referee: [Section 4] Section 4 (Experiments): All reported results use modalities from the OPV2V-H and DAIR-V2X training distributions; no evaluation is provided for translation involving a truly novel sensor type or representation outside this support. This is load-bearing for the zero-shot any-to-any claim, as the intrinsic encoder and expert bank are fitted exclusively to seen modalities.

    Authors: We appreciate the referee pointing out the distinction between modalities within the training support and truly novel sensors. The reported experiments evaluate any-to-any translation across the heterogeneous modalities present in OPV2V-H (simulated) and DAIR-V2X (real-world), which include distinct sensor representations such as different LiDAR point-cloud features and camera-based features. The core mechanism—pretraining an expert bank and deriving combination coefficients from mappings in the modality-intrinsic latent space extracted by the intrinsic encoder—is explicitly designed to support zero-shot instantiation for arbitrary modalities by relying on single-frame feature characteristics rather than dataset-specific training. We acknowledge that direct empirical results on sensor types completely absent from the training distributions would provide stronger validation of the generalization claim. In the revised manuscript we will add a dedicated limitations and future-work paragraph discussing this point, along with preliminary results using synthetically altered feature representations to simulate novel modalities. revision: partial

  2. Referee: [Section 3.2] Section 3.2 (Intrinsic Encoder): The claim that single-frame intermediate features yield modality-specific yet scene-invariant codes sufficient to compute accurate combination coefficients for unseen pairs lacks supporting analysis or ablation; nothing in the architecture enforces invariance under distribution shift to new resolutions or noise statistics.

    Authors: The intrinsic encoder is trained to produce codes that isolate modality-specific properties (e.g., feature dimensionality, noise characteristics, and representation style) while remaining invariant to scene content by operating exclusively on single-frame intermediate features and employing a combination of reconstruction and contrastive objectives that penalize scene-dependent variations. This design choice is motivated by the observation that modality intrinsics are largely preserved across frames, whereas scene semantics vary. We provide supporting ablations in the supplementary material that isolate the contribution of the single-frame input and the latent-space mapping for coefficient prediction accuracy on held-out modality pairs. Nevertheless, we agree that explicit robustness tests under controlled distribution shifts (resolution changes, added sensor noise) are not currently reported in the main text. We will incorporate these ablations into the revised Section 3.2 and add corresponding quantitative results. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained via standard pretraining and held-out evaluation.

full rationale

The paper's core mechanism pretrains a bank of translator expert parameters and learns combination coefficients as a function of source-to-target mappings in a latent space produced by a trained intrinsic encoder. These components are optimized on training data from OPV2V-H and DAIR-V2X and evaluated on held-out test sets for both simulated and real-world scenarios. No equation or claim reduces the claimed any-to-any translation performance to quantities defined by the same fitted parameters or to a self-citation chain. The architecture follows conventional supervised training of neural modules without self-definitional loops or fitted-input predictions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on the existence of a modality-intrinsic latent space whose codes are scene-invariant and on the sufficiency of a linear or learned combination of pretrained experts; both are introduced without independent external validation in the abstract.

free parameters (1)
  • combination coefficients
    Learned as a function of the source-to-target modality mapping extracted from the latent space.
axioms (1)
  • domain assumption A modality-intrinsic latent space exists in which codes are scene-invariant across modalities
    Invoked to measure source-to-target mapping for expert combination.
invented entities (2)
  • intrinsic encoder no independent evidence
    purpose: Extracts modality-specific yet scene-invariant codes from single-frame features
    New component required to produce the mapping used for translator instantiation.
  • bank of translator expert parameters no independent evidence
    purpose: Pretrained experts whose coefficients are combined for arbitrary modality pairs
    Core architectural element enabling the universal model.

pith-pipeline@v0.9.0 · 5786 in / 1318 out tokens · 41476 ms · 2026-05-20T11:40:53.176558+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

  1. [1]

    Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging , shorttitle =

    Ablin, Pierre and Katharopoulos, Angelos and Seto, Skyler and Grangier, David , year = 2025, langid =. Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging , shorttitle =. Forty-Second

  2. [2]

    and Liu, Yongkang and Akin Sisbot, Emrah and Oguchi, Kentaro and Huang, Zhitong , year = 2024, month = nov, journal =

    Bai, Zhengwei and Wu, Guoyuan and Barth, Matthew J. and Liu, Yongkang and Akin Sisbot, Emrah and Oguchi, Kentaro and Huang, Zhitong , year = 2024, month = nov, journal =. A Survey and Framework of Cooperative Perception: From Heterogeneous Singleton to Hierarchical Cooperation , shorttitle =. doi:10.1109/TITS.2024.3436012 , langid =

  3. [3]

    IEEE Transactions on Knowledge and Data Engineering , volume =

    A Survey on Mixture of Experts in Large Language Models , author =. IEEE Transactions on Knowledge and Data Engineering , volume =. doi:10.1109/TKDE.2025.3554028 , langid =

  4. [4]

    End-to-End Autonomous Driving: Challenges and Frontiers , shorttitle =

    Chen, Li and Wu, Penghao and Chitta, Kashyap and Jaeger, Bernhard and Geiger, Andreas and Li, Hongyang , year = 2024, journal =. End-to-End Autonomous Driving: Challenges and Frontiers , shorttitle =. doi:10.1109/TPAMI.2024.3435937 , langid =

  5. [5]

    Cheng, Kun and He, Xiao and Yu, Lei and Tu, Zhijun and Zhu, Mingrui and Wang, Nannan and Gao, Xinbo and Hu, Jie , year = 2025, langid =. Diff-. Forty-Second

  6. [6]

    , year = 2020, series =

    Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey E. , year = 2020, series =. A Simple Framework for Contrastive Learning of Visual Representations , booktitle =

  7. [7]

    A Survey of Autonomous Robots and Multi-Robot Navigation: Perception, Planning and Collaboration , shorttitle =

    Chen, Weinan and Chi, Wenzheng and Ji, Sehua and Ye, Hanjing and Liu, Jie and Jia, Yunjie and Yu, Jiajie and Cheng, Jiyu , year = 2025, month = jun, journal =. A Survey of Autonomous Robots and Multi-Robot Navigation: Perception, Planning and Collaboration , shorttitle =. doi:10.1016/j.birob.2024.100203 , langid =

  8. [8]

    Shao, Congzhang and Yuan, Quan and Luo, Guiyang and Hu, Yue and Wang, Danni and Yilin, Liu and Pan, Rui and Chen, Bo and Li, Jinglin , year = 2025, month = oct, langid =. The

  9. [9]

    Proceedings of the 1st

    Dosovitskiy, Alexey and Ros, German and Codevilla, Felipe and Lopez, Antonio and Koltun, Vladlen , year = 2017, month = oct, series =. Proceedings of the 1st

  10. [10]

    Learning Factored Representations in a Deep Mixture of Experts , booktitle =

    Eigen, David and Ranzato, Marc'Aurelio and Sutskever, Ilya , year = 2014, langid =. Learning Factored Representations in a Deep Mixture of Experts , booktitle =

  11. [11]

    Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , shorttitle =

    Fedus, William and Zoph, Barret and Shazeer, Noam , year = 2022, month = jan, journal =. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , shorttitle =

  12. [12]

    Gao, Xiangbo and Xu, Runsheng and Li, Jiachen and Wang, Ziran and Fan, Zhiwen and Tu, Zhengzhong , year = 2025, langid =. The

  13. [13]

    IEEE Transactions on Intelligent Vehicles , pages =

    A Survey of Collaborative Perception in Intelligent Vehicles at Intersections , author =. IEEE Transactions on Intelligent Vehicles , pages =. doi:10.1109/TIV.2024.3395783 , langid =

  14. [14]

    A Neural Algorithm of Artistic Style

    A Neural Algorithm of Artistic Style , author =. doi:10.48550/arXiv.1508.06576 , archiveprefix =. 1508.06576 , primaryclass =

  15. [15]

    Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models , shorttitle =

    Guo, Yongxin and Cheng, Zhenglin and Tang, Xiaoying and Tu, Zhaopeng and Lin, Tao , year = 2024, month = oct, langid =. Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models , shorttitle =. The

  16. [16]

    Collaborative Perception for Connected and Autonomous Driving: Challenges, Possible Solutions and Opportunities , shorttitle =

    Hu, Senkang and Fang, Zhengru and Deng, Yiqin and Chen, Xianhao and Fang, Yuguang , year = 2025, month = oct, journal =. Collaborative Perception for Connected and Autonomous Driving: Challenges, Possible Solutions and Opportunities , shorttitle =. doi:10.1109/MWC.002.2400348 , langid =

  17. [17]

    Where2comm:

    Hu, Yue and Fang, Shaoheng and Lei, Zixing and Zhong, Yiqi and Chen, Siheng , year = 2022, month = dec, journal =. Where2comm:

  18. [18]

    and Ba, Jimmy , year = 2015, langid =

    Kingma, Diederik P. and Ba, Jimmy , year = 2015, langid =. Adam: A Method for Stochastic Optimization , shorttitle =. 3rd

  19. [19]

    and Vora, Sourabh and Caesar, Holger and Zhou, Lubing and Yang, Jiong and Beijbom, Oscar , year = 2019, month = jun, pages =

    Lang, Alex H. and Vora, Sourabh and Caesar, Holger and Zhou, Lubing and Yang, Jiong and Beijbom, Oscar , year = 2019, month = jun, pages =. 2019. doi:10.1109/CVPR.2019.01298 , copyright =

  20. [20]

    Learning

    Li, Yiming and Ren, Shunli and Wu, Pengxiang and Chen, Siheng and Feng, Chen and Zhang, Wenjun , year = 2021, volume =. Learning. Advances in

  21. [21]

    doi:10.1109/TMM.2026.3654458 , langid =

    Lin, Bin and Tang, Zhenyu and Ye, Yang and Huang, Jinfa and Zhang, Junwu and Pang, Yatian and Jin, Peng and Ning, Munan and Luo, Jiebo and Yuan, Li , year = 2026, journal =. doi:10.1109/TMM.2026.3654458 , langid =

  22. [22]

    Towards Accurate and Efficient

    Liu, Linshen and Su, Boyan and Jiang, Junyue and Wu, Guanlin and Guo, Cong and Xu, Ceyu and Yang, Hao Frank , year = 2025, pages =. Towards Accurate and Efficient. Proceedings of the

  23. [23]

    Liu, Zhuang and Mao, Hanzi and Wu, Chao-Yuan and Feichtenhofer, Christoph and Darrell, Trevor and Xie, Saining , year = 2022, pages =. A. doi:10.1109/CVPR52688.2022.01167 , langid =

  24. [24]

    Liu, Xu and Liu, Juncheng and Woo, Gerald and Aksu, Taha and Liang, Yuxuan and Zimmermann, Roger and Liu, Chenghao and Li, Junnan and Savarese, Silvio and Xiong, Caiming and Sahoo, Doyen , year = 2025, langid =. Moirai-. Forty-Second

  25. [25]

    Li, Yunxin and Jiang, Shenyuan and Hu, Baotian and Wang, Longyue and Zhong, Wanqi and Luo, Wenhan and Ma, Lin and Zhang, Min , year = 2025, month = may, journal =. Uni-. doi:10.1109/TPAMI.2025.3532688 , langid =

  26. [26]

    Lu, Yifan and Li, Quanhao and Liu, Baoan and Dianati, Mehrdad and Feng, Chen and Chen, Siheng and Wang, Yanfeng , year = 2023, month = may, pages =. 2023. doi:10.1109/ICRA48891.2023.10160546 , copyright =

  27. [27]

    An Extensible Framework for Open Heterogeneous Collaborative Perception , booktitle =

    Lu, Yifan and Hu, Yue and Zhong, Yiqi and Wang, Dequan and Wang, Yanfeng and Chen, Siheng , year = 2024, langid =. An Extensible Framework for Open Heterogeneous Collaborative Perception , booktitle =

  28. [28]

    Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception , shorttitle =

    Luo, Tianyou and Yuan, Quan and Luo, Guiyang and Xia, Yuchen and Yang, Yujia and Li, Jinglin , year = 2025, pages =. Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception , shorttitle =. Computer. doi:10.1007/978-3-031-73004-7_17 , langid =

  29. [29]

    Computer

    McKinzie, Brandon and Gan, Zhe and Fauconnier, Jean-Philippe and Dodge, Sam and Zhang, Bowen and Dufter, Philipp and Shah, Dhruti and Du, Xianzhi and Peng, Futang and Belyi, Anton and Zhang, Haotian and Singh, Karanjeet and Kang, Doug and H. Computer. doi:10.1007/978-3-031-73397-0_18 , langid =

  30. [30]

    Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to

    Philion, Jonah and Fidler, Sanja , year = 2020, month = aug, pages =. Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to. Computer. doi:10.1007/978-3-030-58568-6_12 , langid =

  31. [31]

    On Variational Bounds of Mutual Information , booktitle =

    Poole, Ben and Ozair, Sherjil and van den Oord, A. On Variational Bounds of Mutual Information , booktitle =

  32. [32]

    and Hinton, Geoffrey E

    Shazeer, Noam and Mirhoseini, Azalia and Maziarz, Krzysztof and Davis, Andy and Le, Quoc V. and Hinton, Geoffrey E. and Dean, Jeff , year = 2017, langid =. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , shorttitle =. 5th

  33. [33]

    European

    Wang, Tsun-Hsuan and Manivasagam, Sivabalan and Liang, Ming and Yang, Bin and Zeng, Wenyuan and Urtasun, Raquel , year = 2020, volume =. European. doi:10.1007/978-3-030-58536-5_36 , langid =

  34. [34]

    In: Proc

    Xia, Yuchen and Yuan, Quan and Luo, Guiyang and Fu, Xiaoyuan and Li, Yang and Zhu, Xuanhan and Luo, Tianyou and Chen, Siheng and Li, Jinglin , year = 2025, pages =. One Is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception , shorttitle =. doi:10.1109/CVPR52734.2025.00156 , langid =

  35. [35]

    Robust forecasting for robotic control: A game-theoretic approach, in: Pro- ceedingsofthe40thIEEEInternationalConferenceonRoboticsand Automation, pp

    Xu, Runsheng and Li, Jinlong and Dong, Xiaoyu and Yu, Hongkai and Ma, Jiaqi , year = 2023, month = may, pages =. Bridging the. 2023. doi:10.1109/ICRA48891.2023.10160871 , langid =

  36. [36]

    Conference on

    Xu, Runsheng and Tu, Zhengzhong and Xiang, Hao and Shao, Wei and Zhou, Bolei and Ma, Jiaqi , year = 2022, series =. Conference on

  37. [37]

    Xu, Runsheng and Tu, Zhengzhong and Xiang, Hao and Shao, Wei and Zhou, Bolei and Ma, Jiaqi , year = 2022, month = aug, langid =. 6th

  38. [38]

    In: Proc

    Xu, Xiang and Kong, Lingdong and Shuai, Hui and Pan, Liang and Liu, Ziwei and Liu, Qingshan , year = 2025, pages =. doi:10.1109/CVPR52734.2025.02549 , langid =

  39. [39]

    Xu, Runsheng and Guo, Yi and Han, Xu and Xia, Xin and Xiang, Hao and Ma, Jiaqi , year = 2021, month = sep, pages =. 2021. doi:10.1109/ITSC48978.2021.9564825 , keywords =

  40. [40]

    Xu, Runsheng and Xiang, Hao and Xia, Xin and Han, Xu and Li, Jinlong and Ma, Jiaqi , year = 2022, month = may, pages =. 2022. doi:10.1109/ICRA46639.2022.9812038 , copyright =

  41. [41]

    European

    Xu, Runsheng and Xiang, Hao and Tu, Zhengzhong and Xia, Xin and Yang, Ming-Hsuan and Ma, Jiaqi , year = 2022, series =. European. doi:10.1007/978-3-031-19842-7_7 , langid =

  42. [42]

    DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving

    Yang, Zhenjie and Chai, Yilin and Jia, Xiaosong and Li, Qifeng and Shao, Yuqian and Zhu, Xuekai and Su, Haisheng and Yan, Junchi , year = 2025, month = may, number =. doi:10.48550/arXiv.2505.16278 , archiveprefix =. 2505.16278 , primaryclass =

  43. [43]

    Proceedings of the

    Yang, Zuhao and Yu, Yingchen and Zhao, Yunqing and Lu, Shijian and Bai, Song , year = 2025, pages =. Proceedings of the

  44. [44]

    doi:10.3390/s18103337 , copyright =

    Yan, Yan and Mao, Yuxing and Li, Bo , year = 2018, month = oct, journal =. doi:10.3390/s18103337 , copyright =

  45. [45]

    First Mile: An open innovation lab for infrastructure- assisted cooperative intelligent transportation systems,

    Yazgan, Melih and Graf, Thomas and Liu, Min and Fleck, Tobias and Z. A Survey on Intermediate Fusion Methods for Collaborative Perception Categorized by Real World Challenges , booktitle =. doi:10.1109/IV55156.2024.10588382 , langid =

  46. [46]

    Yu, Haibao and Luo, Yizhen and Shu, Mao and Huo, Yiyi and Yang, Zebang and Shi, Yifeng and Guo, Zhenglong and Li, Hanyu and Hu, Xing and Yuan, Jirui and Nie, Zaiqing , year = 2022, month = jun, pages =. 2022. doi:10.1109/CVPR52688.2022.02067 , copyright =

  47. [47]

    Zhou, Yin and Tuzel, Oncel , year = 2018, month = jun, pages =. 2018. doi:10.1109/CVPR.2018.00472 , langid =

  48. [48]

    APACrefauthors \ 1987

    Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis , author =. 1987 , journal =. doi:10.1016/0377-0427(87)90125-7 , langid =

  49. [49]

    Bae, Y .-J

    Kapse, Saarthak and Pati, Pushpak and Das, Srijan and Zhang, Jingwei and Chen, Chao and Vakalopoulou, Maria and Saltz, Joel and Samaras, Dimitris and Gupta, Rajarsi R. and Prasanna, Prateek , year =. Proceedings of the. doi:10.1109/CVPR52733.2024.01067 , langid =