pith. machine review for the scientific record. sign in

arxiv: 2605.13815 · v1 · submitted 2026-05-13 · 💻 cs.CV · cs.RO

Recognition: no theorem link

OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:16 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords LiDAR generationdiffusion modelsmulti-domain learningrange imagedata augmentationsemantic segmentation3D object detectionadverse weather
0
0 comments X

The pith

A single text-conditioned diffusion model generates realistic LiDAR scans across eight domains spanning weather, sensors, and platforms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OmniLiDAR to produce LiDAR range images for many different sensing conditions using one model rather than separate models for each dataset or environment. It trains on a combined set of eight domains that include adverse weather, fewer laser beams, and acquisitions from vehicles, drones, and quadrupeds. A cross-domain training approach mixes samples from all domains in every batch while text prompts steer the output toward the desired condition. Two added modules capture the directional structure of range images and adjust features for domain-specific shifts during the denoising process. If successful this removes the need to collect or train on isolated data for each new condition and supplies synthetic scans that boost performance on semantic segmentation and 3D detection, especially when labeled examples are scarce.

Core claim

OmniLiDAR is a unified text-conditioned diffusion framework that generates LiDAR scans in a shared range-image representation across eight representative domains by introducing a Cross-Domain Training Strategy that mixes domains within each mini-batch together with Cross-Domain Feature Modeling to capture anisotropic scanning structure along azimuth and elevation and Domain-Adaptive Feature Scaling to modulate domain-dependent feature shifts during denoising.

What carries the argument

Cross-Domain Training Strategy (CDTS) that mixes domains in each mini-batch, combined with CDFM for directional range-image dependencies and DAFS for lightweight domain modulation.

If this is right

  • Generative data augmentation yields consistent improvements on LiDAR semantic segmentation and 3D object detection.
  • Benefits appear in limited-label regimes and under robustness tests with added corruptions.
  • Controllable synthesis under weather, sensor, and platform shifts no longer requires domain-isolated models.
  • A shared range-image representation supports unified training across real scans and physically simulated weather or beam reductions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conditioning and modulation approach could extend to additional sensor modalities or time-varying scenes without retraining separate generators.
  • Text prompts might allow finer control over continuous parameters such as rain intensity or platform height beyond the eight discrete domains.
  • Integration with existing physics simulators could produce hybrid real-synthetic datasets for more scalable testing of perception algorithms.

Load-bearing premise

Mixing domains within each mini-batch together with text conditioning and the CDFM and DAFS modules enables effective unified training without negative transfer across heterogeneous shifts.

What would settle it

A head-to-head comparison in which separate per-domain diffusion models produce measurably higher generation fidelity or larger downstream-task gains than the single unified OmniLiDAR model.

Figures

Figures reproduced from arXiv: 2605.13815 by Ao Liang, Ben Fei, Dekai Zhu, Lingdong Kong, Runnan Chen, Tongliang Liu, Wanli Ouyang, Weidong Yang, Xiang Xu, Xin Li, Yang Wu, Youquan Liu.

Figure 1
Figure 1. Figure 1: OmniLiDAR in context: unified multi-domain LiDAR generation with a single text-conditioned diffusion model. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of OmniLiDAR. We adopt a text-conditioned diffusion model for controllable LiDAR generation over multi [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cross-Domain Feature Modeling with directional [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison of object-level LiDAR geome [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparisons of LiDAR scene generation [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative 3D detection results on Pi3DET [ [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

LiDAR scene generation is increasingly important for scalable simulation and synthetic data creation, especially under diverse sensing conditions that are costly to capture at scale. Typically, diffusion-based LiDAR generators are developed under single-domain settings, requiring separate models for different datasets or sensing conditions and hindering unified, controllable synthesis under heterogeneous distribution shifts. To this end, we present OmniLiDAR, a unified text-conditioned diffusion framework that generates LiDAR scans in a shared range-image representation across eight representative domains spanning three shift types: adverse weather, sensor-configuration changes (e.g., reduced beams), and cross-platform acquisition (vehicle, drone, and quadruped). To enable training a single model over heterogeneous domains without isolating optimization by domain, we introduce a Cross-Domain Training Strategy (CDTS) that mixes domains within each mini-batch and leverages conditioning to steer generation. We further propose Cross-Domain Feature Modeling (CDFM), which captures directional dependencies along azimuth and elevation axes to reflect the anisotropic scanning structure of range images, and Domain-Adaptive Feature Scaling (DAFS) as a lightweight modulation to account for structured domain-dependent feature shifts during denoising. In the absence of a public consolidated benchmark, we construct an 8-domain dataset by combining real-world scans with physically based weather simulation and systematic beam reduction while following official splits. Extensive experiments demonstrate strong generation fidelity and consistent gains in downstream use cases, including generative data augmentation for LiDAR semantic segmentation and 3D object detection, as well as robustness evaluation under corruptions, with consistent benefits in limited-label regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces OmniLiDAR, a unified text-conditioned diffusion framework for generating LiDAR scans in a shared range-image representation across eight domains spanning adverse weather, sensor-configuration changes, and cross-platform shifts (vehicle, drone, quadruped). It proposes a Cross-Domain Training Strategy (CDTS) that mixes domains in mini-batches with text conditioning, Cross-Domain Feature Modeling (CDFM) to capture anisotropic directional dependencies, and Domain-Adaptive Feature Scaling (DAFS) for lightweight domain-specific modulation during denoising. Using a newly constructed 8-domain dataset, the work reports strong generation fidelity and consistent downstream gains in generative data augmentation for semantic segmentation and 3D object detection.

Significance. If the central claim holds, the result would be significant for scalable LiDAR simulation and synthetic data pipelines, as a single controllable model could replace multiple domain-specific generators and support robustness evaluation under diverse sensing conditions.

major comments (2)
  1. [Experiments] Experiments section: No per-domain isolated baselines are reported (i.e., separate diffusion models trained only on each of the eight domains). All fidelity and downstream metrics are shown solely for the unified OmniLiDAR model, so it is impossible to quantify whether CDTS + CDFM + DAFS avoids negative transfer or merely benefits from 8× data volume and the shared range-image representation.
  2. [Ablation studies] Ablation studies: Quantitative ablations isolating CDFM, DAFS, and the domain-mixing component of CDTS are absent or insufficiently detailed, leaving the necessity of each module for unified training unverified.
minor comments (2)
  1. [Dataset] Dataset construction paragraph: Provide explicit parameters for the physically based weather simulation and the systematic beam-reduction procedure used to create the eight domains.
  2. [Method] Notation in §3: Clarify the exact form of the text-conditioning embedding and how it is injected into the diffusion U-Net.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that additional per-domain baselines and detailed ablations would strengthen the claims regarding the unified training strategy, and we will incorporate these experiments in the revised manuscript.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: No per-domain isolated baselines are reported (i.e., separate diffusion models trained only on each of the eight domains). All fidelity and downstream metrics are shown solely for the unified OmniLiDAR model, so it is impossible to quantify whether CDTS + CDFM + DAFS avoids negative transfer or merely benefits from 8× data volume and the shared range-image representation.

    Authors: We agree that per-domain baselines are necessary to isolate the effects of our proposed components. In the revision, we will train eight separate domain-specific diffusion models (using the identical backbone architecture and training protocol but without CDTS, CDFM, or DAFS) and report their generation fidelity metrics (FID, MMD) along with downstream semantic segmentation and 3D detection performance. These results will be directly compared to OmniLiDAR to quantify any negative transfer versus benefits from data volume and the shared representation. revision: yes

  2. Referee: [Ablation studies] Ablation studies: Quantitative ablations isolating CDFM, DAFS, and the domain-mixing component of CDTS are absent or insufficiently detailed, leaving the necessity of each module for unified training unverified.

    Authors: We acknowledge that the current ablations are insufficient to verify the contribution of each module. In the revised version, we will add comprehensive quantitative ablations that isolate CDFM, DAFS, and the domain-mixing strategy within CDTS. This will include variants with each component removed or ablated, reporting impacts on generation quality (e.g., FID) and downstream task performance to demonstrate the necessity of each for effective unified training. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no circular reductions

full rationale

The paper introduces a new unified text-conditioned diffusion model with proposed modules (CDTS for mixed-batch training, CDFM for anisotropic feature modeling, and DAFS for domain-adaptive scaling) and evaluates them on a constructed 8-domain dataset plus standard downstream benchmarks for segmentation and detection. No equations, predictions, or central claims reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the architecture and results are presented as independent empirical contributions without tautological renaming or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on standard diffusion model assumptions (iterative denoising converges to data distribution) and the effectiveness of the three proposed modules; no explicit free parameters, new axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5614 in / 1116 out tokens · 38160 ms · 2026-05-14T19:16:58.362102+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

106 extracted references · 14 canonical work pages · 3 internal anchors

  1. [1]

    UniSeg: A unified multi-modal LiDAR segmentation network and the OpenPCSeg codebase,

    Y. Liuet al., “UniSeg: A unified multi-modal LiDAR segmentation network and the OpenPCSeg codebase,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 21 662–21 673

  2. [2]

    Deep learning for LiDAR point clouds in autonomous driving: A review,

    Y. Liet al., “Deep learning for LiDAR point clouds in autonomous driving: A review,”IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 8, pp. 3412–3432, 2021

  3. [3]

    LoGoNet: Towards accurate 3D object detection with local-to-global cross-modal fusion,

    X. Liet al., “LoGoNet: Towards accurate 3D object detection with local-to-global cross-modal fusion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2023, pp. 17 524–17 534

  4. [4]

    Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

    J. Lu, J. Guan, Z. Huanget al., “OneVL: One-step latent reasoning and planning with vision-language explanation,”arXiv preprint arXiv:2604.18486, 2026

  5. [5]

    WorldLens: Full-spectrum evaluations of driving world models in real world,

    A. Liang, L. Kong, T. Yan, H. Liu, W. Yang, Z. Huang, W. Yin, J. Zuo, Y. Hu, D. Zhu, D. Lu, Y. Liu, G. Jiang, L. Li, X. Li, L. Zhuo, L. X. Ng, B. R. Cottereau, C. Gao, L. Pan, W. T. Ooi, and Z. Liu, “WorldLens: Full-spectrum evaluations of driving world models in real world,”arXiv preprint arXiv:2512.10958, 2025

  6. [6]

    Are VLMs ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives,

    S. Xie, L. Kong, Y. Dong, C. Sima, W. Zhang, Q. A. Chen, Z. Liu, and L. Pan, “Are VLMs ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 6585–6597

  7. [7]

    Monocular semantic scene completion via masked recurrent networks,

    X. Wang, X. Wu, S. Wanget al., “Monocular semantic scene completion via masked recurrent networks,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 24 811–24 822

  8. [8]

    Robo3D: Towards robust and reliable 3D perception against corruptions,

    L. Konget al., “Robo3D: Towards robust and reliable 3D perception against corruptions,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 19 994–20 006

  9. [9]

    TripleMixer: A 3D point cloud denoising model for adverse weather,

    X. Zhao, C. Wen, X. Zhu, Y. Wang, H. Bai, and W. Dou, “TripleMixer: A 3D point cloud denoising model for adverse weather,”arXiv preprint arXiv:2408.13802, 2024

  10. [10]

    Fog simulation on real LiDAR point clouds for 3D object detection in adverse weather,

    M. Hahner, C. Sakaridis, D. Dai, and L. Van Gool, “Fog simulation on real LiDAR point clouds for 3D object detection in adverse weather,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 15 283–15 292

  11. [11]

    LiDAR snowfall simulation for robust 3D object detection,

    M. Hahneret al., “LiDAR snowfall simulation for robust 3D object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 16 364–16 374. 11

  12. [12]

    Perspective-invariant 3D object detection,

    A. Lianget al., “Perspective-invariant 3D object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 27 725–27 738

  13. [13]

    NUC-Net: Non- uniform cylindrical partition network for efficient LiDAR semantic segmentation,

    X. Wang, W. Feng, L. Kong, and L. Wan, “NUC-Net: Non- uniform cylindrical partition network for efficient LiDAR semantic segmentation,”IEEE Trans. Circuits Syst. Video Technol., vol. 35, no. 9, pp. 9090–9104, 2025

  14. [14]

    Learning to adapt SAM for segmenting cross- domain point clouds,

    X. Penget al., “Learning to adapt SAM for segmenting cross- domain point clouds,” inEur. Conf. Comput. Vis.Springer, 2024, pp. 54–71

  15. [15]

    Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather,

    M. Bijelicet al., “Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2020, pp. 11 682– 11 692

  16. [16]

    Segment any point cloud sequences by distilling vision foundation models,

    Y. Liuet al., “Segment any point cloud sequences by distilling vision foundation models,” inProc. Adv. Neural Inf. Process. Syst., 2023, pp. 37 193–37 229

  17. [17]

    Scalability in perception for autonomous driving: Waymo open dataset,

    P . Sunet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 2446–2454

  18. [18]

    Learning to generate realistic LiDAR point clouds,

    V . Zyrianov, X. Zhu, and S. Wang, “Learning to generate realistic LiDAR point clouds,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 17–35

  19. [19]

    LiDAR data synthesis with denoising diffusion probabilistic models,

    K. Nakashima and R. Kurazume, “LiDAR data synthesis with denoising diffusion probabilistic models,” inProc. IEEE Int. Conf. Robot. Autom., 2024, pp. 14 724–14 731

  20. [20]

    Towards realistic scene generation with LiDAR diffusion models,

    H. Ran, V . Guizilini, and Y. Wang, “Towards realistic scene generation with LiDAR diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 14 738–14 748

  21. [21]

    U4D: Uncertainty-aware 4D world modeling from LiDAR sequences,

    X. Xuet al., “U4D: Uncertainty-aware 4D world modeling from LiDAR sequences,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2026

  22. [22]

    Spiral: Semantic- aware progressive LiDAR scene generation and understanding,

    D. Zhu, Y. Hu, Y. Liu, D. Lu, L. Kong, and S. Ilic, “Spiral: Semantic- aware progressive LiDAR scene generation and understanding,” inProc. Adv. Neural Inf. Process. Syst., vol. 38, 2025, pp. 57 623– 57 653

  23. [23]

    LaserMix for semi-supervised LiDAR semantic segmentation,

    L. Kong, J. Ren, L. Pan, and Z. Liu, “LaserMix for semi-supervised LiDAR semantic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 21 705–21 715

  24. [24]

    Multi-modal data-efficient 3d scene understanding for autonomous driving,

    L. Kong, X. Xu, J. Ren, W. Zhang, L. Pan, K. Chen, W. T. Ooi, and Z. Liu, “Multi-modal data-efficient 3d scene understanding for autonomous driving,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 5, pp. 3748–3765, 2025

  25. [25]

    Is your HD map constructor reliable under sensor corruptions?

    X. Haoet al., “Is your HD map constructor reliable under sensor corruptions?” inProc. Adv. Neural Inf. Process. Syst., vol. 37, 2024, pp. 22 441–22 482

  26. [26]

    Benchmarking and improving bird’s eye view perception robust- ness in autonomous driving,

    S. Xie, L. Kong, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Benchmarking and improving bird’s eye view perception robust- ness in autonomous driving,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 5, pp. 3878–3894, 2025

  27. [27]

    Optimizing LiDAR placements for robust driving perception in adverse conditions,

    Y. Liet al., “Optimizing LiDAR placements for robust driving perception in adverse conditions,”arXiv preprint arXiv:2403.17009, 2024

  28. [28]

    LargeAD: Large-scale cross-sensor data pretraining for autonomous driving,

    L. Konget al., “LargeAD: Large-scale cross-sensor data pretraining for autonomous driving,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 2, pp. 1291–1308, 2026

  29. [29]

    WeatherGen: A unified diverse weather generator for LiDAR point clouds via spider mamba diffusion,

    Y. Wu, Y. Zhu, K. Zhang, J. Qian, J. Xie, and J. Yang, “WeatherGen: A unified diverse weather generator for LiDAR point clouds via spider mamba diffusion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 17 019–17 028

  30. [30]

    arXiv preprint arXiv:2509.07996 (2025) 2, 4

    L. Konget al., “3D and 4D world modeling: A survey,”arXiv preprint arXiv:2509.07996, 2025

  31. [31]

    3EED: Ground everything everywhere in 3D,

    R. Liet al., “3EED: Ground everything everywhere in 3D,” inProc. Adv. Neural Inf. Process. Syst., vol. 38, 2025

  32. [32]

    LiMoE: Mixture of LiDAR representation learners from automotive scenes,

    X. Xuet al., “LiMoE: Mixture of LiDAR representation learners from automotive scenes,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 27 368–27 379

  33. [33]

    Enhanced spatiotemporal consistency for image-to- LiDAR data pretraining,

    X. Xu, L. Kong, H. Shuai, W. Zhang, L. Pan, K. Chen, Z. Liu, and Q. Liu, “Enhanced spatiotemporal consistency for image-to- LiDAR data pretraining,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 3, pp. 3819–3834, 2026

  34. [34]

    ConDA: Unsupervised domain adaptation for LiDAR segmentation via regularized domain concatenation,

    L. Kong, N. Quader, and V . E. Liong, “ConDA: Unsupervised domain adaptation for LiDAR segmentation via regularized domain concatenation,” inProc. IEEE Int. Conf. Robot. Automat., 2023, pp. 9338–9345

  35. [35]

    Is your LiDAR placement optimized for 3D scene understanding?

    Y. Li, L. Kong, H. Hu, X. Xu, and X. Huang, “Is your LiDAR placement optimized for 3D scene understanding?” inProc. Adv. Neural Inf. Process. Syst., vol. 37, 2024, pp. 34 980–35 017

  36. [36]

    DynamicCity: Large-scale 4D occupancy generation from dynamic scenes,

    H. Bian, L. Kong, H. Xie, L. Pan, Y. Qiao, and Z. Liu, “DynamicCity: Large-scale 4D occupancy generation from dynamic scenes,” in Int. Conf. Learn. Represent., 2025

  37. [37]

    SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences,

    J. Behleyet al., “SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 9297–9307

  38. [38]

    nuScenes: A multimodal dataset for autonomous driving,

    H. Caesaret al., “nuScenes: A multimodal dataset for autonomous driving,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 621–11 631

  39. [39]

    LiDAR light scattering augmentation (LISA): Physics-based simulation of adverse weather conditions for 3D object detection,

    V . Kilic, D. Hegde, A. B. Cooper, V . M. Patel, and M. Foster, “LiDAR light scattering augmentation (LISA): Physics-based simulation of adverse weather conditions for 3D object detection,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process., 2025, pp. 1–5

  40. [40]

    Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,

    D. Fenget al., “Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,”IEEE Trans. Intell. Transp. Syst., vol. 22, no. 3, pp. 1341–1360, 2020

  41. [41]

    CLIP2Scene: Towards label-efficient 3D scene understanding by CLIP,

    R. Chenet al., “CLIP2Scene: Towards label-efficient 3D scene understanding by CLIP,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7020–7030

  42. [42]

    PointNet: Deep learning on point sets for 3D classification and segmentation,

    C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 652–660

  43. [43]

    PointNet++: Deep hierarchical feature learning on point sets in a metric space,

    C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” inProc. Adv. Neural Inf. Process. Syst., 2017, pp. 5105–5114

  44. [44]

    RandLA-Net: Efficient semantic segmentation of large-scale point clouds,

    Q. Huet al., “RandLA-Net: Efficient semantic segmentation of large-scale point clouds,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 108–11 117

  45. [45]

    Backward attentive fusing network with local aggregation classifier for 3D point cloud semantic segmentation,

    H. Shuai, X. Xu, and Q. Liu, “Backward attentive fusing network with local aggregation classifier for 3D point cloud semantic segmentation,”IEEE Trans. Image Process., vol. 30, pp. 4973–4984, 2021

  46. [46]

    PolarNet: An improved grid representation for online LiDAR point clouds semantic segmentation,

    Y. Zhanget al., “PolarNet: An improved grid representation for online LiDAR point clouds semantic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9601–9610

  47. [47]

    Panoptic-PolarNet: Proposal- free LiDAR point cloud panoptic segmentation,

    Z. Zhou, Y. Zhang, and H. Foroosh, “Panoptic-PolarNet: Proposal- free LiDAR point cloud panoptic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13 194–13 203

  48. [48]

    RangeViT: Towards vision transformers for 3D semantic segmen- tation in autonomous driving,

    A. Ando, S. Gidaris, A. Bursuc, G. Puy, A. Boulch, and R. Marlet, “RangeViT: Towards vision transformers for 3D semantic segmen- tation in autonomous driving,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 5240–5250

  49. [49]

    Rethinking range view representation for LiDAR segmentation,

    L. Konget al., “Rethinking range view representation for LiDAR segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 228–240

  50. [50]

    SeeGround: See and ground for zero-shot open- vocabulary 3D visual grounding,

    R. Liet al., “SeeGround: See and ground for zero-shot open- vocabulary 3D visual grounding,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 3707–3717

  51. [51]

    FRNet: Frustum-range networks for scalable LiDAR segmentation,

    X. Xuet al., “FRNet: Frustum-range networks for scalable LiDAR segmentation,”IEEE Trans. Image Process., vol. 34, pp. 2173–2186, 2025

  52. [52]

    SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation,

    C. Xuet al., “SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 1–19

  53. [53]

    RangeNet++: Fast and accurate LiDAR semantic segmentation,

    A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “RangeNet++: Fast and accurate LiDAR semantic segmentation,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2019, pp. 4213–4220

  54. [54]

    4D spatio-temporal convnets: Minkowski convolutional neural networks,

    C. Choy, J. Gwak, and S. Savarese, “4D spatio-temporal convnets: Minkowski convolutional neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3075–3084

  55. [55]

    Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation,

    X. Zhuet al., “Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9939–9948

  56. [56]

    SECOND: Sparsely embedded convolutional detection,

    Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely embedded convolutional detection,”Sensors, vol. 18, no. 10, p. 3337, 2018

  57. [57]

    LiDAR-based panoptic segmentation via dynamic shifting network,

    F. Hong, H. Zhou, X. Zhu, H. Li, and Z. Liu, “LiDAR-based panoptic segmentation via dynamic shifting network,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2021, pp. 13 090– 13 099

  58. [58]

    Center-based 3D object detection and tracking,

    T. Yin, X. Zhou, and P . Krahenbuhl, “Center-based 3D object detection and tracking,” inIEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 11 784–11 793

  59. [59]

    Searching efficient 3D architectures with sparse point-voxel convolution,

    H. Tanget al., “Searching efficient 3D architectures with sparse point-voxel convolution,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 685–702. 12

  60. [60]

    RPVNet: A deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation,

    J. Xuet al., “RPVNet: A deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16 024–16 033

  61. [61]

    AMVNet: Assertion-based multi-view fusion network for LiDAR semantic segmentation,

    V . E. Liong, T. N. T. Nguyen, S. Widjaja, D. Sharma, and Z. J. Chong, “AMVNet: Assertion-based multi-view fusion network for LiDAR semantic segmentation,”arXiv preprint arXiv:2012.04934, 2020

  62. [62]

    Perception- aware multi-sensor fusion for 3D LiDAR semantic segmentation,

    Z. Zhuang, R. Li, K. Jia, Q. Wang, Y. Li, and M. Tan, “Perception- aware multi-sensor fusion for 3D LiDAR semantic segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16 280–16 290

  63. [63]

    High-resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 10 684–10 695

  64. [64]

    Improved denoising diffusion probabilistic models,

    A. Q. Nichol and P . Dhariwal, “Improved denoising diffusion probabilistic models,” inProc. Int. Conf. Mach. Learn., 2021, pp. 8162–8171

  65. [65]

    Imagen Video: High Definition Video Generation with Diffusion Models

    J. Hoet al., “Imagen video: High definition video generation with diffusion models,”arXiv preprint arXiv:2210.02303, 2022

  66. [66]

    Scalable diffusion models with transform- ers,

    W. Peebles and S. Xie, “Scalable diffusion models with transform- ers,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 4195–4205

  67. [67]

    Wan: Open and Advanced Large-Scale Video Generative Models

    T. Wanet al., “Wan: Open and advanced large-scale video generative models,”arXiv preprint arXiv:2503.20314, 2025

  68. [68]

    Adding conditional control to text-to-image diffusion models,

    L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 3836–3847

  69. [69]

    Diffusion models: A comprehensive survey of methods and applications,

    L. Yanget al., “Diffusion models: A comprehensive survey of methods and applications,”ACM Comput. Surv., vol. 56, no. 4, pp. 1–39, 2023

  70. [70]

    Scaling rectified flow transformers for high- resolution image synthesis,

    P . Esseret al., “Scaling rectified flow transformers for high- resolution image synthesis,” inProc. Int. Conf. Mach. Learn., 2024, pp. 12 606–12 633

  71. [71]

    LongVie 2: Multimodal controllable ultra-long video world model,

    J. Gaoet al., “LongVie 2: Multimodal controllable ultra-long video world model,”arXiv preprint arXiv:2512.13604, 2025

  72. [72]

    GetMesh: A controllable model for high-quality mesh generation and manipulation,

    B. Feiet al., “GetMesh: A controllable model for high-quality mesh generation and manipulation,”IEEE Trans. Pattern Anal. Mach. Intell., 2025

  73. [73]

    AdaSFormer: Adaptive serialized transformers for monocular semantic scene completion from indoor environments,

    X. Wang, X. Wu, S. Wang, L. Kong, and Z. Zhao, “AdaSFormer: Adaptive serialized transformers for monocular semantic scene completion from indoor environments,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2026

  74. [74]

    UltraLiDAR: Learning compact representations for LiDAR completion and generation,

    Y. Xiong, W.-C. Ma, J. Wang, and R. Urtasun, “UltraLiDAR: Learning compact representations for LiDAR completion and generation,”arXiv preprint arXiv:2311.01448, 2023

  75. [75]

    RangeLDM: Fast realistic LiDAR point cloud generation,

    Q. Hu, Z. Zhang, and W. Hu, “RangeLDM: Fast realistic LiDAR point cloud generation,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 115–135

  76. [76]

    Text2LiDAR: Text-guided LiDAR point cloud generation via equirectangular transformer,

    Y. Wu, K. Zhang, J. Qian, J. Xie, and J. Yang, “Text2LiDAR: Text-guided LiDAR point cloud generation via equirectangular transformer,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 291–310

  77. [77]

    Veila: Panoramic LiDAR generation from a monocu- lar RGB image,

    Y. Liuet al., “Veila: Panoramic LiDAR generation from a monocu- lar RGB image,”arXiv preprint arXiv:2508.03690, 2025

  78. [78]

    La La LiDAR: Large-scale layout generation from LiDAR data,

    Y. Liu, L. Kong, W. Yang, X. Li, A. Liang, R. Chen, B. Fei, and T. Liu, “La La LiDAR: Large-scale layout generation from LiDAR data,”Proc. AAAI Conf. Artif. Intell., vol. 40, no. 9, pp. 7377–7385, 2026

  79. [79]

    LiDARCrafter: Dynamic 4D world modeling from LiDAR sequences,

    A. Lianget al., “LiDARCrafter: Dynamic 4D world modeling from LiDAR sequences,”Proc. AAAI Conf. Artif. Intell., vol. 40, no. 22, pp. 18 406–18 414, 2026

  80. [80]

    Lidar4d: Dynamic neural fields for novel space-time view lidar synthesis,

    Z. Zheng, F. Lu, W. Xue, G. Chen, and C. Jiang, “Lidar4d: Dynamic neural fields for novel space-time view lidar synthesis,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2024, pp. 5145–5154

Showing first 80 references.