arxiv: 2605.13815 · v1 · submitted 2026-05-13 · 💻 cs.CV · cs.RO

Recognition: no theorem link

OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation

Youquan Liu , Weidong Yang , Ao Liang , Xiang Xu , Lingdong Kong , Yang Wu , Dekai Zhu , Xin Li

show 4 more authors

Runnan Chen Ben Fei Tongliang Liu Wanli Ouyang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:16 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords LiDAR generationdiffusion modelsmulti-domain learningrange imagedata augmentationsemantic segmentation3D object detectionadverse weather

0 comments

The pith

A single text-conditioned diffusion model generates realistic LiDAR scans across eight domains spanning weather, sensors, and platforms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OmniLiDAR to produce LiDAR range images for many different sensing conditions using one model rather than separate models for each dataset or environment. It trains on a combined set of eight domains that include adverse weather, fewer laser beams, and acquisitions from vehicles, drones, and quadrupeds. A cross-domain training approach mixes samples from all domains in every batch while text prompts steer the output toward the desired condition. Two added modules capture the directional structure of range images and adjust features for domain-specific shifts during the denoising process. If successful this removes the need to collect or train on isolated data for each new condition and supplies synthetic scans that boost performance on semantic segmentation and 3D detection, especially when labeled examples are scarce.

Core claim

OmniLiDAR is a unified text-conditioned diffusion framework that generates LiDAR scans in a shared range-image representation across eight representative domains by introducing a Cross-Domain Training Strategy that mixes domains within each mini-batch together with Cross-Domain Feature Modeling to capture anisotropic scanning structure along azimuth and elevation and Domain-Adaptive Feature Scaling to modulate domain-dependent feature shifts during denoising.

What carries the argument

Cross-Domain Training Strategy (CDTS) that mixes domains in each mini-batch, combined with CDFM for directional range-image dependencies and DAFS for lightweight domain modulation.

If this is right

Generative data augmentation yields consistent improvements on LiDAR semantic segmentation and 3D object detection.
Benefits appear in limited-label regimes and under robustness tests with added corruptions.
Controllable synthesis under weather, sensor, and platform shifts no longer requires domain-isolated models.
A shared range-image representation supports unified training across real scans and physically simulated weather or beam reductions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditioning and modulation approach could extend to additional sensor modalities or time-varying scenes without retraining separate generators.
Text prompts might allow finer control over continuous parameters such as rain intensity or platform height beyond the eight discrete domains.
Integration with existing physics simulators could produce hybrid real-synthetic datasets for more scalable testing of perception algorithms.

Load-bearing premise

Mixing domains within each mini-batch together with text conditioning and the CDFM and DAFS modules enables effective unified training without negative transfer across heterogeneous shifts.

What would settle it

A head-to-head comparison in which separate per-domain diffusion models produce measurably higher generation fidelity or larger downstream-task gains than the single unified OmniLiDAR model.

Figures

Figures reproduced from arXiv: 2605.13815 by Ao Liang, Ben Fei, Dekai Zhu, Lingdong Kong, Runnan Chen, Tongliang Liu, Wanli Ouyang, Weidong Yang, Xiang Xu, Xin Li, Yang Wu, Youquan Liu.

**Figure 1.** Figure 1: OmniLiDAR in context: unified multi-domain LiDAR generation with a single text-conditioned diffusion model. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of OmniLiDAR. We adopt a text-conditioned diffusion model for controllable LiDAR generation over multi [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Cross-Domain Feature Modeling with directional [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of object-level LiDAR geome [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparisons of LiDAR scene generation [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 7.** Figure 7: Qualitative 3D detection results on Pi3DET [ [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

read the original abstract

LiDAR scene generation is increasingly important for scalable simulation and synthetic data creation, especially under diverse sensing conditions that are costly to capture at scale. Typically, diffusion-based LiDAR generators are developed under single-domain settings, requiring separate models for different datasets or sensing conditions and hindering unified, controllable synthesis under heterogeneous distribution shifts. To this end, we present OmniLiDAR, a unified text-conditioned diffusion framework that generates LiDAR scans in a shared range-image representation across eight representative domains spanning three shift types: adverse weather, sensor-configuration changes (e.g., reduced beams), and cross-platform acquisition (vehicle, drone, and quadruped). To enable training a single model over heterogeneous domains without isolating optimization by domain, we introduce a Cross-Domain Training Strategy (CDTS) that mixes domains within each mini-batch and leverages conditioning to steer generation. We further propose Cross-Domain Feature Modeling (CDFM), which captures directional dependencies along azimuth and elevation axes to reflect the anisotropic scanning structure of range images, and Domain-Adaptive Feature Scaling (DAFS) as a lightweight modulation to account for structured domain-dependent feature shifts during denoising. In the absence of a public consolidated benchmark, we construct an 8-domain dataset by combining real-world scans with physically based weather simulation and systematic beam reduction while following official splits. Extensive experiments demonstrate strong generation fidelity and consistent gains in downstream use cases, including generative data augmentation for LiDAR semantic segmentation and 3D object detection, as well as robustness evaluation under corruptions, with consistent benefits in limited-label regimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OmniLiDAR gives a single diffusion model for eight LiDAR domains but skips the per-domain baselines needed to prove the unified setup actually wins.

read the letter

Hi, the main point is that this work trains one text-conditioned diffusion model to generate range-image LiDAR across eight domains covering weather, beam count changes, and different platforms. That setup directly targets the cost of collecting or simulating data for every new condition separately. The CDTS mixes domains inside each mini-batch and uses text prompts to steer output, while CDFM models the azimuth-elevation structure of the scans and DAFS adds lightweight per-domain scaling during denoising. They also assembled an 8-domain dataset from real scans plus physics-based simulation and show downstream gains when the generated data augments segmentation and detection training, especially in low-label regimes. Those pieces are practical and address a clear bottleneck. The clearest gap is the missing comparison to separate domain-specific models. All numbers come from the unified model only, so we cannot yet separate the benefit of the new modules from simply training on eight times more data or from the shared representation itself. If the full paper has those controls and reports the actual FID, mIoU, or detection AP numbers with ablations, the claim strengthens; without them the advantage stays plausible rather than demonstrated. This is aimed at groups building synthetic LiDAR pipelines for autonomous driving or robotics who need controllable data under shifts. A reader already working on diffusion for 3D or range data would pick up the geometric modeling choices quickly. It is worth sending for peer review because the problem is real, the architecture is coherent, and the experiments point in a useful direction even if the evaluation needs tightening on the baseline question.

Referee Report

2 major / 2 minor

Summary. The paper introduces OmniLiDAR, a unified text-conditioned diffusion framework for generating LiDAR scans in a shared range-image representation across eight domains spanning adverse weather, sensor-configuration changes, and cross-platform shifts (vehicle, drone, quadruped). It proposes a Cross-Domain Training Strategy (CDTS) that mixes domains in mini-batches with text conditioning, Cross-Domain Feature Modeling (CDFM) to capture anisotropic directional dependencies, and Domain-Adaptive Feature Scaling (DAFS) for lightweight domain-specific modulation during denoising. Using a newly constructed 8-domain dataset, the work reports strong generation fidelity and consistent downstream gains in generative data augmentation for semantic segmentation and 3D object detection.

Significance. If the central claim holds, the result would be significant for scalable LiDAR simulation and synthetic data pipelines, as a single controllable model could replace multiple domain-specific generators and support robustness evaluation under diverse sensing conditions.

major comments (2)

[Experiments] Experiments section: No per-domain isolated baselines are reported (i.e., separate diffusion models trained only on each of the eight domains). All fidelity and downstream metrics are shown solely for the unified OmniLiDAR model, so it is impossible to quantify whether CDTS + CDFM + DAFS avoids negative transfer or merely benefits from 8× data volume and the shared range-image representation.
[Ablation studies] Ablation studies: Quantitative ablations isolating CDFM, DAFS, and the domain-mixing component of CDTS are absent or insufficiently detailed, leaving the necessity of each module for unified training unverified.

minor comments (2)

[Dataset] Dataset construction paragraph: Provide explicit parameters for the physically based weather simulation and the systematic beam-reduction procedure used to create the eight domains.
[Method] Notation in §3: Clarify the exact form of the text-conditioning embedding and how it is injected into the diffusion U-Net.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that additional per-domain baselines and detailed ablations would strengthen the claims regarding the unified training strategy, and we will incorporate these experiments in the revised manuscript.

read point-by-point responses

Referee: [Experiments] Experiments section: No per-domain isolated baselines are reported (i.e., separate diffusion models trained only on each of the eight domains). All fidelity and downstream metrics are shown solely for the unified OmniLiDAR model, so it is impossible to quantify whether CDTS + CDFM + DAFS avoids negative transfer or merely benefits from 8× data volume and the shared range-image representation.

Authors: We agree that per-domain baselines are necessary to isolate the effects of our proposed components. In the revision, we will train eight separate domain-specific diffusion models (using the identical backbone architecture and training protocol but without CDTS, CDFM, or DAFS) and report their generation fidelity metrics (FID, MMD) along with downstream semantic segmentation and 3D detection performance. These results will be directly compared to OmniLiDAR to quantify any negative transfer versus benefits from data volume and the shared representation. revision: yes
Referee: [Ablation studies] Ablation studies: Quantitative ablations isolating CDFM, DAFS, and the domain-mixing component of CDTS are absent or insufficiently detailed, leaving the necessity of each module for unified training unverified.

Authors: We acknowledge that the current ablations are insufficient to verify the contribution of each module. In the revised version, we will add comprehensive quantitative ablations that isolate CDFM, DAFS, and the domain-mixing strategy within CDTS. This will include variants with each component removed or ablated, reporting impacts on generation quality (e.g., FID) and downstream task performance to demonstrate the necessity of each for effective unified training. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained with no circular reductions

full rationale

The paper introduces a new unified text-conditioned diffusion model with proposed modules (CDTS for mixed-batch training, CDFM for anisotropic feature modeling, and DAFS for domain-adaptive scaling) and evaluates them on a constructed 8-domain dataset plus standard downstream benchmarks for segmentation and detection. No equations, predictions, or central claims reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the architecture and results are presented as independent empirical contributions without tautological renaming or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on standard diffusion model assumptions (iterative denoising converges to data distribution) and the effectiveness of the three proposed modules; no explicit free parameters, new axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5614 in / 1116 out tokens · 38160 ms · 2026-05-14T19:16:58.362102+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

106 extracted references · 14 canonical work pages · 3 internal anchors

[1]

UniSeg: A unified multi-modal LiDAR segmentation network and the OpenPCSeg codebase,

Y. Liuet al., “UniSeg: A unified multi-modal LiDAR segmentation network and the OpenPCSeg codebase,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 21 662–21 673

2023
[2]

Deep learning for LiDAR point clouds in autonomous driving: A review,

Y. Liet al., “Deep learning for LiDAR point clouds in autonomous driving: A review,”IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 8, pp. 3412–3432, 2021

2021
[3]

LoGoNet: Towards accurate 3D object detection with local-to-global cross-modal fusion,

X. Liet al., “LoGoNet: Towards accurate 3D object detection with local-to-global cross-modal fusion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2023, pp. 17 524–17 534

2023
[4]

Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

J. Lu, J. Guan, Z. Huanget al., “OneVL: One-step latent reasoning and planning with vision-language explanation,”arXiv preprint arXiv:2604.18486, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

WorldLens: Full-spectrum evaluations of driving world models in real world,

A. Liang, L. Kong, T. Yan, H. Liu, W. Yang, Z. Huang, W. Yin, J. Zuo, Y. Hu, D. Zhu, D. Lu, Y. Liu, G. Jiang, L. Li, X. Li, L. Zhuo, L. X. Ng, B. R. Cottereau, C. Gao, L. Pan, W. T. Ooi, and Z. Liu, “WorldLens: Full-spectrum evaluations of driving world models in real world,”arXiv preprint arXiv:2512.10958, 2025

work page arXiv 2025
[6]

Are VLMs ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives,

S. Xie, L. Kong, Y. Dong, C. Sima, W. Zhang, Q. A. Chen, Z. Liu, and L. Pan, “Are VLMs ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 6585–6597

2025
[7]

Monocular semantic scene completion via masked recurrent networks,

X. Wang, X. Wu, S. Wanget al., “Monocular semantic scene completion via masked recurrent networks,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 24 811–24 822

2025
[8]

Robo3D: Towards robust and reliable 3D perception against corruptions,

L. Konget al., “Robo3D: Towards robust and reliable 3D perception against corruptions,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 19 994–20 006

2023
[9]

TripleMixer: A 3D point cloud denoising model for adverse weather,

X. Zhao, C. Wen, X. Zhu, Y. Wang, H. Bai, and W. Dou, “TripleMixer: A 3D point cloud denoising model for adverse weather,”arXiv preprint arXiv:2408.13802, 2024

work page arXiv 2024
[10]

Fog simulation on real LiDAR point clouds for 3D object detection in adverse weather,

M. Hahner, C. Sakaridis, D. Dai, and L. Van Gool, “Fog simulation on real LiDAR point clouds for 3D object detection in adverse weather,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 15 283–15 292

2021
[11]

LiDAR snowfall simulation for robust 3D object detection,

M. Hahneret al., “LiDAR snowfall simulation for robust 3D object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 16 364–16 374. 11

2022
[12]

Perspective-invariant 3D object detection,

A. Lianget al., “Perspective-invariant 3D object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 27 725–27 738

2025
[13]

NUC-Net: Non- uniform cylindrical partition network for efficient LiDAR semantic segmentation,

X. Wang, W. Feng, L. Kong, and L. Wan, “NUC-Net: Non- uniform cylindrical partition network for efficient LiDAR semantic segmentation,”IEEE Trans. Circuits Syst. Video Technol., vol. 35, no. 9, pp. 9090–9104, 2025

2025
[14]

Learning to adapt SAM for segmenting cross- domain point clouds,

X. Penget al., “Learning to adapt SAM for segmenting cross- domain point clouds,” inEur. Conf. Comput. Vis.Springer, 2024, pp. 54–71

2024
[15]

Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather,

M. Bijelicet al., “Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2020, pp. 11 682– 11 692

2020
[16]

Segment any point cloud sequences by distilling vision foundation models,

Y. Liuet al., “Segment any point cloud sequences by distilling vision foundation models,” inProc. Adv. Neural Inf. Process. Syst., 2023, pp. 37 193–37 229

2023
[17]

Scalability in perception for autonomous driving: Waymo open dataset,

P . Sunet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 2446–2454

2020
[18]

Learning to generate realistic LiDAR point clouds,

V . Zyrianov, X. Zhu, and S. Wang, “Learning to generate realistic LiDAR point clouds,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 17–35

2022
[19]

LiDAR data synthesis with denoising diffusion probabilistic models,

K. Nakashima and R. Kurazume, “LiDAR data synthesis with denoising diffusion probabilistic models,” inProc. IEEE Int. Conf. Robot. Autom., 2024, pp. 14 724–14 731

2024
[20]

Towards realistic scene generation with LiDAR diffusion models,

H. Ran, V . Guizilini, and Y. Wang, “Towards realistic scene generation with LiDAR diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 14 738–14 748

2024
[21]

U4D: Uncertainty-aware 4D world modeling from LiDAR sequences,

X. Xuet al., “U4D: Uncertainty-aware 4D world modeling from LiDAR sequences,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2026

2026
[22]

Spiral: Semantic- aware progressive LiDAR scene generation and understanding,

D. Zhu, Y. Hu, Y. Liu, D. Lu, L. Kong, and S. Ilic, “Spiral: Semantic- aware progressive LiDAR scene generation and understanding,” inProc. Adv. Neural Inf. Process. Syst., vol. 38, 2025, pp. 57 623– 57 653

2025
[23]

LaserMix for semi-supervised LiDAR semantic segmentation,

L. Kong, J. Ren, L. Pan, and Z. Liu, “LaserMix for semi-supervised LiDAR semantic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 21 705–21 715

2023
[24]

Multi-modal data-efficient 3d scene understanding for autonomous driving,

L. Kong, X. Xu, J. Ren, W. Zhang, L. Pan, K. Chen, W. T. Ooi, and Z. Liu, “Multi-modal data-efficient 3d scene understanding for autonomous driving,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 5, pp. 3748–3765, 2025

2025
[25]

Is your HD map constructor reliable under sensor corruptions?

X. Haoet al., “Is your HD map constructor reliable under sensor corruptions?” inProc. Adv. Neural Inf. Process. Syst., vol. 37, 2024, pp. 22 441–22 482

2024
[26]

Benchmarking and improving bird’s eye view perception robust- ness in autonomous driving,

S. Xie, L. Kong, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Benchmarking and improving bird’s eye view perception robust- ness in autonomous driving,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 5, pp. 3878–3894, 2025

2025
[27]

Optimizing LiDAR placements for robust driving perception in adverse conditions,

Y. Liet al., “Optimizing LiDAR placements for robust driving perception in adverse conditions,”arXiv preprint arXiv:2403.17009, 2024

work page arXiv 2024
[28]

LargeAD: Large-scale cross-sensor data pretraining for autonomous driving,

L. Konget al., “LargeAD: Large-scale cross-sensor data pretraining for autonomous driving,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 2, pp. 1291–1308, 2026

2026
[29]

WeatherGen: A unified diverse weather generator for LiDAR point clouds via spider mamba diffusion,

Y. Wu, Y. Zhu, K. Zhang, J. Qian, J. Xie, and J. Yang, “WeatherGen: A unified diverse weather generator for LiDAR point clouds via spider mamba diffusion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 17 019–17 028

2025
[30]

arXiv preprint arXiv:2509.07996 (2025) 2, 4

L. Konget al., “3D and 4D world modeling: A survey,”arXiv preprint arXiv:2509.07996, 2025

work page arXiv 2025
[31]

3EED: Ground everything everywhere in 3D,

R. Liet al., “3EED: Ground everything everywhere in 3D,” inProc. Adv. Neural Inf. Process. Syst., vol. 38, 2025

2025
[32]

LiMoE: Mixture of LiDAR representation learners from automotive scenes,

X. Xuet al., “LiMoE: Mixture of LiDAR representation learners from automotive scenes,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 27 368–27 379

2025
[33]

Enhanced spatiotemporal consistency for image-to- LiDAR data pretraining,

X. Xu, L. Kong, H. Shuai, W. Zhang, L. Pan, K. Chen, Z. Liu, and Q. Liu, “Enhanced spatiotemporal consistency for image-to- LiDAR data pretraining,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 3, pp. 3819–3834, 2026

2026
[34]

ConDA: Unsupervised domain adaptation for LiDAR segmentation via regularized domain concatenation,

L. Kong, N. Quader, and V . E. Liong, “ConDA: Unsupervised domain adaptation for LiDAR segmentation via regularized domain concatenation,” inProc. IEEE Int. Conf. Robot. Automat., 2023, pp. 9338–9345

2023
[35]

Is your LiDAR placement optimized for 3D scene understanding?

Y. Li, L. Kong, H. Hu, X. Xu, and X. Huang, “Is your LiDAR placement optimized for 3D scene understanding?” inProc. Adv. Neural Inf. Process. Syst., vol. 37, 2024, pp. 34 980–35 017

2024
[36]

DynamicCity: Large-scale 4D occupancy generation from dynamic scenes,

H. Bian, L. Kong, H. Xie, L. Pan, Y. Qiao, and Z. Liu, “DynamicCity: Large-scale 4D occupancy generation from dynamic scenes,” in Int. Conf. Learn. Represent., 2025

2025
[37]

SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences,

J. Behleyet al., “SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 9297–9307

2019
[38]

nuScenes: A multimodal dataset for autonomous driving,

H. Caesaret al., “nuScenes: A multimodal dataset for autonomous driving,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 621–11 631

2020
[39]

LiDAR light scattering augmentation (LISA): Physics-based simulation of adverse weather conditions for 3D object detection,

V . Kilic, D. Hegde, A. B. Cooper, V . M. Patel, and M. Foster, “LiDAR light scattering augmentation (LISA): Physics-based simulation of adverse weather conditions for 3D object detection,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process., 2025, pp. 1–5

2025
[40]

Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,

D. Fenget al., “Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,”IEEE Trans. Intell. Transp. Syst., vol. 22, no. 3, pp. 1341–1360, 2020

2020
[41]

CLIP2Scene: Towards label-efficient 3D scene understanding by CLIP,

R. Chenet al., “CLIP2Scene: Towards label-efficient 3D scene understanding by CLIP,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7020–7030

2023
[42]

PointNet: Deep learning on point sets for 3D classification and segmentation,

C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 652–660

2017
[43]

PointNet++: Deep hierarchical feature learning on point sets in a metric space,

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” inProc. Adv. Neural Inf. Process. Syst., 2017, pp. 5105–5114

2017
[44]

RandLA-Net: Efficient semantic segmentation of large-scale point clouds,

Q. Huet al., “RandLA-Net: Efficient semantic segmentation of large-scale point clouds,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 108–11 117

2020
[45]

Backward attentive fusing network with local aggregation classifier for 3D point cloud semantic segmentation,

H. Shuai, X. Xu, and Q. Liu, “Backward attentive fusing network with local aggregation classifier for 3D point cloud semantic segmentation,”IEEE Trans. Image Process., vol. 30, pp. 4973–4984, 2021

2021
[46]

PolarNet: An improved grid representation for online LiDAR point clouds semantic segmentation,

Y. Zhanget al., “PolarNet: An improved grid representation for online LiDAR point clouds semantic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9601–9610

2020
[47]

Panoptic-PolarNet: Proposal- free LiDAR point cloud panoptic segmentation,

Z. Zhou, Y. Zhang, and H. Foroosh, “Panoptic-PolarNet: Proposal- free LiDAR point cloud panoptic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13 194–13 203

2021
[48]

RangeViT: Towards vision transformers for 3D semantic segmen- tation in autonomous driving,

A. Ando, S. Gidaris, A. Bursuc, G. Puy, A. Boulch, and R. Marlet, “RangeViT: Towards vision transformers for 3D semantic segmen- tation in autonomous driving,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 5240–5250

2023
[49]

Rethinking range view representation for LiDAR segmentation,

L. Konget al., “Rethinking range view representation for LiDAR segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 228–240

2023
[50]

SeeGround: See and ground for zero-shot open- vocabulary 3D visual grounding,

R. Liet al., “SeeGround: See and ground for zero-shot open- vocabulary 3D visual grounding,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 3707–3717

2025
[51]

FRNet: Frustum-range networks for scalable LiDAR segmentation,

X. Xuet al., “FRNet: Frustum-range networks for scalable LiDAR segmentation,”IEEE Trans. Image Process., vol. 34, pp. 2173–2186, 2025

2025
[52]

SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation,

C. Xuet al., “SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 1–19

2020
[53]

RangeNet++: Fast and accurate LiDAR semantic segmentation,

A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “RangeNet++: Fast and accurate LiDAR semantic segmentation,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2019, pp. 4213–4220

2019
[54]

4D spatio-temporal convnets: Minkowski convolutional neural networks,

C. Choy, J. Gwak, and S. Savarese, “4D spatio-temporal convnets: Minkowski convolutional neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3075–3084

2019
[55]

Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation,

X. Zhuet al., “Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9939–9948

2021
[56]

SECOND: Sparsely embedded convolutional detection,

Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely embedded convolutional detection,”Sensors, vol. 18, no. 10, p. 3337, 2018

2018
[57]

LiDAR-based panoptic segmentation via dynamic shifting network,

F. Hong, H. Zhou, X. Zhu, H. Li, and Z. Liu, “LiDAR-based panoptic segmentation via dynamic shifting network,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2021, pp. 13 090– 13 099

2021
[58]

Center-based 3D object detection and tracking,

T. Yin, X. Zhou, and P . Krahenbuhl, “Center-based 3D object detection and tracking,” inIEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 11 784–11 793

2021
[59]

Searching efficient 3D architectures with sparse point-voxel convolution,

H. Tanget al., “Searching efficient 3D architectures with sparse point-voxel convolution,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 685–702. 12

2020
[60]

RPVNet: A deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation,

J. Xuet al., “RPVNet: A deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16 024–16 033

2021
[61]

AMVNet: Assertion-based multi-view fusion network for LiDAR semantic segmentation,

V . E. Liong, T. N. T. Nguyen, S. Widjaja, D. Sharma, and Z. J. Chong, “AMVNet: Assertion-based multi-view fusion network for LiDAR semantic segmentation,”arXiv preprint arXiv:2012.04934, 2020

work page arXiv 2012
[62]

Perception- aware multi-sensor fusion for 3D LiDAR semantic segmentation,

Z. Zhuang, R. Li, K. Jia, Q. Wang, Y. Li, and M. Tan, “Perception- aware multi-sensor fusion for 3D LiDAR semantic segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16 280–16 290

2021
[63]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 10 684–10 695

2022
[64]

Improved denoising diffusion probabilistic models,

A. Q. Nichol and P . Dhariwal, “Improved denoising diffusion probabilistic models,” inProc. Int. Conf. Mach. Learn., 2021, pp. 8162–8171

2021
[65]

Imagen Video: High Definition Video Generation with Diffusion Models

J. Hoet al., “Imagen video: High definition video generation with diffusion models,”arXiv preprint arXiv:2210.02303, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[66]

Scalable diffusion models with transform- ers,

W. Peebles and S. Xie, “Scalable diffusion models with transform- ers,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 4195–4205

2023
[67]

Wan: Open and Advanced Large-Scale Video Generative Models

T. Wanet al., “Wan: Open and advanced large-scale video generative models,”arXiv preprint arXiv:2503.20314, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[68]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 3836–3847

2023
[69]

Diffusion models: A comprehensive survey of methods and applications,

L. Yanget al., “Diffusion models: A comprehensive survey of methods and applications,”ACM Comput. Surv., vol. 56, no. 4, pp. 1–39, 2023

2023
[70]

Scaling rectified flow transformers for high- resolution image synthesis,

P . Esseret al., “Scaling rectified flow transformers for high- resolution image synthesis,” inProc. Int. Conf. Mach. Learn., 2024, pp. 12 606–12 633

2024
[71]

LongVie 2: Multimodal controllable ultra-long video world model,

J. Gaoet al., “LongVie 2: Multimodal controllable ultra-long video world model,”arXiv preprint arXiv:2512.13604, 2025

work page arXiv 2025
[72]

GetMesh: A controllable model for high-quality mesh generation and manipulation,

B. Feiet al., “GetMesh: A controllable model for high-quality mesh generation and manipulation,”IEEE Trans. Pattern Anal. Mach. Intell., 2025

2025
[73]

AdaSFormer: Adaptive serialized transformers for monocular semantic scene completion from indoor environments,

X. Wang, X. Wu, S. Wang, L. Kong, and Z. Zhao, “AdaSFormer: Adaptive serialized transformers for monocular semantic scene completion from indoor environments,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2026

2026
[74]

UltraLiDAR: Learning compact representations for LiDAR completion and generation,

Y. Xiong, W.-C. Ma, J. Wang, and R. Urtasun, “UltraLiDAR: Learning compact representations for LiDAR completion and generation,”arXiv preprint arXiv:2311.01448, 2023

work page arXiv 2023
[75]

RangeLDM: Fast realistic LiDAR point cloud generation,

Q. Hu, Z. Zhang, and W. Hu, “RangeLDM: Fast realistic LiDAR point cloud generation,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 115–135

2024
[76]

Text2LiDAR: Text-guided LiDAR point cloud generation via equirectangular transformer,

Y. Wu, K. Zhang, J. Qian, J. Xie, and J. Yang, “Text2LiDAR: Text-guided LiDAR point cloud generation via equirectangular transformer,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 291–310

2024
[77]

Veila: Panoramic LiDAR generation from a monocu- lar RGB image,

Y. Liuet al., “Veila: Panoramic LiDAR generation from a monocu- lar RGB image,”arXiv preprint arXiv:2508.03690, 2025

work page arXiv 2025
[78]

La La LiDAR: Large-scale layout generation from LiDAR data,

Y. Liu, L. Kong, W. Yang, X. Li, A. Liang, R. Chen, B. Fei, and T. Liu, “La La LiDAR: Large-scale layout generation from LiDAR data,”Proc. AAAI Conf. Artif. Intell., vol. 40, no. 9, pp. 7377–7385, 2026

2026
[79]

LiDARCrafter: Dynamic 4D world modeling from LiDAR sequences,

A. Lianget al., “LiDARCrafter: Dynamic 4D world modeling from LiDAR sequences,”Proc. AAAI Conf. Artif. Intell., vol. 40, no. 22, pp. 18 406–18 414, 2026

2026
[80]

Lidar4d: Dynamic neural fields for novel space-time view lidar synthesis,

Z. Zheng, F. Lu, W. Xue, G. Chen, and C. Jiang, “Lidar4d: Dynamic neural fields for novel space-time view lidar synthesis,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2024, pp. 5145–5154

2024

Showing first 80 references.