Recognition: no theorem link
OmniLiDAR: A Unified Diffusion Framework for Multi-Domain 3D LiDAR Generation
Pith reviewed 2026-05-14 19:16 UTC · model grok-4.3
The pith
A single text-conditioned diffusion model generates realistic LiDAR scans across eight domains spanning weather, sensors, and platforms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OmniLiDAR is a unified text-conditioned diffusion framework that generates LiDAR scans in a shared range-image representation across eight representative domains by introducing a Cross-Domain Training Strategy that mixes domains within each mini-batch together with Cross-Domain Feature Modeling to capture anisotropic scanning structure along azimuth and elevation and Domain-Adaptive Feature Scaling to modulate domain-dependent feature shifts during denoising.
What carries the argument
Cross-Domain Training Strategy (CDTS) that mixes domains in each mini-batch, combined with CDFM for directional range-image dependencies and DAFS for lightweight domain modulation.
If this is right
- Generative data augmentation yields consistent improvements on LiDAR semantic segmentation and 3D object detection.
- Benefits appear in limited-label regimes and under robustness tests with added corruptions.
- Controllable synthesis under weather, sensor, and platform shifts no longer requires domain-isolated models.
- A shared range-image representation supports unified training across real scans and physically simulated weather or beam reductions.
Where Pith is reading between the lines
- The same conditioning and modulation approach could extend to additional sensor modalities or time-varying scenes without retraining separate generators.
- Text prompts might allow finer control over continuous parameters such as rain intensity or platform height beyond the eight discrete domains.
- Integration with existing physics simulators could produce hybrid real-synthetic datasets for more scalable testing of perception algorithms.
Load-bearing premise
Mixing domains within each mini-batch together with text conditioning and the CDFM and DAFS modules enables effective unified training without negative transfer across heterogeneous shifts.
What would settle it
A head-to-head comparison in which separate per-domain diffusion models produce measurably higher generation fidelity or larger downstream-task gains than the single unified OmniLiDAR model.
Figures
read the original abstract
LiDAR scene generation is increasingly important for scalable simulation and synthetic data creation, especially under diverse sensing conditions that are costly to capture at scale. Typically, diffusion-based LiDAR generators are developed under single-domain settings, requiring separate models for different datasets or sensing conditions and hindering unified, controllable synthesis under heterogeneous distribution shifts. To this end, we present OmniLiDAR, a unified text-conditioned diffusion framework that generates LiDAR scans in a shared range-image representation across eight representative domains spanning three shift types: adverse weather, sensor-configuration changes (e.g., reduced beams), and cross-platform acquisition (vehicle, drone, and quadruped). To enable training a single model over heterogeneous domains without isolating optimization by domain, we introduce a Cross-Domain Training Strategy (CDTS) that mixes domains within each mini-batch and leverages conditioning to steer generation. We further propose Cross-Domain Feature Modeling (CDFM), which captures directional dependencies along azimuth and elevation axes to reflect the anisotropic scanning structure of range images, and Domain-Adaptive Feature Scaling (DAFS) as a lightweight modulation to account for structured domain-dependent feature shifts during denoising. In the absence of a public consolidated benchmark, we construct an 8-domain dataset by combining real-world scans with physically based weather simulation and systematic beam reduction while following official splits. Extensive experiments demonstrate strong generation fidelity and consistent gains in downstream use cases, including generative data augmentation for LiDAR semantic segmentation and 3D object detection, as well as robustness evaluation under corruptions, with consistent benefits in limited-label regimes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces OmniLiDAR, a unified text-conditioned diffusion framework for generating LiDAR scans in a shared range-image representation across eight domains spanning adverse weather, sensor-configuration changes, and cross-platform shifts (vehicle, drone, quadruped). It proposes a Cross-Domain Training Strategy (CDTS) that mixes domains in mini-batches with text conditioning, Cross-Domain Feature Modeling (CDFM) to capture anisotropic directional dependencies, and Domain-Adaptive Feature Scaling (DAFS) for lightweight domain-specific modulation during denoising. Using a newly constructed 8-domain dataset, the work reports strong generation fidelity and consistent downstream gains in generative data augmentation for semantic segmentation and 3D object detection.
Significance. If the central claim holds, the result would be significant for scalable LiDAR simulation and synthetic data pipelines, as a single controllable model could replace multiple domain-specific generators and support robustness evaluation under diverse sensing conditions.
major comments (2)
- [Experiments] Experiments section: No per-domain isolated baselines are reported (i.e., separate diffusion models trained only on each of the eight domains). All fidelity and downstream metrics are shown solely for the unified OmniLiDAR model, so it is impossible to quantify whether CDTS + CDFM + DAFS avoids negative transfer or merely benefits from 8× data volume and the shared range-image representation.
- [Ablation studies] Ablation studies: Quantitative ablations isolating CDFM, DAFS, and the domain-mixing component of CDTS are absent or insufficiently detailed, leaving the necessity of each module for unified training unverified.
minor comments (2)
- [Dataset] Dataset construction paragraph: Provide explicit parameters for the physically based weather simulation and the systematic beam-reduction procedure used to create the eight domains.
- [Method] Notation in §3: Clarify the exact form of the text-conditioning embedding and how it is injected into the diffusion U-Net.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that additional per-domain baselines and detailed ablations would strengthen the claims regarding the unified training strategy, and we will incorporate these experiments in the revised manuscript.
read point-by-point responses
-
Referee: [Experiments] Experiments section: No per-domain isolated baselines are reported (i.e., separate diffusion models trained only on each of the eight domains). All fidelity and downstream metrics are shown solely for the unified OmniLiDAR model, so it is impossible to quantify whether CDTS + CDFM + DAFS avoids negative transfer or merely benefits from 8× data volume and the shared range-image representation.
Authors: We agree that per-domain baselines are necessary to isolate the effects of our proposed components. In the revision, we will train eight separate domain-specific diffusion models (using the identical backbone architecture and training protocol but without CDTS, CDFM, or DAFS) and report their generation fidelity metrics (FID, MMD) along with downstream semantic segmentation and 3D detection performance. These results will be directly compared to OmniLiDAR to quantify any negative transfer versus benefits from data volume and the shared representation. revision: yes
-
Referee: [Ablation studies] Ablation studies: Quantitative ablations isolating CDFM, DAFS, and the domain-mixing component of CDTS are absent or insufficiently detailed, leaving the necessity of each module for unified training unverified.
Authors: We acknowledge that the current ablations are insufficient to verify the contribution of each module. In the revised version, we will add comprehensive quantitative ablations that isolate CDFM, DAFS, and the domain-mixing strategy within CDTS. This will include variants with each component removed or ablated, reporting impacts on generation quality (e.g., FID) and downstream task performance to demonstrate the necessity of each for effective unified training. revision: yes
Circularity Check
Derivation chain is self-contained with no circular reductions
full rationale
The paper introduces a new unified text-conditioned diffusion model with proposed modules (CDTS for mixed-batch training, CDFM for anisotropic feature modeling, and DAFS for domain-adaptive scaling) and evaluates them on a constructed 8-domain dataset plus standard downstream benchmarks for segmentation and detection. No equations, predictions, or central claims reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; the architecture and results are presented as independent empirical contributions without tautological renaming or ansatz smuggling.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
UniSeg: A unified multi-modal LiDAR segmentation network and the OpenPCSeg codebase,
Y. Liuet al., “UniSeg: A unified multi-modal LiDAR segmentation network and the OpenPCSeg codebase,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 21 662–21 673
2023
-
[2]
Deep learning for LiDAR point clouds in autonomous driving: A review,
Y. Liet al., “Deep learning for LiDAR point clouds in autonomous driving: A review,”IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 8, pp. 3412–3432, 2021
2021
-
[3]
LoGoNet: Towards accurate 3D object detection with local-to-global cross-modal fusion,
X. Liet al., “LoGoNet: Towards accurate 3D object detection with local-to-global cross-modal fusion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2023, pp. 17 524–17 534
2023
-
[4]
Xiaomi OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
J. Lu, J. Guan, Z. Huanget al., “OneVL: One-step latent reasoning and planning with vision-language explanation,”arXiv preprint arXiv:2604.18486, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[5]
WorldLens: Full-spectrum evaluations of driving world models in real world,
A. Liang, L. Kong, T. Yan, H. Liu, W. Yang, Z. Huang, W. Yin, J. Zuo, Y. Hu, D. Zhu, D. Lu, Y. Liu, G. Jiang, L. Li, X. Li, L. Zhuo, L. X. Ng, B. R. Cottereau, C. Gao, L. Pan, W. T. Ooi, and Z. Liu, “WorldLens: Full-spectrum evaluations of driving world models in real world,”arXiv preprint arXiv:2512.10958, 2025
-
[6]
Are VLMs ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives,
S. Xie, L. Kong, Y. Dong, C. Sima, W. Zhang, Q. A. Chen, Z. Liu, and L. Pan, “Are VLMs ready for autonomous driving? an empirical study from the reliability, data, and metric perspectives,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 6585–6597
2025
-
[7]
Monocular semantic scene completion via masked recurrent networks,
X. Wang, X. Wu, S. Wanget al., “Monocular semantic scene completion via masked recurrent networks,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 24 811–24 822
2025
-
[8]
Robo3D: Towards robust and reliable 3D perception against corruptions,
L. Konget al., “Robo3D: Towards robust and reliable 3D perception against corruptions,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 19 994–20 006
2023
-
[9]
TripleMixer: A 3D point cloud denoising model for adverse weather,
X. Zhao, C. Wen, X. Zhu, Y. Wang, H. Bai, and W. Dou, “TripleMixer: A 3D point cloud denoising model for adverse weather,”arXiv preprint arXiv:2408.13802, 2024
-
[10]
Fog simulation on real LiDAR point clouds for 3D object detection in adverse weather,
M. Hahner, C. Sakaridis, D. Dai, and L. Van Gool, “Fog simulation on real LiDAR point clouds for 3D object detection in adverse weather,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 15 283–15 292
2021
-
[11]
LiDAR snowfall simulation for robust 3D object detection,
M. Hahneret al., “LiDAR snowfall simulation for robust 3D object detection,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 16 364–16 374. 11
2022
-
[12]
Perspective-invariant 3D object detection,
A. Lianget al., “Perspective-invariant 3D object detection,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2025, pp. 27 725–27 738
2025
-
[13]
NUC-Net: Non- uniform cylindrical partition network for efficient LiDAR semantic segmentation,
X. Wang, W. Feng, L. Kong, and L. Wan, “NUC-Net: Non- uniform cylindrical partition network for efficient LiDAR semantic segmentation,”IEEE Trans. Circuits Syst. Video Technol., vol. 35, no. 9, pp. 9090–9104, 2025
2025
-
[14]
Learning to adapt SAM for segmenting cross- domain point clouds,
X. Penget al., “Learning to adapt SAM for segmenting cross- domain point clouds,” inEur. Conf. Comput. Vis.Springer, 2024, pp. 54–71
2024
-
[15]
Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather,
M. Bijelicet al., “Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2020, pp. 11 682– 11 692
2020
-
[16]
Segment any point cloud sequences by distilling vision foundation models,
Y. Liuet al., “Segment any point cloud sequences by distilling vision foundation models,” inProc. Adv. Neural Inf. Process. Syst., 2023, pp. 37 193–37 229
2023
-
[17]
Scalability in perception for autonomous driving: Waymo open dataset,
P . Sunet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 2446–2454
2020
-
[18]
Learning to generate realistic LiDAR point clouds,
V . Zyrianov, X. Zhu, and S. Wang, “Learning to generate realistic LiDAR point clouds,” inProc. Eur. Conf. Comput. Vis., 2022, pp. 17–35
2022
-
[19]
LiDAR data synthesis with denoising diffusion probabilistic models,
K. Nakashima and R. Kurazume, “LiDAR data synthesis with denoising diffusion probabilistic models,” inProc. IEEE Int. Conf. Robot. Autom., 2024, pp. 14 724–14 731
2024
-
[20]
Towards realistic scene generation with LiDAR diffusion models,
H. Ran, V . Guizilini, and Y. Wang, “Towards realistic scene generation with LiDAR diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 14 738–14 748
2024
-
[21]
U4D: Uncertainty-aware 4D world modeling from LiDAR sequences,
X. Xuet al., “U4D: Uncertainty-aware 4D world modeling from LiDAR sequences,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2026
2026
-
[22]
Spiral: Semantic- aware progressive LiDAR scene generation and understanding,
D. Zhu, Y. Hu, Y. Liu, D. Lu, L. Kong, and S. Ilic, “Spiral: Semantic- aware progressive LiDAR scene generation and understanding,” inProc. Adv. Neural Inf. Process. Syst., vol. 38, 2025, pp. 57 623– 57 653
2025
-
[23]
LaserMix for semi-supervised LiDAR semantic segmentation,
L. Kong, J. Ren, L. Pan, and Z. Liu, “LaserMix for semi-supervised LiDAR semantic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 21 705–21 715
2023
-
[24]
Multi-modal data-efficient 3d scene understanding for autonomous driving,
L. Kong, X. Xu, J. Ren, W. Zhang, L. Pan, K. Chen, W. T. Ooi, and Z. Liu, “Multi-modal data-efficient 3d scene understanding for autonomous driving,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 5, pp. 3748–3765, 2025
2025
-
[25]
Is your HD map constructor reliable under sensor corruptions?
X. Haoet al., “Is your HD map constructor reliable under sensor corruptions?” inProc. Adv. Neural Inf. Process. Syst., vol. 37, 2024, pp. 22 441–22 482
2024
-
[26]
Benchmarking and improving bird’s eye view perception robust- ness in autonomous driving,
S. Xie, L. Kong, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, “Benchmarking and improving bird’s eye view perception robust- ness in autonomous driving,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 5, pp. 3878–3894, 2025
2025
-
[27]
Optimizing LiDAR placements for robust driving perception in adverse conditions,
Y. Liet al., “Optimizing LiDAR placements for robust driving perception in adverse conditions,”arXiv preprint arXiv:2403.17009, 2024
-
[28]
LargeAD: Large-scale cross-sensor data pretraining for autonomous driving,
L. Konget al., “LargeAD: Large-scale cross-sensor data pretraining for autonomous driving,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 2, pp. 1291–1308, 2026
2026
-
[29]
WeatherGen: A unified diverse weather generator for LiDAR point clouds via spider mamba diffusion,
Y. Wu, Y. Zhu, K. Zhang, J. Qian, J. Xie, and J. Yang, “WeatherGen: A unified diverse weather generator for LiDAR point clouds via spider mamba diffusion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 17 019–17 028
2025
-
[30]
arXiv preprint arXiv:2509.07996 (2025) 2, 4
L. Konget al., “3D and 4D world modeling: A survey,”arXiv preprint arXiv:2509.07996, 2025
-
[31]
3EED: Ground everything everywhere in 3D,
R. Liet al., “3EED: Ground everything everywhere in 3D,” inProc. Adv. Neural Inf. Process. Syst., vol. 38, 2025
2025
-
[32]
LiMoE: Mixture of LiDAR representation learners from automotive scenes,
X. Xuet al., “LiMoE: Mixture of LiDAR representation learners from automotive scenes,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 27 368–27 379
2025
-
[33]
Enhanced spatiotemporal consistency for image-to- LiDAR data pretraining,
X. Xu, L. Kong, H. Shuai, W. Zhang, L. Pan, K. Chen, Z. Liu, and Q. Liu, “Enhanced spatiotemporal consistency for image-to- LiDAR data pretraining,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 3, pp. 3819–3834, 2026
2026
-
[34]
ConDA: Unsupervised domain adaptation for LiDAR segmentation via regularized domain concatenation,
L. Kong, N. Quader, and V . E. Liong, “ConDA: Unsupervised domain adaptation for LiDAR segmentation via regularized domain concatenation,” inProc. IEEE Int. Conf. Robot. Automat., 2023, pp. 9338–9345
2023
-
[35]
Is your LiDAR placement optimized for 3D scene understanding?
Y. Li, L. Kong, H. Hu, X. Xu, and X. Huang, “Is your LiDAR placement optimized for 3D scene understanding?” inProc. Adv. Neural Inf. Process. Syst., vol. 37, 2024, pp. 34 980–35 017
2024
-
[36]
DynamicCity: Large-scale 4D occupancy generation from dynamic scenes,
H. Bian, L. Kong, H. Xie, L. Pan, Y. Qiao, and Z. Liu, “DynamicCity: Large-scale 4D occupancy generation from dynamic scenes,” in Int. Conf. Learn. Represent., 2025
2025
-
[37]
SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences,
J. Behleyet al., “SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 9297–9307
2019
-
[38]
nuScenes: A multimodal dataset for autonomous driving,
H. Caesaret al., “nuScenes: A multimodal dataset for autonomous driving,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 621–11 631
2020
-
[39]
LiDAR light scattering augmentation (LISA): Physics-based simulation of adverse weather conditions for 3D object detection,
V . Kilic, D. Hegde, A. B. Cooper, V . M. Patel, and M. Foster, “LiDAR light scattering augmentation (LISA): Physics-based simulation of adverse weather conditions for 3D object detection,” inProc. IEEE Int. Conf. Acoust. Speech Signal Process., 2025, pp. 1–5
2025
-
[40]
Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,
D. Fenget al., “Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges,”IEEE Trans. Intell. Transp. Syst., vol. 22, no. 3, pp. 1341–1360, 2020
2020
-
[41]
CLIP2Scene: Towards label-efficient 3D scene understanding by CLIP,
R. Chenet al., “CLIP2Scene: Towards label-efficient 3D scene understanding by CLIP,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7020–7030
2023
-
[42]
PointNet: Deep learning on point sets for 3D classification and segmentation,
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2017, pp. 652–660
2017
-
[43]
PointNet++: Deep hierarchical feature learning on point sets in a metric space,
C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” inProc. Adv. Neural Inf. Process. Syst., 2017, pp. 5105–5114
2017
-
[44]
RandLA-Net: Efficient semantic segmentation of large-scale point clouds,
Q. Huet al., “RandLA-Net: Efficient semantic segmentation of large-scale point clouds,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 11 108–11 117
2020
-
[45]
Backward attentive fusing network with local aggregation classifier for 3D point cloud semantic segmentation,
H. Shuai, X. Xu, and Q. Liu, “Backward attentive fusing network with local aggregation classifier for 3D point cloud semantic segmentation,”IEEE Trans. Image Process., vol. 30, pp. 4973–4984, 2021
2021
-
[46]
PolarNet: An improved grid representation for online LiDAR point clouds semantic segmentation,
Y. Zhanget al., “PolarNet: An improved grid representation for online LiDAR point clouds semantic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9601–9610
2020
-
[47]
Panoptic-PolarNet: Proposal- free LiDAR point cloud panoptic segmentation,
Z. Zhou, Y. Zhang, and H. Foroosh, “Panoptic-PolarNet: Proposal- free LiDAR point cloud panoptic segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13 194–13 203
2021
-
[48]
RangeViT: Towards vision transformers for 3D semantic segmen- tation in autonomous driving,
A. Ando, S. Gidaris, A. Bursuc, G. Puy, A. Boulch, and R. Marlet, “RangeViT: Towards vision transformers for 3D semantic segmen- tation in autonomous driving,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 5240–5250
2023
-
[49]
Rethinking range view representation for LiDAR segmentation,
L. Konget al., “Rethinking range view representation for LiDAR segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 228–240
2023
-
[50]
SeeGround: See and ground for zero-shot open- vocabulary 3D visual grounding,
R. Liet al., “SeeGround: See and ground for zero-shot open- vocabulary 3D visual grounding,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2025, pp. 3707–3717
2025
-
[51]
FRNet: Frustum-range networks for scalable LiDAR segmentation,
X. Xuet al., “FRNet: Frustum-range networks for scalable LiDAR segmentation,”IEEE Trans. Image Process., vol. 34, pp. 2173–2186, 2025
2025
-
[52]
SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation,
C. Xuet al., “SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 1–19
2020
-
[53]
RangeNet++: Fast and accurate LiDAR semantic segmentation,
A. Milioto, I. Vizzo, J. Behley, and C. Stachniss, “RangeNet++: Fast and accurate LiDAR semantic segmentation,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2019, pp. 4213–4220
2019
-
[54]
4D spatio-temporal convnets: Minkowski convolutional neural networks,
C. Choy, J. Gwak, and S. Savarese, “4D spatio-temporal convnets: Minkowski convolutional neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3075–3084
2019
-
[55]
Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation,
X. Zhuet al., “Cylindrical and asymmetrical 3D convolution networks for LiDAR segmentation,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 9939–9948
2021
-
[56]
SECOND: Sparsely embedded convolutional detection,
Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely embedded convolutional detection,”Sensors, vol. 18, no. 10, p. 3337, 2018
2018
-
[57]
LiDAR-based panoptic segmentation via dynamic shifting network,
F. Hong, H. Zhou, X. Zhu, H. Li, and Z. Liu, “LiDAR-based panoptic segmentation via dynamic shifting network,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2021, pp. 13 090– 13 099
2021
-
[58]
Center-based 3D object detection and tracking,
T. Yin, X. Zhou, and P . Krahenbuhl, “Center-based 3D object detection and tracking,” inIEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 11 784–11 793
2021
-
[59]
Searching efficient 3D architectures with sparse point-voxel convolution,
H. Tanget al., “Searching efficient 3D architectures with sparse point-voxel convolution,” inProc. Eur. Conf. Comput. Vis., 2020, pp. 685–702. 12
2020
-
[60]
RPVNet: A deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation,
J. Xuet al., “RPVNet: A deep and efficient range-point-voxel fusion network for LiDAR point cloud segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16 024–16 033
2021
-
[61]
AMVNet: Assertion-based multi-view fusion network for LiDAR semantic segmentation,
V . E. Liong, T. N. T. Nguyen, S. Widjaja, D. Sharma, and Z. J. Chong, “AMVNet: Assertion-based multi-view fusion network for LiDAR semantic segmentation,”arXiv preprint arXiv:2012.04934, 2020
-
[62]
Perception- aware multi-sensor fusion for 3D LiDAR semantic segmentation,
Z. Zhuang, R. Li, K. Jia, Q. Wang, Y. Li, and M. Tan, “Perception- aware multi-sensor fusion for 3D LiDAR semantic segmentation,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 16 280–16 290
2021
-
[63]
High-resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P . Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 10 684–10 695
2022
-
[64]
Improved denoising diffusion probabilistic models,
A. Q. Nichol and P . Dhariwal, “Improved denoising diffusion probabilistic models,” inProc. Int. Conf. Mach. Learn., 2021, pp. 8162–8171
2021
-
[65]
Imagen Video: High Definition Video Generation with Diffusion Models
J. Hoet al., “Imagen video: High definition video generation with diffusion models,”arXiv preprint arXiv:2210.02303, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[66]
Scalable diffusion models with transform- ers,
W. Peebles and S. Xie, “Scalable diffusion models with transform- ers,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 4195–4205
2023
-
[67]
Wan: Open and Advanced Large-Scale Video Generative Models
T. Wanet al., “Wan: Open and advanced large-scale video generative models,”arXiv preprint arXiv:2503.20314, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[68]
Adding conditional control to text-to-image diffusion models,
L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 3836–3847
2023
-
[69]
Diffusion models: A comprehensive survey of methods and applications,
L. Yanget al., “Diffusion models: A comprehensive survey of methods and applications,”ACM Comput. Surv., vol. 56, no. 4, pp. 1–39, 2023
2023
-
[70]
Scaling rectified flow transformers for high- resolution image synthesis,
P . Esseret al., “Scaling rectified flow transformers for high- resolution image synthesis,” inProc. Int. Conf. Mach. Learn., 2024, pp. 12 606–12 633
2024
-
[71]
LongVie 2: Multimodal controllable ultra-long video world model,
J. Gaoet al., “LongVie 2: Multimodal controllable ultra-long video world model,”arXiv preprint arXiv:2512.13604, 2025
-
[72]
GetMesh: A controllable model for high-quality mesh generation and manipulation,
B. Feiet al., “GetMesh: A controllable model for high-quality mesh generation and manipulation,”IEEE Trans. Pattern Anal. Mach. Intell., 2025
2025
-
[73]
AdaSFormer: Adaptive serialized transformers for monocular semantic scene completion from indoor environments,
X. Wang, X. Wu, S. Wang, L. Kong, and Z. Zhao, “AdaSFormer: Adaptive serialized transformers for monocular semantic scene completion from indoor environments,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2026
2026
-
[74]
UltraLiDAR: Learning compact representations for LiDAR completion and generation,
Y. Xiong, W.-C. Ma, J. Wang, and R. Urtasun, “UltraLiDAR: Learning compact representations for LiDAR completion and generation,”arXiv preprint arXiv:2311.01448, 2023
-
[75]
RangeLDM: Fast realistic LiDAR point cloud generation,
Q. Hu, Z. Zhang, and W. Hu, “RangeLDM: Fast realistic LiDAR point cloud generation,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 115–135
2024
-
[76]
Text2LiDAR: Text-guided LiDAR point cloud generation via equirectangular transformer,
Y. Wu, K. Zhang, J. Qian, J. Xie, and J. Yang, “Text2LiDAR: Text-guided LiDAR point cloud generation via equirectangular transformer,” inProc. Eur. Conf. Comput. Vis., 2024, pp. 291–310
2024
-
[77]
Veila: Panoramic LiDAR generation from a monocu- lar RGB image,
Y. Liuet al., “Veila: Panoramic LiDAR generation from a monocu- lar RGB image,”arXiv preprint arXiv:2508.03690, 2025
-
[78]
La La LiDAR: Large-scale layout generation from LiDAR data,
Y. Liu, L. Kong, W. Yang, X. Li, A. Liang, R. Chen, B. Fei, and T. Liu, “La La LiDAR: Large-scale layout generation from LiDAR data,”Proc. AAAI Conf. Artif. Intell., vol. 40, no. 9, pp. 7377–7385, 2026
2026
-
[79]
LiDARCrafter: Dynamic 4D world modeling from LiDAR sequences,
A. Lianget al., “LiDARCrafter: Dynamic 4D world modeling from LiDAR sequences,”Proc. AAAI Conf. Artif. Intell., vol. 40, no. 22, pp. 18 406–18 414, 2026
2026
-
[80]
Lidar4d: Dynamic neural fields for novel space-time view lidar synthesis,
Z. Zheng, F. Lu, W. Xue, G. Chen, and C. Jiang, “Lidar4d: Dynamic neural fields for novel space-time view lidar synthesis,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern. Recognit., 2024, pp. 5145–5154
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.