WiSER: A Wireless Scene Encoder for Geometry-Grounded Multi-View Wireless Prediction

Hao Ye; Jing Qiao; Yiyang Guo

arxiv: 2606.04770 · v1 · pith:BOBNWKSEnew · submitted 2026-06-03 · 📡 eess.SP

WiSER: A Wireless Scene Encoder for Geometry-Grounded Multi-View Wireless Prediction

Jing Qiao , Yiyang Guo , Hao Ye This is my paper

Pith reviewed 2026-06-28 05:06 UTC · model grok-4.3

classification 📡 eess.SP

keywords wireless scene encodingradiomap predictionchannel impulse response3D scene representationindoor propagationmulti-view wireless predictiongeometry-grounded encoding

0 comments

The pith

Transmitter-conditioned sparse 3D scene memory supports joint radiomap and multipath channel predictions from one encoder.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a sparse voxel scene plus transmitter location can be encoded into a reusable 3D memory structure. Separate decoders then query this memory to produce dense path-gain maps and variable sets of delay-power taps. The model is trained and tested on aligned indoor geometry and propagation data, where it exceeds task-specific baselines on both coverage and multipath accuracy. A sympathetic reader would care because the shared memory demonstrates that geometry can ground multiple wireless views without retraining separate networks for each task.

Core claim

WiSER maps a sparse voxel representation of an indoor scene and a transmitter location into a transmitter-conditioned sparse 3D scene memory, which is queried by a ray-corridor decoder for dense receiver-plane path-gain prediction and a set decoder for variable cardinality delay and power tap prediction. Experiments on co-registered scene and wireless data show that this outperforms scene-specific radiomap baselines and substantially improves matched delay and power prediction over reference CIR baselines.

What carries the argument

transmitter-conditioned sparse 3D scene memory that encodes scene geometry for reuse across different wireless queries

If this is right

The same encoder can serve heterogeneous propagation queries without separate training per task.
Sharing the scene memory improves both aggregate coverage predictions and path-level multipath structure.
Geometry-grounded representations become reusable building blocks for multiple wireless prediction problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The memory structure could support transfer to new transmitter locations or slight scene changes with limited additional data.
Real-world measurements might be used to adapt the encoded memory beyond simulation-only training.
The approach opens a path to combining the encoder with other sensor inputs for tasks such as localization.

Load-bearing premise

Simulated labels from 3D scenes accurately capture the joint statistics of radiomap and multipath structure that would appear in real indoor environments with actual materials and noise.

What would settle it

Compare the model's predicted path gains and delay-power tap sets against direct measurements collected in a physical indoor space whose geometry and transmitter positions match the input scenes.

Figures

Figures reproduced from arXiv: 2606.04770 by Hao Ye, Jing Qiao, Yiyang Guo.

**Figure 1.** Figure 1: Overall WiSER architecture. (a) The sparse voxel scene tokens are encoded by repeated sparse attention and sparse downsampling, and the coarsest [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Ray-corridor feature gathering. (a) For a transmitter/receiver (TX/RX) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Co-registered dataset-generation pipeline. A ScanNet++ scene is converted into a sparse voxel scene for learning and a Sionna-compatible radio [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative radiomap comparison on four representative scene–TX cases. Columns show ground truth, WiSER, NeRF2, RF-3DGS, and two radiomap [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative multipath CIR prediction examples with reference baselines and attention ablations. Green circles denote ground-truth path taps, colored [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

read the original abstract

Indoor wireless propagation is governed by the interaction among three-dimensional (3D) scene geometry, radiomaterial properties, and transmitter and receiver configuration, which jointly determine both aggregate coverage behavior and path-level multipath structure. However, most learning-based site-specific prediction methods are designed for a single wireless representation, such as radiomap estimation or channel impulse response (CIR) prediction, and therefore do not explicitly exploit the propagation structure shared across heterogeneous wireless views. This paper introduces WiSER, a Wireless Scene Encoder for joint radiomap and multipath CIR prediction. WiSER maps a sparse voxel representation of an indoor scene and a transmitter location into a transmitter-conditioned sparse 3D scene memory, which is queried by two structure-aware decoders: a ray-corridor decoder for dense receiver-plane path-gain prediction and a Detection Transformer (DETR)-style set decoder for variable cardinality delay and power tap prediction. To train and evaluate this setting, we construct a co-registered indoor scene and wireless dataset pipeline using ScanNet++ indoor scenes and Sionna Ray Tracing, producing aligned sparse voxel inputs, dense radiomap labels, and unordered multipath CIR tap sets under a common coordinate frame and propagation configuration. Experimental results show that WiSER outperforms scene-specific radiomap baselines and substantially improves matched delay and power prediction over reference CIR baselines. These results suggest that transmitter-conditioned sparse 3D scene representations can serve as reusable wireless scene encoders for heterogeneous propagation queries, providing a geometry-grounded step toward representation learning and foundation-model development for AI-native wireless systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WiSER's shared transmitter-conditioned scene memory for joint radiomap and CIR prediction is a reasonable architectural idea, but the abstract supplies no metrics or ablations so the performance claims cannot be checked.

read the letter

The paper's core move is to encode an indoor scene plus transmitter location into a sparse 3D memory, then query that memory with two different decoders: one that produces dense path-gain maps along ray corridors and one that outputs unordered sets of delay-power taps in DETR style. This joint setup on a single geometry-grounded representation is the concrete novelty relative to the single-task methods cited.

The dataset construction that aligns ScanNet++ geometry with Sionna ray-tracing labels under a common coordinate frame is a practical step that makes the multi-task training feasible.

The obvious gap is the complete lack of numbers. The abstract states that the model outperforms scene-specific baselines and improves matched delay-power prediction, yet it gives no quantitative values, no description of the baselines, no ablation results, and no statistical tests. Without those, there is no way to know whether the shared memory actually drives the gains or whether the architecture simply fits the simulation data better for other reasons.

A secondary concern is that all labels come from Sionna on ScanNet++ scenes with default or assumed materials and no sensor noise. The learned features may therefore capture simulation artifacts rather than the joint statistics that appear in real indoor measurements.

This work is aimed at people already working on learning-based site-specific propagation modeling who are interested in reusable scene representations. A reader looking for concrete multi-task designs could extract useful ideas from the encoder-decoder structure, but only if the full paper supplies the missing experiments.

I would send it to peer review. The problem is relevant and the architecture is well-specified; referees can assess the results once they are presented.

Referee Report

2 major / 0 minor

Summary. The paper introduces WiSER, a neural architecture that maps a sparse voxel representation of an indoor 3D scene plus transmitter location into a transmitter-conditioned sparse 3D scene memory. This memory is queried by a ray-corridor decoder for dense receiver-plane path-gain (radiomap) prediction and a DETR-style set decoder for variable-cardinality delay/power tap (CIR) prediction. The model is trained and evaluated on a co-registered dataset generated from ScanNet++ scenes via Sionna ray-tracing; the central claim is that the learned representations outperform scene-specific radiomap baselines and reference CIR baselines, enabling reusable geometry-grounded encoders for heterogeneous wireless queries.

Significance. If the performance gains are substantial, statistically validated, and the representations generalize beyond the synthetic training distribution, the work would constitute a concrete step toward multi-task, geometry-aware representation learning for wireless systems. The construction of an aligned sparse-voxel / radiomap / CIR dataset pipeline is a clear positive contribution that could support future foundation-model efforts in the field.

major comments (2)

[Abstract] Abstract: the claim that 'WiSER outperforms scene-specific radiomap baselines and substantially improves matched delay and power prediction over reference CIR baselines' is presented without any quantitative metrics, baseline descriptions, ablation tables, or statistical significance tests. Because the central claim of reusable encoders rests on demonstrated superiority, the absence of these results is load-bearing and prevents verification of the asserted gains.
[Dataset construction] Dataset construction (described in the abstract and implied experimental sections): all labels are generated exclusively by Sionna ray-tracing over ScanNet++ geometry with default or assumed material properties and no sensor noise. This choice directly affects whether the learned features capture genuine joint statistics of real indoor radiomaps and multipath structure; without real-world validation or sensitivity analysis to material/noise mismatch, the claim that the encoder reflects propagation physics rather than simulation artifacts cannot be assessed.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'WiSER outperforms scene-specific radiomap baselines and substantially improves matched delay and power prediction over reference CIR baselines' is presented without any quantitative metrics, baseline descriptions, ablation tables, or statistical significance tests. Because the central claim of reusable encoders rests on demonstrated superiority, the absence of these results is load-bearing and prevents verification of the asserted gains.

Authors: We agree that the abstract would be strengthened by including key quantitative results. The full manuscript reports these metrics, baseline descriptions, ablation studies, and statistical details in Sections 4 and 5. We will revise the abstract to incorporate specific performance numbers (e.g., radiomap NMSE improvements and CIR matching accuracy gains) with references to the corresponding tables. revision: yes
Referee: [Dataset construction] Dataset construction (described in the abstract and implied experimental sections): all labels are generated exclusively by Sionna ray-tracing over ScanNet++ geometry with default or assumed material properties and no sensor noise. This choice directly affects whether the learned features capture genuine joint statistics of real indoor radiomaps and multipath structure; without real-world validation or sensitivity analysis to material/noise mismatch, the claim that the encoder reflects propagation physics rather than simulation artifacts cannot be assessed.

Authors: The co-registered dataset is generated via Sionna ray-tracing on ScanNet++ to ensure precise alignment between geometry, radiomaps, and CIRs under controlled conditions, which is essential for training the multi-view model. We will add an explicit limitations paragraph discussing the use of default material properties and absence of sensor noise, along with a brief sensitivity discussion where feasible. However, real-world measurements and full mismatch analysis require new data collection that is outside the scope of this work. revision: partial

standing simulated objections not resolved

Comprehensive real-world validation or sensitivity analysis to material/noise mismatch, as this would require new measurement campaigns beyond the current simulation-based study.

Circularity Check

0 steps flagged

No circularity: standard supervised training on external simulation labels

full rationale

The paper presents a neural network architecture (transmitter-conditioned sparse voxel encoder plus ray-corridor and DETR-style decoders) trained end-to-end on labels generated by an independent external pipeline (Sionna ray-tracing over ScanNet++ geometry). No equation defines a quantity in terms of itself, no fitted parameter is relabeled as a prediction, and no load-bearing premise rests on a self-citation chain. The central claim that the learned representations are reusable across heterogeneous queries is an empirical outcome of supervised training and held-out evaluation, not a definitional identity. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim that the learned encoder is reusable across heterogeneous queries rests on the fidelity of the ray-tracing simulator and on the assumption that the chosen voxel resolution and training distribution capture the relevant propagation physics.

free parameters (1)

neural-network weights
All encoder and decoder parameters are fitted to the simulated radiomap and CIR labels.

axioms (1)

domain assumption Sionna ray tracing on ScanNet++ scenes produces labels whose joint radiomap and multipath statistics match real indoor propagation
All training and evaluation labels are generated by this simulator; any mismatch directly affects both prediction tasks.

invented entities (1)

transmitter-conditioned sparse 3D scene memory no independent evidence
purpose: Reusable latent representation queried by both the ray-corridor and DETR-style decoders
New architectural component introduced to enable joint heterogeneous prediction; no independent evidence outside the trained model is provided.

pith-pipeline@v0.9.1-grok · 5812 in / 1492 out tokens · 41901 ms · 2026-06-28T05:06:57.182332+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 14 canonical work pages · 4 internal anchors

[1]

The COST 2100 MIMO channel model,

L. Liu, C. Oestges, J. Poutanen, K. Haneda, P. Vainikainen, F. Quitin, F. Tufvesson, and P. De Doncker, “The COST 2100 MIMO channel model,”IEEE Wireless Commun., vol. 19, no. 6, pp. 92–99, Dec. 2012

2012
[2]

QuaDRiGa: A 3- D multi-cell channel model with time evolution for enabling virtual field trials,

S. Jaeckel, L. Raschkowski, K. Borner, and L. Thiele, “QuaDRiGa: A 3- D multi-cell channel model with time evolution for enabling virtual field trials,”IEEE Trans. Antennas Propag., vol. 62, no. 6, pp. 3242–3256, Jun. 2014

2014
[3]

A novel millimeter- wave channel simulator and applications for 5G wireless communica- tions,

S. Sun, G. R. MacCartney, and T. S. Rappaport, “A novel millimeter- wave channel simulator and applications for 5G wireless communica- tions,” inProc. IEEE Int. Conf. Commun. (ICC), Paris, France, May 2017, pp. 1–7

2017
[4]

Sionna: An open-source library for next-generation physical layer research,

J. Hoydis, S. Cammerer, F. A. Aoudia, A. Vem, N. Binder, G. Marcus, and A. Keller, “Sionna: An open-source library for next-generation physical layer research,”arXiv preprint arXiv:2203.11854, 2022

work page arXiv 2022
[5]

Sionna RT: Differentiable ray tracing for radio propagation modeling,

J. Hoydis, F. A. Aoudia, S. Cammerer, M. Nimier-David, N. Binder, G. Marcus, and A. Keller, “Sionna RT: Differentiable ray tracing for radio propagation modeling,”arXiv preprint arXiv:2303.11103, 2023

work page arXiv 2023
[6]

RadioUNet: Fast radio map estimation with convolutional neural networks,

R. Levie, C ¸ . Yapar, G. Kutyniok, and G. Caire, “RadioUNet: Fast radio map estimation with convolutional neural networks,”IEEE Trans. Wireless Commun., vol. 20, no. 6, pp. 4001–4015, Jun. 2021

2021
[7]

Radio map estimation: A data-driven approach to spectrum cartography,

D. Romero and S.-J. Kim, “Radio map estimation: A data-driven approach to spectrum cartography,”IEEE Signal Process. Mag., vol. 39, no. 6, pp. 53–72, Nov. 2022

2022
[8]

An I2I inpainting approach for efficient channel knowledge map construction,

Z. Jin, L. You, J. Wang, X.-G. Xia, and X. Gao, “An I2I inpainting approach for efficient channel knowledge map construction,”IEEE Trans. Wireless Commun., vol. 24, no. 2, pp. 1415–1429, Feb. 2025

2025
[9]

A tutorial on environment-aware communications via channel knowledge map for 6G,

Y . Zeng, J. Chen, J. Xu, D. Wu, X. Xu, S. Jin, X. Gao, D. Gesbert, S. Cui, and R. Zhang, “A tutorial on environment-aware communications via channel knowledge map for 6G,”IEEE Commun. Surveys Tuts., vol. 26, no. 3, pp. 1478–1519, 2024

2024
[10]

Deep learning for massive MIMO CSI feedback,

C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,”IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, Oct. 2018

2018
[11]

Convolutional neural network based multiple-rate compressive sensing for massive MIMO CSI feed- back: Design, simulation, and analysis,

J. Guo, C.-K. Wen, S. Jin, and G. Y . Li, “Convolutional neural network based multiple-rate compressive sensing for massive MIMO CSI feed- back: Design, simulation, and analysis,”IEEE Trans. Wireless Commun., vol. 19, no. 4, pp. 2827–2840, Apr. 2020

2020
[12]

Distributed deep convo- lutional compression for massive MIMO CSI feedback,

M. B. Mashhadi, Q. Yang, and D. G ¨und¨uz, “Distributed deep convo- lutional compression for massive MIMO CSI feedback,”IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2621–2633, Apr. 2021

2021
[13]

Towards a wireless physical- layer foundation model: Challenges and strategies,

J. Fontaine, A. Shahid, and E. De Poorter, “Towards a wireless physical- layer foundation model: Challenges and strategies,” inProc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), Denver, CO, USA, Jun. 2024

2024
[14]

Large wireless model (LWM): A foundation model for wireless channels,

S. Alikhani, G. Charan, and A. Alkhateeb, “Large wireless model (LWM): A foundation model for wireless channels,”arXiv preprint arXiv:2411.08872, 2024

work page arXiv 2024
[15]

WiFo: Wireless foun- dation model for channel prediction,

B. Liu, S. Gao, X. Liu, X. Cheng, and L. Yang, “WiFo: Wireless foun- dation model for channel prediction,”arXiv preprint arXiv:2412.08908, 2024

work page arXiv 2024
[16]

ScanNet++: A high- fidelity dataset of 3D indoor scenes,

C. Yeshwanth, Y .-C. Liu, M. Nießner, and A. Dai, “ScanNet++: A high- fidelity dataset of 3D indoor scenes,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Paris, France, Oct. 2023

2023
[17]

DeepMIMO: A Generic Deep Learning Dataset for Millimeter Wave and Massive MIMO Applications

A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter-wave and massive MIMO applications,”arXiv preprint arXiv:1902.06435, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[18]

DeepSense 6G: A large-scale real- world multi-modal sensing and communication dataset,

A. Alkhateeb, G. Charan, T. Osman, A. Hredzak, J. Morais, U. Demirhan, and N. Srinivas, “DeepSense 6G: A large-scale real- world multi-modal sensing and communication dataset,”arXiv preprint arXiv:2211.09769, 2022

work page arXiv 2022
[19]

M 3SC: A generic dataset for mixed multi-modal sensing and communication integration,

X. Cheng, Z. Huang, L. Bai, H. Zhang, M. Sun, B. Liu, S. Li, J. Zhang, and M. Lee, “M 3SC: A generic dataset for mixed multi-modal sensing and communication integration,”China Commun., vol. 20, no. 11, pp. 13–29, Nov. 2023

2023
[20]

RadioDiff-3D: A 3D×3D radio map dataset and generative diffusion based benchmark for 6G environment-aware communication,

X. Wang, Q. Zhang, N. Cheng, J. Chen, Z. Zhang, Z. Li, S. Cui, and X. Shen, “RadioDiff-3D: A 3D×3D radio map dataset and generative diffusion based benchmark for 6G environment-aware communication,” arXiv preprint arXiv:2507.12166, 2025

work page arXiv 2025
[21]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” inProc. Eur . Conf. Comput. Vis. (ECCV), Virtual, Aug. 2020, pp. 213–229

2020
[22]

Native and Compact Structured Latents for 3D Generation

J. Xiang, X. Chen, S. Xu, R. Wang, Z. Lv, Y . Deng, H. Zhu, Y . Dong, H. Zhao, N. J. Yuan, and J. Yang, “Native and compact structured latents for 3D generation,”arXiv preprint arXiv:2512.14692, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

RadioGen3D: 3D radio map generation via adversarial learning on large-scale synthetic data,

J. Chen, A. Xu, Z. Zhang, S. Zhang, J. Chen, and S. Cui, “RadioGen3D: 3D radio map generation via adversarial learning on large-scale synthetic data,”arXiv preprint arXiv:2602.18744, 2026

work page arXiv 2026
[24]

Deep Learning-Based Site-Specific Channel Modeling and Inference

J. Song, R. He, M. Yang, Z. Zhang, S. Gao, B. Ai, and Z. Zhong, “Deep learning-based site-specific channel modeling and inference,” arXiv preprint arXiv:2603.28083, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[25]

Deformable DETR: Deformable transformers for end-to-end object detection,

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable transformers for end-to-end object detection,” inProc. Int. Conf. Learn. Representations (ICLR), Virtual, May 2021

2021
[26]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskillet al., “On the opportunities and risks of foundation models,”arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[27]

Multitask learning,

R. Caruana, “Multitask learning,”Mach. Learn., vol. 28, no. 1, pp. 41– 75, Jul. 1997

1997
[28]

Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,

A. Kendall, Y . Gal, and R. Cipolla, “Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, Jun. 2018, pp. 7482–7491

2018
[29]

NeRF: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing scenes as neural radiance fields for view synthesis,” inProc. Eur . Conf. Comput. Vis. (ECCV), Virtual, Aug. 2020, pp. 405–421

2020
[30]

3D gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3D gaussian splatting for real-time radiance field rendering,”ACM Trans. Graph., vol. 42, no. 4, pp. 139:1–139:14, Jul. 2023

2023
[31]

NeRF2: Neural radio-frequency radiance fields,

X. Zhao, Z. An, Q. Pan, and L. Yang, “NeRF2: Neural radio-frequency radiance fields,”arXiv preprint arXiv:2305.06118, 2023

work page arXiv 2023
[32]

NeWRF: A deep learning framework for wireless radiation field reconstruction and channel prediction,

H. Lu, C. Vattheuer, B. Mirzasoleiman, and O. Abari, “NeWRF: A deep learning framework for wireless radiation field reconstruction and channel prediction,”arXiv preprint arXiv:2403.03241, 2024

work page arXiv 2024
[33]

RF-3DGS: Wireless channel modeling with radio radiance field and 3D gaussian splatting,

L. Zhang, H. Sun, S. Berweger, C. Gentile, and R. Q. Hu, “RF-3DGS: Wireless channel modeling with radio radiance field and 3D gaussian splatting,”arXiv preprint arXiv:2411.19420, 2024

work page arXiv 2024
[34]

WiNeRT: Towards neural ray tracing for wireless channel modelling and differentiable simulations,

T. Orekondy, P. Kumar, S. Kadambi, H. Ye, J. Soriaga, and A. Behboodi, “WiNeRT: Towards neural ray tracing for wireless channel modelling and differentiable simulations,” inProc. Int. Conf. Learn. Representa- tions (ICLR), Kigali, Rwanda, May 2023

2023
[35]

4D spatio-temporal convnets: Minkowski convolutional neural networks,

C. Choy, J. Gwak, and S. Savarese, “4D spatio-temporal convnets: Minkowski convolutional neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, Jun. 2019

2019
[36]

Point transformer V2: Grouped vector attention and partition-based pooling,

X. Wu, Y . Lao, L. Jiang, X. Liu, and H. Zhao, “Point transformer V2: Grouped vector attention and partition-based pooling,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), New Orleans, LA, USA, Nov. 2022

2022

[1] [1]

The COST 2100 MIMO channel model,

L. Liu, C. Oestges, J. Poutanen, K. Haneda, P. Vainikainen, F. Quitin, F. Tufvesson, and P. De Doncker, “The COST 2100 MIMO channel model,”IEEE Wireless Commun., vol. 19, no. 6, pp. 92–99, Dec. 2012

2012

[2] [2]

QuaDRiGa: A 3- D multi-cell channel model with time evolution for enabling virtual field trials,

S. Jaeckel, L. Raschkowski, K. Borner, and L. Thiele, “QuaDRiGa: A 3- D multi-cell channel model with time evolution for enabling virtual field trials,”IEEE Trans. Antennas Propag., vol. 62, no. 6, pp. 3242–3256, Jun. 2014

2014

[3] [3]

A novel millimeter- wave channel simulator and applications for 5G wireless communica- tions,

S. Sun, G. R. MacCartney, and T. S. Rappaport, “A novel millimeter- wave channel simulator and applications for 5G wireless communica- tions,” inProc. IEEE Int. Conf. Commun. (ICC), Paris, France, May 2017, pp. 1–7

2017

[4] [4]

Sionna: An open-source library for next-generation physical layer research,

J. Hoydis, S. Cammerer, F. A. Aoudia, A. Vem, N. Binder, G. Marcus, and A. Keller, “Sionna: An open-source library for next-generation physical layer research,”arXiv preprint arXiv:2203.11854, 2022

work page arXiv 2022

[5] [5]

Sionna RT: Differentiable ray tracing for radio propagation modeling,

J. Hoydis, F. A. Aoudia, S. Cammerer, M. Nimier-David, N. Binder, G. Marcus, and A. Keller, “Sionna RT: Differentiable ray tracing for radio propagation modeling,”arXiv preprint arXiv:2303.11103, 2023

work page arXiv 2023

[6] [6]

RadioUNet: Fast radio map estimation with convolutional neural networks,

R. Levie, C ¸ . Yapar, G. Kutyniok, and G. Caire, “RadioUNet: Fast radio map estimation with convolutional neural networks,”IEEE Trans. Wireless Commun., vol. 20, no. 6, pp. 4001–4015, Jun. 2021

2021

[7] [7]

Radio map estimation: A data-driven approach to spectrum cartography,

D. Romero and S.-J. Kim, “Radio map estimation: A data-driven approach to spectrum cartography,”IEEE Signal Process. Mag., vol. 39, no. 6, pp. 53–72, Nov. 2022

2022

[8] [8]

An I2I inpainting approach for efficient channel knowledge map construction,

Z. Jin, L. You, J. Wang, X.-G. Xia, and X. Gao, “An I2I inpainting approach for efficient channel knowledge map construction,”IEEE Trans. Wireless Commun., vol. 24, no. 2, pp. 1415–1429, Feb. 2025

2025

[9] [9]

A tutorial on environment-aware communications via channel knowledge map for 6G,

Y . Zeng, J. Chen, J. Xu, D. Wu, X. Xu, S. Jin, X. Gao, D. Gesbert, S. Cui, and R. Zhang, “A tutorial on environment-aware communications via channel knowledge map for 6G,”IEEE Commun. Surveys Tuts., vol. 26, no. 3, pp. 1478–1519, 2024

2024

[10] [10]

Deep learning for massive MIMO CSI feedback,

C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,”IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, Oct. 2018

2018

[11] [11]

Convolutional neural network based multiple-rate compressive sensing for massive MIMO CSI feed- back: Design, simulation, and analysis,

J. Guo, C.-K. Wen, S. Jin, and G. Y . Li, “Convolutional neural network based multiple-rate compressive sensing for massive MIMO CSI feed- back: Design, simulation, and analysis,”IEEE Trans. Wireless Commun., vol. 19, no. 4, pp. 2827–2840, Apr. 2020

2020

[12] [12]

Distributed deep convo- lutional compression for massive MIMO CSI feedback,

M. B. Mashhadi, Q. Yang, and D. G ¨und¨uz, “Distributed deep convo- lutional compression for massive MIMO CSI feedback,”IEEE Trans. Wireless Commun., vol. 20, no. 4, pp. 2621–2633, Apr. 2021

2021

[13] [13]

Towards a wireless physical- layer foundation model: Challenges and strategies,

J. Fontaine, A. Shahid, and E. De Poorter, “Towards a wireless physical- layer foundation model: Challenges and strategies,” inProc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), Denver, CO, USA, Jun. 2024

2024

[14] [14]

Large wireless model (LWM): A foundation model for wireless channels,

S. Alikhani, G. Charan, and A. Alkhateeb, “Large wireless model (LWM): A foundation model for wireless channels,”arXiv preprint arXiv:2411.08872, 2024

work page arXiv 2024

[15] [15]

WiFo: Wireless foun- dation model for channel prediction,

B. Liu, S. Gao, X. Liu, X. Cheng, and L. Yang, “WiFo: Wireless foun- dation model for channel prediction,”arXiv preprint arXiv:2412.08908, 2024

work page arXiv 2024

[16] [16]

ScanNet++: A high- fidelity dataset of 3D indoor scenes,

C. Yeshwanth, Y .-C. Liu, M. Nießner, and A. Dai, “ScanNet++: A high- fidelity dataset of 3D indoor scenes,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Paris, France, Oct. 2023

2023

[17] [17]

DeepMIMO: A Generic Deep Learning Dataset for Millimeter Wave and Massive MIMO Applications

A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter-wave and massive MIMO applications,”arXiv preprint arXiv:1902.06435, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[18] [18]

DeepSense 6G: A large-scale real- world multi-modal sensing and communication dataset,

A. Alkhateeb, G. Charan, T. Osman, A. Hredzak, J. Morais, U. Demirhan, and N. Srinivas, “DeepSense 6G: A large-scale real- world multi-modal sensing and communication dataset,”arXiv preprint arXiv:2211.09769, 2022

work page arXiv 2022

[19] [19]

M 3SC: A generic dataset for mixed multi-modal sensing and communication integration,

X. Cheng, Z. Huang, L. Bai, H. Zhang, M. Sun, B. Liu, S. Li, J. Zhang, and M. Lee, “M 3SC: A generic dataset for mixed multi-modal sensing and communication integration,”China Commun., vol. 20, no. 11, pp. 13–29, Nov. 2023

2023

[20] [20]

RadioDiff-3D: A 3D×3D radio map dataset and generative diffusion based benchmark for 6G environment-aware communication,

X. Wang, Q. Zhang, N. Cheng, J. Chen, Z. Zhang, Z. Li, S. Cui, and X. Shen, “RadioDiff-3D: A 3D×3D radio map dataset and generative diffusion based benchmark for 6G environment-aware communication,” arXiv preprint arXiv:2507.12166, 2025

work page arXiv 2025

[21] [21]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” inProc. Eur . Conf. Comput. Vis. (ECCV), Virtual, Aug. 2020, pp. 213–229

2020

[22] [22]

Native and Compact Structured Latents for 3D Generation

J. Xiang, X. Chen, S. Xu, R. Wang, Z. Lv, Y . Deng, H. Zhu, Y . Dong, H. Zhao, N. J. Yuan, and J. Yang, “Native and compact structured latents for 3D generation,”arXiv preprint arXiv:2512.14692, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

RadioGen3D: 3D radio map generation via adversarial learning on large-scale synthetic data,

J. Chen, A. Xu, Z. Zhang, S. Zhang, J. Chen, and S. Cui, “RadioGen3D: 3D radio map generation via adversarial learning on large-scale synthetic data,”arXiv preprint arXiv:2602.18744, 2026

work page arXiv 2026

[24] [24]

Deep Learning-Based Site-Specific Channel Modeling and Inference

J. Song, R. He, M. Yang, Z. Zhang, S. Gao, B. Ai, and Z. Zhong, “Deep learning-based site-specific channel modeling and inference,” arXiv preprint arXiv:2603.28083, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[25] [25]

Deformable DETR: Deformable transformers for end-to-end object detection,

X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable DETR: Deformable transformers for end-to-end object detection,” inProc. Int. Conf. Learn. Representations (ICLR), Virtual, May 2021

2021

[26] [26]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskillet al., “On the opportunities and risks of foundation models,”arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[27] [27]

Multitask learning,

R. Caruana, “Multitask learning,”Mach. Learn., vol. 28, no. 1, pp. 41– 75, Jul. 1997

1997

[28] [28]

Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,

A. Kendall, Y . Gal, and R. Cipolla, “Multi-task learning using uncer- tainty to weigh losses for scene geometry and semantics,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Salt Lake City, UT, USA, Jun. 2018, pp. 7482–7491

2018

[29] [29]

NeRF: Representing scenes as neural radiance fields for view synthesis,

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing scenes as neural radiance fields for view synthesis,” inProc. Eur . Conf. Comput. Vis. (ECCV), Virtual, Aug. 2020, pp. 405–421

2020

[30] [30]

3D gaussian splatting for real-time radiance field rendering,

B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3D gaussian splatting for real-time radiance field rendering,”ACM Trans. Graph., vol. 42, no. 4, pp. 139:1–139:14, Jul. 2023

2023

[31] [31]

NeRF2: Neural radio-frequency radiance fields,

X. Zhao, Z. An, Q. Pan, and L. Yang, “NeRF2: Neural radio-frequency radiance fields,”arXiv preprint arXiv:2305.06118, 2023

work page arXiv 2023

[32] [32]

NeWRF: A deep learning framework for wireless radiation field reconstruction and channel prediction,

H. Lu, C. Vattheuer, B. Mirzasoleiman, and O. Abari, “NeWRF: A deep learning framework for wireless radiation field reconstruction and channel prediction,”arXiv preprint arXiv:2403.03241, 2024

work page arXiv 2024

[33] [33]

RF-3DGS: Wireless channel modeling with radio radiance field and 3D gaussian splatting,

L. Zhang, H. Sun, S. Berweger, C. Gentile, and R. Q. Hu, “RF-3DGS: Wireless channel modeling with radio radiance field and 3D gaussian splatting,”arXiv preprint arXiv:2411.19420, 2024

work page arXiv 2024

[34] [34]

WiNeRT: Towards neural ray tracing for wireless channel modelling and differentiable simulations,

T. Orekondy, P. Kumar, S. Kadambi, H. Ye, J. Soriaga, and A. Behboodi, “WiNeRT: Towards neural ray tracing for wireless channel modelling and differentiable simulations,” inProc. Int. Conf. Learn. Representa- tions (ICLR), Kigali, Rwanda, May 2023

2023

[35] [35]

4D spatio-temporal convnets: Minkowski convolutional neural networks,

C. Choy, J. Gwak, and S. Savarese, “4D spatio-temporal convnets: Minkowski convolutional neural networks,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, Jun. 2019

2019

[36] [36]

Point transformer V2: Grouped vector attention and partition-based pooling,

X. Wu, Y . Lao, L. Jiang, X. Liu, and H. Zhao, “Point transformer V2: Grouped vector attention and partition-based pooling,” inProc. Adv. Neural Inf. Process. Syst. (NeurIPS), New Orleans, LA, USA, Nov. 2022

2022