arxiv: 2604.23533 · v1 · submitted 2026-04-26 · 📡 eess.SP · cs.NI

Recognition: unknown

PILOT: One Physics-Integrated Generation Framework to Unify 2D and 3D Radio Map Construction

Weiming Huang , Hao Sun , Junting Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:11 UTC · model grok-4.3

classification 📡 eess.SP cs.NI

keywords radio map constructionautoregressive generation2D and 3D radio mapswavefront propagation orderingphysics-integrated modelnormalized mean square errorzero-shot cross-domain transfer

0 comments

The pith

PILOT generates accurate 2D and 3D radio maps by ordering predictions as a wavefront expanding from the transmitter.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PILOT as a single pretrained autoregressive framework for building both two-dimensional and three-dimensional radio maps. It replaces the conventional left-to-right raster scan with a sequence that follows the physical expansion of radio waves outward from the transmitter location. Each prediction step is further conditioned by an environment-aware instruction that aligns scene features with the region being generated. For three-dimensional volumes the model stacks successive height slices and adds a gradient loss to enforce continuity in the vertical direction. The resulting model records the lowest normalized mean square error on standard 2D benchmarks, reduces 3D error by 78 percent relative to a diffusion baseline, and runs roughly 2500 times faster while also outperforming sparse-measurement methods and achieving the best zero-shot cross-domain results.

Core claim

PILOT replaces raster ordering with a wavefront sequence expanding outward from the transmitter, each step guided by an environment-aware instruction that spatially aligns environment features with the queried radio map region. The same framework extends to 3D radio maps through height-slice stacking while a gradient loss enforces vertical continuity. On standard 2D benchmarks PILOT achieves the lowest NMSE among all baselines; for volumetric generation it reduces NMSE by 78 percent relative to the diffusion baseline at roughly 2500 times faster inference, outperforms methods relying on 10 percent sparse measurements, and records the best zero-shot results in cross-domain evaluation.

What carries the argument

Wavefront sequence expanding outward from the transmitter, conditioned on environment-aware instructions that align physical features with the generation region.

If this is right

A single model handles both 2D and 3D radio map tasks without separate architectures or training regimes.
Volumetric radio map generation becomes practical for real-time applications because inference is orders of magnitude faster than diffusion approaches.
The method outperforms approaches that require 10 percent sparse measurements, reducing the need for dense sensor deployments.
Strong zero-shot cross-domain performance indicates the model can be deployed in new environments without retraining.
Accurate 3D maps support network planning, wireless digital twins, and UAV trajectory optimization in urban settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same propagation-aware ordering principle could be tested on generation of other spatially structured physical fields such as acoustic or thermal maps.
The inference speedup may enable on-device or edge-based radio map updates for mobile network optimization.
Zero-shot success suggests the model learns transferable propagation physics; experiments with measured real-world data rather than simulations would test this claim.
Extending the wavefront idea to time-varying maps could support dynamic digital twins where transmitters or obstacles move.

Load-bearing premise

That ordering generation as a wavefront sequence expanding from the transmitter supplies more causally informative conditioning and lowers conditional uncertainty than standard raster ordering.

What would settle it

Retrain the identical architecture on the same data but replace the wavefront ordering with raster or random ordering; a substantial rise in NMSE on the same test sets would falsify the ordering benefit.

Figures

Figures reproduced from arXiv: 2604.23533 by Hao Sun, Junting Chen, Weiming Huang.

**Figure 1.** Figure 1: Overview of the PILOT framework. blockage-aware propagation costs accumulated along paths from the transmitter to each target region, using the building environment map. At each step, the radio token is predicted with guidance from its spatially corresponding environment token, and spatial coordinates are embedded via 3D rotary position embedding (3D-RoPE) to ensure positional registration and semantic al… view at source ↗

**Figure 2.** Figure 2: Blockage ratio computation. K sample points are placed along the ground-plane projection from utx to uj ; blue segments indicate blocked portions where zk < H(xk, yk). Here Lthr = 10 log10(W N0) + NF − (PTx)dB is a domaindependent threshold determined by the link-budget setting of the target domain, where W is the bandwidth, NF is the noise figure, and PTx is the transmit power; and d0 is a near-field ref… view at source ↗

**Figure 3.** Figure 3: Overall framework of the proposed PILOT. view at source ↗

**Figure 4.** Figure 4: (a) Wavefront propagation paths in an urban scene, routing around view at source ↗

**Figure 5.** Figure 5: Predicted radio maps. (a) 2D construction on RadioMapSeer. (b) Zero-shot transfer to RadioMap3DSeer under frequency and geometry shift. view at source ↗

**Figure 6.** Figure 6: t-SNE of RadioMapSeer and RadioMap3DSeer distributions in pixel view at source ↗

**Figure 7.** Figure 7: 3D radio maps on UrbanRadio3D at receiver heights 1–4 m. view at source ↗

**Figure 8.** Figure 8: Validation cross-entropy by generation order. (a) Geometric orders. (b) Physics-guided orders. (c) Hybrid strategies. view at source ↗

**Figure 9.** Figure 9: Predictive entropy and vertical gradient error measurements. (a) Sample-averaged predictive entropy view at source ↗

**Figure 10.** Figure 10: Spatial entropy maps for three propagation regimes. view at source ↗

read the original abstract

Unified 2D and 3D radio map construction supports network planning, wireless digital twins, and unmanned aerial vehicle (UAV) applications. In urban environments, blockage, reflection, and diffraction make accurate construction expensive for physics-based solvers. Autoregressive next-token prediction offers a single sequential formulation that can cover both 2D and 3D generation, but standard raster ordering ignores the spatial structure of radio propagation. When generation follows propagation, each token is predicted from propagation-relevant history rather than spatially arbitrary context, which provides more causally informative conditioning and lowers conditional uncertainty. We propose PILOT, a pretrained autoregressive framework that replaces raster scan with a wavefront sequence expanding outward from the transmitter. Each prediction step is guided by an environment-aware instruction that spatially aligns environment features with the queried radio map region. The same framework extends to 3D radio maps through height-slice stacking while a gradient loss enforces vertical continuity. On standard 2D benchmarks, PILOT achieves the lowest NMSE among all baselines. For volumetric generation, it reduces NMSE by 78% relative to the diffusion baseline at roughly $2500\times$ faster inference. It also outperforms methods that rely on 10% sparse measurements and achieves the best zero-shot results in the cross-domain evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PILOT's wavefront ordering for autoregressive radio map generation is a reasonable physics-motivated tweak but the abstract's performance claims lack any supporting experimental details or ablations.

read the letter

The core move here is swapping standard raster ordering for a wavefront sequence that starts at the transmitter and expands outward. That aligns generation steps with how radio waves actually propagate, which could give the model more relevant history at each token. They add environment-aware instructions to align features with the map region and handle 3D by stacking height slices plus a gradient loss for vertical continuity. On paper this unifies 2D and 3D under one autoregressive setup and claims big speedups over diffusion while cutting error substantially on volumetric cases. If the numbers hold, the speed advantage alone would matter for digital-twin or UAV planning work where full physics solvers are too slow. The abstract also says it beats sparse-measurement baselines and does well zero-shot across domains. Those are the parts that could be useful if replicated. The soft spot is exactly what the stress-test note flags: the abstract gives no experimental setup, no baseline descriptions, no dataset sizes, and no error bars, so the 78% NMSE drop and lowest-on-benchmarks claim cannot be checked. More importantly, there is no ablation that keeps the rest of the pipeline fixed and changes only the token ordering. Without that, it is impossible to know whether the wavefront idea is doing the work or whether the gains come from the instructions, the gradient loss, or just better training. The paper appears to treat the ordering benefit as self-evident rather than measured. This is the kind of work that belongs in a reading group for people doing generative models for wireless or propagation tasks, but only after the full manuscript shows the controls and the actual numbers. It deserves a serious referee because the framing is coherent and the application area is practical, yet the current evidence is too thin to accept the quantitative conclusions at face value. I would send it out for review with a request for ablations on the ordering and full experimental transparency.

Referee Report

2 major / 2 minor

Summary. The paper proposes PILOT, a pretrained autoregressive framework for unified 2D and 3D radio map construction. It replaces standard raster ordering with a wavefront sequence expanding outward from the transmitter, each step guided by an environment-aware instruction for spatial alignment. The framework extends to 3D via height-slice stacking and incorporates a gradient loss to enforce vertical continuity. On standard 2D benchmarks it reports the lowest NMSE among baselines; for volumetric generation it claims a 78% NMSE reduction relative to a diffusion baseline at approximately 2500× faster inference, plus superior performance with 10% sparse measurements and in zero-shot cross-domain evaluation.

Significance. If the results hold after proper controls, the work provides a single sequential formulation that unifies 2D/3D radio map generation while embedding a physics-inspired ordering that may lower conditional uncertainty. The reported inference speedup over diffusion models and the ability to handle sparse and cross-domain settings would be practically relevant for network planning and wireless digital twins.

major comments (2)

[Experimental Results] The central claim that wavefront ordering supplies more causally informative conditioning than raster scan (thereby lowering conditional uncertainty and underpinning the unification and NMSE gains) is load-bearing but untested. The reported experiments compare the full PILOT pipeline against external baselines without an internal ablation that fixes architecture, pretraining, environment-aware spatial alignment, and vertical gradient loss while varying only the token ordering (raster vs. wavefront).
[Abstract and Results] The abstract states quantitative wins (lowest NMSE on 2D benchmarks, 78% NMSE reduction in volumetric generation) but supplies no experimental setup, baseline descriptions, dataset details, or error-bar information. These must be fully documented in the main text and tables to allow verification that the data support the claims.

minor comments (2)

Define NMSE explicitly on first use and clarify whether it is normalized over the entire map or per-region.
The description of 'environment-aware instruction' and 'height-slice stacking' would benefit from a short schematic or pseudocode to make the conditioning mechanism reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications and indicate planned revisions to improve the manuscript.

read point-by-point responses

Referee: [Experimental Results] The central claim that wavefront ordering supplies more causally informative conditioning than raster scan (thereby lowering conditional uncertainty and underpinning the unification and NMSE gains) is load-bearing but untested. The reported experiments compare the full PILOT pipeline against external baselines without an internal ablation that fixes architecture, pretraining, environment-aware spatial alignment, and vertical gradient loss while varying only the token ordering (raster vs. wavefront).

Authors: We agree that directly isolating the contribution of wavefront ordering via an internal ablation would provide stronger support for the central claim. Although the reported results compare against baselines that rely on standard raster ordering and demonstrate consistent gains, this does not fully control for all other factors. In the revised manuscript we will add a controlled ablation study that keeps the architecture, pretraining procedure, environment-aware spatial alignment, and vertical gradient loss fixed while varying only the token ordering (raster versus wavefront). This will quantify the reduction in conditional uncertainty and directly test the hypothesis. revision: yes
Referee: [Abstract and Results] The abstract states quantitative wins (lowest NMSE on 2D benchmarks, 78% NMSE reduction in volumetric generation) but supplies no experimental setup, baseline descriptions, dataset details, or error-bar information. These must be fully documented in the main text and tables to allow verification that the data support the claims.

Authors: All requested details are already present in the main text: Section 4 describes the experimental setup, datasets (including the specific 2D benchmarks and 3D volumetric scenarios), baseline implementations, and evaluation metrics; Tables 1–3 report the NMSE values together with standard deviations computed over multiple runs. To improve verifiability we will add explicit cross-references from the abstract and results sections to these tables and expand the caption text where needed. No new experiments are required, but the documentation will be made more prominent. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical ML model with untested but non-reductive ordering assumption

full rationale

The paper describes a pretrained autoregressive transformer for radio map generation that adopts wavefront token ordering from the transmitter, environment-aware instructions, and a vertical gradient loss. No equations, derivations, or parameter-fitting steps are presented in the provided text that would reduce any claimed prediction or result to its own inputs by construction. Performance metrics (NMSE on 2D/3D benchmarks, speedups, zero-shot results) are reported as outcomes of training and evaluation against external baselines rather than self-referential fits or self-citation chains. The central modeling choice (wavefront ordering lowers conditional uncertainty) is an empirical hypothesis whose benefit is not isolated in the reported experiments, but this is a question of experimental controls, not circular reduction of the derivation itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities; the framework is described at a high conceptual level without mathematical formulation or fitting details.

pith-pipeline@v0.9.0 · 5535 in / 1269 out tokens · 95689 ms · 2026-05-08T06:11:02.324129+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Machine learning based clustering and modeling for 6g uav-to-ground communication channels,

Z. Zhang, Y . Liu, C.-X. Wang, H. Chang, J. Bian, and J. Zhang, “Machine learning based clustering and modeling for 6g uav-to-ground communication channels,”IEEE Transactions on Vehicular Technology, vol. 73, no. 10, pp. 14 113–14 126, 2024

2024
[2]

Fast 3-d radio map reconstruction via cross tensor approximation,

C. Li, Z. Dou, and Y . Lin, “Fast 3-d radio map reconstruction via cross tensor approximation,”IEEE Internet of Things Journal, vol. 11, no. 24, pp. 40 619–40 633, 2024

2024
[3]

Digital twin channel for 6g: Concepts, architectures and potential applications,

H. Wang, J. Zhang, G. Nie, L. Yu, Z. Yuan, T. Li, J. Wang, and G. Liu, “Digital twin channel for 6g: Concepts, architectures and potential applications,”IEEE Communications Magazine, vol. 63, no. 3, pp. 24– 30, 2025

2025
[4]

A scalable and generalizable pathloss map prediction,

J.-H. Lee and A. F. Molisch, “A scalable and generalizable pathloss map prediction,”IEEE Transactions on Wireless Communications, vol. 23, no. 11, pp. 17 793–17 806, 2024

2024
[5]

Ckmimagenet: A dataset for ai-based channel knowledge map toward environment-aware communi- cation and sensing,

Z. Wu, D. Wu, S. Fu, Y . Qiu, and Y . Zeng, “Ckmimagenet: A dataset for ai-based channel knowledge map toward environment-aware communi- cation and sensing,”IEEE Transactions on Communications, vol. 73, no. 12, pp. 14 430–14 443, 2025

2025
[6]

Radiounet: Fast radio map estimation with convolutional neural networks,

R. Levie, C ¸ . Yapar, G. Kutyniok, and G. Caire, “Radiounet: Fast radio map estimation with convolutional neural networks,”IEEE Transactions on Wireless Communications, vol. 20, no. 6, pp. 4001–4015, 2021

2021
[7]

Ra- diomamba: Breaking the accuracy-efficiency trade-off in radio map construction via a hybrid mamba-unet,

H. Jia, N. Cheng, X. Wang, C. Zhou, R. Sun, and X. Shen, “Ra- diomamba: Breaking the accuracy-efficiency trade-off in radio map construction via a hybrid mamba-unet,”IEEE Transactions on Network Science and Engineering, vol. 13, pp. 2454–2468, 2026

2026
[8]

Rme-gan: A learning framework for radio map estimation based on conditional generative adversarial network,

S. Zhang, A. Wijesinghe, and Z. Ding, “Rme-gan: A learning framework for radio map estimation based on conditional generative adversarial network,”IEEE Internet of Things Journal, vol. 10, no. 20, pp. 18 016– 18 027, 2023

2023
[9]

Deep completion autoencoders for ra- dio map estimation,

Y . Teganya and D. Romero, “Deep completion autoencoders for ra- dio map estimation,”IEEE Transactions on Wireless Communications, vol. 21, no. 3, pp. 1710–1724, 2022

2022
[10]

Llm4pg: Adapting large language model for pathloss map generation via synesthesia of machines,

M. Sun, L. Bai, X. Cheng, and J. Wu, “Llm4pg: Adapting large language model for pathloss map generation via synesthesia of machines,”arXiv preprint arXiv:2511.02423, 2025

work page arXiv 2025
[11]

In-context radio map estimation via ripple autore- gressive modeling,

Y . Peng and J. Xu, “In-context radio map estimation via ripple autore- gressive modeling,” inNeurIPS 2025 Workshop: AI and ML for Next- Generation Wireless Communications and Networking

2025
[12]

Radiodiff-3d: A 3d× 3d radio map dataset and generative diffusion based benchmark for 6g environment-aware communication,

X. Wang, Q. Zhang, N. Cheng, J. Chen, Z. Zhang, Z. Li, S. Cui, and X. Shen, “Radiodiff-3d: A 3d× 3d radio map dataset and generative diffusion based benchmark for 6g environment-aware communication,” IEEE Transactions on Network Science and Engineering, vol. 13, pp. 3773–3789, 2026

2026
[13]

Radiodiff: An effective generative diffusion model for sampling-free dynamic radio map construction,

X. Wang, K. Tao, N. Cheng, Z. Yin, Z. Li, Y . Zhang, and X. Shen, “Radiodiff: An effective generative diffusion model for sampling-free dynamic radio map construction,”IEEE Transactions on Cognitive Communications and Networking, vol. 11, no. 2, pp. 738–750, 2025

2025
[14]

Randar: Decoder-only autoregressive visual generation in random orders,

Z. Pang, T. Zhang, F. Luan, Y . Man, H. Tan, K. Zhang, W. T. Freeman, and Y .-X. Wang, “Randar: Decoder-only autoregressive visual generation in random orders,” inProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 45–55

2025
[15]

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

P. Sun, Y . Jiang, S. Chen, S. Zhang, B. Peng, P. Luo, and Z. Yuan, “Autoregressive model beats diffusion: Llama for scalable image gener- ation,”ArXiv, vol. abs/2406.06525, 2024

work page internal anchor Pith review arXiv 2024
[16]

Lora: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” inInternational Conference on Learning Representations, ICLR, 2022

2022
[17]

Diveq: Differentiable vector quantization using the reparameterization trick,

M. H. Vali, T. B ¨ackstr¨om, and A. Solin, “Diveq: Differentiable vector quantization using the reparameterization trick,” inInternational Con- ference on Learning Representations, ICLR, 2026

2026
[18]

Deep multi-scale video prediction beyond mean square error,

M. Mathieu, C. Couprie, and Y . LeCun, “Deep multi-scale video prediction beyond mean square error,” inInternational Conference on Learning Representations (ICLR), 2016

2016
[19]

Addressing representation collapse in vector quantized models with one linear layer,

Y . Zhu, B. Li, Y . Xin, Z. Xia, and L. Xu, “Addressing representation collapse in vector quantized models with one linear layer,” inProceed- ings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 22 968–22 977

2025