WiFo-2: a generalist foundation model unifies heterogeneous wireless system design
Pith reviewed 2026-05-17 05:12 UTC · model grok-4.3
The pith
WiFo-2 pretrained on 11.6 billion channel measurements unifies design for heterogeneous wireless systems via zero-shot reconstruction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
WiFo-2 is a space-time-frequency foundation model pretrained on 11.6 billion CSI points drawn from heterogeneous datasets. It learns generalized wireless representations across scenarios, configurations, and tasks, enabling reliable and accurate zero-shot channel reconstruction that outperforms fully supervised task-specific models. With only 1 percent of the training samples required by supervised AI models, it achieves state-of-the-art performance across nine distinct wireless tasks, and a functional hardware prototype demonstrates its real-world deployability and superior capability.
What carries the argument
The space-time-frequency foundation model that learns unified representations from heterogeneous channel state information across scenarios, configurations, and tasks.
Load-bearing premise
Pretraining on the collected heterogeneous CSI dataset will allow the model to generalize reliably to new scenarios, configurations, and tasks not represented in the 11.6 billion training points.
What would settle it
Evaluating zero-shot channel reconstruction accuracy on a wireless scenario with propagation conditions, frequency bands, or mobility patterns absent from the training distribution and comparing results to retrained task-specific models.
read the original abstract
Emerging sixth-generation wireless systems are increasingly heterogeneous, with compatibility across diverse configurations, ubiquitous coverage, and expanded functionalities. Although deep learning has substantially benefited wireless system design, existing approaches are typically trained for specific system settings and scenarios with limited generalizability. Here we present WiFo-2, a space-time-frequency foundation model for unified wireless communications and sensing system design. Pretrained on a heterogeneous dataset of 11.6 billion channel state information (CSI) points, WiFo-2 learns generalized wireless representations across scenarios, configurations, and tasks, and exhibits scaling-law behavior. WiFo-2 achieves reliable and accurate zero-shot channel reconstruction, outperforming fully supervised task-specific models. With only 1% of the training samples required by supervised AI models, WiFo-2 achieves state-of-the-art performance across 9 distinct wireless tasks. A functional hardware prototype further demonstrates its real-world deployability and superior capability across diverse wireless tasks. This work provides a versatile wireless design framework and advances understanding of wireless channels.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces WiFo-2, a space-time-frequency foundation model pretrained on a heterogeneous dataset of 11.6 billion CSI points. It claims to learn generalized wireless representations across scenarios, configurations, and tasks, exhibit scaling-law behavior, achieve reliable and accurate zero-shot channel reconstruction that outperforms fully supervised task-specific models, attain state-of-the-art performance across 9 distinct wireless tasks using only 1% of the training samples required by supervised AI models, and demonstrate real-world deployability via a functional hardware prototype.
Significance. If the central empirical claims are robustly supported by detailed methods and explicit OOD evaluation, the work would represent a notable advance in applying foundation-model techniques to wireless communications and sensing, offering a potential unified framework for heterogeneous 6G system design that reduces reliance on task-specific supervised training.
major comments (2)
- [Zero-shot reconstruction experiments] The zero-shot channel reconstruction claim (abstract and results) is load-bearing for the generalist foundation-model thesis, yet the manuscript provides no explicit out-of-distribution test sets whose statistical properties (delay spread, Doppler spectrum, spatial correlation) lie outside the support of the 11.6 billion pretraining points. Without such separation, reported gains risk reflecting interpolation rather than the advertised transfer to new configurations.
- [Few-shot learning results] The few-shot SOTA claim across 9 tasks with 1% training samples requires reporting of data splits, statistical significance, and baseline details to rule out post-hoc evaluation choices; the current presentation leaves open whether performance differences are reliable or sensitive to partitioning.
minor comments (2)
- [Abstract] The abstract states strong empirical wins but omits any description of model architecture, pretraining procedure, or data collection protocol, which impairs reproducibility assessment.
- [Methods] Notation for space-time-frequency representations and the precise pretraining loss should be introduced earlier and used consistently in the methods section.
Simulated Author's Rebuttal
We thank the referee for the constructive and insightful comments. We have addressed each major point below and revised the manuscript to provide the requested details and strengthen the empirical support for our claims.
read point-by-point responses
-
Referee: [Zero-shot reconstruction experiments] The zero-shot channel reconstruction claim (abstract and results) is load-bearing for the generalist foundation-model thesis, yet the manuscript provides no explicit out-of-distribution test sets whose statistical properties (delay spread, Doppler spectrum, spatial correlation) lie outside the support of the 11.6 billion pretraining points. Without such separation, reported gains risk reflecting interpolation rather than the advertised transfer to new configurations.
Authors: We thank the referee for this important observation. The original manuscript described the zero-shot test scenarios as distinct from pretraining but did not include quantitative comparisons of statistical properties. In the revised manuscript, we have added a new subsection (Section 4.2) that reports delay spread, Doppler spectrum, and spatial correlation metrics for the zero-shot test sets relative to the 11.6 billion pretraining points. These metrics show clear distributional shifts (e.g., test sets exhibit 30-50% higher maximum Doppler spreads and different spatial correlation structures). We have also included additional OOD experiments on configurations with unseen antenna arrays and frequency bands to further demonstrate transfer rather than interpolation. revision: yes
-
Referee: [Few-shot learning results] The few-shot SOTA claim across 9 tasks with 1% training samples requires reporting of data splits, statistical significance, and baseline details to rule out post-hoc evaluation choices; the current presentation leaves open whether performance differences are reliable or sensitive to partitioning.
Authors: We agree that fuller experimental details are needed to establish reliability. The revised manuscript now includes an expanded experimental protocol section that specifies: (i) the exact train/test splits for each of the 9 tasks with explicit confirmation of no leakage from pretraining data; (ii) performance aggregated over 5 random seeds with mean, standard deviation, and p-values from paired statistical tests against the supervised baselines; and (iii) complete descriptions of baseline model architectures, hyperparameters, and training procedures. These additions confirm that the reported gains with 1% samples are statistically significant and robust across different partitions. revision: yes
Circularity Check
No circularity: empirical performance claims rest on direct experimental measurements
full rationale
The paper presents WiFo-2 as a pretrained foundation model on a heterogeneous CSI dataset of 11.6 billion points, with claims of zero-shot channel reconstruction and state-of-the-art few-shot results across 9 tasks supported by reported empirical outcomes and a hardware prototype. No equations, derivations, or mathematical chains appear in the abstract or described claims that reduce predictions or results to fitted parameters by construction. Performance numbers are presented as direct measurements from training and evaluation, not self-definitional quantities or renamed known results. Self-citations, if present, are not load-bearing for the central generalization claims, which rely on experimental validation rather than reduction to inputs or prior author work. The derivation chain is effectively absent, rendering the work self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Wireless channels exhibit sufficient shared structure across scenarios and configurations to support a single generalist model
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
WiFo-2 adopts the proposed transformer-based MDAE architecture... CSI-SMoE layer... two-phase pretraining strategy... mixed masking and denoising pretraining tasks
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LH-CSI... 11.6 billion CSI points... zero-shot generalization split
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 5 Pith papers
-
FARM: Foundational Aerial Radio Map for Intelligent Low-Altitude Networking
FARM is a foundation model combining masked autoencoders and diffusion decoders to estimate high-resolution aerial radio maps from a new multi-band low-altitude dataset, claiming superior accuracy and generalization o...
-
WiFo-MiSAC: A Wireless Foundation Model for Multimodal Sensing and Communication Integration via Synesthesia of Machines (SoM)
WiFo-MiSAC is a task-agnostic foundation model that unifies multimodal wireless signals via tokenization and self-supervised learning with SS-DMoE to achieve strong few-shot performance on beam prediction and channel ...
-
AirFM-DDA: Air-Interface Foundation Model in the Delay-Doppler-Angle Domain for AI-Native 6G
AirFM-DDA reparameterizes wireless channel data into the delay-Doppler-angle domain and uses efficient window attention to achieve better zero-shot performance on channel prediction and estimation with lower compute cost.
-
A Graph Foundation Model for Wireless Resource Allocation
A pre-trained interference-aware graph Transformer model for wireless resource allocation that achieves strong few-shot adaptation to new tasks and scenarios.
-
Adaptive 3D-RoPE: Physics-Aligned Rotary Positional Encoding for Wireless Foundation Models
Adaptive 3D-RoPE adapts rotary positional encoding to wireless channel physics via learnable 3D frequencies and dynamic CSI control, yielding up to 10.7 dB NMSE gains in scale extrapolation and 1 dB in zero-shot tasks.
Reference graph
Works this paper leans on
-
[1]
Choi, H.W.,et al.: Smart Textile Lighting/Display System With Multifunctional Fibre Devices for Large Scale Smart Home and IoT Applications. Nat. Commun. 13, 814 (2022)
work page 2022
-
[2]
Cheng, X.,et al.: Intelligent Multi-Modal Sensing-Communication Integration: Synesthesia of Machines. IEEE Commun. Surv. Tutor.26, 258–301 (2024)
work page 2024
-
[3]
IEEE Wireless Commun.27(2), 218–228 (2020) https://doi.org/10.1109/ mwc.001.1900333
Chen, S.,et al.: Vision, Requirements, and Technology Trend of 6G: How to Tackle the Challenges of System Coverage, Capacity, User Data-Rate and Movement Speed. IEEE Wireless Commun.27(2), 218–228 (2020) https://doi.org/10.1109/ mwc.001.1900333
work page 2020
-
[4]
Liu, X., Gao, S., Liu, B., Cheng, X., Yang, L.: LLM4WM: Adapting LLM for Wireless Multi-Tasking. IEEE Trans. Mach. Learn. Commun. Netw.3, 835–847 (2025) https://doi.org/10.1109/TMLCN.2025.3585845
-
[5]
Liu, B., Liu, X., Gao, S., Cheng, X., Yang, L.: LLM4CP: Adapting Large Lan- guage Models for Channel Prediction. J. Commun. Inf. Netw.9(2), 113–125 (2024)
work page 2024
-
[6]
Li, Y.,et al.: Multi-Representation Domain Attentive Contrastive Learning Based Unsupervised Automatic Modulation Recognition. Nat. Commun.16, 5951 (2025)
work page 2025
-
[7]
Abramson, J.,et al.: Accurate Structure Prediction of Biomolecular Interactions With AlphaFold 3. Nature630, 493–500 (2024)
work page 2024
-
[8]
Moor, M., Banerjee, O., Shakeri Hossein Abad, Z., Krumholz, H.M., Leskovec, J., Topol, E.J., Rajpurkar, P.,et al.: Foundation Models for Generalist Medical 40 Artificial Intelligence. Nature616, 259–265 (2023)
work page 2023
-
[9]
Wu, K.,et al.: A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model for Earth Observation. Nat. Mach. Intell.7, 1235–1249 (2025)
work page 2025
-
[10]
Binz, M.,et al.: A Foundation Model to Predict and Capture Human Cognition. Nature644, 1002–1009 (2025)
work page 2025
-
[11]
Xue, B.,et al.: Deep Spectral Component Filtering as a Foundation Model for Spectral Analysis Demonstrated in Metabolic Profiling. Nat. Mach. Intell.7, 743– 757 (2025)
work page 2025
-
[12]
Cheng, X., Liu, B., Liu, X., Liu, E., Huang, Z.: Foundation Model Empowered Synesthesia of Machines (SoM): AI-Native Intelligent Multi-Modal Sensing- Communication Integration. IEEE Trans. Netw. Sci. Eng. (2025) https://doi.org/ 10.1109/TNSE.2025.3587238 . Early Access
-
[13]
Liu, B., Gao, S., Liu, X., Cheng, X., Yang, L.: WiFo: Wireless Foundation Model for Channel Prediction. Sci. China Inf. Sci.68, 162302 (2025) https://doi.org/ 10.1007/s11432-025-4349-0
-
[14]
He, Y.,et al.: Generalized Biological Foundation Model With Unified Nucleic Acid and Protein Language. Nat. Mach. Intell. (2025) https://doi.org/10.1038/ s42256-025-01044-4
work page 2025
-
[15]
Pai, S.,et al.: Foundation Model for Cancer Imaging Biomarkers. Nat. Mach. Intell.6(3), 354–367 (2024) https://doi.org/10.1038/s42256-024-00807-9
-
[16]
Large Wireless Model (LWM): A Foundation Model for Wireless Channels,
Alikhani, S., Charan, G., Alkhateeb, A.: Large Wireless Model (LWM): A Foundation Model for Wireless Channels. arXiv (2024) 2411.08872
-
[17]
Salihu, A., Rupp, M., Schwarz, S.: Self-Supervised and Invariant Representa- tions for Wireless Localization. IEEE Trans. Wireless Commun.23(8), 8281–8296 (2024) https://doi.org/10.1109/TWC.2023.3348203
-
[18]
Catak, F.O., Kuzlu, M., Cali, U.: BERT4MIMO: A Foundation Model Using BERT Architecture for Massive MIMO Channel State Information Prediction. arXiv (2025) 2501.01802
-
[19]
Zhao, Z., et al.: CSI-BERT2: A BERT-Inspired Framework for Efficient CSI Pre- diction and Classification in Wireless Communication and Sensing. arXiv (2024) 2412.06861
-
[20]
Jiang, J., Yu, W., Li, Y., Gao, Y., Xu, S.: A MIMO Wireless Channel Foundation Model via CIR-CSI Consistency. arXiv (2025) 2502.11965
-
[21]
Jaeckel, S., Raschkowski, L., B¨ orner, K., Thiele, L.: QuaDRiGa: A 3-D Multi-Cell Channel Model With Time Evolution for Enabling Virtual Field Trials. IEEE 41 Trans. Antennas Propag.62(6), 3242–3256 (2014)
work page 2014
-
[22]
Huang, Z.,et al.: A Mixed-Bouncing Based Non-Stationarity and Consistency 6G V2V Channel Model With Continuously Arbitrary Trajectory. IEEE Trans. Wireless Commun.23(2), 1634–1650 (2023)
work page 2023
-
[23]
In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp
Yaman, I.,et al.: The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization. In: 2024 IEEE International Conference on Robotics and Automation (ICRA), pp. 11920–11926 (2024)
work page 2024
-
[24]
In: WSA 2021; 25th International ITG Workshop on Smart Antennas (2021)
Euchner, F., Gauger, M., D¨ orner, S., Brink, S.: A Distributed Massive MIMO Channel Sounder for ”Big CSI Data”-Driven Machine Learning. In: WSA 2021; 25th International ITG Workshop on Smart Antennas (2021)
work page 2021
-
[25]
In: 2016 50th Asilomar Conference on Signals, Systems and Computers, pp
Shepard, C., Ding, J., Guerra, R.E., Zhong, L.: Understanding Real Many- Antenna MU-MIMO Channels. In: 2016 50th Asilomar Conference on Signals, Systems and Computers, pp. 461–467 (2016)
work page 2016
-
[26]
DeepMIMO: A Generic Deep Learning Dataset for Millimeter Wave and Massive MIMO Applications
Alkhateeb, A.: DeepMIMO: A Generic Deep Learning Dataset for Millimeter Wave and Massive MIMO Applications. arXiv (2019) 1902.06435
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[27]
https://nvlabs.github.io/sionna/
Hoydis, J., Cammerer, S., Ait Aoudia, F., Nimier-David, M., Maggi, L., Marcus, G., Vem, A., Keller, A.: Sionna. https://nvlabs.github.io/sionna/
-
[28]
Jiang, H., Cui, M., Ng, D.W.K., Dai, L.: Accurate Channel Prediction Based on Transformer: Making Mobility Negligible. IEEE J. Sel. Areas Commun.40(9), 2717–2732 (2022)
work page 2022
-
[29]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp
Feichtenhofer, C., Fan, H., Malik, J., He, K.: SlowFast Networks for Video Recog- nition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
work page 2019
-
[30]
Soltani, M., Pourahmadi, V., Mirzaei, A., Sheikhzadeh, H.: Deep Learning-Based Channel Estimation. IEEE Commun. Lett.23(4), 652–655 (2019)
work page 2019
-
[31]
Luan, D., Thompson, J.S.: Channelformer: Attention-Based Neural Solution for Wireless Channel Estimation and Effective Online Training. IEEE Trans. Wireless Commun.22(10), 6562–6577 (2023)
work page 2023
-
[32]
Jiang, W., Schotten, H.D.: Deep Learning for Fading Channel Prediction. IEEE Open J. Commun. Soc.1, 320–332 (2020)
work page 2020
-
[33]
Yin, H., Wang, H., Liu, Y., Gesbert, D.: Addressing the Curse of Mobility in Massive MIMO With Prony-Based Angular-Delay Domain Channel Predictions. IEEE J. Sel. Areas Commun.38(12), 2903–2917 (2020)
work page 2020
-
[34]
In: ICLR 2025: The Thirteenth International Conference on Learning Representations (2025)
Xiaoming, S., Shiyu, W., Yuqi, N., Dianqi, L., Zhou, Y., Qingsong, W., Jin, M.: Time-MoE: Billion-Scale Time Series Foundation Models With Mixture of 42 Experts. In: ICLR 2025: The Thirteenth International Conference on Learning Representations (2025). International Conference on Learning Representations
work page 2025
-
[35]
Sun, Z., Wang, K., Sun, R., Chen, Z.: Channel State Identification in Complex Indoor Environments With ST-CNN and Transfer Learning. IEEE Commun. Lett.27(2), 546–550 (2023)
work page 2023
-
[36]
Alrabeiah, M., Alkhateeb, A.: Deep Learning for mmWave Beam and Blockage Prediction Using Sub-6 GHz Channels. IEEE Trans. Commun.68(9), 5504–5518 (2020)
work page 2020
-
[37]
Salihu, A., Schwarz, S., Rupp, M.: Attention Aided CSI Wireless Localization. In: 2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5 (2022)
work page 2022
-
[38]
Cheng, X.,et al.: SynthSoM: A Synthetic Intelligent Multi-Modal Sensing- Communication Dataset for Synesthesia of Machines (SoM). Sci. Data12, 819 (2025) https://doi.org/10.1038/s41597-025-05065-x
-
[39]
Nam, Y., Choi, J.: Multi-Modal Variable-Rate CSI Reconstruction for FDD Massive MIMO Systems. arXiv (2025) 2501.11926
-
[40]
Wen, C.-K., Shih, W.-T., Jin, S.: Deep Learning for Massive MIMO CSI Feedback. IEEE Wireless Commun. Lett.7(5), 748–751 (2018) https://doi.org/10.1109/ LWC.2018.2818160
-
[41]
Cui, Y., Guo, A., Song, C.: TransNet: Full Attention Network for CSI Feedback in FDD Massive MIMO System. IEEE Wireless Commun. Lett.11(5), 903–907 (2022)
work page 2022
-
[42]
Pan, G., Huang, K., Chen, H., Zhang, S., H¨ ager, C., Wymeersch, H.: Large Wire- less Localization Model (LWLM): A Foundation Model for Positioning in 6G Networks. arXiv (2025) 2505.10134
-
[43]
He, H., Wen, C.-K., Jin, S., Li, G.Y.: Model-Driven Deep Learning for MIMO Detection. IEEE Trans. Signal Process.68, 1702–1715 (2020)
work page 2020
-
[44]
Maaten, L., Hinton, G.E.: Visualizing Data Using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
work page 2008
- [45]
-
[46]
Cui, Y., Guo, J., Wen, C.-K., Jin, S., Tong, E.: Leveraging Pre-Trained Large Language Models for CSI Feedback in Massive MIMO Systems. Authorea Prepr
-
[47]
Yin, H., Wang, H., Liu, Y., Gesbert, D.: Addressing the Curse of Mobility in 43 Massive MIMO With Prony-Based Angular-Delay Domain Channel Predictions. IEEE J. Sel. Areas Commun.38(12), 2903–2917 (2020)
work page 2020
- [48]
-
[49]
Scaling Laws for Neural Language Models
Kaplan, J., et al.: Scaling Laws for Neural Language Models. arXiv (2020) 2001.08361 44
work page internal anchor Pith review Pith/arXiv arXiv 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.