pith. machine review for the scientific record. sign in

arxiv: 2604.18255 · v1 · submitted 2026-04-20 · 📡 eess.SP

Recognition: unknown

WiFo-MiSAC: A Wireless Foundation Model for Multimodal Sensing and Communication Integration via Synesthesia of Machines (SoM)

Authors on Pith no claims yet

Pith reviewed 2026-05-10 04:01 UTC · model grok-4.3

classification 📡 eess.SP
keywords wireless foundation modelmultimodal sensingcommunication integrationself-supervised learningmixture of expertsbeam predictionchannel estimation
0
0 comments X

The pith

WiFo-MiSAC unifies wireless sensing and communication by tokenizing signals into a shared space for self-supervised learning with disentangled experts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current wireless approaches process communication and sensing data separately, which limits their ability to generalize. The paper introduces WiFo-MiSAC as a foundation model that first converts these different signals into tokens in one common space. Self-supervised training then uses both masked reconstruction and contrastive alignment. A shared-specific disentangled mixture-of-experts structure keeps common and unique features separate to avoid interference. This leads to better results on tasks such as beam prediction and channel estimation, plus easier adaptation to new situations with limited data.

Core claim

The central discovery is that a task-agnostic foundation model can integrate multimodal sensing and communication through signal tokenization into a unified space, followed by pre-training that combines masked reconstruction with contrastive alignment. The SS-DMoE architecture decouples shared and specific representations to allow beneficial interaction without cross-modal interference. This yields state-of-the-art performance on downstream tasks including beam prediction and channel estimation, robust few-shot adaptation, and the capacity to incorporate new modalities without difficulty.

What carries the argument

The shared-specific disentangled mixture-of-experts (SS-DMoE) architecture, which decouples modality-shared and modality-specific representations from tokenized heterogeneous signals.

If this is right

  • State-of-the-art performance is achieved on beam prediction and channel estimation tasks.
  • The model adapts effectively to new tasks using only a small number of examples.
  • New modalities integrate seamlessly into the existing model.
  • The approach provides a scalable backbone for integrated sensing and communication systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Joint processing of sensing and communication may allow wireless systems to operate more efficiently by sharing learned features.
  • The tokenization method could be tested in other multimodal environments, such as combining radar with visual data.
  • Few-shot capabilities suggest potential use in rapidly changing wireless conditions like mobile networks.

Load-bearing premise

That converting different wireless signals into tokens in a single space and applying the SS-DMoE architecture will separate shared and specific features without causing interference between modalities.

What would settle it

If the model shows higher error rates than specialized models on beam prediction when modalities are combined in ways not seen during pre-training, this would challenge the claim of improved generalization without cross-modal interference.

Figures

Figures reproduced from arXiv: 2604.18255 by Boxun Liu, Liuqing Yang, Shijian Gao, Xiang Cheng, Xuanyu Liu.

Figure 1
Figure 1. Figure 1: Comparison between conventional task-specific SoM pipelines and the proposed WiFo-MiSAC framework. adapters, such retrofitting usually introduces additional cross￾modal plumbing and handcrafted fusion choices, offers lim￾ited portability across heterogeneous system configurations and downstream objectives, and degrades noticeably when modalities are added, missing, or perturbed. More importantly, adapter-b… view at source ↗
Figure 2
Figure 2. Figure 2: An illustration of data pre-processing and modal￾specific tokenizer. B. Data Pre-processing and Modal-specific Tokenizer Let M = {csi, map, radar} denote the modality set. Each sample provides synchronized observations X(m) for m ∈ M. For CSI, the channel measured at the BS is X (csi) 0 ∈ C Na×Nsc , where Na = NtNr is the antenna dimension and Nsc is the number of subcarriers. We convert it to a real-value… view at source ↗
Figure 3
Figure 3. Figure 3: An illustration of the unified multimodal encoder architecture in WiFo-MiSAC. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Performance gain of different methods after modality expansion (higher is better). [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Performance drop of different methods after modality missingness (lower is better). [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The t-SNE visualization of CSI, Radar, and Map features extracted by three WiFo-MiSAC variants on PF5. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Scaling laws of WiFo-MiSAC under different tasks and input configurations, where low SNR and high SNR denote SNR = 10 dB and SNR = 20 dB, respectively [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Heatmaps of routing weights for SoM experts and [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

Current learning-based wireless methods struggle with generalization due to the fragmented processing of communication and sensing data. WiFo-MiSAC addresses this as a task-agnostic foundation model that tokenizes heterogeneous signals into a unified space for self-supervised pre-training. A shared-specific disentangled mixture-of-experts (SS-DMoE) architecture is employed to decouple modality-shared and modality-specific representations, facilitating interaction without cross-modal interference. By combining masked reconstruction with contrastive alignment, the model achieves state-of-the-art performance across downstream tasks, including beam prediction and channel estimation. Experimental results demonstrate robust few-shot adaptation and seamless integration of new modalities, positioning WiFo-MiSAC as a scalable backbone for future integrated sensing and communication systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes WiFo-MiSAC, a task-agnostic foundation model for multimodal sensing and communication integration via Synesthesia of Machines. It tokenizes heterogeneous wireless signals (RF and sensing) into a unified space for self-supervised pre-training, employs a shared-specific disentangled mixture-of-experts (SS-DMoE) architecture to separate modality-shared and modality-specific representations, and combines masked reconstruction with contrastive alignment. The model claims state-of-the-art performance on downstream tasks including beam prediction and channel estimation, along with robust few-shot adaptation and seamless integration of new modalities.

Significance. If the experimental claims are substantiated, this work could advance integrated sensing and communication systems by offering a scalable, unified backbone that addresses generalization limitations of fragmented task-specific models. The self-supervised pre-training strategy and emphasis on disentangled multimodal representations represent a potentially valuable direction for handling heterogeneous wireless data with reduced reliance on labeled datasets.

major comments (2)
  1. [Abstract] Abstract: The abstract asserts state-of-the-art results on beam prediction and channel estimation but supplies no experimental details, baselines, error bars, or data-exclusion rules. Without these elements the numerical support for the central performance claims cannot be evaluated.
  2. [SS-DMoE Architecture] SS-DMoE Architecture: The claim that the SS-DMoE successfully decouples modality-shared and modality-specific representations without cross-modal interference is load-bearing for the generalization and few-shot adaptation results. Given that wireless modalities share physical-layer statistics such as multipath and Doppler, the manuscript requires explicit ablations or an interference metric to demonstrate that the mixture-of-experts routing and contrastive alignment prevent leakage and deliver gains over fragmented baselines.
minor comments (1)
  1. The abstract would benefit from a concise statement of the number of modalities and dataset sizes used in the pre-training and downstream evaluations to contextualize the reported robustness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment in detail below and indicate the revisions made to strengthen the work.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract asserts state-of-the-art results on beam prediction and channel estimation but supplies no experimental details, baselines, error bars, or data-exclusion rules. Without these elements the numerical support for the central performance claims cannot be evaluated.

    Authors: We agree that the abstract is currently high-level and does not provide the requested experimental specifics, which limits immediate evaluation of the claims. In the revised manuscript we expand the abstract to include the primary baselines (task-specific models and prior multimodal approaches), the reported performance gains with associated error bars from the main experiments, and a brief reference to the data processing protocol detailed in Section 4.1. The full experimental setup, including any exclusion criteria, remains in the body of the paper. revision: yes

  2. Referee: [SS-DMoE Architecture] SS-DMoE Architecture: The claim that the SS-DMoE successfully decouples modality-shared and modality-specific representations without cross-modal interference is load-bearing for the generalization and few-shot adaptation results. Given that wireless modalities share physical-layer statistics such as multipath and Doppler, the manuscript requires explicit ablations or an interference metric to demonstrate that the mixture-of-experts routing and contrastive alignment prevent leakage and deliver gains over fragmented baselines.

    Authors: The referee correctly notes that the decoupling property is central to our generalization and few-shot results and that shared physical-layer effects could induce leakage. While the manuscript already reports ablation studies comparing SS-DMoE against non-disentangled MoE variants and shows corresponding gains in downstream tasks, we acknowledge that a direct interference metric is not yet present. In the revision we add (i) an explicit ablation isolating the effect of the shared-specific routing and (ii) a quantitative interference metric (normalized mutual information between shared and modality-specific expert activations) together with routing visualizations. These additions directly quantify leakage under shared statistics such as multipath and Doppler and confirm the contribution of the contrastive alignment term. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical self-supervised model with held-out evaluation

full rationale

The paper presents a foundation model trained via masked reconstruction and contrastive alignment on tokenized multimodal wireless signals, then evaluated on downstream tasks such as beam prediction and channel estimation. No equations, first-principles derivations, or predictions appear that reduce by construction to fitted inputs or self-citations. Performance claims rest on experimental results with few-shot adaptation and new-modality integration, which are falsifiable on held-out data. The SS-DMoE architecture and SoM framing are design choices justified by training objectives rather than tautological definitions or load-bearing self-citations that collapse the central result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The central claim rests on standard machine-learning assumptions about tokenization and self-supervised objectives plus domain assumptions about wireless signal compatibility; the model itself introduces new architectural components whose effectiveness is asserted rather than derived from first principles.

free parameters (1)
  • Neural network weights and hyperparameters
    All model parameters are learned from data during the self-supervised pre-training stage.
axioms (1)
  • domain assumption Heterogeneous wireless signals from different modalities can be tokenized into a shared latent space without prohibitive information loss.
    Invoked at the start of the tokenization pipeline to enable unified pre-training.
invented entities (2)
  • SS-DMoE architecture no independent evidence
    purpose: Decouple modality-shared and modality-specific representations to avoid cross-modal interference
    New architectural component introduced by the paper.
  • WiFo-MiSAC model no independent evidence
    purpose: Task-agnostic foundation model for multimodal sensing and communication
    The overall proposed system.

pith-pipeline@v0.9.0 · 5436 in / 1555 out tokens · 54781 ms · 2026-05-10T04:01:38.375266+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 10 canonical work pages · 3 internal anchors

  1. [1]

    Integrated Sensing and Com- munications (ISAC) for Vehicular Communication Networks (VCN),

    X. Cheng, D. Duan, S. Gao, and L. Yang, “Integrated Sensing and Com- munications (ISAC) for Vehicular Communication Networks (VCN),” IEEE Internet Things J., vol. 9, no. 23, pp. 23 441–23 451, Jul. 2022

  2. [2]

    Integrated Sensing and Communications: Toward Dual- Functional Wireless Networks for 6G and Beyond,

    F. Liuet al., “Integrated Sensing and Communications: Toward Dual- Functional Wireless Networks for 6G and Beyond,”IEEE J. Sel. Areas Commun., vol. 40, no. 6, pp. 1728–1767, Mar. 2022

  3. [3]

    Integrated Sensing and Communication for Low Altitude Economy: Opportunities and Challenges,

    Y . Jianget al., “Integrated Sensing and Communication for Low Altitude Economy: Opportunities and Challenges,”IEEE Commun. Mag., vol. 64, no. 12, pp. 72–78, Apr. 2025

  4. [4]

    Intelligent Multi-Modal Sensing-Communication In- tegration: Synesthesia of Machines,

    X. Chenget al., “Intelligent Multi-Modal Sensing-Communication In- tegration: Synesthesia of Machines,”IEEE Commun. Surv. Tutorials, vol. 26, no. 1, pp. 258–301, Firstquarter 2024

  5. [5]

    Integrated sensing and communications toward proactive beamforming in mmWave V2I via multi-modal feature fusion (MMFF),

    H. Zhang, S. Gao, X. Cheng, and L. Yang, “Integrated sensing and communications toward proactive beamforming in mmWave V2I via multi-modal feature fusion (MMFF),”IEEE Transactions on Wireless Communications, vol. 23, no. 11, pp. 15 721–15 735, Nov. 2024

  6. [6]

    Deep Multimodal Learning: Merging Sensory Data for Massive MIMO Channel Predic- tion,

    Y . Yang, F. Gao, C. Xing, J. An, and A. Alkhateeb, “Deep Multimodal Learning: Merging Sensory Data for Massive MIMO Channel Predic- tion,”IEEE J. Sel. Areas Commun., vol. 39, no. 7, pp. 1885–1898, Jul. 2020

  7. [7]

    Sensing Aided Channel Estimation in Wideband Millimeter- Wave MIMO Systems,

    R. Mundlamuri, R. Gangula, C. K. Thomas, F. Kaltenberger, and W. Saad, “Sensing Aided Channel Estimation in Wideband Millimeter- Wave MIMO Systems,” inIEEE Int. Conf. Commun. Workshops (ICC Workshops). Rome, Italy: IEEE, May 2023, pp. 1404–1409

  8. [8]

    Radar Aided Proactive Blockage Prediction in Real-World Millimeter Wave Systems,

    U. Demirhan and A. Alkhateeb, “Radar Aided Proactive Blockage Prediction in Real-World Millimeter Wave Systems,” inIEEE Int. Conf. Commun. (ICC). Seoul, South Korea: IEEE, May 2022, pp. 4547–4552

  9. [9]

    LIDAR Data for Deep Learning-Based mmWave Beam-Selection,

    A. Klautau, N. Gonz ´alez-Prelcic, and R. W. Heath, “LIDAR Data for Deep Learning-Based mmWave Beam-Selection,”IEEE Wireless Commun. Lett., vol. 8, no. 3, pp. 909–912, Feb. 2019

  10. [10]

    Millimeter Wave Base Stations with Cameras: Vision-Aided Beam and Blockage Prediction,

    M. Alrabeiah, A. Hredzak, and A. Alkhateeb, “Millimeter Wave Base Stations with Cameras: Vision-Aided Beam and Blockage Prediction,” inIEEE Proc. IEEE Veh. Technol. Conf. (VTC2020-Spring). Antwerp, Belgium: IEEE, May 2020, pp. 1–5

  11. [11]

    Towards Real-World 6G Drone Communication: Position and Camera Aided Beam Prediction,

    G. Charanet al., “Towards Real-World 6G Drone Communication: Position and Camera Aided Beam Prediction,” inIEEE Global Commun. Conf.(GLOBECOM). Rio de Janeiro, Brazil: IEEE, Dec. 2022, pp. 2951–2956

  12. [12]

    SAM-Med3D: A Vision Foundation Model for General- Purpose Segmentation on V olumetric Medical Images,

    H. Wanget al., “SAM-Med3D: A Vision Foundation Model for General- Purpose Segmentation on V olumetric Medical Images,”IEEE Trans. Neural Networks Learn. Syst., vol. 36, pp. 17 599–17 612, Oct. 2025

  13. [13]

    DMAE-EEG: A Pretraining Framework for EEG Spatiotemporal Representation Learning,

    Y . Zhanget al., “DMAE-EEG: A Pretraining Framework for EEG Spatiotemporal Representation Learning,”IEEE Trans. Neural Networks Learn. Syst., vol. 36, pp. 17 664–17 678, Oct. 2025

  14. [14]

    Foundation Model Em- powered Synesthesia of Machines (SoM): AI-Native Intelligent Multi- Modal Sensing-Communication Integration,

    X. Cheng, B. Liu, X. Liu, E. Liu, and Z. Huang, “Foundation Model Em- powered Synesthesia of Machines (SoM): AI-Native Intelligent Multi- Modal Sensing-Communication Integration,”IEEE Trans. Network Sci. Eng., vol. 13, pp. 762–782, 2026

  15. [15]

    Large Wireless Foundation Models: Stronger over Bigger,

    X. Cheng, B. Liu, X. Liu, and X. Cai, “Large Wireless Foundation Models: Stronger over Bigger,”arXiv preprint arXiv:2601.10963, 2026

  16. [16]

    Large wireless model (LWM): A foundation model for wireless channels,

    S. Alikhani, G. Charan, and A. Alkhateeb, “Large Wireless Model (LWM): A Foundation Model for Wireless Channels,”arXiv preprint arXiv:2411.08872, 2024

  17. [17]

    WiFo: Wireless Foundation Model for Channel Prediction,

    B. Liu, S. Gao, X. Liu, X. Cheng, and L. Yang, “WiFo: Wireless Foundation Model for Channel Prediction,”Sci. China Inf. Sci., vol. 68, no. 8, p. 162302, May 2025

  18. [18]

    WiFo-CF: Wireless foundation model for CSI feedback,

    X. Liu, S. Gao, B. Liu, X. Cheng, and L. Yang, “WiFo-CF: Wireless foundation model for CSI feedback,”arXiv preprint arXiv:2508.04068, 2025

  19. [19]

    WiFo-2: a generalist foundation model unifies heterogeneous wireless system design

    B. Liu, X. Liu, S. Gao, X. Cheng, and L. Yang, “Foundation Model for Intelligent Wireless Communications,”arXiv preprint arXiv:2511.22222, 2025

  20. [20]

    IQFM–A Wireless Foundation Model for I/Q Streams in AI-Native 6G,

    O. Mashaal and H. Abou-Zeid, “IQFM–A Wireless Foundation Model for I/Q Streams in AI-Native 6G,”IEEE Open J. Commun. Soc., vol. 7, pp. 1426–1441, Feb. 2026

  21. [21]

    WiFo-M 2: Plug-and-Play Multi- Modal Sensing via Foundation Model to Empower Wireless Communi- cations,

    H. Zhang, S. Gao, and X. Cheng, “WiFo-M 2: Plug-and-Play Multi- Modal Sensing via Foundation Model to Empower Wireless Communi- cations,”arXiv preprint arXiv:2601.09179, 2026

  22. [22]

    Wireless Multimodal Foundation Model (WMFM): Integrating Vision and Communication Modalities for 6G ISAC Systems,

    M. Farzanullah, H. Zhang, A. B. Sediq, A. Afana, and M. Erol-Kantarci, “Wireless Multimodal Foundation Model (WMFM): Integrating Vision and Communication Modalities for 6G ISAC Systems,”arXiv preprint arXiv:2512.23897, 2025

  23. [23]

    FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER,

    T. Yanet al., “FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER,”IEEE Trans. Neural Net- works Learn. Syst., vol. 36, pp. 10 779–10 793, Jun. 2025

  24. [24]

    A Multi-Modal Foundational Model for Wireless Communication and Sensing,

    V . Yazdnian and Y . Ghasempour, “A Multi-Modal Foundational Model for Wireless Communication and Sensing,”arXiv preprint arXiv:2602.04016, 2026

  25. [25]

    Radar Aided 6G Beam Prediction: Deep Learning Algorithms and Real-World Demonstration,

    U. Demirhan and A. Alkhateeb, “Radar Aided 6G Beam Prediction: Deep Learning Algorithms and Real-World Demonstration,” inIEEE Wireless Commun. Netw. Conf. (WCNC). Austin, USA: IEEE, Apr. 2022, pp. 2655–2660

  26. [26]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,

    A. Dosovitskiyet al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” inInt. Conf. Learn. Representations (ICLR), Vienna, Austria, May 2021, pp. 1–22

  27. [27]

    Root mean square layer normalization,

    B. Zhang and R. Sennrich, “Root mean square layer normalization,” Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, Dec. 2019

  28. [28]

    Qwen3-VL Technical Report

    S. Baiet al., “Qwen3-vl Technical Report,”arXiv preprint arXiv:2511.21631, 2025

  29. [29]

    GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

    D. Lepikhinet al., “Gshard: Scaling giant models with conditional computation and automatic sharding,”arXiv preprint arXiv:2006.16668, 2020

  30. [30]

    Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity,

    W. Fedus, B. Zoph, and N. Shazeer, “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity,”J. Mach. Learn. Res., vol. 23, no. 120, pp. 1–39, Jan. 2022

  31. [31]

    M 3SC: A Generic Dataset for Mixed Multi-Modal (MMM) Sensing and Communication Integration,

    X. Chenget al., “M 3SC: A Generic Dataset for Mixed Multi-Modal (MMM) Sensing and Communication Integration,”China Commun., vol. 20, no. 11, pp. 13–29, Nov. 2023

  32. [32]

    SynthSoM: A Synthetic Intelligent Multi-Modal Sensing-Communication Dataset for Synesthesia of Machines (SoM),

    X. Chenget al., “SynthSoM: A Synthetic Intelligent Multi-Modal Sensing-Communication Dataset for Synesthesia of Machines (SoM),” Sci. Data, vol. 12, no. 819, May 2025

  33. [33]

    DeepSense 6G: A Large-Scale Real-World Multi- Modal Sensing and Communication Dataset,

    A. Alkhateebet al., “DeepSense 6G: A Large-Scale Real-World Multi- Modal Sensing and Communication Dataset,”IEEE Commun. Mag., vol. 61, no. 9, pp. 122–128, Sep. 2023

  34. [34]

    LLM4CP: Adapting Large Language Models for Channel Prediction,

    B. Liu, X. Liu, S. Gao, X. Cheng, and L. Yang, “LLM4CP: Adapting Large Language Models for Channel Prediction,”J. Commun. Inf. Networks, vol. 9, no. 2, pp. 113–125, Jun. 2024

  35. [35]

    LLM4WM: Adapting LLM for Wireless Multi-Tasking,

    X. Liu, S. Gao, B. Liu, X. Cheng, and L. Yang, “LLM4WM: Adapting LLM for Wireless Multi-Tasking,”IEEE Trans. Mach. Learn. Commun. Networking, vol. 3, pp. 835–847, Jul. 2025

  36. [36]

    Accurate Channel Prediction Based on Transformer: Making Mobility Negligible,

    H. Jiang, M. Cui, D. W. K. Ng, and L. Dai, “Accurate Channel Prediction Based on Transformer: Making Mobility Negligible,”IEEE J. Sel. Areas Commun., vol. 40, no. 9, pp. 2717–2732, July 2022

  37. [37]

    Channelformer: Attention Based Neural Solution for Wireless Channel Estimation and Effective Online Train- ing,

    D. Luan and J. S. Thompson, “Channelformer: Attention Based Neural Solution for Wireless Channel Estimation and Effective Online Train- ing,”IEEE Trans. Wireless Commun., vol. 22, no. 10, pp. 6562–6577, Oct. 2023

  38. [38]

    Deep Learning for mmWave Beam and Blockage Prediction Using Sub-6 GHz Channels,

    M. Alrabeiah and A. Alkhateeb, “Deep Learning for mmWave Beam and Blockage Prediction Using Sub-6 GHz Channels,”IEEE Trans. Commun., vol. 68, no. 9, pp. 5504–5518, Sep. 2020

  39. [39]

    Attention Aided CSI Wireless Lo- calization,

    A. Salihu, S. Schwarz, and M. Rupp, “Attention Aided CSI Wireless Lo- calization,” inIEEE Workshop Signal Process. Adv. Wireless Commun. (SPAWC), Oulu, Finland, Jul. 2022, pp. 1–5

  40. [40]

    Multi-Modal Variable-Rate CSI Reconstruction for FDD Massive MIMO Systems,

    Y . Nam and J. Choi, “Multi-Modal Variable-Rate CSI Reconstruction for FDD Massive MIMO Systems,”arXiv preprint arXiv:2501.11926, 2025