RF-Analyzer: Can Vision-Language Models Learn RF Understanding from Synthetic Data?

Anis Bara; Brahim Mefgouda; Hang Zou; Lina Bariah; Merouane Debbah

arxiv: 2605.04676 · v1 · submitted 2026-05-06 · 📡 eess.SP

RF-Analyzer: Can Vision-Language Models Learn RF Understanding from Synthetic Data?

Anis Bara , Lina Bariah , Hang Zou , Brahim Mefgouda , Merouane Debbah This is my paper

Pith reviewed 2026-05-08 16:46 UTC · model grok-4.3

classification 📡 eess.SP

keywords vision-language modelssynthetic dataRF spectrogramssignal understandinggeneralizationwireless spectrumphysical attribute extractionSDR platform

0 comments

The pith

Vision-language models can learn to understand real RF signals from synthetic spectrogram data alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper asks if vision-language models trained only on computer-generated RF spectrograms can still make sense of actual wireless signals captured from the air. The authors find that the models do generalize, successfully pulling out details like which frequencies are in use, how the signal behaves over time, and its strength level. This matters because collecting large amounts of real RF data for training is costly and logistically difficult, whereas synthetic data can be generated in unlimited quantities. They support this by building RF-Analyzer, a system that links software-defined radios directly to the model for real-time testing, and by defining new metrics to measure how well the model describes the physical properties without hallucinating or leaking prompt information. The results hold for typical conditions but break down when signals are very weak or when the synthetic data misses key variations.

Core claim

VLMs trained exclusively on synthetic spectrogram data can generalize to real over-the-air RF environments, particularly for extracting physical signal attributes such as spectral occupancy, temporal behavior, and SNR. This indicates that synthetic data is sufficient for learning transferable representations of RF signal structure, though generalization is limited without contextual priors and fails in low-SNR regimes.

What carries the argument

RF-Analyzer, an SDR-to-AI analysis platform that pairs live spectrum captures with VLM interpretations and uses metrics like Physical Attribute Extraction Score to evaluate generalization from synthetic to real data.

If this is right

VLMs can extract physical attributes from real RF signals after synthetic training.
Generalization succeeds for signal properties within the synthetic distribution.
Low-SNR regimes and lack of contextual priors limit the transfer.
The introduced platform and metrics enable systematic assessment of VLM performance on live RF data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could make advanced spectrum analysis tools more accessible by reducing reliance on expensive real-world datasets.
The success suggests potential for applying similar synthetic-data strategies to other signal processing tasks involving visual representations like images of waveforms.
Testing with augmented synthetic data that includes more noise variations could improve performance in challenging low-SNR conditions.

Load-bearing premise

The synthetic training data distribution is representative enough of real over-the-air RF variations to support generalization, especially outside the low-SNR regimes explicitly noted as failure cases.

What would settle it

A demonstration that the VLM misidentifies key attributes like spectral occupancy on real signals whose characteristics fall within the range of the synthetic training data.

Figures

Figures reproduced from arXiv: 2605.04676 by Anis Bara, Brahim Mefgouda, Hang Zou, Lina Bariah, Merouane Debbah.

**Figure 1.** Figure 1: System architecture of RF-Analyzer. Colored blocks indicate functional view at source ↗

**Figure 3.** Figure 3: RF Analyzer running on an Ubuntu 24 workstation with an Ettus view at source ↗

read the original abstract

Understanding the wireless spectrum is a fundamen- tal requirement for intelligent communication systems, however, interpreting spectrograms requires extracting multiple physical attributes and reasoning about signal structure, which is a capability that is not achieved by traditional ML approaches. Recent advances in vision-language models (VLMs) demonstrated the possibility of learning such interpretation capabilities directly from data. This paper investigates whether VLMs can learn this capability from synthetic data alone, and more importantly, whether such learned representations generalize to real over-the- air RF environments. To address this question, we introduce RF-Analyzer, an SDR-to-AI analysis platform that integrates live spectrum captures associated with the corresponding VLM- based interpretation, enabling direct evaluation of VLMs outputs on live over-the-air signals. Using this platform, we assess a model trained exclusively on synthetic spectrogram data with general-purpose baselines. To enable systematic analysis, we establish a benchmark framework comprising three metrics, Physical Attribute Extraction Score (PAES), Prompt Leakage Rate (PLR), and hallucination count, to assess signal understanding and grounding. The obtained results demonstrate that VLMs trained on synthetic spectrogram data can generalize to real RF environments, particularly for extracting physical signal attributes such as spectral occupancy, temporal behavior, and SNR. This indicates that synthetic data is sufficient for learning transferable representations of RF signal structure. However, this generalization is limited due to the fact that synthetic training does not provide reliable semantic grounding without contextual priors. In particular, generalization breaks under conditions that are not covered in the synthetic distribution, particularly low-SNR regimes

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VLMs pick up basic physical RF attributes from synthetic spectrograms and apply them to some real signals, but the transfer is narrow and the supporting evidence stays thin.

read the letter

The main point is that VLMs trained only on synthetic spectrogram data can extract attributes like spectral occupancy, temporal behavior, and SNR from live over-the-air captures, at least under the conditions tested. The paper introduces RF-Analyzer, a platform that links SDR hardware directly to VLM outputs for this kind of evaluation, plus three metrics (PAES, PLR, and hallucination count) to measure how well the model stays grounded in the signal rather than hallucinating or leaking prompts. That setup lets them run a direct synthetic-to-real check and compare against general baselines, which is a practical step for anyone trying to avoid collecting large real RF datasets. They also state the limits plainly: performance drops in low-SNR regimes and the model lacks reliable semantic grounding without extra priors. Those admissions keep the claims from overreaching on what the data shows. The platform and metrics are the clearest additions; they give a concrete way to benchmark this kind of cross-domain transfer that earlier vision-for-signal work did not supply in the same integrated form. The results line up with the idea that synthetic data can teach transferable low-level structure, at least for the attributes they measured. The soft spot is the strength of the generalization evidence. The abstract reports positive outcomes but supplies no numbers, no detailed baseline scores, no statistical tests, and no breakdown of how the real test captures were selected or how much they differed from the synthetic distribution. The load-bearing assumption is that the synthetic generator already covers the main real-world mismatches (multipath, hardware effects, dynamic interference). The paper flags low-SNR failures but does not show tests for those other factors, so it is hard to know whether the observed transfer would hold on a broader set of live signals or if the current results reflect a narrower test regime. This work is aimed at people building AI tools for spectrum sensing or cognitive radio who want to explore VLMs without massive real-data collection. A reader looking for new evaluation tools or synthetic-data strategies would find usable pieces here. It deserves a serious referee because the platform and metrics are new and the question is relevant, even though the current experiments will need more detail and wider testing to support the claims. I would send it for review with a request for full experimental tables and additional real-world variation checks.

Referee Report

2 major / 2 minor

Summary. The paper introduces the RF-Analyzer SDR-to-AI platform to test whether vision-language models trained exclusively on synthetic spectrogram data can extract physical RF signal attributes (spectral occupancy, temporal behavior, SNR) from live over-the-air captures. It defines three evaluation metrics—Physical Attribute Extraction Score (PAES), Prompt Leakage Rate (PLR), and hallucination count—and reports that the trained VLM generalizes to real signals for these attributes while noting failures in semantic grounding and low-SNR regimes outside the synthetic distribution.

Significance. If the generalization result is robust, the work shows that synthetic data alone can produce transferable representations for physical RF attribute extraction, reducing reliance on scarce real-world labeled captures for spectrum analysis tasks. The RF-Analyzer platform and the PAES/PLR/hallucination benchmark constitute concrete, reusable contributions for evaluating VLM grounding on live SDR data.

major comments (2)

[Abstract] Abstract: the central generalization claim ('VLMs trained on synthetic spectrogram data can generalize to real RF environments, particularly for extracting physical signal attributes') rests on the untested assumption that the synthetic generator reproduces the statistics of real over-the-air effects beyond the explicitly noted low-SNR breakdown; no ablation or quantitative comparison is supplied for multipath, hardware non-idealities, dynamic interference, or SDR-specific artifacts that would shift the input distribution.
[Evaluation / Results] The manuscript provides no experimental details, statistical tests, baseline comparisons, or error analysis to support the reported generalization (reader note: soundness rated 3.0). Without these, the PAES scores cannot be assessed for reliability or compared against traditional ML approaches mentioned in the abstract.

minor comments (2)

[Abstract] The abstract states positive results but does not report numerical PAES values, sample sizes, or confidence intervals; adding these would improve clarity.
[Benchmark Framework] Notation for the three metrics (PAES, PLR, hallucination count) is introduced without a dedicated definitions subsection or table summarizing their formulas and ranges.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will incorporate revisions to strengthen the manuscript's rigor and clarity.

read point-by-point responses

Referee: [Abstract] Abstract: the central generalization claim ('VLMs trained on synthetic spectrogram data can generalize to real RF environments, particularly for extracting physical signal attributes') rests on the untested assumption that the synthetic generator reproduces the statistics of real over-the-air effects beyond the explicitly noted low-SNR breakdown; no ablation or quantitative comparison is supplied for multipath, hardware non-idealities, dynamic interference, or SDR-specific artifacts that would shift the input distribution.

Authors: We acknowledge that the synthetic generator prioritizes core signal parameters (frequency, bandwidth, modulation, SNR) and does not explicitly simulate all real-world effects such as multipath or hardware non-idealities. The generalization results are based on direct testing via the RF-Analyzer platform on live over-the-air captures, which inherently contain these effects, and the PAES scores reflect performance under those conditions. We agree that explicit analysis of distribution shifts would improve the paper. In revision, we will add a dedicated subsection discussing potential mismatches, including qualitative comparisons of real vs. synthetic spectrograms under multipath and interference, plus limitations in regimes outside the synthetic distribution. revision: partial
Referee: [Evaluation / Results] The manuscript provides no experimental details, statistical tests, baseline comparisons, or error analysis to support the reported generalization (reader note: soundness rated 3.0). Without these, the PAES scores cannot be assessed for reliability or compared against traditional ML approaches mentioned in the abstract.

Authors: The full manuscript describes the synthetic data generation process, VLM fine-tuning, RF-Analyzer implementation, and the three metrics, with general-purpose VLMs as baselines. We agree that additional rigor is needed for assessing reliability. In the revised manuscript, we will expand the Evaluation section with: full hyperparameter details and training procedure; statistical summaries (means, standard deviations, and confidence intervals) of PAES across multiple real captures; error analysis stratified by SNR and signal type; and explicit numerical comparisons to traditional ML baselines such as CNN classifiers for spectral occupancy. This will include appropriate statistical tests to support the generalization claims. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with no derivations or self-referential reductions

full rationale

The paper introduces an SDR-based platform and benchmark metrics (PAES, PLR, hallucination count) to compare VLM outputs on synthetic spectrograms versus live over-the-air captures. No equations, derivations, fitted parameters, or first-principles results are claimed. Generalization statements are presented as direct empirical observations with explicit caveats for out-of-distribution cases (low SNR), not as predictions derived from the training distribution by construction. No self-citations, ansatzes, or uniqueness theorems are invoked to support core claims. The work is self-contained as an experimental comparison.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified assumption that synthetic spectrograms capture the essential statistical structure of real RF signals for the attributes tested, plus the validity of the new metrics as proxies for 'understanding'.

axioms (1)

domain assumption Synthetic data distribution sufficiently covers real RF variations for physical attribute extraction
Invoked when claiming generalization from synthetic training to live over-the-air signals.

invented entities (1)

RF-Analyzer platform no independent evidence
purpose: Integrates live SDR captures with VLM-based interpretation for direct evaluation
New system introduced to enable the reported experiments.

pith-pipeline@v0.9.0 · 5598 in / 1373 out tokens · 57430 ms · 2026-05-08T16:46:56.717793+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Large generative AI models for telecom: The next big thing?

L. Bariahet al., “Large generative AI models for telecom: The next big thing?”IEEE Communications Magazine, vol. 62, no. 11, 2024

work page 2024
[2]

TelecomGPT: A framework to build telecom-specific large language models,

H. Zouet al., “TelecomGPT: A framework to build telecom-specific large language models,”IEEE Transactions on Machine Learning in Communications and Networking, vol. 3, pp. 948–975, 2025

work page 2025
[3]

Large language model (LLM) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities,

H. Zhouet al., “Large language model (LLM) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities,” IEEE Communications Surveys & Tutorials, vol. 27, no. 3, 2025

work page 2025
[4]

Spectrum analyzers and signal analyz- ers,

Rohde & Schwarz, “Spectrum analyzers and signal analyz- ers,” https://www.rohde-schwarz.com/us/products/test-and-measurement/ benchtop-analyzers/rs-fsc-spectrum-analyzer 63493-10891.html, 2024, accessed: May 2025

work page 2024
[5]

Signal analyzers,

Keysight Technologies, “Signal analyzers,” https://www.keysight.com/us/ en/product/N9000B/cxa-signal-analyzer-multi-touch-9-khz-26-5-ghz. html, 2024, accessed: May 2025

work page 2024
[6]

Over-the-air deep learning based radio signal classification,

T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning based radio signal classification,”IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 168–179, 2018

work page 2018
[7]

Large scale radio frequency signal classification,

L. Boegneret al., “Large scale radio frequency signal classification,” arXiv preprint arXiv:2207.09918, 2022

work page arXiv 2022
[8]

Hierarchical digital modulation classifica- tion using cumulants,

A. Swami and B. M. Sadler, “Hierarchical digital modulation classifica- tion using cumulants,”IEEE Transactions on Communications, vol. 48, no. 3, pp. 416–429, 2000

work page 2000
[9]

Deep neural network architectures for modulation classification,

N. E. West and T. O’Shea, “Deep neural network architectures for modulation classification,” in2017 IEEE 18th Wireless and Microwave Technology Conference (WAMICON), 2017, pp. 1–6

work page 2017
[10]

Seeing radio: From zero RF priors to explainable modulation recognition with vision language models,

H. Zouet al., “Seeing radio: From zero RF priors to explainable modulation recognition with vision language models,”arXiv preprint arXiv:2601.13157, 2026

work page arXiv 2026
[11]

2602.14833 , archivePrefix=

H. Zou, Y . Tian, B. Wang, L. Bariah, S. Lasaulce, C. Huang, and M. Debbah, “RF-GPT: Teaching AI to See the Wireless World,”arXiv preprint arXiv:2602.14833, 2026

work page arXiv 2026
[12]

Efficient memory management for large language model serving with PagedAttention,

W. Kwonet al., “Efficient memory management for large language model serving with PagedAttention,” inProceedings of the 29th symposium on operating systems principles, 2023, pp. 611–626

work page 2023

[1] [1]

Large generative AI models for telecom: The next big thing?

L. Bariahet al., “Large generative AI models for telecom: The next big thing?”IEEE Communications Magazine, vol. 62, no. 11, 2024

work page 2024

[2] [2]

TelecomGPT: A framework to build telecom-specific large language models,

H. Zouet al., “TelecomGPT: A framework to build telecom-specific large language models,”IEEE Transactions on Machine Learning in Communications and Networking, vol. 3, pp. 948–975, 2025

work page 2025

[3] [3]

Large language model (LLM) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities,

H. Zhouet al., “Large language model (LLM) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities,” IEEE Communications Surveys & Tutorials, vol. 27, no. 3, 2025

work page 2025

[4] [4]

Spectrum analyzers and signal analyz- ers,

Rohde & Schwarz, “Spectrum analyzers and signal analyz- ers,” https://www.rohde-schwarz.com/us/products/test-and-measurement/ benchtop-analyzers/rs-fsc-spectrum-analyzer 63493-10891.html, 2024, accessed: May 2025

work page 2024

[5] [5]

Signal analyzers,

Keysight Technologies, “Signal analyzers,” https://www.keysight.com/us/ en/product/N9000B/cxa-signal-analyzer-multi-touch-9-khz-26-5-ghz. html, 2024, accessed: May 2025

work page 2024

[6] [6]

Over-the-air deep learning based radio signal classification,

T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning based radio signal classification,”IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 168–179, 2018

work page 2018

[7] [7]

Large scale radio frequency signal classification,

L. Boegneret al., “Large scale radio frequency signal classification,” arXiv preprint arXiv:2207.09918, 2022

work page arXiv 2022

[8] [8]

Hierarchical digital modulation classifica- tion using cumulants,

A. Swami and B. M. Sadler, “Hierarchical digital modulation classifica- tion using cumulants,”IEEE Transactions on Communications, vol. 48, no. 3, pp. 416–429, 2000

work page 2000

[9] [9]

Deep neural network architectures for modulation classification,

N. E. West and T. O’Shea, “Deep neural network architectures for modulation classification,” in2017 IEEE 18th Wireless and Microwave Technology Conference (WAMICON), 2017, pp. 1–6

work page 2017

[10] [10]

Seeing radio: From zero RF priors to explainable modulation recognition with vision language models,

H. Zouet al., “Seeing radio: From zero RF priors to explainable modulation recognition with vision language models,”arXiv preprint arXiv:2601.13157, 2026

work page arXiv 2026

[11] [11]

2602.14833 , archivePrefix=

H. Zou, Y . Tian, B. Wang, L. Bariah, S. Lasaulce, C. Huang, and M. Debbah, “RF-GPT: Teaching AI to See the Wireless World,”arXiv preprint arXiv:2602.14833, 2026

work page arXiv 2026

[12] [12]

Efficient memory management for large language model serving with PagedAttention,

W. Kwonet al., “Efficient memory management for large language model serving with PagedAttention,” inProceedings of the 29th symposium on operating systems principles, 2023, pp. 611–626

work page 2023