RF-Analyzer: Can Vision-Language Models Learn RF Understanding from Synthetic Data?
Pith reviewed 2026-05-08 16:46 UTC · model grok-4.3
The pith
Vision-language models can learn to understand real RF signals from synthetic spectrogram data alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VLMs trained exclusively on synthetic spectrogram data can generalize to real over-the-air RF environments, particularly for extracting physical signal attributes such as spectral occupancy, temporal behavior, and SNR. This indicates that synthetic data is sufficient for learning transferable representations of RF signal structure, though generalization is limited without contextual priors and fails in low-SNR regimes.
What carries the argument
RF-Analyzer, an SDR-to-AI analysis platform that pairs live spectrum captures with VLM interpretations and uses metrics like Physical Attribute Extraction Score to evaluate generalization from synthetic to real data.
If this is right
- VLMs can extract physical attributes from real RF signals after synthetic training.
- Generalization succeeds for signal properties within the synthetic distribution.
- Low-SNR regimes and lack of contextual priors limit the transfer.
- The introduced platform and metrics enable systematic assessment of VLM performance on live RF data.
Where Pith is reading between the lines
- This approach could make advanced spectrum analysis tools more accessible by reducing reliance on expensive real-world datasets.
- The success suggests potential for applying similar synthetic-data strategies to other signal processing tasks involving visual representations like images of waveforms.
- Testing with augmented synthetic data that includes more noise variations could improve performance in challenging low-SNR conditions.
Load-bearing premise
The synthetic training data distribution is representative enough of real over-the-air RF variations to support generalization, especially outside the low-SNR regimes explicitly noted as failure cases.
What would settle it
A demonstration that the VLM misidentifies key attributes like spectral occupancy on real signals whose characteristics fall within the range of the synthetic training data.
Figures
read the original abstract
Understanding the wireless spectrum is a fundamen- tal requirement for intelligent communication systems, however, interpreting spectrograms requires extracting multiple physical attributes and reasoning about signal structure, which is a capability that is not achieved by traditional ML approaches. Recent advances in vision-language models (VLMs) demonstrated the possibility of learning such interpretation capabilities directly from data. This paper investigates whether VLMs can learn this capability from synthetic data alone, and more importantly, whether such learned representations generalize to real over-the- air RF environments. To address this question, we introduce RF-Analyzer, an SDR-to-AI analysis platform that integrates live spectrum captures associated with the corresponding VLM- based interpretation, enabling direct evaluation of VLMs outputs on live over-the-air signals. Using this platform, we assess a model trained exclusively on synthetic spectrogram data with general-purpose baselines. To enable systematic analysis, we establish a benchmark framework comprising three metrics, Physical Attribute Extraction Score (PAES), Prompt Leakage Rate (PLR), and hallucination count, to assess signal understanding and grounding. The obtained results demonstrate that VLMs trained on synthetic spectrogram data can generalize to real RF environments, particularly for extracting physical signal attributes such as spectral occupancy, temporal behavior, and SNR. This indicates that synthetic data is sufficient for learning transferable representations of RF signal structure. However, this generalization is limited due to the fact that synthetic training does not provide reliable semantic grounding without contextual priors. In particular, generalization breaks under conditions that are not covered in the synthetic distribution, particularly low-SNR regimes
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the RF-Analyzer SDR-to-AI platform to test whether vision-language models trained exclusively on synthetic spectrogram data can extract physical RF signal attributes (spectral occupancy, temporal behavior, SNR) from live over-the-air captures. It defines three evaluation metrics—Physical Attribute Extraction Score (PAES), Prompt Leakage Rate (PLR), and hallucination count—and reports that the trained VLM generalizes to real signals for these attributes while noting failures in semantic grounding and low-SNR regimes outside the synthetic distribution.
Significance. If the generalization result is robust, the work shows that synthetic data alone can produce transferable representations for physical RF attribute extraction, reducing reliance on scarce real-world labeled captures for spectrum analysis tasks. The RF-Analyzer platform and the PAES/PLR/hallucination benchmark constitute concrete, reusable contributions for evaluating VLM grounding on live SDR data.
major comments (2)
- [Abstract] Abstract: the central generalization claim ('VLMs trained on synthetic spectrogram data can generalize to real RF environments, particularly for extracting physical signal attributes') rests on the untested assumption that the synthetic generator reproduces the statistics of real over-the-air effects beyond the explicitly noted low-SNR breakdown; no ablation or quantitative comparison is supplied for multipath, hardware non-idealities, dynamic interference, or SDR-specific artifacts that would shift the input distribution.
- [Evaluation / Results] The manuscript provides no experimental details, statistical tests, baseline comparisons, or error analysis to support the reported generalization (reader note: soundness rated 3.0). Without these, the PAES scores cannot be assessed for reliability or compared against traditional ML approaches mentioned in the abstract.
minor comments (2)
- [Abstract] The abstract states positive results but does not report numerical PAES values, sample sizes, or confidence intervals; adding these would improve clarity.
- [Benchmark Framework] Notation for the three metrics (PAES, PLR, hallucination count) is introduced without a dedicated definitions subsection or table summarizing their formulas and ranges.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will incorporate revisions to strengthen the manuscript's rigor and clarity.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central generalization claim ('VLMs trained on synthetic spectrogram data can generalize to real RF environments, particularly for extracting physical signal attributes') rests on the untested assumption that the synthetic generator reproduces the statistics of real over-the-air effects beyond the explicitly noted low-SNR breakdown; no ablation or quantitative comparison is supplied for multipath, hardware non-idealities, dynamic interference, or SDR-specific artifacts that would shift the input distribution.
Authors: We acknowledge that the synthetic generator prioritizes core signal parameters (frequency, bandwidth, modulation, SNR) and does not explicitly simulate all real-world effects such as multipath or hardware non-idealities. The generalization results are based on direct testing via the RF-Analyzer platform on live over-the-air captures, which inherently contain these effects, and the PAES scores reflect performance under those conditions. We agree that explicit analysis of distribution shifts would improve the paper. In revision, we will add a dedicated subsection discussing potential mismatches, including qualitative comparisons of real vs. synthetic spectrograms under multipath and interference, plus limitations in regimes outside the synthetic distribution. revision: partial
-
Referee: [Evaluation / Results] The manuscript provides no experimental details, statistical tests, baseline comparisons, or error analysis to support the reported generalization (reader note: soundness rated 3.0). Without these, the PAES scores cannot be assessed for reliability or compared against traditional ML approaches mentioned in the abstract.
Authors: The full manuscript describes the synthetic data generation process, VLM fine-tuning, RF-Analyzer implementation, and the three metrics, with general-purpose VLMs as baselines. We agree that additional rigor is needed for assessing reliability. In the revised manuscript, we will expand the Evaluation section with: full hyperparameter details and training procedure; statistical summaries (means, standard deviations, and confidence intervals) of PAES across multiple real captures; error analysis stratified by SNR and signal type; and explicit numerical comparisons to traditional ML baselines such as CNN classifiers for spectral occupancy. This will include appropriate statistical tests to support the generalization claims. revision: yes
Circularity Check
No circularity: purely empirical evaluation with no derivations or self-referential reductions
full rationale
The paper introduces an SDR-based platform and benchmark metrics (PAES, PLR, hallucination count) to compare VLM outputs on synthetic spectrograms versus live over-the-air captures. No equations, derivations, fitted parameters, or first-principles results are claimed. Generalization statements are presented as direct empirical observations with explicit caveats for out-of-distribution cases (low SNR), not as predictions derived from the training distribution by construction. No self-citations, ansatzes, or uniqueness theorems are invoked to support core claims. The work is self-contained as an experimental comparison.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synthetic data distribution sufficiently covers real RF variations for physical attribute extraction
invented entities (1)
-
RF-Analyzer platform
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Large generative AI models for telecom: The next big thing?
L. Bariahet al., “Large generative AI models for telecom: The next big thing?”IEEE Communications Magazine, vol. 62, no. 11, 2024
work page 2024
-
[2]
TelecomGPT: A framework to build telecom-specific large language models,
H. Zouet al., “TelecomGPT: A framework to build telecom-specific large language models,”IEEE Transactions on Machine Learning in Communications and Networking, vol. 3, pp. 948–975, 2025
work page 2025
-
[3]
H. Zhouet al., “Large language model (LLM) for telecommunications: A comprehensive survey on principles, key techniques, and opportunities,” IEEE Communications Surveys & Tutorials, vol. 27, no. 3, 2025
work page 2025
-
[4]
Spectrum analyzers and signal analyz- ers,
Rohde & Schwarz, “Spectrum analyzers and signal analyz- ers,” https://www.rohde-schwarz.com/us/products/test-and-measurement/ benchtop-analyzers/rs-fsc-spectrum-analyzer 63493-10891.html, 2024, accessed: May 2025
work page 2024
-
[5]
Keysight Technologies, “Signal analyzers,” https://www.keysight.com/us/ en/product/N9000B/cxa-signal-analyzer-multi-touch-9-khz-26-5-ghz. html, 2024, accessed: May 2025
work page 2024
-
[6]
Over-the-air deep learning based radio signal classification,
T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning based radio signal classification,”IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 168–179, 2018
work page 2018
-
[7]
Large scale radio frequency signal classification,
L. Boegneret al., “Large scale radio frequency signal classification,” arXiv preprint arXiv:2207.09918, 2022
-
[8]
Hierarchical digital modulation classifica- tion using cumulants,
A. Swami and B. M. Sadler, “Hierarchical digital modulation classifica- tion using cumulants,”IEEE Transactions on Communications, vol. 48, no. 3, pp. 416–429, 2000
work page 2000
-
[9]
Deep neural network architectures for modulation classification,
N. E. West and T. O’Shea, “Deep neural network architectures for modulation classification,” in2017 IEEE 18th Wireless and Microwave Technology Conference (WAMICON), 2017, pp. 1–6
work page 2017
-
[10]
Seeing radio: From zero RF priors to explainable modulation recognition with vision language models,
H. Zouet al., “Seeing radio: From zero RF priors to explainable modulation recognition with vision language models,”arXiv preprint arXiv:2601.13157, 2026
-
[11]
H. Zou, Y . Tian, B. Wang, L. Bariah, S. Lasaulce, C. Huang, and M. Debbah, “RF-GPT: Teaching AI to See the Wireless World,”arXiv preprint arXiv:2602.14833, 2026
-
[12]
Efficient memory management for large language model serving with PagedAttention,
W. Kwonet al., “Efficient memory management for large language model serving with PagedAttention,” inProceedings of the 29th symposium on operating systems principles, 2023, pp. 611–626
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.