Unsupervised Equivalent Contrastive Learning for Radio Signal Recognition
Pith reviewed 2026-05-10 16:23 UTC · model grok-4.3
The pith
Unsupervised contrastive learning aligns four domain-specific transformations to produce transferable embeddings for radio signal recognition from unlabeled data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Four information-lossless transformations create multi-view, semantically consistent representations of each radio signal; an equivalent contrastive strategy then aligns these complementary views to learn embeddings that transfer directly to downstream recognition tasks on raw samples without further transformation.
What carries the argument
The equivalent contrastive learning strategy that aligns the four domain-specific views to produce discriminative and transferable embeddings.
If this is right
- The pre-trained model outperforms existing contrastive baselines across linear, few-shot, and transfer evaluations on four public datasets.
- Representations improve notably in few-shot regimes and under challenging channel conditions.
- Deployment requires only raw signal inputs after pre-training, lowering computational cost.
- Massive unlabeled radio datasets can be exploited for pre-training without task-specific labels.
Where Pith is reading between the lines
- The same multi-domain view-alignment idea may extend to other time-series signals such as audio or vibration data.
- Adding further equivalent transformations could increase robustness if they remain lossless.
- The separation between pre-training transformations and downstream raw-signal use simplifies integration into existing radio pipelines.
Load-bearing premise
The four transformations preserve all necessary semantic information and generate views that remain consistent enough for their alignment to yield features useful on downstream tasks.
What would settle it
An experiment on one of the public datasets in which the pre-trained model shows no accuracy gain over standard contrastive baselines when evaluated under few-shot or cross-domain conditions.
Figures
read the original abstract
Robust radio signal recognition is fundamental to spectrum management, electromagnetic space security, and intelligent wireless applications, yet existing deep-learning methods rely heavily on large labeled datasets and struggle to capture the multi-domain characteristics inherent in real-world signals. To address these limitations, we propose an unsupervised equivalent contrastive learning method that leverages four information-lossless equivalent transformations, spanning the time, instantaneous, frequency, and time-frequency domains, to construct multi-view and semantically consistent representations of each signal. An equivalent contrastive learning strategy then aligns these complementary views to learn discriminative and transferable embeddings without requiring labeled data. Once pre-training is completed, the resulting model can be directly fine-tuned on downstream tasks using only raw signal samples, without reapplying any equivalent transformations, which reduces computational overhead and simplifies deployment. Extensive experiments on four public datasets demonstrate that the proposed method consistently outperforms state-of-the-art contrastive baselines under linear evaluation, few-shot semi-supervised learning, and cross-domain transfer settings. Notably, the learned representations yield substantial gains in few-shot regimes and challenging channel conditions, confirming the effectiveness of multi-domain equivalent modeling in enhancing robustness and generalization. This work establishes a principled pathway for exploiting massive unlabeled radio data and provides a foundation for future self-supervised learning frameworks in wireless systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an unsupervised equivalent contrastive learning method for radio signal recognition. It employs four information-lossless equivalent transformations across the time, instantaneous, frequency, and time-frequency domains to construct multi-view and semantically consistent representations of each signal. These views are aligned using an equivalent contrastive learning strategy to learn discriminative and transferable embeddings without labeled data. The pre-trained model can then be fine-tuned on downstream tasks using only raw signals. Experiments on four public datasets show consistent outperformance over state-of-the-art contrastive baselines in linear evaluation, few-shot semi-supervised learning, and cross-domain transfer settings, with notable gains in few-shot regimes and challenging conditions.
Significance. If the core assumptions regarding the transformations hold, this approach could provide a valuable framework for self-supervised learning in wireless communications and spectrum management, enabling effective use of abundant unlabeled radio data to improve robustness and generalization in signal recognition tasks. The multi-domain perspective and the ability to fine-tune without reapplying transformations could simplify deployment in practical systems.
major comments (2)
- [Abstract] Abstract and Methods: The central claim requires that the four domain transformations (time, instantaneous, frequency, time-frequency) are strictly information-lossless and yield semantically consistent views. The manuscript asserts this property but supplies no reconstruction-error bounds, invariance proofs, or ablation showing that alignment fails when losslessness is violated (e.g., due to windowing or phase handling).
- [§4] §4 (Experiments): The reported outperformance on four public datasets lacks ablation studies isolating the contribution of each transformation, error bars across multiple runs, or statistical significance tests. Without these, it is difficult to confirm that gains arise from the multi-domain equivalent modeling rather than implementation choices or baseline variants.
minor comments (2)
- [Abstract] The abstract mentions 'four public datasets' without naming them or providing even summary quantitative gains; adding dataset names and example metrics would improve readability.
- [Methods] Notation for the contrastive loss and the four transformations could be introduced with a single equation or table for clarity, rather than relying solely on textual description.
Simulated Author's Rebuttal
We sincerely thank the referee for the constructive and detailed feedback. The comments identify important areas where additional clarification and experimental rigor will strengthen the manuscript. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract and Methods: The central claim requires that the four domain transformations (time, instantaneous, frequency, time-frequency) are strictly information-lossless and yield semantically consistent views. The manuscript asserts this property but supplies no reconstruction-error bounds, invariance proofs, or ablation showing that alignment fails when losslessness is violated (e.g., due to windowing or phase handling).
Authors: We appreciate the referee's emphasis on rigorously justifying the lossless property. Each transformation is constructed from standard, invertible signal-processing operations: circular time shifts, direct extraction of instantaneous amplitude/phase from the analytic signal, FFT/IFFT pairs for frequency-domain views, and invertible short-time Fourier transforms for the time-frequency domain. In the revised manuscript we will add a new subsection in §3 that explicitly states these invertibility properties, cites the relevant signal-processing references, and provides a brief argument that no information is discarded. We will also include a targeted ablation that replaces one or more transformations with deliberately lossy approximations (e.g., non-circular shifts or magnitude-only STFT) and shows the resulting degradation in contrastive alignment and downstream accuracy. Full analytic reconstruction-error bounds under arbitrary channel distortions, however, constitute a substantial theoretical extension that lies outside the empirical scope of the present work. revision: partial
-
Referee: [§4] §4 (Experiments): The reported outperformance on four public datasets lacks ablation studies isolating the contribution of each transformation, error bars across multiple runs, or statistical significance tests. Without these, it is difficult to confirm that gains arise from the multi-domain equivalent modeling rather than implementation choices or baseline variants.
Authors: We agree that these elements are essential for credible validation. In the revised version we will augment §4 with: (i) systematic ablation tables that successively disable each of the four transformations while keeping all other factors fixed, (ii) mean and standard-deviation results computed over five independent runs with distinct random seeds, and (iii) paired statistical significance tests (t-tests or Wilcoxon signed-rank tests with p-values) comparing our method against each baseline. These additions will appear both in the main text and in an expanded supplementary material. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper defines a contrastive learning method that applies four domain-specific transformations (time, instantaneous, frequency, time-frequency) asserted to be information-lossless, then aligns the resulting views to produce embeddings. No equations, derivations, or self-referential definitions appear that reduce the claimed embeddings, transferability, or experimental gains to fitted parameters or inputs by construction. The approach extends standard contrastive frameworks with new transformations whose independence from the target downstream performance is preserved; results are supported by experiments on external public datasets rather than by tautological reduction. The losslessness assertion is an unproven modeling assumption, not a circular step in any derivation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Equivalent transformations across domains preserve semantic identity of radio signals
- domain assumption Aligning embeddings of equivalent views produces transferable discriminative features
Reference graph
Works this paper leans on
-
[1]
Cellular, wide-area, and non-terrestrial iot: A survey on 5g advances and the road toward 6g,
M. Vaezi, A. Azari, S. R. Khosravirad, M. Shirvanimoghaddam, M. M. Azari, D. Chasaki, and P. Popovski, “Cellular, wide-area, and non-terrestrial iot: A survey on 5g advances and the road toward 6g,”IEEE Communications Surveys & Tutorials, vol. 24, no. 2, pp. 1117–1174, 2022
work page 2022
-
[2]
Physical-layer security of 5g wireless networks for iot: Chal- lenges and opportunities,
N. Wang, P. Wang, A. Alipour-Fanid, L. Jiao, and K. Zeng, “Physical-layer security of 5g wireless networks for iot: Chal- lenges and opportunities,”IEEE Internet of Things Journal, vol. 6, no. 5, pp. 8169–8181, 2019
work page 2019
-
[3]
C. Chaccour, M. N. Soorki, W. Saad, M. Bennis, P. Popovski, and M. Debbah, “Seven defining features of terahertz (thz) wireless systems: A fellowship of communication and sensing,” IEEE Communications Surveys & Tutorials, vol. 24, no. 2, pp. 967–993, 2022
work page 2022
-
[4]
Integrated sensing, communication, and computation for iov: Challenges and opportunities,
C. Li, M. Dong, Y . Fu, F. Richard Yu, and N. Cheng, “Integrated sensing, communication, and computation for iov: Challenges and opportunities,”IEEE Communications Surveys & Tutorials, pp. 1–1, 2025
work page 2025
-
[5]
Artificial intelligence for wireless physical-layer technologies (ai4phy): A comprehensive survey,
N. Ye, S. Miao, J. Pan, Q. Ouyang, X. Li, and X. Hou, “Artificial intelligence for wireless physical-layer technologies (ai4phy): A comprehensive survey,”IEEE Transactions on Cognitive Communications and Networking, vol. 10, no. 3, pp. 729–755, 2024
work page 2024
-
[6]
Cognitive radio networking and communications: An overview,
Y .-C. Liang, K.-C. Chen, G. Y . Li, and P. Mahonen, “Cognitive radio networking and communications: An overview,”IEEE transactions on vehicular technology, vol. 60, no. 7, pp. 3386– 3407, 2011
work page 2011
-
[7]
J. He, S. Huang, Z. Yang, K. Yu, H. Huan, and Z. Feng, “Channel-agnostic radio frequency fingerprint identification us- ing spectral quotient constellation errors,”IEEE Transactions on Wireless Communications, vol. 23, no. 1, pp. 158–170, 2023
work page 2023
-
[8]
Survey of auto- matic modulation classification techniques: classical approaches and new trends,
O. A. Dobre, A. Abdi, Y . Bar-Ness, and W. Su, “Survey of auto- matic modulation classification techniques: classical approaches and new trends,”IET communications, vol. 1, no. 2, pp. 137– 156, 2007
work page 2007
-
[9]
H.-C. Wu, M. Saquib, and Z. Yun, “Novel automatic modulation classification using cumulant features for communications via multipath channels,”IEEE Transactions on Wireless Communi- cations, vol. 7, no. 8, pp. 3098–3105, 2008
work page 2008
-
[10]
On the likelihood- based approach to modulation classification,
F. Hameed, O. A. Dobre, and D. C. Popescu, “On the likelihood- based approach to modulation classification,”IEEE Transac- tions on Wireless Communications, vol. 8, no. 12, pp. 5884– 5892, 2009
work page 2009
-
[11]
A likelihood-based algorithm for blind identification of qam and psk signals,
D. Zhu, V . J. Mathews, and D. H. Detienne, “A likelihood-based algorithm for blind identification of qam and psk signals,”IEEE Transactions on Wireless Communications, vol. 17, no. 5, pp. 3417–3430, 2018
work page 2018
-
[12]
Large-scale real-world radio signal recognition with deep learning,
Y . Tu, Y . Lin, H. Zha, J. Zhang, Y . Wang, G. Gui, and S. Mao, “Large-scale real-world radio signal recognition with deep learning,”Chinese Journal of Aeronautics, vol. 35, no. 9, pp. 35–48, 2022
work page 2022
-
[13]
Over-the-air deep learning based radio signal classification,
T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning based radio signal classification,”IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 168– 179, 2018
work page 2018
-
[14]
Convolutional radio modulation recognition networks,
T. J. O’Shea, J. Corgan, and T. C. Clancy, “Convolutional radio modulation recognition networks,” proc. Int. Conf. Eng. Appl. Neural Netw. (EANN), 213–226 (Springer, 2016)
work page 2016
-
[15]
A spatiotemporal multi- channel learning framework for automatic modulation recogni- tion,
J. Xu, C. Luo, G. Parr, and Y . Luo, “A spatiotemporal multi- channel learning framework for automatic modulation recogni- tion,”IEEE Wireless Communications Letters, vol. 9, no. 10, pp. 1629–1632, 2020
work page 2020
-
[16]
Deepsig: A hybrid heterogeneous deep learning framework for radio signal classification,
K. Qiu, S. Zheng, L. Zhang, C. Lou, and X. Yang, “Deepsig: A hybrid heterogeneous deep learning framework for radio signal classification,”IEEE Transactions on Wireless Communications, vol. 23, no. 1, pp. 775–788, 2024
work page 2024
-
[17]
A survey on self-supervised learning: Algorithms, applications, and future trends,
J. Gui, T. Chen, J. Zhang, Q. Cao, Z. Sun, H. Luo, and D. Tao, “A survey on self-supervised learning: Algorithms, applications, and future trends,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 9052–9071, 2024
work page 2024
-
[18]
A compre- hensive survey on contrastive learning,
H. Hu, X. Wang, Y . Zhang, Q. Chen, and Q. Guan, “A compre- hensive survey on contrastive learning,”Neurocomputing, vol. 610, p. 128645, 2024
work page 2024
-
[19]
A simple framework for contrastive learning of visual representations,
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” international Conference on Machine Learning (ICML), 1597– 1607 (PMLR, 2020)
work page 2020
-
[20]
Momen- tum contrast for unsupervised visual representation learning,
K. He, H. Fan, Y . Wu, S. Xie, and R. Girshick, “Momen- tum contrast for unsupervised visual representation learning,” iEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 9726-9735 (IEEE 2020)
work page 2020
-
[21]
Emerging properties in self-supervised vision transformers,
M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bo- janowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” iEEE/CVF International Conference on Computer Vision (ICCV), 9650-9660 (IEEE 2021)
work page 2021
-
[22]
W. Kong, X. Jiao, Y . Xu, B. Zhang, and Q. Yang, “A transformer-based contrastive semi-supervised learning frame- work for automatic modulation recognition,”IEEE Transactions on Cognitive Communications and Networking, vol. 9, no. 4, pp. 950–962, 2023
work page 2023
-
[23]
A contrastive learner for automatic modulation classification,
M. Du, J. Pan, and D. Bi, “A contrastive learner for automatic modulation classification,”IEEE Transactions on Wireless Com- munications, vol. 24, no. 4, pp. 3575–3589, 2025
work page 2025
-
[24]
J. Bai, X. Wang, Z. Xiao, H. Zhou, T. A. A. Ali, Y . Li, and L. Jiao, “Achieving efficient feature representation for modulation signal: A cooperative contrast learning approach,” IEEE Internet of Things Journal, vol. 11, no. 9, pp. 16 196– 16 211, 2024
work page 2024
-
[25]
Y . Li, X. Shi, H. Tan, Z. Zhang, X. Yang, and F. Zhou, “Multi- representation domain attentive contrastive learning based un- supervised automatic modulation recognition,”Nature Commu- nications, vol. 16, no. 1, p. 5951, 2025. 11
work page 2025
-
[26]
A survey of convo- lutional neural networks: Analysis, applications, and prospects,
Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convo- lutional neural networks: Analysis, applications, and prospects,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 6999–7019, 2022
work page 2022
-
[27]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[28]
Unsupervised learning of visual features by con- trasting cluster assignments,
M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by con- trasting cluster assignments,” advances in Neural Information Processing Systems (NeurIPS), 9912–9924 (Springer 2020)
work page 2020
-
[29]
Bootstrap your own latent - a new approach to self-supervised learning,
J.-B. Grill, F. Strub, F. Altch ´e, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Ghesh- laghi Azar, B. Piot, k. kavukcuoglu, R. Munos, and M. Valko, “Bootstrap your own latent - a new approach to self-supervised learning,” advances in Neural Information Processing Systems (NeurIPS), 21271–21284 (Springer 2020)
work page 2020
-
[30]
Exploring simple siamese representation learning,
X. Chen and K. He, “Exploring simple siamese representation learning,” iEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 15750-15758 (IEEE 2021)
work page 2021
-
[31]
Bar- low twins: Self-supervised learning via redundancy reduction,
J. Zbontar, L. Jing, I. Misra, Y . LeCun, and S. Deny, “Bar- low twins: Self-supervised learning via redundancy reduction,” international Conference on Machine Learning (ICML), 12310– 12320 (PMLR, 2021)
work page 2021
-
[32]
Masked autoencoders are scalable vision learners,
K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Gir- shick, “Masked autoencoders are scalable vision learners,” iEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 16000-16009 (IEEE 2022)
work page 2022
-
[33]
C. Xiao, S. Yang, Z. Feng, and L. Jiao, “Mclhn: Toward auto- matic modulation classification via masked contrastive learning with hard negatives,”IEEE Transactions on Wireless Commu- nications, vol. 23, no. 10, pp. 14 304–14 319, 2024
work page 2024
-
[34]
Fourier transforms and the fast fourier transform (fft) algorithm,
P. Heckbert, “Fourier transforms and the fast fourier transform (fft) algorithm,”Computer Graphics, vol. 2, no. 1995, pp. 15– 463, 1995
work page 1995
-
[35]
N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis,”Proceedings of the Royal Society of London. Series A: mathematical, physical and engineering sciences, vol. 454, no. 1971, pp. 903–995, 1998
work page 1971
-
[36]
S. Zheng, X. Zhou, L. Zhang, P. Qi, K. Qiu, J. Zhu, and X. Yang, “Toward next-generation signal intelligence: A hybrid knowledge and data-driven deep learning framework for radio signal classification,”IEEE Transactions on Cognitive Commu- nications and Networking, vol. 9, no. 3, pp. 564–579, 2023
work page 2023
-
[37]
Deep learn- ing based automatic modulation recognition: Models, datasets, and challenges,
F. Zhang, C. Luo, J. Xu, Y . Luo, and F.-C. Zheng, “Deep learn- ing based automatic modulation recognition: Models, datasets, and challenges,”Digital Signal Processing, vol. 129, p. 103650, 2022
work page 2022
-
[38]
Ag- gregated residual transformations for deep neural networks,
S. Xie, R. Girshick, P. Doll ´ar, Z. Tu, and K. He, “Ag- gregated residual transformations for deep neural networks,” iEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 1492–1500 (IEEE 2017)
work page 2017
-
[39]
S4l: Self- supervised semi-supervised learning,
X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4l: Self- supervised semi-supervised learning,” iEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1476–1485 (IEEE 2019)
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.