pith. sign in

arxiv: 2604.10906 · v1 · submitted 2026-04-13 · 📡 eess.SP

Unsupervised Equivalent Contrastive Learning for Radio Signal Recognition

Pith reviewed 2026-05-10 16:23 UTC · model grok-4.3

classification 📡 eess.SP
keywords radio signal recognitioncontrastive learningunsupervised learningself-supervised learningmulti-domain transformationssignal embeddings
0
0 comments X

The pith

Unsupervised contrastive learning aligns four domain-specific transformations to produce transferable embeddings for radio signal recognition from unlabeled data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a pre-training approach for radio signals that avoids labeled data by generating multiple views of each signal through four equivalent transformations spanning the time, instantaneous, frequency, and time-frequency domains. These views are aligned via contrastive learning to extract discriminative features that remain effective when the model is later applied directly to raw signals. The resulting embeddings support downstream tasks such as recognition under linear evaluation, few-shot semi-supervised settings, and cross-domain transfer, all without re-applying the transformations at deployment time. This setup matters because real radio environments produce vast amounts of unlabeled data while labeling remains costly and existing supervised methods often fail to generalize across varying channel conditions.

Core claim

Four information-lossless transformations create multi-view, semantically consistent representations of each radio signal; an equivalent contrastive strategy then aligns these complementary views to learn embeddings that transfer directly to downstream recognition tasks on raw samples without further transformation.

What carries the argument

The equivalent contrastive learning strategy that aligns the four domain-specific views to produce discriminative and transferable embeddings.

If this is right

  • The pre-trained model outperforms existing contrastive baselines across linear, few-shot, and transfer evaluations on four public datasets.
  • Representations improve notably in few-shot regimes and under challenging channel conditions.
  • Deployment requires only raw signal inputs after pre-training, lowering computational cost.
  • Massive unlabeled radio datasets can be exploited for pre-training without task-specific labels.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same multi-domain view-alignment idea may extend to other time-series signals such as audio or vibration data.
  • Adding further equivalent transformations could increase robustness if they remain lossless.
  • The separation between pre-training transformations and downstream raw-signal use simplifies integration into existing radio pipelines.

Load-bearing premise

The four transformations preserve all necessary semantic information and generate views that remain consistent enough for their alignment to yield features useful on downstream tasks.

What would settle it

An experiment on one of the public datasets in which the pre-trained model shows no accuracy gain over standard contrastive baselines when evaluated under few-shot or cross-domain conditions.

Figures

Figures reproduced from arXiv: 2604.10906 by Jie Chen, Luxin Zhang, Shilian Zheng, Xiaoniu Yang.

Figure 1
Figure 1. Figure 1: The overall architecture of our proposed method. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The performance of linear evaluation at different SNR level for dataset [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The performance of semi-supervised learning in few-shot scenarios for dataset HKDD [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Robust radio signal recognition is fundamental to spectrum management, electromagnetic space security, and intelligent wireless applications, yet existing deep-learning methods rely heavily on large labeled datasets and struggle to capture the multi-domain characteristics inherent in real-world signals. To address these limitations, we propose an unsupervised equivalent contrastive learning method that leverages four information-lossless equivalent transformations, spanning the time, instantaneous, frequency, and time-frequency domains, to construct multi-view and semantically consistent representations of each signal. An equivalent contrastive learning strategy then aligns these complementary views to learn discriminative and transferable embeddings without requiring labeled data. Once pre-training is completed, the resulting model can be directly fine-tuned on downstream tasks using only raw signal samples, without reapplying any equivalent transformations, which reduces computational overhead and simplifies deployment. Extensive experiments on four public datasets demonstrate that the proposed method consistently outperforms state-of-the-art contrastive baselines under linear evaluation, few-shot semi-supervised learning, and cross-domain transfer settings. Notably, the learned representations yield substantial gains in few-shot regimes and challenging channel conditions, confirming the effectiveness of multi-domain equivalent modeling in enhancing robustness and generalization. This work establishes a principled pathway for exploiting massive unlabeled radio data and provides a foundation for future self-supervised learning frameworks in wireless systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an unsupervised equivalent contrastive learning method for radio signal recognition. It employs four information-lossless equivalent transformations across the time, instantaneous, frequency, and time-frequency domains to construct multi-view and semantically consistent representations of each signal. These views are aligned using an equivalent contrastive learning strategy to learn discriminative and transferable embeddings without labeled data. The pre-trained model can then be fine-tuned on downstream tasks using only raw signals. Experiments on four public datasets show consistent outperformance over state-of-the-art contrastive baselines in linear evaluation, few-shot semi-supervised learning, and cross-domain transfer settings, with notable gains in few-shot regimes and challenging conditions.

Significance. If the core assumptions regarding the transformations hold, this approach could provide a valuable framework for self-supervised learning in wireless communications and spectrum management, enabling effective use of abundant unlabeled radio data to improve robustness and generalization in signal recognition tasks. The multi-domain perspective and the ability to fine-tune without reapplying transformations could simplify deployment in practical systems.

major comments (2)
  1. [Abstract] Abstract and Methods: The central claim requires that the four domain transformations (time, instantaneous, frequency, time-frequency) are strictly information-lossless and yield semantically consistent views. The manuscript asserts this property but supplies no reconstruction-error bounds, invariance proofs, or ablation showing that alignment fails when losslessness is violated (e.g., due to windowing or phase handling).
  2. [§4] §4 (Experiments): The reported outperformance on four public datasets lacks ablation studies isolating the contribution of each transformation, error bars across multiple runs, or statistical significance tests. Without these, it is difficult to confirm that gains arise from the multi-domain equivalent modeling rather than implementation choices or baseline variants.
minor comments (2)
  1. [Abstract] The abstract mentions 'four public datasets' without naming them or providing even summary quantitative gains; adding dataset names and example metrics would improve readability.
  2. [Methods] Notation for the contrastive loss and the four transformations could be introduced with a single equation or table for clarity, rather than relying solely on textual description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for the constructive and detailed feedback. The comments identify important areas where additional clarification and experimental rigor will strengthen the manuscript. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Methods: The central claim requires that the four domain transformations (time, instantaneous, frequency, time-frequency) are strictly information-lossless and yield semantically consistent views. The manuscript asserts this property but supplies no reconstruction-error bounds, invariance proofs, or ablation showing that alignment fails when losslessness is violated (e.g., due to windowing or phase handling).

    Authors: We appreciate the referee's emphasis on rigorously justifying the lossless property. Each transformation is constructed from standard, invertible signal-processing operations: circular time shifts, direct extraction of instantaneous amplitude/phase from the analytic signal, FFT/IFFT pairs for frequency-domain views, and invertible short-time Fourier transforms for the time-frequency domain. In the revised manuscript we will add a new subsection in §3 that explicitly states these invertibility properties, cites the relevant signal-processing references, and provides a brief argument that no information is discarded. We will also include a targeted ablation that replaces one or more transformations with deliberately lossy approximations (e.g., non-circular shifts or magnitude-only STFT) and shows the resulting degradation in contrastive alignment and downstream accuracy. Full analytic reconstruction-error bounds under arbitrary channel distortions, however, constitute a substantial theoretical extension that lies outside the empirical scope of the present work. revision: partial

  2. Referee: [§4] §4 (Experiments): The reported outperformance on four public datasets lacks ablation studies isolating the contribution of each transformation, error bars across multiple runs, or statistical significance tests. Without these, it is difficult to confirm that gains arise from the multi-domain equivalent modeling rather than implementation choices or baseline variants.

    Authors: We agree that these elements are essential for credible validation. In the revised version we will augment §4 with: (i) systematic ablation tables that successively disable each of the four transformations while keeping all other factors fixed, (ii) mean and standard-deviation results computed over five independent runs with distinct random seeds, and (iii) paired statistical significance tests (t-tests or Wilcoxon signed-rank tests with p-values) comparing our method against each baseline. These additions will appear both in the main text and in an expanded supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper defines a contrastive learning method that applies four domain-specific transformations (time, instantaneous, frequency, time-frequency) asserted to be information-lossless, then aligns the resulting views to produce embeddings. No equations, derivations, or self-referential definitions appear that reduce the claimed embeddings, transferability, or experimental gains to fitted parameters or inputs by construction. The approach extends standard contrastive frameworks with new transformations whose independence from the target downstream performance is preserved; results are supported by experiments on external public datasets rather than by tautological reduction. The losslessness assertion is an unproven modeling assumption, not a circular step in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unproven premise that the chosen transformations preserve all information and semantic equivalence for radio signals; standard contrastive loss assumptions are invoked without new axioms.

axioms (2)
  • domain assumption Equivalent transformations across domains preserve semantic identity of radio signals
    Invoked in the construction of multi-view representations from unlabeled signals.
  • domain assumption Aligning embeddings of equivalent views produces transferable discriminative features
    Core of the equivalent contrastive learning strategy.

pith-pipeline@v0.9.0 · 5520 in / 1286 out tokens · 25530 ms · 2026-05-10T16:23:06.585620+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    Cellular, wide-area, and non-terrestrial iot: A survey on 5g advances and the road toward 6g,

    M. Vaezi, A. Azari, S. R. Khosravirad, M. Shirvanimoghaddam, M. M. Azari, D. Chasaki, and P. Popovski, “Cellular, wide-area, and non-terrestrial iot: A survey on 5g advances and the road toward 6g,”IEEE Communications Surveys & Tutorials, vol. 24, no. 2, pp. 1117–1174, 2022

  2. [2]

    Physical-layer security of 5g wireless networks for iot: Chal- lenges and opportunities,

    N. Wang, P. Wang, A. Alipour-Fanid, L. Jiao, and K. Zeng, “Physical-layer security of 5g wireless networks for iot: Chal- lenges and opportunities,”IEEE Internet of Things Journal, vol. 6, no. 5, pp. 8169–8181, 2019

  3. [3]

    Seven defining features of terahertz (thz) wireless systems: A fellowship of communication and sensing,

    C. Chaccour, M. N. Soorki, W. Saad, M. Bennis, P. Popovski, and M. Debbah, “Seven defining features of terahertz (thz) wireless systems: A fellowship of communication and sensing,” IEEE Communications Surveys & Tutorials, vol. 24, no. 2, pp. 967–993, 2022

  4. [4]

    Integrated sensing, communication, and computation for iov: Challenges and opportunities,

    C. Li, M. Dong, Y . Fu, F. Richard Yu, and N. Cheng, “Integrated sensing, communication, and computation for iov: Challenges and opportunities,”IEEE Communications Surveys & Tutorials, pp. 1–1, 2025

  5. [5]

    Artificial intelligence for wireless physical-layer technologies (ai4phy): A comprehensive survey,

    N. Ye, S. Miao, J. Pan, Q. Ouyang, X. Li, and X. Hou, “Artificial intelligence for wireless physical-layer technologies (ai4phy): A comprehensive survey,”IEEE Transactions on Cognitive Communications and Networking, vol. 10, no. 3, pp. 729–755, 2024

  6. [6]

    Cognitive radio networking and communications: An overview,

    Y .-C. Liang, K.-C. Chen, G. Y . Li, and P. Mahonen, “Cognitive radio networking and communications: An overview,”IEEE transactions on vehicular technology, vol. 60, no. 7, pp. 3386– 3407, 2011

  7. [7]

    Channel-agnostic radio frequency fingerprint identification us- ing spectral quotient constellation errors,

    J. He, S. Huang, Z. Yang, K. Yu, H. Huan, and Z. Feng, “Channel-agnostic radio frequency fingerprint identification us- ing spectral quotient constellation errors,”IEEE Transactions on Wireless Communications, vol. 23, no. 1, pp. 158–170, 2023

  8. [8]

    Survey of auto- matic modulation classification techniques: classical approaches and new trends,

    O. A. Dobre, A. Abdi, Y . Bar-Ness, and W. Su, “Survey of auto- matic modulation classification techniques: classical approaches and new trends,”IET communications, vol. 1, no. 2, pp. 137– 156, 2007

  9. [9]

    Novel automatic modulation classification using cumulant features for communications via multipath channels,

    H.-C. Wu, M. Saquib, and Z. Yun, “Novel automatic modulation classification using cumulant features for communications via multipath channels,”IEEE Transactions on Wireless Communi- cations, vol. 7, no. 8, pp. 3098–3105, 2008

  10. [10]

    On the likelihood- based approach to modulation classification,

    F. Hameed, O. A. Dobre, and D. C. Popescu, “On the likelihood- based approach to modulation classification,”IEEE Transac- tions on Wireless Communications, vol. 8, no. 12, pp. 5884– 5892, 2009

  11. [11]

    A likelihood-based algorithm for blind identification of qam and psk signals,

    D. Zhu, V . J. Mathews, and D. H. Detienne, “A likelihood-based algorithm for blind identification of qam and psk signals,”IEEE Transactions on Wireless Communications, vol. 17, no. 5, pp. 3417–3430, 2018

  12. [12]

    Large-scale real-world radio signal recognition with deep learning,

    Y . Tu, Y . Lin, H. Zha, J. Zhang, Y . Wang, G. Gui, and S. Mao, “Large-scale real-world radio signal recognition with deep learning,”Chinese Journal of Aeronautics, vol. 35, no. 9, pp. 35–48, 2022

  13. [13]

    Over-the-air deep learning based radio signal classification,

    T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning based radio signal classification,”IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 168– 179, 2018

  14. [14]

    Convolutional radio modulation recognition networks,

    T. J. O’Shea, J. Corgan, and T. C. Clancy, “Convolutional radio modulation recognition networks,” proc. Int. Conf. Eng. Appl. Neural Netw. (EANN), 213–226 (Springer, 2016)

  15. [15]

    A spatiotemporal multi- channel learning framework for automatic modulation recogni- tion,

    J. Xu, C. Luo, G. Parr, and Y . Luo, “A spatiotemporal multi- channel learning framework for automatic modulation recogni- tion,”IEEE Wireless Communications Letters, vol. 9, no. 10, pp. 1629–1632, 2020

  16. [16]

    Deepsig: A hybrid heterogeneous deep learning framework for radio signal classification,

    K. Qiu, S. Zheng, L. Zhang, C. Lou, and X. Yang, “Deepsig: A hybrid heterogeneous deep learning framework for radio signal classification,”IEEE Transactions on Wireless Communications, vol. 23, no. 1, pp. 775–788, 2024

  17. [17]

    A survey on self-supervised learning: Algorithms, applications, and future trends,

    J. Gui, T. Chen, J. Zhang, Q. Cao, Z. Sun, H. Luo, and D. Tao, “A survey on self-supervised learning: Algorithms, applications, and future trends,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 9052–9071, 2024

  18. [18]

    A compre- hensive survey on contrastive learning,

    H. Hu, X. Wang, Y . Zhang, Q. Chen, and Q. Guan, “A compre- hensive survey on contrastive learning,”Neurocomputing, vol. 610, p. 128645, 2024

  19. [19]

    A simple framework for contrastive learning of visual representations,

    T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” international Conference on Machine Learning (ICML), 1597– 1607 (PMLR, 2020)

  20. [20]

    Momen- tum contrast for unsupervised visual representation learning,

    K. He, H. Fan, Y . Wu, S. Xie, and R. Girshick, “Momen- tum contrast for unsupervised visual representation learning,” iEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 9726-9735 (IEEE 2020)

  21. [21]

    Emerging properties in self-supervised vision transformers,

    M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bo- janowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” iEEE/CVF International Conference on Computer Vision (ICCV), 9650-9660 (IEEE 2021)

  22. [22]

    A transformer-based contrastive semi-supervised learning frame- work for automatic modulation recognition,

    W. Kong, X. Jiao, Y . Xu, B. Zhang, and Q. Yang, “A transformer-based contrastive semi-supervised learning frame- work for automatic modulation recognition,”IEEE Transactions on Cognitive Communications and Networking, vol. 9, no. 4, pp. 950–962, 2023

  23. [23]

    A contrastive learner for automatic modulation classification,

    M. Du, J. Pan, and D. Bi, “A contrastive learner for automatic modulation classification,”IEEE Transactions on Wireless Com- munications, vol. 24, no. 4, pp. 3575–3589, 2025

  24. [24]

    Achieving efficient feature representation for modulation signal: A cooperative contrast learning approach,

    J. Bai, X. Wang, Z. Xiao, H. Zhou, T. A. A. Ali, Y . Li, and L. Jiao, “Achieving efficient feature representation for modulation signal: A cooperative contrast learning approach,” IEEE Internet of Things Journal, vol. 11, no. 9, pp. 16 196– 16 211, 2024

  25. [25]

    Multi- representation domain attentive contrastive learning based un- supervised automatic modulation recognition,

    Y . Li, X. Shi, H. Tan, Z. Zhang, X. Yang, and F. Zhou, “Multi- representation domain attentive contrastive learning based un- supervised automatic modulation recognition,”Nature Commu- nications, vol. 16, no. 1, p. 5951, 2025. 11

  26. [26]

    A survey of convo- lutional neural networks: Analysis, applications, and prospects,

    Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convo- lutional neural networks: Analysis, applications, and prospects,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 6999–7019, 2022

  27. [27]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017

  28. [28]

    Unsupervised learning of visual features by con- trasting cluster assignments,

    M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by con- trasting cluster assignments,” advances in Neural Information Processing Systems (NeurIPS), 9912–9924 (Springer 2020)

  29. [29]

    Bootstrap your own latent - a new approach to self-supervised learning,

    J.-B. Grill, F. Strub, F. Altch ´e, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Ghesh- laghi Azar, B. Piot, k. kavukcuoglu, R. Munos, and M. Valko, “Bootstrap your own latent - a new approach to self-supervised learning,” advances in Neural Information Processing Systems (NeurIPS), 21271–21284 (Springer 2020)

  30. [30]

    Exploring simple siamese representation learning,

    X. Chen and K. He, “Exploring simple siamese representation learning,” iEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), 15750-15758 (IEEE 2021)

  31. [31]

    Bar- low twins: Self-supervised learning via redundancy reduction,

    J. Zbontar, L. Jing, I. Misra, Y . LeCun, and S. Deny, “Bar- low twins: Self-supervised learning via redundancy reduction,” international Conference on Machine Learning (ICML), 12310– 12320 (PMLR, 2021)

  32. [32]

    Masked autoencoders are scalable vision learners,

    K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Gir- shick, “Masked autoencoders are scalable vision learners,” iEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 16000-16009 (IEEE 2022)

  33. [33]

    Mclhn: Toward auto- matic modulation classification via masked contrastive learning with hard negatives,

    C. Xiao, S. Yang, Z. Feng, and L. Jiao, “Mclhn: Toward auto- matic modulation classification via masked contrastive learning with hard negatives,”IEEE Transactions on Wireless Commu- nications, vol. 23, no. 10, pp. 14 304–14 319, 2024

  34. [34]

    Fourier transforms and the fast fourier transform (fft) algorithm,

    P. Heckbert, “Fourier transforms and the fast fourier transform (fft) algorithm,”Computer Graphics, vol. 2, no. 1995, pp. 15– 463, 1995

  35. [35]

    The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis,

    N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis,”Proceedings of the Royal Society of London. Series A: mathematical, physical and engineering sciences, vol. 454, no. 1971, pp. 903–995, 1998

  36. [36]

    Toward next-generation signal intelligence: A hybrid knowledge and data-driven deep learning framework for radio signal classification,

    S. Zheng, X. Zhou, L. Zhang, P. Qi, K. Qiu, J. Zhu, and X. Yang, “Toward next-generation signal intelligence: A hybrid knowledge and data-driven deep learning framework for radio signal classification,”IEEE Transactions on Cognitive Commu- nications and Networking, vol. 9, no. 3, pp. 564–579, 2023

  37. [37]

    Deep learn- ing based automatic modulation recognition: Models, datasets, and challenges,

    F. Zhang, C. Luo, J. Xu, Y . Luo, and F.-C. Zheng, “Deep learn- ing based automatic modulation recognition: Models, datasets, and challenges,”Digital Signal Processing, vol. 129, p. 103650, 2022

  38. [38]

    Ag- gregated residual transformations for deep neural networks,

    S. Xie, R. Girshick, P. Doll ´ar, Z. Tu, and K. He, “Ag- gregated residual transformations for deep neural networks,” iEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR), 1492–1500 (IEEE 2017)

  39. [39]

    S4l: Self- supervised semi-supervised learning,

    X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4l: Self- supervised semi-supervised learning,” iEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1476–1485 (IEEE 2019)