pith. sign in

arxiv: 2604.11016 · v1 · submitted 2026-04-13 · 📡 eess.SP

Learning an Opponent-aware Anti-jamming Strategy via Online Convex Optimization

Pith reviewed 2026-05-10 16:31 UTC · model grok-4.3

classification 📡 eess.SP
keywords online convex optimizationanti-jammingDRFM jammerfrequency agile radarregret boundgradient estimatoradversarial strategy
0
0 comments X

The pith

Incorporating unbiased gradient estimators tailored to DRFM jammers into online convex optimization yields lower regret and faster anti-jamming convergence for frequency-agile radars.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames the dynamic competition between a frequency-agile radar and an intelligent DRFM-based jammer as an online convex optimization problem. It then develops two refined OCO algorithms that replace standard gradients with unbiased estimators built from the jammer's specific structure. Theoretical analysis shows these changes improve the regret bound relative to conventional OCO. Simulations confirm faster convergence and stronger anti-jamming performance than both standard OCO and reinforcement-learning baselines. A reader would care because radars facing adaptive jammers need sample-efficient strategies that exploit rather than ignore adversary structure.

Core claim

The paper claims that two refined online convex optimization algorithms, each using unbiased gradient estimators constructed specifically for the characteristics of DRFM-based jammers, produce a significantly tighter regret bound and deliver measurably better anti-jamming performance than either standard OCO or reinforcement-learning methods.

What carries the argument

Two refined OCO algorithms that replace conventional gradients with unbiased estimators exploiting the structure of DRFM-based jammers, thereby capturing the radar-jammer interaction more precisely within the convex framework.

If this is right

  • The refined estimators produce a measurably tighter long-term regret bound than standard OCO.
  • Convergence to effective anti-jamming policies occurs in fewer steps than with conventional methods.
  • Simulation performance against DRFM jammers exceeds that of both standard OCO and reinforcement-learning baselines.
  • The approach yields opponent-aware strategies that exploit rather than ignore the jammer's memory and replay behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same structure-aware estimator technique could be applied to other online learning settings where partial knowledge of an adversary's update rule exists.
  • Real-time radar systems might achieve lower training overhead by embedding these estimators instead of relying on generic gradient methods.
  • The framework invites hybrid designs that combine OCO regret guarantees with occasional model-based corrections when jammer behavior drifts.

Load-bearing premise

The radar-jammer interaction must be accurately representable as a convex optimization problem and the jammer's structure must allow construction of unbiased gradient estimators inside that framework.

What would settle it

A simulation run in which the refined algorithms produce regret curves or anti-jamming success rates no better than those of standard OCO would falsify the claimed improvement.

Figures

Figures reproduced from arXiv: 2604.11016 by Liangqi Liu, Wenqiang Pu, Yingru Li, Zhi-Quan Luo.

Figure 1
Figure 1. Figure 1: Illustration of the anti-jamming scenario. (The radar [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Post-processing for the received signal. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: SINR comparison for OMD-IWE and OME-AME in [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Simulated single-round spectrograms of radar and three [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Regret comparison in a stationary environment. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Regret comparison in a non-stationary environment. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: SINR Comparison jammers, we develop two novel algorithms that outperform conventional OCO benchmarks. Sub-linear static and universal regret bounds are provided, and numerical simulations demon￾strate a significant enhancement in sample efficiency. APPENDIX A. Post-processing Details For the n-th pulse with sub-frequencies f R m, ∀m ∈ [M], the received signal undergoes bandpass filtering at each frequency:… view at source ↗
read the original abstract

The dynamic competition against intelligent jammer systems presents a significant challenge to modern radar. Traditional active anti-jamming strategy learning methods often suffer from low sample efficiency and fail to fully exploit the structures of the adversary jammer. To reveal the inherent structure, this paper adopts an Online Convex Optimization (OCO) framework to capture the competition between a frequency agile radar and a digital radio frequency memory (DRFM)-based intelligent jammer. Recognizing that conventional OCO algorithms also suffer from suboptimal sample efficiency, two refined algorithms are developed that incorporate unbiased gradient estimators specifically tailored to the unique characteristics of DRFM-based jammers. Our theoretical analysis of the regret bound indicates significant improvements in long-term performance compared to standard OCO. The simulation results consistently show that our algorithms outperform traditional OCO and reinforcement learning baselines, achieving faster convergence and better anti-jamming performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper models the dynamic competition between a frequency-agile radar and a DRFM-based intelligent jammer as an online convex optimization (OCO) problem. It develops two refined OCO algorithms incorporating unbiased gradient estimators that exploit the jammer's memory and replay structure, claiming improved regret bounds over standard OCO and superior empirical performance (faster convergence, better anti-jamming) versus both OCO and reinforcement-learning baselines.

Significance. If the per-round loss is convex in the radar action and the proposed estimators are unbiased, the work would strengthen the applicability of OCO to structured adversarial radar-jammer settings by delivering both tighter theoretical guarantees and practical sample-efficiency gains, as demonstrated in the reported simulations.

major comments (2)
  1. [Problem formulation and loss definition] The modeling of the radar-DRFM interaction as a convex OCO problem is load-bearing for all regret claims, yet the effective loss (e.g., detection probability or post-jammer SINR) is not shown to be convex in the radar's frequency choice. DRFM replay and signal-dependent interference can produce history-dependent, non-convex responses; without an explicit verification or proof that convexity is preserved under the DRFM model, the standard OCO regret machinery does not apply.
  2. [Algorithm development and regret analysis] The unbiasedness of the two tailored gradient estimators is asserted but not derived in detail. Because the estimators rely on estimates of the jammer's internal state, any bias introduced by finite memory, estimation error, or the replay mechanism would invalidate the claimed regret improvement; a full bias analysis (including variance bounds) is required.
minor comments (2)
  1. [Numerical results] The simulation section should report exact jammer parameters (memory length, replay probability, power levels) and radar action discretization to permit reproduction of the reported convergence curves.
  2. [Preliminaries] Notation for the action set, loss function, and gradient estimator should be introduced once and used consistently; several symbols appear to be redefined across sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and insightful comments on our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the theoretical foundations without altering the core contributions.

read point-by-point responses
  1. Referee: [Problem formulation and loss definition] The modeling of the radar-DRFM interaction as a convex OCO problem is load-bearing for all regret claims, yet the effective loss (e.g., detection probability or post-jammer SINR) is not shown to be convex in the radar's frequency choice. DRFM replay and signal-dependent interference can produce history-dependent, non-convex responses; without an explicit verification or proof that convexity is preserved under the DRFM model, the standard OCO regret machinery does not apply.

    Authors: We appreciate this observation, as convexity is indeed foundational. In the manuscript, the radar action is the instantaneous frequency choice and the per-round loss is the resulting post-jamming SINR (or detection probability), which is convex in the chosen frequency for a fixed jammer state because interference power is a convex function of frequency mismatch. The DRFM replay introduces history dependence through the jammer's internal state, but the instantaneous loss at each round remains convex in the current action once the state is fixed. To make this rigorous and address the referee's concern directly, we will add an explicit lemma and proof in the revised manuscript (new Section 3.1) showing that the effective loss is convex under the stated DRFM model assumptions. This will confirm the applicability of standard OCO regret bounds as a baseline before presenting our refined estimators. revision: yes

  2. Referee: [Algorithm development and regret analysis] The unbiasedness of the two tailored gradient estimators is asserted but not derived in detail. Because the estimators rely on estimates of the jammer's internal state, any bias introduced by finite memory, estimation error, or the replay mechanism would invalidate the claimed regret improvement; a full bias analysis (including variance bounds) is required.

    Authors: We agree that a detailed derivation is necessary for the claimed regret improvements. The manuscript describes the two DRFM-specific estimators (one exploiting replay memory and one using an unbiased reconstruction of the jammer state) but presents their unbiasedness primarily through high-level arguments rather than full derivations. In the revision we will expand the analysis (new Appendix B) to include: (i) step-by-step proofs that both estimators are unbiased conditional on the jammer's finite memory and replay structure, (ii) explicit bounds on the bias introduced by state estimation error, and (iii) variance bounds that are used to derive the improved regret rates. These additions will directly support the theoretical claims and allow readers to verify the conditions under which the regret improvement holds. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the OCO modeling or regret analysis

full rationale

The paper adopts the established OCO framework to model radar-DRFM interaction (a standard modeling choice requiring per-round convexity), then constructs tailored unbiased gradient estimators that exploit DRFM memory structure and derives improved regret bounds from the standard OCO machinery applied to these estimators. No load-bearing step reduces by construction to its own inputs: the regret improvement follows from the new estimator variance reduction, not from re-labeling a fit or from a self-citation chain. Simulations supply independent empirical checks. Convexity is an explicit modeling assumption rather than a derived claim that loops back to itself. This matches the reader's assessment of low circularity risk.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that the radar-jammer interaction fits an OCO model and that unbiased estimators exist for DRFM characteristics; no free parameters or invented entities are indicated in the abstract.

axioms (2)
  • domain assumption The anti-jamming competition between frequency agile radar and DRFM jammer can be modeled as an online convex optimization problem.
    The paper adopts an OCO framework to capture the competition.
  • domain assumption Unbiased gradient estimators can be constructed that are tailored to the unique characteristics of DRFM-based jammers.
    The refined algorithms rely on this property to improve sample efficiency over standard OCO.

pith-pipeline@v0.9.0 · 5442 in / 1318 out tokens · 56995 ms · 2026-05-10T16:31:20.543743+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    Radar anti-jamming strategy learning via domain-knowledge enhanced online convex optimization,

    L. Liu, W. Pu, Y . Li, B. Jiu, and Z.-Q. Luo, “Radar anti-jamming strategy learning via domain-knowledge enhanced online convex optimization,” in2024 IEEE 13rd Sensor Array and Multichannel Signal Processing Workshop (SAM). IEEE, 2024, pp. 1–5

  2. [2]

    Adamy,EW 101: A first course in electronic warfare

    D. Adamy,EW 101: A first course in electronic warfare. Artech house, 2001, vol. 101

  3. [3]

    De Martino,Introduction to modern EW systems

    A. De Martino,Introduction to modern EW systems. Artech house, 2018

  4. [4]

    Electronic warfare systems,

    A. E. Spezio, “Electronic warfare systems,”IEEE Transactions on Microwave Theory and Techniques, vol. 50, no. 3, pp. 633–644, 2002

  5. [5]

    Radar anti-jamming techniques,

    M. V . Maksimov, M. Bobnev, L. N. Shustov, B. Krivitskii, G. I. Gor- gonov, V . Ilin, and B. M. Stepanov, “Radar anti-jamming techniques,” Dedham, 1979

  6. [6]

    Anti-monopulse jamming techniques,

    F. Neri, “Anti-monopulse jamming techniques,” inProceedings of the 2001 SBMO/IEEE MTT-S International Microwave and Optoelectronics Conference.(Cat. No. 01TH8568), vol. 2. IEEE, 2001, pp. 45–50

  7. [7]

    Mainlobe jamming sup- pression via blind source separation,

    M. Ge, G. Cui, X. Yu, D. Huang, and L. Kong, “Mainlobe jamming sup- pression via blind source separation,” in2018 IEEE Radar Conference (RadarConf18). Oklahoma City, OK: IEEE, Apr. 2018, pp. 0914–0918

  8. [8]

    Analysis of Random step frequency radar and comparison with experiments,

    S. R. J. Axelsson, “Analysis of Random step frequency radar and comparison with experiments,”IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 4, pp. 890–904, Apr. 2007

  9. [9]

    Coherent signal processing method for frequency-agile radar,

    R. Zhou, G. Xia, Y . Zhao, and H. Liu, “Coherent signal processing method for frequency-agile radar,” in2015 12th IEEE International Con- ference on Electronic Measurement & Instruments (ICEMI). Qingdao, China: IEEE, Jul. 2015, pp. 431–434

  10. [10]

    Generalized multicarrier radar: Models and performance,

    M. Bica and V . Koivunen, “Generalized multicarrier radar: Models and performance,”IEEE Transactions on Signal Processing, vol. 64, no. 17, pp. 4389–4402, Sep. 2016

  11. [11]

    Radar active antagonism through deep reinforcement learning: A way to address the challenge of mainlobe jamming,

    K. Li, B. Jiu, P. Wang, H. Liu, and Y . Shi, “Radar active antagonism through deep reinforcement learning: A way to address the challenge of mainlobe jamming,”Signal Processing, vol. 186, p. 108130, Sep. 2021

  12. [12]

    Deep q-network based anti-jamming strategy design for frequency agile radar,

    K. Li, B. Jiu, and H. Liu, “Deep q-network based anti-jamming strategy design for frequency agile radar,” in2019 International Radar Conference (RADAR). IEEE, Sep. 2019, pp. 1–5. 14

  13. [13]

    Airborne radar anti-jamming waveform design based on deep reinforcement learning,

    Z. Zheng, W. Li, and K. Zou, “Airborne radar anti-jamming waveform design based on deep reinforcement learning,”Sensors, vol. 22, no. 22, p. 8689, Nov. 2022

  14. [14]

    The MIMO radar and jammer games,

    X. Song, P. Willett, S. Zhou, and P. B. Luh, “The MIMO radar and jammer games,”IEEE Transactions on Signal Processing, vol. 60, no. 2, pp. 687–699, Feb. 2012

  15. [15]

    MIMO radar and target Stackelberg game in the presence of clutter,

    X. Lan, W. Li, X. Wang, J. Yan, and M. Jiang, “MIMO radar and target Stackelberg game in the presence of clutter,”IEEE Sensors Journal, vol. 15, no. 12, pp. 6912–6920, Dec. 2015

  16. [16]

    Neural fictitious self-play for radar antijamming dynamic game with imperfect information,

    K. Li, B. Jiu, W. Pu, H. Liu, and X. Peng, “Neural fictitious self-play for radar antijamming dynamic game with imperfect information,”IEEE Transactions on Aerospace and Electronic Systems, vol. 58, no. 6, pp. 5533–5547, Dec. 2022

  17. [17]

    Counterfactual regret minimization for anti-jamming game of frequency agile radar,

    H. Li, Z. Han, W. Pu, L. Liu, K. Li, and B. Jiu, “Counterfactual regret minimization for anti-jamming game of frequency agile radar,” in2022 IEEE 12th Sensor Array and Multichannel Signal Processing Workshop (SAM). IEEE, 2022, pp. 111–115

  18. [18]

    Radar and jammer intelligent game under jamming power dynamic allocation,

    J. Geng, B. Jiu, K. Li, Y . Zhao, H. Liu, and H. Li, “Radar and jammer intelligent game under jamming power dynamic allocation,”Remote Sensing, vol. 15, no. 3, p. 581, Jan. 2023

  19. [19]

    A radar anti-jamming strategy optimisation based on Stackelberg game,

    C. Feng, X. Fu, J. Dong, C. Zhao, H. Chang, P. Lang, and T. Pan, “A radar anti-jamming strategy optimisation based on Stackelberg game,” IET Radar, Sonar & Navigation, pp. 1248–1258, May 2023

  20. [20]

    Adaptation of frequency hopping interval for radar anti-jamming based on reinforcement learning,

    Ailiya, W. Yi, and P. K. Varshney, “Adaptation of frequency hopping interval for radar anti-jamming based on reinforcement learning,”IEEE Transactions on Vehicular Technology, vol. 71, no. 12, pp. 12 434– 12 449, Dec. 2022

  21. [21]

    A new scheme of target detection for pulse Doppler radar in interrupted sampling repeater jamming environment,

    Y . Zhang, B. Jiu, X. Peng, H. Liu, and W. Jiang, “A new scheme of target detection for pulse Doppler radar in interrupted sampling repeater jamming environment,”IET Radar, Sonar & Navigation, vol. 16, no. 11, pp. 1836–1850, Nov. 2022

  22. [22]

    Two-dimensional anti-jamming communication based on deep reinforcement learning,

    G. Han, L. Xiao, and H. V . Poor, “Two-dimensional anti-jamming communication based on deep reinforcement learning,” in2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2017, pp. 2087–2091

  23. [23]

    Online convex programming and generalized infinitesi- mal gradient ascent,

    M. Zinkevich, “Online convex programming and generalized infinitesi- mal gradient ascent,” inProceedings of the 20th international conference on machine learning (ICML’03), 2003, pp. 928–936

  24. [24]

    Online learning and online convex optimiza- tion,

    S. Shalev-Shwartzet al., “Online learning and online convex optimiza- tion,”Foundations and Trends® in Machine Learning, vol. 4, no. 2, pp. 107–194, 2012

  25. [25]

    Logarithmic regret algorithms for online convex optimization,

    E. Hazan, A. Agarwal, and S. Kale, “Logarithmic regret algorithms for online convex optimization,”Machine Learning, vol. 69, no. 2, pp. 169– 192, 2007

  26. [26]

    A real-time and protocol-aware reactive jamming framework built on software-defined radios,

    D. Nguyen, C. Sahin, B. Shishkin, N. Kandasamy, and K. R. Dandekar, “A real-time and protocol-aware reactive jamming framework built on software-defined radios,” inProceedings of the 2014 ACM Workshop on Software Radio Implementation Forum, Aug. 2014, pp. 15–22

  27. [27]

    A survey of radar ECM and ECCM,

    L. Neng-Jing and Z. Yi-Ting, “A survey of radar ECM and ECCM,” IEEE Transactions on Aerospace and Electronic Systems, vol. 31, no. 3, pp. 1110–1120, 1995

  28. [28]

    Mathematic principles of interrupted-sampling repeater jamming (ISRJ),

    X. Wang, J. Liu, W. Zhang, Q. Fu, Z. Liu, and X. Xie, “Mathematic principles of interrupted-sampling repeater jamming (ISRJ),”Science in China Series F: Information Sciences, vol. 50, pp. 113–123, 2007

  29. [29]

    Radar compound jamming cognition based on a deep object detection network,

    J. Zhang, Z. Liang, C. Zhou, Q. Liu, and T. Long, “Radar compound jamming cognition based on a deep object detection network,”IEEE Transactions on Aerospace and Electronic Systems, vol. 59, no. 3, pp. 3251–3263, 2022

  30. [30]

    Introduction to online convex optimization,

    E. Hazan, “Introduction to online convex optimization,” Aug. 2023

  31. [31]

    Extracting certainty from uncertainty: Regret bounded by variation in costs,

    E. Hazan and S. Kale, “Extracting certainty from uncertainty: Regret bounded by variation in costs,”Machine learning, vol. 80, pp. 165–188, 2010

  32. [32]

    Lattimore and C

    T. Lattimore and C. Szepesv ´ari,Bandit algorithms, 1st ed. Cambridge University Press, Jul. 2020

  33. [33]

    The non- stochastic multiarmed bandit problem,

    P. Auer, N. Cesa-Bianchi, Y . Freund, and R. E. Schapire, “The non- stochastic multiarmed bandit problem,”SIAM Journal on Computing, vol. 32, no. 1, pp. 48–77, Jan. 2002

  34. [34]

    Modern techniques of power spectrum estimation,

    C. Bingham, M. Godfrey, and J. Tukey, “Modern techniques of power spectrum estimation,”IEEE Transactions on Audio and Electroacoustics, vol. 15, no. 2, pp. 56–66, 1967

  35. [35]

    Boashash,Time-frequency signal analysis and processing: A compre- hensive reference

    B. Boashash,Time-frequency signal analysis and processing: A compre- hensive reference. Academic press, 2015

  36. [36]

    Power spectrum parameter estimation,

    M. Levin, “Power spectrum parameter estimation,”IEEE Transactions on Information Theory, vol. 11, no. 1, pp. 100–107, 1965

  37. [37]

    Digital radio frequency memory,

    S. Roome, “Digital radio frequency memory,”Electronics & Communi- cation Engineering Journal, vol. 2, no. 4, pp. 147–153, 1990

  38. [38]

    Digital radio frequency memory linear range gate stealer spectrum,

    S. D. Berger, “Digital radio frequency memory linear range gate stealer spectrum,”IEEE Transactions on Aerospace and Electronic Systems, vol. 39, no. 2, pp. 725–735, 2003

  39. [39]

    Suppression of smeared spectrum ECM signal,

    M.-H. Sun and B. Tang, “Suppression of smeared spectrum ECM signal,”Journal of the Chinese Institute of Engineers, vol. 32, no. 3, pp. 407–413, 2009