pith. sign in

arxiv: 2602.03856 · v2 · submitted 2026-01-23 · 📡 eess.SP · cs.LG

The Turing Synthetic Radar Dataset: A dataset for pulse deinterleaving

Pith reviewed 2026-05-16 11:49 UTC · model grok-4.3

classification 📡 eess.SP cs.LG
keywords synthetic datasetradar pulse deinterleavingpulse trainselectronic warfaresignal intelligenceemitter clusteringV-measurebenchmark
0
0 comments X

The pith

A large synthetic radar pulse dataset enables deinterleaving research with realistic multi-emitter overlaps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Turing Synthetic Radar Dataset as a public resource to separate interleaved pulses from multiple unknown radar emitters in electronic warfare and signal intelligence settings. It contains 6000 pulse trains across two receiver configurations, totaling nearly 3 billion pulses, with scenarios that include up to 110 emitters and extensive overlaps in their parameters. Real labeled data of this kind is scarce, so the synthetic collection is positioned to act as a benchmark while also supporting development of new clustering-based deinterleaving methods. An accompanying challenge requires models to assign pulses to emitters by maximizing the V-measure and similar metrics.

Core claim

The Turing Synthetic Radar Dataset is one of the first publicly available, comprehensively simulated pulse train datasets that contains 6000 pulse trains totaling almost 3 billion pulses, featuring realistic scenarios with up to 110 emitters and significant parameter space overlap, to serve as both a benchmark for radar pulse deinterleaving research and an enabler of new methods in the electronic warfare community.

What carries the argument

The Turing Synthetic Radar Dataset of simulated interleaved pulse trains, which supplies the raw sequences and parameter overlaps needed for clustering pulses back to their originating emitters.

If this is right

  • Models can be trained and tested on high-complexity cases with more than 100 simultaneous emitters and heavy parameter overlaps.
  • Standardized evaluation becomes possible through the Turing Deinterleaving Challenge using the V-measure on clustered pulse assignments.
  • The public release removes a major data barrier that has limited progress in electronic warfare signal processing.
  • New clustering algorithms can be developed and compared against a shared, large-scale reference collection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset generation approach could be adapted to create training material for related problems such as communications signal separation.
  • Hybrid training that mixes the synthetic pulses with small amounts of real data may improve robustness when models encounter field recordings.
  • Performance gains on the challenge may correlate with improved emitter identification in operational electronic warfare receivers.
  • Extending the simulation with time-varying emitter behaviors or additional noise types would test model generalization further.

Load-bearing premise

The synthetic pulse trains and parameter overlaps accurately represent the statistical and physical complexities encountered in real-world radar environments and receiver hardware.

What would settle it

A direct comparison in which models trained on the synthetic dataset show substantially lower clustering accuracy on actual recorded radar signals than on the provided data would indicate the simulation fails to capture essential real-world features.

Figures

Figures reproduced from arXiv: 2602.03856 by Adam Hosford, Edward Gunn, Ian Groves, Leo Zeitler, Robert Jones, Victoria Nockles.

Figure 1
Figure 1. Figure 1: The TSRD includes realistic transmitter-receiver behaviours. For each simulated pulse train, a static receiver detects pulses from multiple emitters at varying distances on a two-dimensional plane, simulating realistic signal propagation effects, such as path loss and detected angle of arrival. Pulses sent from too far or at the wrong angle are not detected. Emitters operate on different modes, which inclu… view at source ↗
Figure 2
Figure 2. Figure 2: Emitted pulses substantially overlap in the parameter space, rendering straightforward deinterleaving challenging. (A) and (B) exemplify two received pulse trains over ToA and amplitude in scan and stare mode, respectively, demonstrating that emitter signals are substantially superimposed. Simple dein￾terleaving is challenging, requiring sophisticated model devel￾opment that makes use of clean data with gr… view at source ↗
Figure 3
Figure 3. Figure 3: PDWs mimic realistic radar transmitters. We simulated pulse transmission and detection in realistic environments characterised by 5-feature PDWs. Figure (A) and (B) demonstrate stare and scan receiver models over frequency, pulse width, AoA, and amplitude. The substantial overlap of radar pulses suggest that successful deinterleaving can only be achieved by leveraging temporal patterns over all parts of th… view at source ↗
Figure 4
Figure 4. Figure 4: Emitter-level statistics are well balanced over the entire dataset. (Top) The number of emitters is approximately uniformly distributed over all pulse trains, rendering some more complex than others. Emitter numbers over 80 eventually tail off. (Bottom) As expected the average number of pulses per emitter follows a Poisson-like distribution as expected from count data. Statistics were computed in scan mode… view at source ↗
Figure 5
Figure 5. Figure 5: Distributions for amplitude, frequency, and pulse width. PDWs are differently distributed across pulse trains, as demonstrated for amplitude, frequency, and pulse width (left to right) for scan mode (top) and stare model (bottom). emitter at a proportion of up to 99.7%, which leads to a median per-emitter contribution of 2.4%. Whilst this is intended to mimic realistic scenarios which most current deinterl… view at source ↗
Figure 6
Figure 6. Figure 6: PDW features are largely independent, suggesting that every feature can contribute to better task performance. Although frequency exhibits a weak correlation with pulse width and amplitude, most PDW features are independent of each other, indicating that all data properties can contribute useful statistics for downstream tasks. Correlations were measured for scan mode. homogeneity, and completeness, across… view at source ↗
Figure 7
Figure 7. Figure 7: HDBscan on the raw PDWs provides a first baseline for the Turing Deinterleaving Challenge. Whilst HDBscan cluster of pulses collected in stare yields higher values for V-measure, AMI, ARI, homogeneity, and completeness, scan yields a slightly better worst-case performance as measured in the pairwise-bianry metrics MCC and F1. III. OUTLOOK & CONCLUSION The TSRD provides the radar deinterleaving research com… view at source ↗
read the original abstract

We present the Turing Synthetic Radar Dataset, a comprehensive dataset to serve both as a benchmark for radar pulse deinterleaving research and as an enabler of new research methods. The dataset addresses the critical problem of separating interleaved radar pulses from multiple unknown emitters for electronic warfare applications and signal intelligence. Our dataset contains a total of 6000 pulse trains over two receiver configurations, totalling to almost 3 billion pulses, featuring realistic scenarios with up to 110 emitters and significant parameter space overlap. To encourage dataset adoption and establish standardised evaluation procedures, we have launched an accompanying Turing Deinterleaving Challenge, for which models need to associate pulses in interleaved pulse trains to the correct emitter by clustering and maximising metrics such as the V-measure. The Turing Synthetic Radar Dataset is one of the first publicly available, comprehensively simulated pulse train datasets aimed to facilitate sophisticated model development in the electronic warfare community

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the Turing Synthetic Radar Dataset, comprising 6000 simulated pulse trains (nearly 3 billion pulses total) across two receiver configurations, with scenarios involving up to 110 emitters and substantial parameter overlap in RF, PRI, PW, and amplitude. The dataset is positioned as a public benchmark for radar pulse deinterleaving research in electronic warfare and signal intelligence, accompanied by the Turing Deinterleaving Challenge that evaluates models via clustering metrics such as V-measure.

Significance. A large-scale, publicly released synthetic dataset with standardized evaluation protocols would fill a notable gap in resources for electronic warfare algorithm development, particularly if the generation process produces statistically representative pulse trains; the scale (emitter count and pulse volume) and challenge framework are concrete strengths that could enable reproducible progress in deinterleaving methods.

major comments (2)
  1. [Dataset generation and validation] Dataset generation and validation section: the manuscript describes emitter parameter sampling and two receiver models but supplies no quantitative fidelity checks (e.g., Kolmogorov-Smirnov tests, Earth Mover's Distance, or moment matching) of the joint (RF, PRI, PW, amplitude) distributions against any measured real-world radar data or hardware-in-the-loop recordings; this directly weakens the central claim that the scenarios 'accurately represent the statistical and physical complexities' of operational environments.
  2. [Abstract and introduction] Abstract and introduction: the assertion that the dataset features 'realistic scenarios' with 'significant parameter space overlap' is presented without supporting evidence or sensitivity analysis showing that the chosen overlap statistics match observed real-world emitter densities and modulation behaviors, leaving open the possibility that models trained on the data may exploit simulation-specific artifacts.
minor comments (2)
  1. [Abstract] The total pulse count ('almost 3 billion') should be stated exactly in the abstract and methods for reproducibility.
  2. [Methods] Clarify whether the two receiver configurations differ only in sampling rate or also in noise model and dynamic range; a table comparing their parameters would improve clarity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive feedback and positive assessment of the dataset scale and challenge framework. We address each major comment below, indicating revisions to the manuscript.

read point-by-point responses
  1. Referee: Dataset generation and validation section: the manuscript describes emitter parameter sampling and two receiver models but supplies no quantitative fidelity checks (e.g., Kolmogorov-Smirnov tests, Earth Mover's Distance, or moment matching) of the joint (RF, PRI, PW, amplitude) distributions against any measured real-world radar data or hardware-in-the-loop recordings; this directly weakens the central claim that the scenarios 'accurately represent the statistical and physical complexities' of operational environments.

    Authors: We agree that direct quantitative fidelity checks against real-world data would strengthen the manuscript. However, operational radar recordings are typically classified and unavailable for public release or comparison. The emitter parameters were instead drawn from distributions grounded in open radar engineering literature (e.g., standard RF bands, PRI ranges for search/track radars, and PW/amplitude statistics). We have revised the Dataset generation section to cite these sources explicitly, added marginal and pairwise distribution plots, and replaced the claim that scenarios 'accurately represent' real environments with language stating they are 'synthetic scenarios constructed to emulate the statistical and physical complexities'. These changes provide transparency on modeling choices without overstating fidelity. revision: yes

  2. Referee: Abstract and introduction: the assertion that the dataset features 'realistic scenarios' with 'significant parameter space overlap' is presented without supporting evidence or sensitivity analysis showing that the chosen overlap statistics match observed real-world emitter densities and modulation behaviors, leaving open the possibility that models trained on the data may exploit simulation-specific artifacts.

    Authors: We accept this point and have revised the abstract and introduction to remove the term 'realistic scenarios', replacing it with 'synthetic scenarios with up to 110 emitters and substantial parameter space overlap'. We have added a new subsection describing how overlap levels were selected to reflect dense multi-emitter environments reported in EW literature, together with a figure showing the distribution of overlap statistics across the 6000 pulse trains and a brief sensitivity discussion on clustering difficulty as overlap increases. These additions supply the requested evidence and reduce the chance that simulation artifacts go unexamined. revision: yes

standing simulated objections not resolved
  • Direct quantitative statistical comparisons (e.g., KS tests or EMD) to measured real-world radar data, due to classification restrictions on operational recordings.

Circularity Check

0 steps flagged

Dataset release paper contains no derivation chain

full rationale

The manuscript introduces and describes the Turing Synthetic Radar Dataset along with its generation process and two receiver configurations. No equations, fitted parameters, predictions, or mathematical derivations appear anywhere in the provided text. The central contribution is the public release of simulated pulse trains and an accompanying challenge; there are no load-bearing steps that reduce by construction to prior inputs, self-citations, or ansatzes. This is a standard dataset paper whose claims rest on the fidelity of the simulation description rather than any circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that synthetic generation can produce sufficiently realistic interleaved pulse trains; no free parameters are fitted to external data and no new physical entities are postulated.

axioms (1)
  • domain assumption Synthetic simulation of radar emitters and receivers can produce pulse trains whose statistical properties match those needed for algorithm development
    Invoked when claiming the dataset enables realistic research despite being fully simulated.

pith-pipeline@v0.9.0 · 5460 in / 1233 out tokens · 56259 ms · 2026-05-16T11:49:48.352023+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    The Intelligent Evolution of Radar Signal Deinterleaving: A Systematic Review from Foundational Algorithms to Cognitive AI Frontiers,

    Z. Qu, J. Zhang, Y . Zhou, L. Ni, Z. Qu, J. Zhang, Y . Zhou, and L. Ni, “The Intelligent Evolution of Radar Signal Deinterleaving: A Systematic Review from Foundational Algorithms to Cognitive AI Frontiers,”Sensors, vol. 26, no. 1, Dec. 2025

  2. [2]

    Radar Pulse Deinterleaving with Transformer Based Deep Metric Learning,

    E. Gunn, A. Hosford, D. Mannion, J. Williams, V . Chhabra, and V . Nockles, “Radar Pulse Deinterleaving with Transformer Based Deep Metric Learning,” in2025 IEEE International Radar Conference (RADAR), May 2025, pp. 1–6

  3. [3]

    Radar Signal Dein- terleaving in Electronic Warfare Systems: A Combined Approach,

    M. A. Nuhoglu and H. A. Cirpan, “Radar Signal Dein- terleaving in Electronic Warfare Systems: A Combined Approach,”IEEE Access, vol. 11, pp. 142 043–142 061, 2023

  4. [4]

    A novel method for deinterleaving radar signals: First-order differ- ence curve based on sorted TOA difference sequence,

    M. Xie, C. Zhao, Y . Zhao, D. Hu, and Z. Wang, “A novel method for deinterleaving radar signals: First-order differ- ence curve based on sorted TOA difference sequence,”IET Signal Processing, vol. 17, no. 1, p. e12162, 2023

  5. [5]

    Multi-stage learning for radar pulse activity segmentation,

    Z. Huang, A. Pemasiri, S. Denman, C. Fookes, and T. Martin, “Multi-stage learning for radar pulse activity segmentation,” inICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 7340–7344

  6. [6]

    State-of-the-art review: Electronic warfare against radar systems,

    R. Reddy and S. Sinha, “State-of-the-art review: Electronic warfare against radar systems,”IEEE Access, 2025

  7. [7]

    Over-the-air deep learning based radio signal classification,

    T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning based radio signal classification,”IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 168–179, 2018

  8. [8]

    Multi-task learning ap- proach for automatic modulation and wireless signal clas- sification,

    A. Jagannath and J. Jagannath, “Multi-task learning ap- proach for automatic modulation and wireless signal clas- sification,” inICC 2021-IEEE International Conference on Communications. IEEE, 2021, pp. 1–7

  9. [9]

    Multi-task learning for radar signal characterisation,

    Z. Huang, A. Pemasiri, S. Denman, C. Fookes, and T. Mar- tin, “Multi-task learning for radar signal characterisation,” in2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE, 2023, pp. 1–5

  10. [10]

    Lstm framework for classification of radar and com- munications signals,

    V . Clerico, J. Gonz ´alez-L´opez, G. Agam, and J. Grajal, “Lstm framework for classification of radar and com- munications signals,” in2023 IEEE Radar Conference (RadarConf23). IEEE, 2023, pp. 1–6

  11. [11]

    Semi- supervised radar work mode recognition based on con- trastive learning,

    P. Sun, M. Du, Z. Li, X. Chen, and J. Shi, “Semi- supervised radar work mode recognition based on con- trastive learning,”Sensors, vol. 25, no. 24, p. 7440, 2025

  12. [12]

    Density- based clustering based on hierarchical density estimates,

    R. J. G. B. Campello, D. Moulavi, and J. Sander, “Density- based clustering based on hierarchical density estimates,” inAdvances in Knowledge Discovery and Data Mining, J. Pei, V . S. Tseng, L. Cao, H. Motoda, and G. Xu, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 160–172