pith. sign in

arxiv: 2605.16759 · v1 · pith:GHR5RSTYnew · submitted 2026-05-16 · ⚛️ physics.med-ph

Topological structure of radiation-induced DNA damage encodes coupled LET-oxygen signatures

Pith reviewed 2026-05-19 19:46 UTC · model grok-4.3

classification ⚛️ physics.med-ph
keywords DNA double-strand breakspersistent homologyparticle therapyoxygen enhancementLET dependenceradiobiologyTOPAS-nBiohypoxic tumors
0
0 comments X

The pith

The topology of DNA double-strand breaks encodes the identity of the radiation particle, its position in the beam, and the local oxygen tension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that analyzing the spatial arrangement of radiation-induced DNA double-strand breaks using persistent homology reveals a structured encoding of key radiotherapy parameters. Particle type and location within the spread-out Bragg peak are perfectly identifiable from the topological features, while oxygen level information is present but becomes harder to extract as the linear energy transfer of the particles increases. This hierarchy arises because different physical mechanisms control each layer of the encoding, with topological summaries like persistent entropy playing a major role in capturing oxygen effects. A sympathetic reader would care because this offers a new computational way to characterize damage in low-oxygen tumor regions, which are often resistant to radiation treatment.

Core claim

DSB topology encodes particle identity, Spread-Out Bragg Peak position, and oxygen tension in a three-tier hierarchy. Particle identity and SOBP position are exactly decodable with balanced accuracy of 1.000. Oxygen-level classification degrades monotonically with LET from 0.517 for electrons to 0.189 for carbon distal SOBP, with a charge-driven non-monotonicity at the helium-to-carbon transition. The joint 49-class task achieves balanced accuracy 0.346. Per-class recall peaks at 0.5% O2, consistent with the OER curve inflection. Topological summaries dominate oxygen encoding.

What carries the argument

Persistent homology features extracted from the three-dimensional configuration of DSBs, including persistent entropy and landscape integrals, which are then used in Random Forest classification to decode the radiation conditions.

Load-bearing premise

The Voxel-Aware Oxygen model and TOPAS-nBio simulations generate DSB topologies whose persistent homology features accurately mirror real biological damage mechanisms for the tested ranges of LET and oxygen.

What would settle it

Laboratory experiments measuring actual DSB spatial distributions in irradiated cells under controlled particle beams and oxygen tensions that fail to reproduce the reported classification accuracies and monotonic degradation with LET.

Figures

Figures reproduced from arXiv: 2605.16759 by Ramon Jose C. Bagunu, Renato III Fernan Bolo.

Figure 1
Figure 1. Figure 1: Classification of oxygen tension by DSB topology features. a Task 1 (7-class O2 classification) balanced accuracy (BA) for each of the seven particle configurations, ordered by LET. Error bars: ±1 SD over 5-fold × 10-repeat stratified cross-validation (500 trees, balanced class weights). Dotted line: chance level (1/7 ≈ 0.143). The annotated bracket marks the helium-distal to carbon-proximal non-monotonici… view at source ↗
Figure 2
Figure 2. Figure 2: Single-modality oxygen-level classification accuracy. Each cell gives the 7-class balanced accuracy for a Random Forest trained on one modality alone for one particle configuration (200 trees, 5-fold × 5-repeat cross-validation, balanced class weights). Asterisk (*): highest-accuracy modality per column. Color bands on the left margin identify modalities m1– m7. m7 Topological Summaries achieves the highes… view at source ↗
Figure 3
Figure 3. Figure 3: Dual-axis scatter of one-way ANOVA effect sizes (η 2 ) for all 107 features. x-axis: η 2 with O2 level as grouping variable. y-axis: η 2 with particle type as grouping variable. Each point represents one feature; color encodes modality. Three landmark features are labeled. Upper region: m2, m3, m4 features encoding track physics (η 2 particle ≈ 1, η 2 O2 ≈ 0). Left region: m1, m5, m7 oxygen-sensitive featu… view at source ↗
Figure 4
Figure 4. Figure 4: Within-condition and between-condition Wasserstein-2 distance distributions for a H0 and b H1 persistence diagrams, computed over all 2,450 nucleus pairs (60,025 within￾condition pairs; 2,940,000 between-condition pairs). Violin plots show the full distribution; bold horizontal bars indicate medians (values labeled to the right of each violin). Separation ratio: between-condition median divided by within-c… view at source ↗
Figure 5
Figure 5. Figure 5: Partial-out test characterizing the dual oxygen-encoding mechanism of m7 Topolog￾ical Summaries. a 7-class O2 balanced accuracy under five feature-set conditions: Full (all 107 features), −nDSBs (106), −m1 (74), m7 raw (10), and m7 residualized (10 features after OLS removal of the DSB count feature from each m7 feature). Dotted line: chance level (0.143). b m7 feature-level η 2 O2 before (raw) and after (… view at source ↗
read the original abstract

We present the first nuclear-scale persistent homology and Random Forest classification analysis of radiation-induced DNA double-strand break (DSB) topology across the clinical particle therapy range. Using TOPAS-nBio and the Voxel-Aware Oxygen model, we generated 2,450 simulated nuclei across 49 conditions (seven particle configurations, 0.2--70.7~keV/\textmu{}m; seven oxygen levels, 0.005--21\%~O$_2$) and extracted a 107-feature matrix across seven modalities. DSB topology encodes particle identity, Spread-Out Bragg Peak (SOBP) position, and oxygen tension in a three-tier hierarchy, with fidelity at each tier governed by the physical mechanism controlling it. Particle identity and SOBP position are exactly decodable (balanced accuracy = 1.000). Oxygen-level classification degrades monotonically with LET from 0.517 (electrons) to 0.189 (carbon distal SOBP), with a charge-driven non-monotonicity at the helium-to-carbon transition confirming that atomic number, not LET alone, governs topological discriminability. The joint 49-class task achieves balanced accuracy 0.346, seventeen times above chance. Per-class recall peaks universally at 0.5\%~O$_2$ (0.788--0.976 across all configurations), which is consistent with the OER curve inflection. Topological Summaries (persistent entropy, landscape integrals) dominate oxygen encoding at all LET ($\eta^2_{O_2} =\,$0.300--0.622). A partial-out test reveals two mechanistically separable channels: a count-mediated scale signal ($\eta^2_{O_2}$ survival ratio 0.062) and a count-independent shape signal preserved or enhanced in five of seven configurations (balanced accuracy survival ratio 1.011). Persistent entropy and landscape integrals, as novel radiobiological observables, provide a computational basis for characterizing oxygen-dependent damage topology in hypoxic tumor treatment planning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a computational study applying persistent homology to extract topological features from DNA double-strand break (DSB) configurations generated by TOPAS-nBio Monte Carlo simulations that incorporate a Voxel-Aware Oxygen model. Across 2,450 simulated nuclei spanning seven particle/LET configurations (0.2–70.7 keV/μm) and seven oxygen levels (0.005–21 % O₂), a 107-feature matrix is fed to Random Forest classifiers. The central claims are that DSB topology encodes particle identity, SOBP position, and oxygen tension in a three-tier hierarchy, with balanced accuracy exactly 1.000 for particle identity and SOBP position, oxygen classification accuracy degrading monotonically with LET (0.517 for electrons to 0.189 for carbon distal SOBP), a charge-driven non-monotonicity at the helium–carbon transition, and the existence of separable count-mediated and count-independent shape signals.

Significance. If the simulation faithfully captures biological DSB geometry, the work introduces persistent entropy and landscape integrals as novel radiobiological observables that could help quantify LET–oxygen coupling in hypoxic tumor regions during particle therapy planning. The scale of the simulation campaign (2,450 nuclei, 49 conditions) and the separation of count-dependent versus shape-only contributions are strengths that would, once experimentally anchored, provide a falsifiable computational framework for testing damage-topology hypotheses.

major comments (2)
  1. [Methods (Voxel-Aware Oxygen model)] Methods section describing the Voxel-Aware Oxygen model: the model’s oxygen-diffusion and radical-scavenging rules are presented without any direct comparison to experimental measurements of DSB spatial clustering or track-core versus penumbra geometries across the tested LET range. Because the reported perfect classification accuracies, the monotonic oxygen degradation, and the helium–carbon non-monotonicity are all downstream of these simulated topologies, the absence of such validation leaves open the possibility that the three-tier hierarchy reflects model-specific artifacts rather than physical or biological signatures.
  2. [Results (classification performance)] Results section on classification performance: the balanced accuracy of 1.000 for particle identity and SOBP position is stated without accompanying standard deviations, cross-validation folds, or confusion matrices. Given that the 107-feature matrix is derived from a finite set of 2,450 nuclei and that Random Forest can overfit high-dimensional topological summaries, these details are required to establish that the claimed exact decodability is robust rather than an artifact of the particular train–test split.
minor comments (2)
  1. [Abstract] Abstract: the phrase “seventeen times above chance” for the 49-class task should be accompanied by the explicit chance level (1/49) for immediate clarity.
  2. [Methods (feature extraction)] The 107-feature matrix composition is described only at a high level; an explicit breakdown of how many features come from each of the seven modalities (e.g., persistent entropy versus landscape integrals) would aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which highlight important aspects of model validation and statistical reporting. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Methods (Voxel-Aware Oxygen model)] Methods section describing the Voxel-Aware Oxygen model: the model’s oxygen-diffusion and radical-scavenging rules are presented without any direct comparison to experimental measurements of DSB spatial clustering or track-core versus penumbra geometries across the tested LET range. Because the reported perfect classification accuracies, the monotonic oxygen degradation, and the helium–carbon non-monotonicity are all downstream of these simulated topologies, the absence of such validation leaves open the possibility that the three-tier hierarchy reflects model-specific artifacts rather than physical or biological signatures.

    Authors: We agree that direct experimental anchoring of the simulated DSB geometries would strengthen interpretation of the topological signatures. The Voxel-Aware Oxygen model extends established radical-diffusion and scavenging frameworks already implemented in TOPAS-nBio; these components have been benchmarked against measured oxygen enhancement ratios and track-structure data in prior literature. However, the manuscript does not include new side-by-side comparisons of simulated versus measured DSB spatial clustering or core-versus-penumbra distributions across the full LET range. In the revised version we will add a dedicated paragraph in the Methods and a limitations subsection in the Discussion that (i) cites the relevant experimental benchmarks for the underlying oxygen model, (ii) explicitly states that the reported three-tier hierarchy remains a computational prediction pending experimental validation, and (iii) outlines how future measurements of DSB topology could test the predicted LET–oxygen coupling. revision: yes

  2. Referee: [Results (classification performance)] Results section on classification performance: the balanced accuracy of 1.000 for particle identity and SOBP position is stated without accompanying standard deviations, cross-validation folds, or confusion matrices. Given that the 107-feature matrix is derived from a finite set of 2,450 nuclei and that Random Forest can overfit high-dimensional topological summaries, these details are required to establish that the claimed exact decodability is robust rather than an artifact of the particular train–test split.

    Authors: We acknowledge that reporting only the point estimate of balanced accuracy = 1.000 without variability measures leaves the robustness open to question. The classification was performed with Random Forest on the 107-feature matrix derived from the 2,450 nuclei. In the revised manuscript we will expand the Results section to report (i) 5-fold cross-validation results with mean balanced accuracy and standard deviation across folds, (ii) the exact train–test split ratios and random seeds used, and (iii) the full confusion matrices for both the particle-identity and SOBP-position tasks. These additions will demonstrate that the perfect decodability is reproducible across partitions and not an artifact of a single split. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper's core derivation proceeds from Monte Carlo generation of DSB topologies (TOPAS-nBio + Voxel-Aware Oxygen model) to extraction of 107 persistent-homology features across seven modalities, followed by standard Random Forest classification on the resulting feature matrix. Reported balanced accuracies (1.000 for particle identity/SOBP position, monotonic degradation for oxygen, joint 49-class accuracy 0.346) are direct empirical outputs of this pipeline applied to independently simulated data; no equations, fitted parameters, or self-citations reduce these quantities to tautological restatements of the inputs. The three-tier hierarchy and partial-out tests (count-mediated vs. count-independent signals) are likewise downstream statistical observations rather than definitional identities. The analysis is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the fidelity of the chosen simulation models and the assumption that topological features extracted from them capture mechanistically relevant signals separable from count-based effects.

axioms (1)
  • domain assumption The Voxel-Aware Oxygen model accurately represents oxygen effects on DSB topology across 0.005--21% O2.
    Invoked to generate the 49-condition dataset whose topological features support all classification results.

pith-pipeline@v0.9.0 · 5905 in / 1276 out tokens · 52731 ms · 2026-05-19T19:46:05.625636+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We present the first nuclear-scale persistent homology and Random Forest classification analysis of radiation-induced DNA double-strand break (DSB) topology... extracted a 107-feature matrix across seven modalities... m7 Topological Summaries (persistent entropy, landscape integrals)

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages

  1. [1]

    , title =

    Cucinotta, Francis A. , title =. International Journal of Molecular Sciences , volume =. 2024 , doi =

  2. [2]

    Cancers , volume =

    Sokol, Olga and Durante, Marco , title =. Cancers , volume =. 2023 , doi =

  3. [3]

    and Durante, M

    Tinganelli, W. and Durante, M. and Hirayama, R. and Kr. Kill-painting of hypoxic tumours in charged particle therapy , journal =. 2015 , doi =

  4. [4]

    Acta Oncologica , volume =

    Bassler, Niels and Toftegaard, Jakob and L\"uhr, Armin and S. Acta Oncologica , volume =. 2014 , doi =

  5. [5]

    Nature , volume =

    Alper, Tikvah and Howard-Flanders, Paul , title =. Nature , volume =. 1956 , doi =

  6. [6]

    Radiation Research , volume =

    Furusawa, Yoshiya and Fukutsu, Koichi and Aoki, Masami and Itsukaichi, Hiroshi and Eguchi-Kasai, Kiyomi and Ohara, Hitoshi and Yatagai, Fumio and Kanai, Tatsuaki and Ando, Koichi , title =. Radiation Research , volume =. 2000 , doi =

  7. [7]

    Robert and Partridge, M

    Grimes, D. Robert and Partridge, M. , title =. Biomedical Physics. 2015 , doi =

  8. [8]

    and Tinganelli, W

    Scifoni, E. and Tinganelli, W. and Weyrather, W. K. and Durante, M. and Maier, A. and Kr. Including oxygen enhancement ratio in ion beam treatment planning:. Physics in Medicine. 2013 , doi =

  9. [9]

    and Torriani, F

    Strigari, L. and Torriani, F. and Manganaro, L. and Inaniwa, T. and Dalmasso, F. and Cirio, R. and Attili, A. , title =. Physics in Medicine and Biology , year =

  10. [10]

    Chapman, J. D. , title =. New England Journal of Medicine , volume =. 1979 , doi =

  11. [11]

    Huang, Y. W. and Pan, C. Y. and Hsiao, Y. Y. and Chao, T. C. and Lee, C. C. and Tung, C. J. , title =. Physics in Medicine and Biology , volume =. 2015 , doi =

  12. [12]

    Discrete

    Herbert Edelsbrunner and David Letscher and Afra Zomorodian , title =. Discrete. 2002 , doi =

  13. [13]

    Discrete & Computational Geometry , volume =

    Zomorodian, Afra and Carlsson, Gunnar , title =. Discrete & Computational Geometry , volume =. 2005 , doi =

  14. [14]

    Journal of Applied and Computational Topology , volume =

    Bauer, Ulrich , title =. Journal of Applied and Computational Topology , volume =. 2021 , doi =

  15. [15]

    Journal of Open Source Software , volume =

    Tralie, Christopher and Saul, Nathaniel and Bar-On, Rann , title =. Journal of Open Source Software , volume =. 2018 , doi =

  16. [16]

    Journal of Machine Learning Research , volume =

    Adams, Henry and Chepushtanova, Sofya and Emerson, Tegan and Hanson, Eric and Kirby, Michael and Motta, Francis and Neville, Rachel and Peterson, Chris and Shipman, Patrick and Ziegelmeier, Lori , title =. Journal of Machine Learning Research , volume =. 2017 , url =

  17. [17]

    Thomas, A. M. , title =. Journal of Applied and Computational Topology , volume =. 2025 , doi =

  18. [18]

    , title =

    Bukkuri, Abhinav and Andor, Noemi and Darcy, Isabelle K. , title =. Frontiers in Artificial Intelligence , volume =. 2021 , doi =

  19. [19]

    and Hausmann, Michael , title =

    Hofmann, Alexander and Krufczik, Matthias and Heermann, Dieter W. and Hausmann, Michael , title =. International Journal of Molecular Sciences , volume =. 2018 , doi =

  20. [20]

    Single molecule localization microscopy analyses of

    Hausmann, Michael and Neitzel, Charlotte and Bobkova, Elizaveta and Nagel, David and Hofmann, Andreas and Chramko, Tatyana and Smirnova, Elena and Kope. Single molecule localization microscopy analyses of. Frontiers in Physics , volume =. 2020 , doi =

  21. [21]

    Topological analysis of

    Hannes Hahn and Charlotte Neitzel and Olga Kope. Topological analysis of. Cancers , volume =. 2021 , doi =

  22. [22]

    Nanoscale topology of

    K. Nanoscale topology of. Nanoscale , volume =. 2026 , doi =

  23. [23]

    Nano-architecture of persistent focal

    Scherthan, Heidemarie and Geiger, Beate and Ridinger, Daniel and M. Nano-architecture of persistent focal. Biomolecules , volume =. 2023 , doi =

  24. [24]

    Advanced image-free analysis of the nano-organization of chromatin and other biomolecules by single molecule localization microscopy (

    Weidner, Jonas and Neitzel, Charlotte and Gote, Martin and Deck, Jeanette and K. Advanced image-free analysis of the nano-organization of chromatin and other biomolecules by single molecule localization microscopy (. Computational and Structural Biotechnology Journal , volume =. 2023 , doi =

  25. [25]

    Friedrich and K

    T. Friedrich and K. Ilicic and C. Greubel and S. Girst and J. Reindl and M. Sammer and B. Schwarz and C. Siebenwirth and D. W. M. Walsh and T. E. Schmid and M. Scholz and G. Dollinger , title =. Scientific Reports , volume =. 2018 , doi =

  26. [26]

    Radiation Research , year =

    Hu, Ankang and Zhou, Wanyi and Luo, Xiyu and Qiu, Rui and Li, Junli , title =. Radiation Research , year =

  27. [27]

    and Perl, Joseph and Held, Kathryn and Faddegon, Bruce and Paganetti, Harald and Schuemann, Jan , title =

    McNamara, Aidan and Geng, Chen and Turner, Ross and Mendez, Jose R. and Perl, Joseph and Held, Kathryn and Faddegon, Bruce and Paganetti, Harald and Schuemann, Jan , title =. Physica Medica , volume =. 2017 , doi =

  28. [28]

    and McNamara, A

    Schuemann, J. and McNamara, A. L. and Ramos-Méndez, J. and Perl, J. and Held, K. D. and Paganetti, H. and others , title =. Radiation Research , year =

  29. [29]

    and McNamara, A

    Schuemann, J. and McNamara, A. L. and Warmenhoven, J. W. and Henthorn, N. T. and Kirkby, K. J. and Merchant, M. J. and Ingram, S. and Paganetti, H. and Held, K. D. and Ramos-Mendez, J. and Faddegon, B. and Perl, J. and Goodhead, D. T. and Plante, I. and Rabus, H. and Nettelbeck, H. and Friedland, W. and Kundr. A new standard. Radiation Research , year =

  30. [30]

    Impact of

    Bertolet, Alexander and Ramos-M. Impact of. Radiation Research , volume =. 2022 , doi =

  31. [31]

    Radiation Research , volume =

    Hirayama, Ryoichi and Ito, Atsushi and Tomita, Masanori and Tsukada, Teruyo and Yatagai, Fumio and Noguchi, Miho and Matsumoto, Yoshitaka and Kase, Yuki and Ando, Koichi and Okayasu, Ryuichi and Furusawa, Yoshiya , title =. Radiation Research , volume =. 2009 , doi =

  32. [32]

    Machine Learning , volume =

    Breiman, Leo , title =. Machine Learning , volume =. 2001 , doi =

  33. [33]

    , title =

    Bolo, Renato III Fernan and Bagunu, Ramon Jose C. , title =. arXiv , year =

  34. [34]

    Cancers , volume =

    Ambrosio, Susanna and Noviello, Anna and Di Fusco, Giovanni and Gorini, Francesca and Piscone, Anna and Amente, Stefano and Majello, Barbara , title =. Cancers , volume =. 2025 , doi =

  35. [35]

    , title =

    Caron, Pierre and Polo, Sophie E. , title =. Trends in Biochemical Sciences , volume =. 2020 , doi =

  36. [36]

    , title =

    Scully, Ralph and Panday, Arvind and Elango, Rajula and Willis, Nicholas A. , title =. Nature Reviews Molecular Cell Biology , volume =. 2019 , doi =

  37. [37]

    International Journal of Molecular Sciences , volume =

    Schäfer, Myriam and Hildenbrand, Georg and Hausmann, Michael , title =. International Journal of Molecular Sciences , volume =. 2024 , doi =

  38. [38]

    Pattern Recognition , volume =

    Nieves Atienza and Rocio Gonzalez-Díaz and Manuel Soriano-Trigueros , title =. Pattern Recognition , volume =. 2020 , doi =

  39. [39]

    The Journal of Machine Learning Research , volume =

    Bubenik, Peter , title =. The Journal of Machine Learning Research , volume =. 2015 , url =