pith. machine review for the scientific record. sign in

arxiv: 2605.08164 · v1 · submitted 2026-05-04 · 💻 cs.DC · cs.AI· cs.CR

Recognition: 2 theorem links

· Lean Theorem

parHSOM: A novel parallel Hierarchical Self-Organizing Map implementation

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:06 UTC · model grok-4.3

classification 💻 cs.DC cs.AIcs.CR
keywords hierarchical self-organizing mapsparallel computingintrusion detectioncybersecuritydistributed trainingmachine learningself-organizing maps
0
0 comments X

The pith

A parallel implementation of hierarchical self-organizing maps trains faster on intrusion detection data while preserving map quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces parHSOM to address the slow sequential training of hierarchical self-organizing maps used in explainable intrusion detection systems. It distributes the training steps across processors and measures the resulting speed gains. Experiments across multiple output grid sizes and five cybersecurity datasets show consistent reductions in training time. The work demonstrates that these speedups occur without major changes to standard performance metrics. This opens a path for using HSOM-based detectors on larger or more frequently updated datasets.

Core claim

parHSOM splits the training of hierarchical self-organizing maps across processors so that the algorithm completes in less time than the sequential version while producing maps whose performance metrics remain comparable on the tested intrusion detection datasets.

What carries the argument

The parallel HSOM architecture that partitions training steps for concurrent execution on multiple processors or nodes.

If this is right

  • Larger cybersecurity datasets become practical for HSOM-based intrusion detection.
  • Models can be retrained more often without prohibitive compute costs.
  • The explainable properties of HSOM remain available in time-sensitive security applications.
  • The same distribution strategy can be tested on other hierarchical neural models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may lower overall energy use for training by shortening wall-clock time on multi-core hardware.
  • Real-time or streaming intrusion detection could update maps more frequently using the faster training loop.
  • Scalability tests with increasing processor counts would reveal where communication overhead begins to limit further gains.

Load-bearing premise

Distributing the original sequential training steps across processors leaves the convergence behavior and final map quality unchanged on the tested datasets.

What would settle it

Train both versions on a new, substantially larger cybersecurity dataset and check whether the parallel maps show clearly worse detection rates or different cluster structures than the sequential maps.

Figures

Figures reproduced from arXiv: 2605.08164 by Andy Perkins, George Trawick, Ioana Banicescu, Logan Cummins, Rebekah Lane, Sudip Mittal.

Figure 1
Figure 1. Figure 1: A visual representation of the output from the HSOM algorithm. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A figure that compares the speed increase of parHSOM across all of [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A figure that compares the speed increase of parHSOM across all of [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

The digital age has completely transformed the way that information is processed and stored, which makes cybersecurity a crucial field of research. Cybersecurity contains many different domains, but this work focuses on Intrusion Detection Systems (IDSs). Within the literature, Hierarchical Self-Organizing Maps (HSOMs) have been used to create trustworthy, explainable, and AI-based IDSs. However, HSOMs are trained sequentially, which means that training HSOMs on large datasets is slow. This work presents a novel parallel HSOM architecture, called parHSOM. The purpose of this research is to investigate the effect that parallel computation has on the HSOM training time. parHSOM is tested on two different testbeds, four different output grid sizes, and five different cybersecurity datasets. Performance metrics collected from these experiments show that parHSOM consistently trains faster than the Sequential HSOM algorithm without any significant loss in performance. Additionally, this work provides a platform for further investigation into parallel HSOM implementations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces parHSOM, a parallel implementation of Hierarchical Self-Organizing Maps (HSOM) for accelerating training in Intrusion Detection Systems (IDS). It evaluates the approach across two testbeds, four output grid sizes, and five cybersecurity datasets, claiming consistent speedups over sequential HSOM with no significant loss in performance, while providing a platform for further parallel HSOM research.

Significance. If the parallel version is shown to preserve HSOM convergence and map quality, the work would enable scaling of topology-preserving, explainable IDS models to larger datasets, which is relevant for cybersecurity applications. The multi-testbed, multi-grid, multi-dataset evaluation is a strength that supports generalizability of the speedup results.

major comments (1)
  1. [Experimental results] The central claim that parHSOM trains faster 'without any significant loss in performance' is load-bearing but unsupported by direct evidence. The experimental results section reports timing comparisons but does not include side-by-side quantitative metrics (quantization error, topographic error, or IDS classification F1 scores) between parHSOM and sequential HSOM, nor statistical tests (multiple random seeds, confidence intervals, or significance tests) to demonstrate equivalence rather than visual or single-run similarity.
minor comments (1)
  1. [Abstract] The abstract and introduction could more explicitly define the exact performance metrics used beyond training time to support the 'no significant loss' claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting the potential impact of parHSOM on scalable explainable IDS. We address the major comment on experimental evidence below and agree that strengthening the support for performance preservation is necessary.

read point-by-point responses
  1. Referee: [Experimental results] The central claim that parHSOM trains faster 'without any significant loss in performance' is load-bearing but unsupported by direct evidence. The experimental results section reports timing comparisons but does not include side-by-side quantitative metrics (quantization error, topographic error, or IDS classification F1 scores) between parHSOM and sequential HSOM, nor statistical tests (multiple random seeds, confidence intervals, or significance tests) to demonstrate equivalence rather than visual or single-run similarity.

    Authors: We agree that the current manuscript does not include the requested side-by-side quantitative metrics or statistical validation. While the experiments collected performance metrics and the text states there was no significant loss, these were not presented comparatively with sequential HSOM, nor were multiple seeds or statistical tests reported. In the revised version we will add direct comparisons of quantization error, topographic error, and IDS F1 scores between parHSOM and sequential HSOM for all datasets and grid sizes. We will also rerun the experiments with multiple random seeds, report means with confidence intervals, and include appropriate significance tests to demonstrate that performance differences are not statistically significant. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical timing and quality comparisons on fixed datasets

full rationale

The paper's central claim rests on direct experimental measurements of training time and performance metrics (accuracy, etc.) for a parallel HSOM implementation versus its sequential counterpart across multiple datasets, grid sizes, and testbeds. No derivation chain, fitted parameters renamed as predictions, self-referential definitions, or load-bearing self-citations appear; the results are obtained by running the code on external data and reporting observed differences. This is a standard engineering evaluation paper whose claims are falsifiable by re-running the experiments and do not reduce to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that HSOM training can be parallelized without altering its core clustering properties, plus standard assumptions about parallel computing correctness.

axioms (1)
  • domain assumption Parallel distribution of HSOM layer training preserves the original sequential learning dynamics and final map quality
    Invoked implicitly when claiming no significant loss in performance after parallelization.

pith-pipeline@v0.9.0 · 5488 in / 1277 out tokens · 54651 ms · 2026-05-12T02:06:07.882569+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Creating an explainable intrusion detection system using self organizing maps,

    J. Ables, T. Kirby, W. Anderson, S. Mittal, S. Rahimi, I. Banicescu, and M. Seale, “Creating an explainable intrusion detection system using self organizing maps,” in2022 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2022, pp. 404–412

  2. [2]

    Self-organizing maps,

    F. Cuello, “Self-organizing maps,” 2024, website Title: TrednSpider Learning Center. [Online]. Available: https://trendspider.com/ learning-center/self-organizing-maps/

  3. [3]

    Data analysis in wireless sensor networks with distributed self organizing map,

    A. Panwar and S. J. Nanda, “Data analysis in wireless sensor networks with distributed self organizing map,” in2024 IEEE 1st International Conference on Advances in Signal Processing, Power , Communication, and Computing (ASPCC), 2024, pp. 61–66

  4. [4]

    Edgesom: Distributed hierarchical edge-driven iot data analytics framework,

    K. Bagher, I. Khalil, A. Alabdulatif, and M. Atiquzzaman, “Edgesom: Distributed hierarchical edge-driven iot data analytics framework,”Com- puter Communications, vol. 172, pp. 64–74, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0140366421000906

  5. [5]

    A parallel adaptive segmentation method based on som and gpu with application to mri image processing,

    A. De, Y . Zhang, and C. Guo, “A parallel adaptive segmentation method based on som and gpu with application to mri image processing,”Neurocomputing, vol. 198, pp. 180–189, 2016, advances in Neural Networks, Intelligent Control and Information Processing. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0925231216003283

  6. [6]

    The self-organizing map,

    T. Kohonen, “The self-organizing map,”Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990

  7. [7]

    Xpysom: High- performance self-organizing maps,

    R. Mancini, A. Ritacco, G. Lanciano, and T. Cucinotta, “Xpysom: High- performance self-organizing maps,” in2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2020, pp. 209–216

  8. [8]

    Self-organizing maps and full gpu parallel approach to graph matching,

    B. Cui, J.-C. Cr ´eput, and L. Zhang, “Self-organizing maps and full gpu parallel approach to graph matching,”Computer Communications, vol. 198, pp. 217–227, 2023. [Online]. Available: https://www.sciencedirect. com/science/article/pii/S0140366422004558

  9. [9]

    A tutorial on self-organizing maps,

    S. Ghorpade and G. Bruns, “A tutorial on self-organizing maps,” California State University, 2023

  10. [10]

    Distributed-som: A novel performance bottleneck handler for large-sized software- defined networks under flooding attacks,

    T. V . Phan, N. K. Bao, and M. Park, “Distributed-som: A novel performance bottleneck handler for large-sized software- defined networks under flooding attacks,”Journal of Network and Computer Applications, vol. 91, pp. 14–25, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1084804517301649

  11. [11]

    A distributed self-organizing map for dos attack detection,

    M. Kim, S. Jung, and M. Park, “A distributed self-organizing map for dos attack detection,” in2015 Seventh International Conference on Ubiquitous and Future Networks, 2015, pp. 19–22

  12. [12]

    Distributing som ensemble training using grid middleware,

    B. L. Vrusias, L. V omvoridis, and L. Gillam, “Distributing som ensemble training using grid middleware,” in2007 International Joint Conference on Neural Networks, 2007, pp. 2712–2717

  13. [13]

    Parallel self-organizing maps with application in clustering distributed data,

    F. L. Gorgonio and J. A. F. Costa, “Parallel self-organizing maps with application in clustering distributed data,” in2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Com- putational Intelligence), 2008, pp. 3276–3283

  14. [14]

    Combining parallel self-organizing maps and k-means to cluster distributed data,

    F. Gorgonio and J. Costa, “Combining parallel self-organizing maps and k-means to cluster distributed data,” in2008 11th IEEE International Conference on Computational Science and Engineering - Workshops, 2008, pp. 53–58

  15. [15]

    Apache spark based distributed self-organizing map algorithm for sensor data analy- sis,

    M. Jayaratne, D. Alahakoon, D. De Silva, and X. Yu, “Apache spark based distributed self-organizing map algorithm for sensor data analy- sis,” inIECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society, 2017, pp. 8343–8349

  16. [16]

    Unsupervised skill transfer learning for autonomous robots using distributed growing self organizing maps,

    M. Jayaratne, D. Alahakoon, and D. de Silva, “Unsupervised skill transfer learning for autonomous robots using distributed growing self organizing maps,”Robotics and Autonomous Systems, vol. 144, p. 103835, 2021. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0921889021001202

  17. [17]

    A parallel implementation of a growing som promoting independent neural networks over distributed input space,

    J. Hammond, D. MacClean, and I. Valova, “A parallel implementation of a growing som promoting independent neural networks over distributed input space,” inThe 2006 IEEE International Joint Conference on Neural Network Proceedings, 2006, pp. 958–965

  18. [18]

    A parallel general implementation of kohonen’s self-organizing map algorithm: performance and scalability,

    P. Ozdzynski, A. Lin, M. Liljeholm, and J. Beatty, “A parallel general implementation of kohonen’s self-organizing map algorithm: performance and scalability,”Neurocomputing, vol. 44-46, pp. 567– 571, 2002, computational Neuroscience Trends in Research 2002. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0925231202004277

  19. [19]

    parsom: a parallel implementa- tion of the self-organizing map exploiting cache effects: making the som fit for interactive high-performance data analysis,

    A. Rauber, P. Tomsich, and D. Merkl, “parsom: a parallel implementa- tion of the self-organizing map exploiting cache effects: making the som fit for interactive high-performance data analysis,” inProceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New M...

  20. [20]

    Training a self-organizing map distributed on a pvm network,

    N. Bandeira, V . Lobo, and F. Moura-Pires, “Training a self-organizing map distributed on a pvm network,” in1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227), vol. 1, 1998, pp. 457–461 vol.1

  21. [21]

    Parallel design and implementation of som neural computing model in pvm environment of a distributed system,

    H. Guan, C. kwong Li, T. yat Cheung, and S. Yu, “Parallel design and implementation of som neural computing model in pvm environment of a distributed system,” inProceedings. Advances in Parallel and Distributed Computing, 1997, pp. 26–31

  22. [22]

    Parallel self-organizing maps for actual applications,

    G. Myklebust and J. Solheim, “Parallel self-organizing maps for actual applications,” inProceedings of ICNN’95 - International Conference on Neural Networks, vol. 2, 1995, pp. 1054–1059 vol.2

  23. [23]

    Mapping of som and lvq algorithms on a tree shape parallel computer system,

    T. H ¨am¨al¨ainen, H. Klapuri, J. Saarinen, and K. Kaski, “Mapping of som and lvq algorithms on a tree shape parallel computer system,”Parallel Computing, vol. 23, no. 3, pp. 271–289, 1997. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167819197000203

  24. [24]

    Scalable parallel som learning for web user profiles,

    L. V oj ´a ˇCek, J. Dvorsk ´y, K. Slaninov ´a, and J. Martinovi ˇC, “Scalable parallel som learning for web user profiles,” in2013 13th International Conference on Intellient Systems Design and Applications, 2013, pp. 283–288

  25. [25]

    An efficient parallel algorithm for lissom neural network,

    L.-C. Chang and F.-J. Chang, “An efficient parallel algorithm for lissom neural network,”Parallel Computing, vol. 28, no. 11, pp. 1611–1633, 2002. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0167819102001667

  26. [26]

    Self-organizing maps neural networks on parallel cluster,

    L. Zhu, W. Guo, and Y . Bai, “Self-organizing maps neural networks on parallel cluster,” in2009 First International Conference on Information Science and Engineering, 2009, pp. 384–388

  27. [27]

    Design and implementation of parallel som model on gpgpu,

    S. Q. Khan and M. A. Ismail, “Design and implementation of parallel som model on gpgpu,” in2013 5th International Conference on Com- puter Science and Information Technology, 2013, pp. 233–237

  28. [28]

    Parallel high dimensional self organizing maps using cuda,

    F. C. Moraes, S. C. Botelho, N. D. Filho, and J. F. O. Gaya, “Parallel high dimensional self organizing maps using cuda,” in2012 Brazilian Robotics Symposium and Latin American Robotics Symposium, 2012, pp. 302–306

  29. [29]

    A motion trajectory based video retrieval system using parallel adaptive self organizing maps,

    W. Qu, F. Bashir, D. Graupe, A. Khokhar, and D. Schonfeld, “A motion trajectory based video retrieval system using parallel adaptive self organizing maps,” inProceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., vol. 3, 2005, pp. 1800–1805 vol. 3

  30. [30]

    Massively parallel cellular matrix model for self-organizing map applications,

    H. Wang, A. Mansouri, and J.-C. Cr ´eput, “Massively parallel cellular matrix model for self-organizing map applications,” in2015 IEEE International Conference on Electronics, Circuits, and Systems (ICECS), 2015, pp. 584–587

  31. [31]

    Extending parallelization of the self-organizing map by combining data and network partitioned methods,

    T. Richardson and E. Winer, “Extending parallelization of the self-organizing map by combining data and network partitioned methods,”Advances in Engineering Software, vol. 88, pp. 1–7, 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0965997815000769

  32. [32]

    A data partition method for parallel self- organizing map,

    M.-H. Yang and N. Ahuja, “A data partition method for parallel self- organizing map,” inIJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), vol. 3, 1999, pp. 1929– 1933 vol.3

  33. [33]

    Explainable intrusion detection systems using competitive learning techniques,

    J. Ables, T. Kirby, S. Mittal, I. Banicescu, S. Rahimi, W. Anderson, and M. Seale, “Explainable intrusion detection systems using competitive learning techniques,”arXiv preprint arXiv:2303.17387, 2023

  34. [34]

    A detailed analysis of the kdd cup 99 data set,

    M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the kdd cup 99 data set,” in2009 IEEE symposium on computational intelligence for security and defense applications. Ieee, 2009, pp. 1–6

  35. [35]

    Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set),

    N. Moustafa and J. Slay, “Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set),” in2015 military communications and information systems conference (MilCIS). IEEE, 2015, pp. 1–6

  36. [36]

    Toward generating a new intrusion detection dataset and intrusion traffic characterization

    I. Sharafaldin, A. H. Lashkari, A. A. Ghorbaniet al., “Toward generating a new intrusion detection dataset and intrusion traffic characterization.” ICISSp, vol. 1, pp. 108–116, 2018

  37. [37]

    A new distributed architecture for evaluating ai-based security systems at the edge: network ton iot datasets. sustain. cities soc. 72, 102994 (2021),

    N. Moustafa, “A new distributed architecture for evaluating ai-based security systems at the edge: network ton iot datasets. sustain. cities soc. 72, 102994 (2021),” 2021