FloatSOM: GPU-Accelerated, Distributed, Topology-Flexible Self-Organizing Maps

Anne Brustle; Felix Marsh-Wakefield; Givanna Putri; Katherine Turner; Sarah Klamt; Tony Xu

arxiv: 2604.26555 · v1 · submitted 2026-04-29 · 💻 cs.DC · cs.LG

FloatSOM: GPU-Accelerated, Distributed, Topology-Flexible Self-Organizing Maps

Tony Xu , Sarah Klamt , Katherine Turner , Anne Brustle , Felix Marsh-Wakefield , Givanna Putri This is my paper

Pith reviewed 2026-05-07 12:52 UTC · model grok-4.3

classification 💻 cs.DC cs.LG

keywords self-organizing mapsGPU accelerationdistributed computingtopology optimizationquantization errorlarge-scale unsupervised learningout-of-memory streaming

0 comments

The pith

FloatSOM trains flexible-topology self-organizing maps on billion-sample datasets with lower quantization error than standard baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FloatSOM as a framework that adds multi-GPU support, disk-backed streaming for datasets too large for device memory, and topologies that go beyond fixed lattices to self-organizing map training. It pairs these capabilities with hyperparameter tuning that takes the chosen topology into account. On fourteen synthetic and real datasets the combination produces lower quantization error than existing SOM implementations while sustaining high throughput when the workload is spread across multiple GPUs and nodes. If the results hold, large-scale unsupervised mapping tasks become feasible without forcing users to simplify the map structure or downsample the data.

Core claim

FloatSOM supports multi-GPU execution, out-of-memory disk-backed streaming, and novel topologies beyond regular lattices for self-organizing maps. When these features are combined with topology-aware hyperparameter fine-tuning, the resulting maps achieve lower quantization error than current state-of-the-art SOM baselines while scaling to very large problems, including a 1024-node network trained on one billion samples with fifty features in 6.16 minutes on eight GPUs across two HPC nodes.

What carries the argument

The FloatSOM framework enabling topology-flexible SOM training together with distributed GPU execution and out-of-memory streaming.

If this is right

SOM-based clustering and visualization can be applied to datasets that previously exceeded single-device memory.
Map quality improves without forcing users to restrict themselves to rectangular or hexagonal grids.
Training times for high-volume, high-dimensional data drop enough to allow routine use on modest HPC allocations.
The same distributed infrastructure can be reused for repeated runs with different topologies or parameter settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar memory-streaming and topology-flexibility ideas could be ported to other prototype-based or graph-based unsupervised learners.
The reported scaling numbers suggest that interactive exploration of billion-point maps may become practical once the framework is wrapped in a higher-level interface.
If the topology-tuning step proves cheap, it could be embedded inside automated model-selection loops for larger pipelines.

Load-bearing premise

That the fourteen chosen benchmark datasets and the single quantization-error metric are representative enough to establish general superiority over prior SOM methods.

What would settle it

A new dataset or topology where FloatSOM's quantization error is not lower than a standard lattice baseline, or a scaling test on more than eight GPUs that fails to maintain the reported throughput.

Figures

Figures reproduced from arXiv: 2604.26555 by Anne Brustle, Felix Marsh-Wakefield, Givanna Putri, Katherine Turner, Sarah Klamt, Tony Xu.

**Figure 1.** Figure 1: Schematic overview of the FloatSOM methods framing used in this manuscript. Sample selection and topology definition have configurable components that feed into the standard SOM training step, while the compute architecture determines how that same training procedure is executed in practice. The options shown here summarize the FloatSOM configurations discussed in the following subsections. 3.1 Sampling Se… view at source ↗

**Figure 2.** Figure 2: Multi-GPU data loading and NCCL synchronization schematic. In disk-backed mode, data are sharded to worker-local storage, loaded chunk-wise into pinned host memory, transferred to GPU with transfer overlapped with compute, processed locally, and synchronized by NCCL all-reduce before one weight update per iteration. In RAM mode, data are instead pre-sharded into worker-local CPU RAM and follow the same pin… view at source ↗

**Figure 3.** Figure 3: Full versus HDSSSOM on Pilot Dataset blobs circles moons s_curve swiss_roll breast_cancer wine iris digits olivetti_faces GLOBAL_OVERALL A Balanced QE B Holdout QE C Train QE QE change (%) view at source ↗

**Figure 4.** Figure 4: Full vs Random Outcomes and Dataset-Size Regression (Hexagonal) view at source ↗

**Figure 5.** Figure 5: Representative Topology on sklearn Circles view at source ↗

**Figure 6.** Figure 6: Hexagonal vs MST Across QE Metrics blobs circles moons s_curve swiss_roll breast_cancer wine iris digits olivetti_faces diabetes california_housing covertype kddcup99 GLOBAL_OVERALL A Balanced QE B Holdout QE C Train QE QE change (%) view at source ↗

**Figure 7.** Figure 7: Hexagonal vs RNG Across QE Metrics blobs circles moons s_curve swiss_roll breast_cancer wine iris digits olivetti_faces diabetes california_housing covertype kddcup99 GLOBAL_OVERALL A Balanced QE B Holdout QE C Train QE QE change (%) view at source ↗

**Figure 8.** Figure 8: Tuned vs Default Across QE Metrics blobs circles moons s_curve swiss_roll breast_cancer wine iris digits olivetti_faces diabetes california_housing covertype kddcup99 GLOBAL_OVERALL A Balanced QE B Holdout QE C Train QE QE change (%) view at source ↗

**Figure 9.** Figure 9: Hyperparameter Stability A Full sampling B Random sampling C Random Sampling QE by Sample Size Overall Stability Score view at source ↗

**Figure 10.** Figure 10: Sampling Runtime A 1 GPU B 2 GPUs C 4 GPUs view at source ↗

**Figure 11.** Figure 11: Multi-GPU Full-Batch Scaling (RNG) A Dimension Scaling B Sample Scaling C Grid-Size Scaling D Dimension Scaling (Efficiency) E Sample Scaling (Efficiency) F Grid-Size Scaling (Efficiency) view at source ↗

**Figure 12.** Figure 12: Topology Runtime Comparison (8 GPUs, Full Batch) view at source ↗

**Figure 13.** Figure 13: XPySOM (Default) v FloatSOM RNG A Balanced QE B Holdout QE C Train QE D Scaling Runtime (4 GPUs) blobs circles moons s_curve swiss_roll breast_cancer wine iris digits olivetti_faces diabetes california_housing covertype kddcup99 GLOBAL_OVERALL FloatSOM - Full FloatSOM - Random XPySOM - Default view at source ↗

read the original abstract

GPU-accelerated Self-Organizing Map (SOM) implementations are among the most competitive options for large-scale SOM analysis, but growing dataset sizes increasingly challenge their practical use because workloads no longer fit cleanly within device-memory limits. We introduce FloatSOM, a SOM framework for scalable training and deployment that supports multi-GPU execution, out-of-memory disk-backed streaming, and novel topologies beyond regular lattices. We evaluate FloatSOM on 14 synthetic and real benchmark datasets together with controlled speed scaling benchmarks, and show that these improved topologies, combined with topology-aware hyperparameter fine-tuning, yield lower quantization error than current state-of-the-art SOM baselines. FloatSOM also sustains this performance at large scale with high-throughput distributed execution; in the largest benchmark, it trains a 1024-node SOM network on 1,000,000,000 samples with 50 features in 6.16 minutes on 8 GPUs across two separate high-performance-computing nodes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FloatSOM is a practical engineering package that combines multi-GPU distribution, out-of-core streaming, and non-lattice topologies for SOMs, with a concrete large-scale timing result, but the lower quantization error claims rest on unverified parity in hyperparameter effort for the baselines.

read the letter

The main thing here is that FloatSOM packages multi-GPU training, disk streaming for data larger than memory, and support for topologies beyond regular grids into one open framework, then shows it can train a 1024-node map on a billion 50-feature samples in roughly six minutes on eight GPUs spread across two nodes. That scaling number is specific and directly useful for anyone already running SOMs on HPC hardware who has hit device-memory limits before.

Referee Report

1 major / 2 minor

Summary. The paper introduces FloatSOM, a GPU-accelerated SOM framework supporting multi-GPU distributed execution, out-of-memory disk-backed streaming, and novel topologies beyond regular lattices. It evaluates the system on 14 synthetic and real benchmarks, claiming that improved topologies combined with topology-aware hyperparameter fine-tuning produce lower quantization error than current state-of-the-art SOM baselines, while also demonstrating scalability via a 1024-node SOM trained on 1 billion samples (50 features) in 6.16 minutes using 8 GPUs across two HPC nodes.

Significance. If the performance gains hold under matched experimental conditions, the engineering contributions in distributed execution and topology flexibility would offer practical value for large-scale SOM applications in clustering and visualization. The concrete large-scale timing result and multi-GPU support address real deployment constraints, though the work remains an implementation and benchmark study without new theoretical derivations.

major comments (1)

Abstract and evaluation sections: The claim that 'these improved topologies, combined with topology-aware hyperparameter fine-tuning, yield lower quantization error than current state-of-the-art SOM baselines' is load-bearing for the central contribution, yet the manuscript provides no explicit confirmation that baseline implementations received a matched hyperparameter search budget or identical training protocol. Without this, the reported QE reductions cannot be confidently attributed to topology rather than unequal optimization effort, as noted in the stress-test concern.

minor comments (2)

Abstract: The reported lower quantization error lacks accompanying statistical significance tests, error bars, or details on exact baseline implementations and data-exclusion rules, limiting assessment of robustness.
Results: Reproducibility would benefit from explicit description of the topology-aware tuning procedure and the precise configurations used for all compared methods.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful review and constructive feedback on ensuring fair comparisons. We address the major comment below and will revise the manuscript to provide the requested clarifications.

read point-by-point responses

Referee: Abstract and evaluation sections: The claim that 'these improved topologies, combined with topology-aware hyperparameter fine-tuning, yield lower quantization error than current state-of-the-art SOM baselines' is load-bearing for the central contribution, yet the manuscript provides no explicit confirmation that baseline implementations received a matched hyperparameter search budget or identical training protocol. Without this, the reported QE reductions cannot be confidently attributed to topology rather than unequal optimization effort, as noted in the stress-test concern.

Authors: We agree that explicit confirmation of matched hyperparameter budgets and training protocols is essential to attribute performance differences to the topologies. In the original evaluation, we applied a uniform grid-search procedure over the same hyperparameter ranges (learning rate, neighborhood radius, epochs) to all methods including the baselines, with topology-aware adjustments applied only to FloatSOM as described in Section 4. To eliminate any ambiguity, we will add a dedicated paragraph in the evaluation section (and update the abstract if space allows) that explicitly states the shared search budget, identical training protocol, and baseline-specific settings used. This revision will include a summary table of the search spaces for transparency. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical implementation and benchmark study with no load-bearing derivations

full rationale

The paper is an engineering contribution describing FloatSOM's implementation for GPU/distributed SOM training with novel topologies. Its central claims rest on empirical benchmarks across 14 datasets showing lower quantization error and scaling performance, not on any mathematical derivation, prediction, or uniqueness theorem. No equations or results are shown to reduce by construction to fitted inputs, self-citations, or ansatzes imported from prior author work. The evaluation protocol and topology-aware tuning are presented as design choices whose validity is tested externally via direct comparison to baselines; any concerns about unequal hyperparameter effort fall under experimental fairness rather than circularity. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a systems and benchmarking paper; the central claims rest on engineering choices and empirical comparisons rather than mathematical axioms or new postulated entities.

pith-pipeline@v0.9.0 · 5486 in / 1216 out tokens · 37860 ms · 2026-05-07T12:52:28.466740+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Damminda Alahakoon, Saman Halgamuge, and Srinivasan Bala

doi: 10.1016/j.ins.2015.10.013. Damminda Alahakoon, Saman Halgamuge, and Srinivasan Bala. Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Transactions on Neural Networks, 11(3), 601- 614.Neural Networks, IEEE Transactions on, 11:601–614, June 2000. doi: 10.1109/72.846732. Florent Forest, Mustapha Lebbah, Hanane Azzag, and...

work page doi:10.1016/j.ins.2015.10.013 2015
[2]

ect. 28 Supplementary Figure S2. FloatSOM versus XPySOM calibration under default settings for the RNG topology path. Panels A-C report pairedQEe

ISSN 1573-773X. doi: 10.1007/s11063-004-7775-6. Denis White and A. Ross Kiester. Topology matters: Network topology a"ects outcomes from community ecology neutral models.Computers, Environment and Urban Systems, 32(2):165– 171, March 2008. ISSN 0198-9715. doi: 10.1016/j.compenvurbsys.2007.11.002. Peter Wittek, Shi Chao Gao, Ik Soo Lim, and Li Zhao. Somocl...

work page doi:10.1007/s11063-004-7775-6 2008

[1] [1]

Damminda Alahakoon, Saman Halgamuge, and Srinivasan Bala

doi: 10.1016/j.ins.2015.10.013. Damminda Alahakoon, Saman Halgamuge, and Srinivasan Bala. Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Transactions on Neural Networks, 11(3), 601- 614.Neural Networks, IEEE Transactions on, 11:601–614, June 2000. doi: 10.1109/72.846732. Florent Forest, Mustapha Lebbah, Hanane Azzag, and...

work page doi:10.1016/j.ins.2015.10.013 2015

[2] [2]

ect. 28 Supplementary Figure S2. FloatSOM versus XPySOM calibration under default settings for the RNG topology path. Panels A-C report pairedQEe

ISSN 1573-773X. doi: 10.1007/s11063-004-7775-6. Denis White and A. Ross Kiester. Topology matters: Network topology a"ects outcomes from community ecology neutral models.Computers, Environment and Urban Systems, 32(2):165– 171, March 2008. ISSN 0198-9715. doi: 10.1016/j.compenvurbsys.2007.11.002. Peter Wittek, Shi Chao Gao, Ik Soo Lim, and Li Zhao. Somocl...

work page doi:10.1007/s11063-004-7775-6 2008