pith. sign in

arxiv: 2305.01507 · v3 · submitted 2023-05-01 · 💻 cs.NE · cs.LG

A Parameter-free Adaptive Resonance Theory-based Topological Clustering Algorithm Capable of Continual Learning

Pith reviewed 2026-05-24 08:33 UTC · model grok-4.3

classification 💻 cs.NE cs.LG
keywords adaptive resonance theorytopological clusteringparameter-free algorithmcontinual learningdeterminantal point processedge deletionself-organizing clustering
0
0 comments X

The pith

An ART-based topological clustering algorithm estimates its own similarity and edge deletion thresholds to achieve strong performance without dataset-specific tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a clustering method based on Adaptive Resonance Theory that removes the need for users to manually set a vigilance parameter controlling node learning and an edge deletion threshold that shapes cluster separation. The similarity threshold is set by a determinantal point process criterion while the edge deletion threshold uses the age of connections. This design lets the algorithm form well-separated clusters during self-organization and adapt to new data over time. Tests on synthetic and real-world datasets indicate it outperforms other clustering algorithms that still depend on per-dataset parameter choices.

Core claim

The proposed algorithm integrates a determinantal point process-based criterion to estimate the similarity threshold and an age-based rule to set the edge deletion threshold within an ART topological clustering framework, yielding superior clustering performance on synthetic and real-world datasets without requiring parameter specifications specific to the datasets.

What carries the argument

The integration of a determinantal point process criterion for estimating the vigilance parameter together with an age-based edge deletion rule inside the ART-based topological clustering process.

If this is right

  • Clustering can proceed on fresh data streams without repeated parameter searches.
  • The topological map adapts cluster boundaries through ongoing edge management.
  • Performance remains competitive with tuned algorithms while removing expert intervention for threshold selection.
  • The method supports continual learning by incorporating new samples without full retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Non-experts could apply clustering more readily in domains where parameter tuning has been a barrier.
  • The internal estimation approach might transfer to other ART variants or self-organizing map algorithms.
  • Fully autonomous clustering pipelines become feasible if the estimation rules prove stable across broader domains.

Load-bearing premise

The determinantal point process criterion and age-based edge deletion rule produce thresholds that remain appropriate across unseen datasets without any hidden dataset-specific adjustments.

What would settle it

On a new dataset the algorithm fails to match or exceed the performance of competing methods unless the user manually adjusts the similarity or edge thresholds.

Figures

Figures reproduced from arXiv: 2305.01507 by Chu Kiong Loo, Hisao Ishibuchi, Naoki Masuyama, Stefan Wermter, Takanori Takebayashi, Yusuke Nojima.

Figure 1
Figure 1. Figure 1: Two-dimensional synthetic dataset. (a) Stream #1 (b) Stream #2 (c) Stream #3 (d) Stream #4 (e) Stream #5 (f) Stream #6 [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of the synthetic dataset in sequential [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of self-organizing results in the stationary environment. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of self-organizing results of AutoCloud in the non-stationary environment. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visualization of self-organizing results of ASOINN in the non-stationary environment. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of self-organizing results of SOINN+ in the non-stationary environment. [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of self-organizing results of TCA in the non-stationary environment. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of self-organizing results of CAEA in the non-stationary environment. [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of self-organizing results of CAE in the non-stationary environment. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Critical difference diagram based on the overall [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Critical difference diagram based on results of NMI [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Critical difference diagram based on results of NMI [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Relationships among the number of active nodes, the number of clusters in CAE, and NMI in the stationary [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Relationships among the number of active nodes, the number of clusters in CAE, and NMI in the non-stationary [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗
read the original abstract

In general, a similarity threshold (i.e., a vigilance parameter) for a node learning process in Adaptive Resonance Theory (ART)-based algorithms has a significant impact on clustering performance. In addition, an edge deletion threshold in a topological clustering algorithm plays an important role in adaptively generating well-separated clusters during a self-organizing process. In this paper, we propose an ART-based topological clustering algorithm that integrates parameter estimation methods for both the similarity threshold and the edge deletion threshold. The similarity threshold is estimated using a determinantal point process-based criterion, while the edge deletion threshold is defined based on the age of edges. Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance to state-of-the-art clustering algorithms without requiring parameter specifications specific to the datasets. Source code is available at https://github.com/Masuyama-lab/CAE

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CAE, an ART-based topological clustering algorithm that estimates the similarity (vigilance) threshold via a determinantal point process (DPP) criterion and the edge deletion threshold via an age-based rule. It claims to be fully parameter-free, capable of continual learning, and to achieve superior clustering performance on synthetic and real-world datasets relative to state-of-the-art methods without any dataset-specific parameter specifications. Source code is provided.

Significance. If the parameter-free property and performance gains prove robust, the work would meaningfully advance practical use of ART and topological clustering methods by removing the common requirement for manual vigilance and edge-age tuning. The open-source release supports reproducibility, which is a clear strength.

major comments (2)
  1. [§4] §4 (Experimental Results): The superiority claims rest on comparisons whose evaluation protocol is not fully specified (number of independent runs, error bars or statistical tests, and whether baseline parameters were set by the same DPP/age rules or by oracle tuning). This detail is load-bearing for the central claim that the method outperforms SOTA without dataset-specific adjustments.
  2. [§3.2] §3.2 (DPP criterion): The vigilance threshold is computed directly from the input data via the DPP kernel and sampling procedure. The manuscript must demonstrate that this estimation rule produces appropriate values on held-out distributions without hidden adjustments; otherwise the 'parameter-free' and generalization claims reduce to data-dependent fitting on the reported sets.
minor comments (2)
  1. [§3.2] Notation for the DPP kernel matrix and the exact sampling procedure should be expanded with a short pseudocode block for clarity.
  2. Figure captions for the continual-learning experiments should explicitly state the sequence of data distributions presented to the algorithm.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [§4] §4 (Experimental Results): The superiority claims rest on comparisons whose evaluation protocol is not fully specified (number of independent runs, error bars or statistical tests, and whether baseline parameters were set by the same DPP/age rules or by oracle tuning). This detail is load-bearing for the central claim that the method outperforms SOTA without dataset-specific adjustments.

    Authors: We agree that the evaluation protocol was insufficiently detailed. In the revised manuscript we will specify that all results are averaged over 10 independent runs with different random seeds, include standard-deviation error bars in the tables, and report statistical significance via the Wilcoxon signed-rank test. Baseline parameters were taken from the values recommended in each method’s original publication or from standard defaults; they were not tuned with our DPP or age-based rules. These clarifications will be added to §4. revision: yes

  2. Referee: [§3.2] §3.2 (DPP criterion): The vigilance threshold is computed directly from the input data via the DPP kernel and sampling procedure. The manuscript must demonstrate that this estimation rule produces appropriate values on held-out distributions without hidden adjustments; otherwise the 'parameter-free' and generalization claims reduce to data-dependent fitting on the reported sets.

    Authors: The DPP criterion is deliberately data-driven: it constructs the kernel matrix and performs sampling solely from the observed input points, introducing no additional tunable parameters or hidden adjustments. This design directly supports the parameter-free claim for continual-learning scenarios. While we did not conduct separate held-out experiments that isolate only the threshold estimator, the consistent performance across synthetic and real-world datasets already provides empirical support for its robustness. We will add a clarifying paragraph in §3.2 that reiterates the absence of hidden adjustments and notes the data-driven nature of the rule. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external data-driven estimation and empirical validation

full rationale

The provided sections describe estimation of the vigilance threshold via a determinantal point process criterion and edge deletion via age rules, followed by experimental comparisons on synthetic and real-world datasets against SOTA methods. No equations or text in the abstract or description reduce the performance claims to the inputs by construction (e.g., no fitted parameter renamed as prediction, no self-citation load-bearing the uniqueness of the DPP or age rules, no ansatz smuggled via prior self-work). The method is explicitly data-dependent for threshold setting, but this is presented as the mechanism for being parameter-free rather than a circular derivation. Claims rest on reported experimental outcomes, which are falsifiable against external benchmarks and do not exhibit the enumerated circular patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the chosen estimation procedures yield generally valid thresholds; no explicit free parameters are introduced because the method is presented as parameter-free, but the DPP criterion itself encodes modeling choices about point diversity.

axioms (2)
  • domain assumption A determinantal point process criterion supplies an appropriate similarity threshold for ART node learning across datasets.
    This is the load-bearing step that replaces manual vigilance parameter selection.
  • domain assumption Edge age provides a sufficient signal for deciding when to delete connections in the topological graph.
    This replaces a manual edge deletion threshold.

pith-pipeline@v0.9.0 · 5702 in / 1274 out tokens · 19941 ms · 2026-05-24T08:33:51.475611+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 2 internal anchors

  1. [1]

    Least squares quantization in PCM,

    S. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982

  2. [2]

    Finite mixture models,

    G. J. McLachlan, S. X. Lee, and S. I. Rathnayake, “Finite mixture models,” Annual Review of Statistics and its Application , vol. 6, pp. 355–378, 2019

  3. [3]

    Self-organized formation of topologically correct feature maps,

    T. Kohonen, “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, vol. 43, no. 1, pp. 59–69, 1982

  4. [4]

    A growing neural gas network learns topologies,

    B. Fritzke, “A growing neural gas network learns topologies,” Advances in Neural Information Processing Systems , vol. 7, pp. 625– 632, 1995

  5. [5]

    A fast nearest neighbor classifier based on self-organizing incremental neural network,

    F. Shen and O. Hasegawa, “A fast nearest neighbor classifier based on self-organizing incremental neural network,” Neural Networks, vol. 21, no. 10, pp. 1537–1547, 2008

  6. [6]

    Episodic memory multimodal learning for robot sensorimotor map building and navigation,

    W. H. Chin, Y. Toda, N. Kubota, C. K. Loo, and M. Seera, “Episodic memory multimodal learning for robot sensorimotor map building and navigation,” IEEE Transactions on Cognitive and Developmental Systems, vol. 11, no. 2, pp. 210–220, June 2018

  7. [7]

    Lifelong learning of spatiotemporal representations with dual-memory recurrent self- organization,

    G. I. Parisi, J. Tani, C. Weber, and S. Wermter, “Lifelong learning of spatiotemporal representations with dual-memory recurrent self- organization,” Frontiers in Neurorobotics , vol. 12, # 78, November 2018

  8. [8]

    The ART of adaptive pat- tern recognition by a self-organizing neural network,

    G. A. Carpenter and S. Grossberg, “The ART of adaptive pat- tern recognition by a self-organizing neural network,” Computer, vol. 21, no. 3, pp. 77–88, 1988

  9. [9]

    Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system,

    G. A. Carpenter, S. Grossberg, and D. B. Rosen, “Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system,” Neural Networks , vol. 4, no. 6, pp. 759–771, 1991

  10. [10]

    Grossberg, Conscious Mind, Resonant Brain: How Each Brain Makes a Mind

    S. Grossberg, Conscious Mind, Resonant Brain: How Each Brain Makes a Mind . Oxford University Press, June 2021. [Online]. Available: https://doi.org/10.1093/oso/9780190070557.001.0001

  11. [11]

    Topological clustering via adaptive resonance theory with information theoretic learning,

    N. Masuyama, C. K. Loo, H. Ishibuchi, N. Kubota, Y. Nojima, and Y. Liu, “Topological clustering via adaptive resonance theory with information theoretic learning,” IEEE Access , vol. 7, pp. 76 920– 76 936, 2019

  12. [12]

    Adaptive resonance theory-based topological clustering with a divisive hierarchical structure capable of continual learning,

    N. Masuyama, N. Amako, Y. Yamada, Y. Nojima, and H. Ishibuchi, “Adaptive resonance theory-based topological clustering with a divisive hierarchical structure capable of continual learning,” arXiv preprint arXiv:2201.10713, 2022

  13. [13]

    Correntropy: Properties and applications in non-Gaussian signal processing,

    W. Liu, P . P . Pokharel, and J. C. Pr´ıncipe, “Correntropy: Properties and applications in non-Gaussian signal processing,” IEEE Trans- actions on Signal Processing, vol. 55, no. 11, pp. 5286–5298, 2007

  14. [14]

    The Bayesian ARTMAP,

    B. Vigdor and B. Lerner, “The Bayesian ARTMAP,” IEEE Transac- tions on Neural Networks, vol. 18, no. 6, pp. 1628–1644, 2007

  15. [15]

    Incremental local distribution-based clustering using Bayesian adaptive resonance theory,

    L. Wang, H. Zhu, J. Meng, and W. He, “Incremental local distribution-based clustering using Bayesian adaptive resonance theory,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3496–3504, 2019

  16. [16]

    Multi-label classification via adaptive resonance theory-based clustering,

    N. Masuyama, Y. Nojima, C. K. Loo, and H. Ishibuchi, “Multi-label classification via adaptive resonance theory-based clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence , pp. 1–18, 2022

  17. [17]

    Determinantal point processes for machine learning,

    A. Kulesza and B. Taskar, “Determinantal point processes for machine learning,” Foundations and Trends® in Machine Learning , vol. 5, no. 2–3, pp. 123–286, 2012. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 16

  18. [18]

    Effective diversity in population based reinforcement learning,

    J. Parker-Holder, A. Pacchiano, K. Choromanski, and S. Roberts, “Effective diversity in population based reinforcement learning,” in Proceedings of the 34th International Conference on Neural Informa- tion Processing Systems , ser. NIPS’20, no. 1515. Red Hook, NY, USA: Curran Associates Inc., December 2020, pp. 18 050–18 062

  19. [19]

    SOINN+, a self-organizing incremental neural network for unsupervised learning from noisy data streams,

    C. Wiwatcharakoses and D. Berrar, “SOINN+, a self-organizing incremental neural network for unsupervised learning from noisy data streams,” Expert Systems with Applications, vol. 143, p. 113069, 2020

  20. [20]

    Three scenarios for continual learning

    G. M. Van de Ven and A. S. Tolias, “Three scenarios for continual learning,” arXiv preprint arXiv:1904.07734, 2019

  21. [21]

    Localizing Catastrophic Forgetting in Neural Networks

    F. Wiewel and B. Yang, “Localizing catastrophic forgetting in neural networks,” arXiv preprint arXiv:1906.02568, 2019

  22. [22]

    A self-organising network that grows when required,

    S. Marsland, J. Shapiro, and U. Nehmzow, “A self-organising network that grows when required,” Neural Networks , vol. 15, no. 8, pp. 1041–1058, 2002

  23. [23]

    Dis- tributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence,

    L. E. B. da Silva, I. Elnabarawy, and D. C. Wunsch II, “Dis- tributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence,” Neural Networks, vol. 121, pp. 208–228, 2020

  24. [24]

    Kernel Bayesian ART and ARTMAP,

    N. Masuyama, C. K. Loo, and F. Dawood, “Kernel Bayesian ART and ARTMAP,” Neural Networks, vol. 98, pp. 76–86, 2018

  25. [25]

    A kernel Bayesian adaptive resonance theory with a topological structure,

    N. Masuyama, C. K. Loo, and S. Wermter, “A kernel Bayesian adaptive resonance theory with a topological structure,” Interna- tional Journal of Neural Systems, vol. 29, no. 5, p. 1850052 (20 pages), 2019

  26. [26]

    Fast topological adaptive resonance theory based on correntropy induced metric,

    N. Masuyama, N. Amako, Y. Nojima, Y. Liu, C. K. Loo, and H. Ishibuchi, “Fast topological adaptive resonance theory based on correntropy induced metric,” in Proceedings of IEEE Symposium Series on Computational Intelligence, 2019, pp. 2215–2221

  27. [27]

    Dual vigilance fuzzy adaptive resonance theory,

    L. E. B. da Silva, I. Elnabarawy, and D. C. Wunsch II, “Dual vigilance fuzzy adaptive resonance theory,” Neural Networks, vol. 109, pp. 1–5, January 2019

  28. [28]

    Validity index-based vigilance test in adaptive resonance theory neural networks,

    L. E. B. da Silva and D. C. Wunsch, “Validity index-based vigilance test in adaptive resonance theory neural networks,” in in Procced- ings of IEEE Symposium Series on Computational Intelligence . IEEE, 2017, pp. 1–8

  29. [29]

    Adaptive scaling of cluster boundaries for large-scale social media data clustering,

    L. Meng, A.-H. Tan, and D. C. Wunsch, “Adaptive scaling of cluster boundaries for large-scale social media data clustering,” IEEE Transactions on Nneural Networks and Learning Systems, vol. 27, no. 12, pp. 2656–2669, December 2015

  30. [30]

    Segnet: A deep convolutional encoder-decoder architecture for image segmenta- tion,

    V . Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmenta- tion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, January 2017

  31. [31]

    Semantic object accuracy for generative text-to-image synthesis,

    T. Hinz, S. Heinrich, and S. Wermter, “Semantic object accuracy for generative text-to-image synthesis,” IEEE Transactions on Ppattern Analysis and Machine Intelligence , vol. 44, no. 3, pp. 1552–1565, September 2020

  32. [32]

    A survey on vision transformer,

    K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., “A survey on vision transformer,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 1, pp. 87–110, February 2022

  33. [33]

    A comprehensive study of class incremental learning algorithms for visual tasks,

    E. Belouadah, A. Popescu, and I. Kanellos, “A comprehensive study of class incremental learning algorithms for visual tasks,” Neural Networks, vol. 135, pp. 38–54, March 2021

  34. [34]

    Continual learning through synaptic intelligence,

    F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” in Proceedings of International Conference on Machine Learning, 2017, pp. 3987–3995

  35. [35]

    Continual learning with deep generative replay,

    H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 2994– 3003

  36. [36]

    Variational continual learning,

    C. V . Nguyen, Y. Li, T. D. Bui, and R. E. Turner, “Variational continual learning,” in Proceedings of International Conference on Learning Representations, 2018, pp. 1–18

  37. [37]

    Overcoming catastrophic forgetting in neural networks,

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al. , “Overcoming catastrophic forgetting in neural networks,” Proceedings of the National Academy of Sciences , vol. 114, no. 13, pp. 3521–3526, March 2017

  38. [38]

    An open-ended continual learning for food recognition using class incremental extreme learning machines,

    G. A. Tahir and C. K. Loo, “An open-ended continual learning for food recognition using class incremental extreme learning machines,” IEEE Access, vol. 8, pp. 82 328–82 346, May 2020

  39. [39]

    An incremental kernel extreme learning machine for multi-label learning with emerging new labels,

    Y. Kongsorot, P . Horata, and P . Musikawan, “An incremental kernel extreme learning machine for multi-label learning with emerging new labels,” IEEE Access, vol. 8, pp. 46 055–46 070, 2020

  40. [40]

    Multi-label classification via incremental clustering on an evolving data stream,

    T. T. Nguyen, M. T. Dang, A. V . Luong, A. W.-C. Liew, T. Liang, and J. McCall, “Multi-label classification via incremental clustering on an evolving data stream,” Pattern Recognition, vol. 95, pp. 96–113, 2019

  41. [41]

    Density peaks clus- tering based on k-nearest neighbors and self-recommendation,

    L. Sun, X. Qin, W. Ding, J. Xu, and S. Zhang, “Density peaks clus- tering based on k-nearest neighbors and self-recommendation,” International Journal of Machine Learning and Cybernetics , vol. 12, pp. 1913–1938, March 2021

  42. [42]

    Semantic clustering based de- duction learning for image recognition and classification,

    W. Ma, X. Tu, B. Luo, and G. Wang, “Semantic clustering based de- duction learning for image recognition and classification,” Pattern Recognition, vol. 124, p. # 108440, April 2022

  43. [43]

    Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps,

    G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, and D. B. Rosen, “Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps,” IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 698– 713, 1992

  44. [44]

    An evolving approach to data streams clustering based on typicality and eccentricity data analytics,

    C. G. Bezerra, B. S. J. Costa, L. A. Guedes, and P . P . Angelov, “An evolving approach to data streams clustering based on typicality and eccentricity data analytics,” Information Sciences, vol. 518, pp. 13–28, May 2020

  45. [45]

    A self-organizing incremental neural network for continual supervised learning,

    C. Wiwatcharakoses and D. Berrar, “A self-organizing incremental neural network for continual supervised learning,” Expert Systems with Applications, vol. 185, p. 115662, 2021

  46. [46]

    Evolutionary fuzzy ARTMAP neural networks for classification of semiconduc- tor defects,

    S. C. Tan, J. Watada, Z. Ibrahim, and M. Khalid, “Evolutionary fuzzy ARTMAP neural networks for classification of semiconduc- tor defects,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 5, pp. 933–950, 2014

  47. [47]

    OnARTMAP: A fuzzy ARTMAP- based architecture,

    A. L. Matias and A. R. R. Neto, “OnARTMAP: A fuzzy ARTMAP- based architecture,” Neural Networks, vol. 98, pp. 236–250, 2018

  48. [48]

    A novel fuzzy ARTMAP with area of influence,

    A. L. Matias, A. R. R. Neto, C. L. C. Mattos, and J. P . P . Gomes, “A novel fuzzy ARTMAP with area of influence,” Neurocomputing, vol. 432, pp. 80–90, 2021

  49. [49]

    L. E. B. da Silva, N. Rayapati, and D. C. Wunsch, “iCVI-ARTMAP: Using incremental cluster validity indices and adaptive resonance theory reset mechanism to accelerate validation and achieve mul- tiprototype unsupervised representations,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–14, 2022

  50. [50]

    Incremental cluster validity index-guided online learning for performance and robustness to presentation order,

    ——, “Incremental cluster validity index-guided online learning for performance and robustness to presentation order,” IEEE Transactions on Neural Networks and Learning Systems , pp. 1–15, 2022

  51. [51]

    Topological biclustering ARTMAP for identifying within bicluster relation- ships,

    R. Yelugam, L. E. B. da Silva, and D. C. Wunsch II, “Topological biclustering ARTMAP for identifying within bicluster relation- ships,” Neural Networks, vol. 160, pp. 34–49, March 2023

  52. [52]

    Normal reference band- widths for the general order, multivariate kernel density deriva- tive estimator,

    D. J. Henderson and C. F. Parmeter, “Normal reference band- widths for the general order, multivariate kernel density deriva- tive estimator,” Statistics & Probability Letters , vol. 82, no. 12, pp. 2198–2205, 2012

  53. [53]

    B. W. Silverman, Density Estimation for Statistics and Data Analysis . Routledge, 2018

  54. [54]

    Cluster ensembles—A knowledge reuse framework for combining multiple partitions,

    A. Strehl and J. Ghosh, “Cluster ensembles—A knowledge reuse framework for combining multiple partitions,” Journal of Machine Learning Research, vol. 3, pp. 583–617, December 2002

  55. [55]

    Comparing partitions,

    L. Hubert and P . Arabie, “Comparing partitions,” Journal of Classi- fication, vol. 2, no. 1, pp. 193–218, 1985

  56. [56]

    KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,

    J. Derrac, S. Garcia, L. Sanchez, and F. Herrera, “KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol. 17, pp. 255–287, 2011

  57. [57]

    UCI machine learning repository,

    D. Dua and C. Graff, “UCI machine learning repository,” University of California, Irvine, School of Information and Computer Sciences, 2019. [Online]. Available: http://archive.ics. uci.edu/ml

  58. [58]

    Statistical comparisons of classifiers over multiple data sets,

    J. Dem ˇsar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine Learning Research , vol. 7, no. 1, pp. 1–30, 2006

  59. [59]

    Survey of state-of-the-art mixed data clustering algorithms,

    A. Ahmad and S. S. Khan, “Survey of state-of-the-art mixed data clustering algorithms,” IEEE Access, vol. 7, pp. 31 883–31 902, 2019. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 17 Naoki Masuyama (S’12–M’16) received the B.Eng. degree from Nihon University, Fun- abashi, Japan, in 2010, the M.E. degree from Tokyo Metropolitan University, Hino...