pith. sign in

arxiv: 2606.23639 · v1 · pith:E7KYQ5YOnew · submitted 2026-06-22 · ❄️ cond-mat.soft

Application of Machine Learning for the Identification of 2D Colloidal Assemblies: A Case Study on Particles of Distinct Shapes

Pith reviewed 2026-06-26 06:17 UTC · model grok-4.3

classification ❄️ cond-mat.soft
keywords colloidal assembliesmachine learningobject detectionsynthetic datasetsexperimental imagesparticle shapesYOLO model
0
0 comments X

The pith

Machine learning models trained only on synthetic images produce 43 percent average error when identifying real colloidal particle assemblies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether an object detection model can classify colloidal monolayer assemblies from images of particles with different shapes. Separate models are trained on synthetic images for spheres, ellipsoids, cuboids, and rods to label configurations as isolated particles, dimers, chains, clusters, or loops. Recognition succeeds almost perfectly on the artificial test images yet shows a large drop on laboratory images, averaging 43.1 percent error with clear dependence on particle geometry. The outcome demonstrates that synthetic data alone cannot support reliable recognition in experimental settings.

Core claim

Models trained exclusively on synthetic datasets achieve near-perfect recognition on artificial images but exhibit an average error of 43.1 percent when applied to experimental images of colloidal assemblies, with errors ranging from 20 percent for spheres to 58.5 percent for cuboids.

What carries the argument

The YOLO object detection model trained on shape-specific synthetic datasets to classify colloidal configurations into isolated particles, dimers, chains, clusters, and loops.

If this is right

  • Recognition performance degrades significantly when moving from synthetic to experimental images.
  • Error rates depend on particle geometry, lowest for spheres and highest for cuboids.
  • Preparing datasets based on experimental images is required to improve prediction accuracy.
  • The trained models and synthetic datasets are released for public use in an information system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Domain adaptation methods that align synthetic and real image distributions could reduce the observed transfer gap.
  • Particle shape may serve as a predictor for how much real data is needed to reach usable accuracy.
  • The released models provide a baseline for testing whether mixed synthetic-real training sets close the performance difference.

Load-bearing premise

Models trained exclusively on synthetic datasets can feasibly recognize configurations in real experimental images.

What would settle it

Retraining the models on datasets that include experimental images and measuring the resulting error rate on a held-out set of experimental images; an average error remaining near 43 percent would confirm the reported limitation of synthetic-only training.

Figures

Figures reproduced from arXiv: 2606.23639 by K. S. Kolegov, L. T. Khusainova, S. A. Kolegova.

Figure 1
Figure 1. Figure 1: Possible colloidal assemblies: a) isolated particles, b) dimers, c) chains, d) [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Confusion matrices for (a) cuboids, (b) rods, (c) ellipsoids, and (d) spheres [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Loss curve for spheres [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Loss curve for ellipsoids Attention should be paid to the loss curve for cuboids. When analyzing the loss metrics on the validation dataset, signs of overfitting are noticeable: after reaching a minimum at early stages, the value begins to fluctuate and even increases slightly as the number of epochs grows. This indicates that the model begins to overfit to the training data at the expense of generalizatio… view at source ↗
Figure 5
Figure 5. Figure 5: Loss curve for rods [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Loss curve for cuboids 9 [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Loss curve for cuboids at 35 epochs All four models (for different particle shapes) demonstrated approxi￾mately the same level of recognition accuracy, 97–99% (on synthetic data). A preliminary analysis of the models’ performance conducted on test sets shows a high percentage of assembly identification, which confirms the stated ac￾curacy ( [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Identification results on the test set: a) spherical particles; b) ellipsoidal particles; [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Over-identification: (a) spheres (© Busch et al. [16], ACS AuthorChoice) and (b) rods (© 2023 Wiley-VCH GmbH [17]) The omission of target assemblies ( [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Omission of target assemblies: (a) spheres ( [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Misclassification of an assembly (©2006 American Chemical Society) [19]. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Probability overlap (©2010 The Royal Society of Chemistry) [20]. The problem of truncation of particle fragments ( [PITH_FULL_IMAGE:figures/full_fig_p014_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Particle truncation (©2006 American Chemical Society) [19]. Recall = TP TP + FN , (2) where FN — false negative objects (it is incorrectly asserted that the object does not belong to the class). However, for the case of probability overlap, these formulas are not ap￾plicable, since they are not intended for soft classification tasks. Given the detected identification problems, these metrics should be calc… view at source ↗
read the original abstract

This work addresses the problem of identifying colloidal monolayer assemblies using particles of various shapes (two-dimensional coatings): spheres, ellipsoids, cuboids, and rods. The following classification of assemblies is considered: isolated particles, dimers, chains, clusters, and loops. The YOLO model was chosen as the identification method. Synthetic datasets were prepared for each of the four particle shapes to train the models. The paper discusses the application of models trained on synthetic data to experimental images. An analysis was carried out on the feasibility of using such models for recognizing configurations in real images. While recognition on artificial images is nearly perfect, tests on experimental images showed a significant deviation. The average error across all particle types was 43.1%, but a considerable spread in values is observed: from 20% for spheres to 58.5% for cuboids, indicating the algorithm's selective sensitivity to object geometry. The created datasets and trained models are freely available for use. The corresponding modules have been integrated into the previously developed information system (https://isanm.space/). To further improve prediction results, it is necessary to prepare datasets based on experimental images.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript applies YOLO models to identify 2D colloidal monolayer assemblies (isolated particles, dimers, chains, clusters, loops) for four particle shapes (spheres, ellipsoids, cuboids, rods). Models are trained solely on synthetic image datasets generated for each shape. The central result is that recognition is nearly perfect on held-out synthetic test images, but application to experimental images produces an average error of 43.1% (ranging from 20% for spheres to 58.5% for cuboids). The authors interpret this as evidence of selective sensitivity to object geometry, conclude that experimental training data are required, and release the synthetic datasets, trained models, and integrated modules for an existing information system (https://isanm.space/).

Significance. If the reported performance gap is substantiated, the work supplies a concrete, quantitative demonstration of the synthetic-to-real domain shift in colloidal image analysis. This is useful for the soft-matter community because it directly motivates the collection of labeled experimental datasets rather than relying on simulation alone. The open release of datasets and models, together with integration into an existing platform, is a clear strength that supports reproducibility and extension by others.

major comments (1)
  1. [Abstract] Abstract and results sections: The error rates on experimental images (average 43.1%, 20% spheres to 58.5% cuboids) are the load-bearing evidence for the central claim of significant deviation and geometry-dependent sensitivity. The manuscript does not define the error metric (e.g., per-particle classification error, assembly-level mismatch, or detection IoU threshold), the number of experimental images or particles evaluated, or any baseline comparison (e.g., to human annotation accuracy or conventional image-processing methods). These omissions leave the quantitative claim defensible but not fully substantiated.
minor comments (1)
  1. [Methods] The description of the five assembly classes (isolated particles, dimers, chains, clusters, loops) would benefit from a supplementary figure showing representative labeled examples for each class and shape to improve clarity and reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for recognizing the value of our quantitative demonstration of the synthetic-to-real domain shift. We address the single major comment below and will revise the manuscript to address the noted omissions.

read point-by-point responses
  1. Referee: [Abstract] Abstract and results sections: The error rates on experimental images (average 43.1%, 20% spheres to 58.5% cuboids) are the load-bearing evidence for the central claim of significant deviation and geometry-dependent sensitivity. The manuscript does not define the error metric (e.g., per-particle classification error, assembly-level mismatch, or detection IoU threshold), the number of experimental images or particles evaluated, or any baseline comparison (e.g., to human annotation accuracy or conventional image-processing methods). These omissions leave the quantitative claim defensible but not fully substantiated.

    Authors: We agree that the manuscript does not explicitly define the error metric, report the scale of the experimental evaluation, or provide baseline comparisons, and that these details are required to fully substantiate the central quantitative claims. In the revised manuscript we will add, in both the abstract and results sections: a precise definition of the error metric, the number of experimental images and particles evaluated, and comparisons against human annotation accuracy as well as conventional image-processing methods. These additions will be made without altering the reported error values. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical case study reporting measured recognition performance of YOLO models. Synthetic datasets are generated to train the models; held-out synthetic test images yield near-perfect accuracy while experimental images yield 20–58.5 % error (average 43.1 %). These numbers are direct empirical measurements on separate image sets; no equations, fitted parameters, derivations, or self-citation chains are invoked to obtain the central results. The recommendation to collect experimental training data follows directly from the observed gap rather than from any internal reduction to the training inputs. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations or physical postulates are present. The work is an empirical machine-learning application study.

pith-pipeline@v0.9.1-grok · 5752 in / 1076 out tokens · 29327 ms · 2026-06-26T06:17:55.982526+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 17 canonical work pages

  1. [1]

    Lotito, T

    V. Lotito, T. Zambelli, Approaches to self-assembly of colloidal mono- layers: A guide for nanotechnologists, Advances in Colloid and Interface Science 246 (2017) 217–274.doi:10.1016/j.cis.2017.04.003

  2. [2]

    M. A. Klatt, J. Lovrić, D. Chen, S. C. Kapfer, F. M. Schaller, P. W. A. Schönhöfer, B. S. Gardiner, A.-S. Smith, G. E. Schröder- Turk, S. Torquato, Universal hidden order in amorphous cellular ge- ometries, Nature Communications 10 (1) (Feb. 2019).doi:10.1038/ s41467-019-08360-5

  3. [3]

    A. Pal, A. Gope, Texture identification in liquid crystal-protein droplets using evaporative drying, generalized additive modeling, and K-means Clustering, The European Physical Journal E 47 (5) (May 2024).doi: 10.1140/epje/s10189-024-00429-4. 19

  4. [4]

    A. Pal, A. Gope, M. Yanagisawa, From droplet to diagnosis: spatio- temporal pattern recognition in drying biofluids, Advanced Intelligent Systems 8 (2) (Nov. 2025).doi:10.1002/aisy.202500550

  5. [5]

    Mickel, S

    W. Mickel, S. C. Kapfer, G. E. Schröder-Turk, K. Mecke, Shortcomings of the bond orientational order parameters for the analysis of disordered particulate matter, The Journal of Chemical Physics 138 (4) (Jan. 2013). doi:10.1063/1.4774084

  6. [6]

    Sukhoverkhova, V

    D. Sukhoverkhova, V. Mozolenko, L. Shchur, Phase probabilities in first- order transitions using machine learning, Physical Review E 112 (4) (Oct. 2025).doi:10.1103/h9cg-cc4r

  7. [7]

    A. W. Long, A. L. Ferguson, Nonlinear machine learning of patchy col- loid self-assembly pathways and mechanisms, The Journal of Physical Chemistry B 118 (15) (2014) 4228–4244.doi:10.1021/jp500350b

  8. [8]

    Carstensen, V

    H. Carstensen, V. Kapaklis, M. Wolff, Statistical analysis of phase for- mation in 2D colloidal systems, The European Physical Journal E 41 (1) (Jan. 2018).doi:10.1140/epje/i2018-11615-x

  9. [9]

    J. M. Newby, A. M. Schaefer, P. T. Lee, M. G. Forest, S. K. Lai, Convo- lutional neural networks automate detection for tracking of submicron- scale particles in 2D and 3D, Proceedings of the National Academy of Sciences 115 (36) (2018) 9026–9031.doi:10.1073/pnas.1804420115

  10. [10]

    Boattini, M

    E. Boattini, M. Dijkstra, L. Filion, Unsupervised learning for local struc- ture detection in colloidal systems, The Journal of Chemical Physics 151 (15) (Oct. 2019).doi:10.1063/1.5118867

  11. [11]

    Lotito, T

    V. Lotito, T. Zambelli, A journey through the landscapes of small parti- cles in binary colloidal assemblies: Unveiling structural transitions from isolated particles to clusters upon variation in composition, Nanomate- rials 9 (7) (2019) 921.doi:10.3390/nano9070921

  12. [12]

    Y. Shi, L. Liu, J. Huang, J. Xiong, S. Zhong, G. Zhu, X. Li, Z. He, T. Pan, H. Xin, B. Li, Adaptive opto-thermal-hydrodynamic manipula- tion and polymerization (AOTHMAP) for 4D colloidal patterning, Ad- vanced Materials 36 (52) (Nov. 2024).doi:10.1002/adma.202412895. 20

  13. [13]

    2026).doi:10.1140/epje/ s10189-026-00560-4

    L.T.Khusainova, K.S.Kolegov, Identificationof2Dcolloidalassemblies in images: a threshold processing method versus machine learning, The European Physical Journal E 49 (3) (Feb. 2026).doi:10.1140/epje/ s10189-026-00560-4

  14. [14]

    Roboflow, Roboflow: Computer vision tools for developers and enter- prises,https://roboflow.com, accessed on 15 January 2024 (2025)

    I. Roboflow, Roboflow: Computer vision tools for developers and enter- prises,https://roboflow.com, accessed on 15 January 2024 (2025)

  15. [15]

    L. T. Khusainova, S. A. Kolegova, K. S. Kolegov, Colloidal cluster analysis, gPL-3.0 licence (2026). URLhttps://github.com/prelydia/colloidal-cluster-analysis-of-different-shapes. git

  16. [16]

    R. T. Busch, F. Karim, J. Weis, Y. Sun, C. Zhao, E. S. Vasquez, Op- timization and structural stability of gold nanoparticle–antibody bio- conjugates, ACS Omega 4 (12) (2019) 15269–15279.doi:10.1021/ acsomega.9b02276

  17. [17]

    M. Li, J. Guo, C. Zhang, Y. Che, Y. Yi, B. Liu, Uniform col- loidal polymer rods by stabilizer-assisted liquid-crystallization-driven self-assembly, Angewandte Chemie International Edition 62 (49) (Oct. 2023).doi:10.1002/anie.202309914

  18. [18]

    Rosenberg, F

    M. Rosenberg, F. Dekker, J. G. Donaldson, A. P. Philipse, S. S. Kan- torovich, Self-assembly of charged colloidal cubes, Soft Matter 16 (18) (2020) 4451–4461.doi:10.1039/c9sm02189b

  19. [19]

    Sacanna, L

    S. Sacanna, L. Rossi, B. W. M. Kuipers, A. P. Philipse, Fluorescent monodisperse silica ellipsoids for optical rotational diffusion studies, Langmuir 22 (4) (2006) 1822–1827.doi:10.1021/la052484o

  20. [20]

    L. Li, D. Qin, X. Yang, G. Liu, Synthesis of ellipsoidal hematite/polymer/titania hybrid materials and the corresponding hol- low ellipsoidal particles, Polym. Chem. 1 (3) (2010) 289–295.doi: 10.1039/b9py00230h

  21. [21]

    Fränti, R

    P. Fränti, R. Mariescu-Istodor, Soft precision and recall, Pattern Recog- nition Letters 167 (2023) 115–121.doi:10.1016/j.patrec.2023.02. 005. 21

  22. [22]

    L. T. Khusainova, K. S. Kolegov, Information system for analy- sis of nanostructure morphology: education and research, Interna- tional Journal of Information Technology (Jul. 2025).doi:10.1007/ s41870-025-02658-y. 22