Measuring Prediction Uncertainty in Neural Cellular Automata
Pith reviewed 2026-07-01 16:21 UTC · model grok-4.3
The pith
Resilience measures uncertainty in neural cellular automata by testing prediction stability under small state perturbations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Resilience works by probing the iterative NCA dynamics: after convergence to a candidate segmentation, the automaton state receives small random perturbations and is allowed to iterate again; the degree of return to the original output quantifies how strongly the prediction sits in a stable attractor, thereby serving as an uncertainty score that requires no model changes.
What carries the argument
Resilience, the consistency of the converged NCA state when the automaton is restarted from slightly perturbed versions of its final configuration.
If this is right
- Resilience can be computed at inference time to withhold or flag unreliable segmentations in clinical workflows.
- The method applies to any pretrained NCA without requiring ensemble training or architectural modifications.
- Higher resilience values predict better selective-prediction performance on metrics such as Delta Dice at 90 percent coverage and AURC.
- The same stability signal improves ranking of uncertain cases on AUROC and AUPRC relative to baseline uncertainty estimators.
Where Pith is reading between the lines
- The same perturbation-stability idea could be tested on other iterative refinement architectures that converge to fixed points.
- If resilience correlates with attractor strength, it may offer a route to uncertainty quantification in any dynamical system whose iterations are cheap to rerun.
- Combining resilience with existing uncertainty techniques might produce hybrid detectors that catch both model-intrinsic instability and data-shift failures.
Load-bearing premise
Stability of the NCA output under small perturbations of its final state corresponds to confident, trustworthy predictions.
What would settle it
A dataset of NCA segmentations with known ground-truth quality where resilience scores show no better correlation with actual Dice scores than random guessing would falsify the claim.
Figures
read the original abstract
Neural cellular automata (NCA) provide a lightweight alternative to encoder-decoder segmentation networks. However, it can be difficult to decide when a prediction should be trusted. Here, we study uncertainty estimation for NCA-based medical image segmentation without modifying the underlying architecture or retraining the model. Our approach is motivated by viewing the NCA as a dynamical system where convergent attractors correspond to confident predictions. Concretely, we propose resilience, a simple measure that leverages the intrinsic iterative structure of NCAs by probing the stability of the final prediction under small perturbations of the automaton state. Predictions that return to the same solution are deemed confident, while those that change substantially are flagged as uncertain. We evaluate uncertainty by its ability to predict segmentation quality using selective prediction metrics ($\Delta$Dice@90 and AURC) and ranking metrics (AUROC and AUPRC). Across multiple medical segmentation benchmarks, resilience identifies failure cases more reliably than baselines, improving trust and safety in NCA-based models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes 'resilience' as an uncertainty measure for Neural Cellular Automata (NCA) applied to medical image segmentation. Motivated by a dynamical-systems view in which convergent attractors indicate confident predictions, resilience probes stability of the final state under small perturbations without altering the NCA architecture or retraining. Performance is assessed via selective-prediction metrics (ΔDice@90, AURC) and ranking metrics (AUROC, AUPRC) on multiple benchmarks, with the claim that resilience identifies failure cases more reliably than baselines.
Significance. If the core assumption holds, the method offers a lightweight, training-free route to uncertainty quantification that exploits the iterative structure already present in NCA models. Evaluation across several medical segmentation benchmarks using standard selective-prediction and ranking metrics is a positive feature; the absence of fitted parameters or invented auxiliary networks is also a strength.
major comments (2)
- [Abstract / motivation section] Abstract and motivation section: the claim that 'convergent attractors correspond to confident predictions' is asserted as motivation but is not derived from the learned NCA update rule, nor supported by any ablation demonstrating invariance to perturbation magnitude or that stability coincides with correct outputs rather than training artifacts.
- [Evaluation section] Evaluation: the manuscript reports ΔDice@90 and AURC gains but supplies insufficient detail on the precise baselines, implementation of the perturbation test, or statistical significance testing, preventing full assessment of whether the observed improvements can be attributed to resilience.
minor comments (2)
- [Method] Clarify the exact operational definition of resilience (perturbation distribution, number of steps, convergence criterion) so that the measure can be reproduced from the text alone.
- [Discussion] Add a short discussion of how resilience behaves under changes in NCA iteration count or grid resolution, as these are intrinsic to the model class.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below and indicate the planned revisions.
read point-by-point responses
-
Referee: [Abstract / motivation section] Abstract and motivation section: the claim that 'convergent attractors correspond to confident predictions' is asserted as motivation but is not derived from the learned NCA update rule, nor supported by any ablation demonstrating invariance to perturbation magnitude or that stability coincides with correct outputs rather than training artifacts.
Authors: The motivation draws from a high-level dynamical-systems interpretation of NCA convergence rather than a direct derivation from the specific learned update rule. We agree that the manuscript would benefit from additional support. In the revised version we will add a concise theoretical paragraph linking fixed-point stability of the NCA iteration to prediction confidence and include an ablation study examining invariance to perturbation magnitude together with an analysis of the correlation between stability and segmentation correctness versus training artifacts. revision: yes
-
Referee: [Evaluation section] Evaluation: the manuscript reports ΔDice@90 and AURC gains but supplies insufficient detail on the precise baselines, implementation of the perturbation test, or statistical significance testing, preventing full assessment of whether the observed improvements can be attributed to resilience.
Authors: We acknowledge that the current manuscript lacks sufficient implementation detail. The revised manuscript will expand the evaluation section to specify the exact baseline implementations, the perturbation procedure (including magnitude, number of trials, and state-update steps), and will report statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) on the reported metric improvements. revision: yes
Circularity Check
No circularity; resilience defined directly from NCA iterative rule
full rationale
The paper defines resilience explicitly from the NCA's built-in iterative update rule and a perturbation test on the automaton state. This construction uses only the model's existing dynamics and does not reduce to any fitted parameter, self-citation chain, or ansatz smuggled from prior work. The dynamical-system motivation is presented as interpretive framing rather than a derived theorem, and selective-prediction metrics are evaluated empirically on external benchmarks. No load-bearing step collapses to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption NCA can be viewed as a dynamical system where convergent attractors correspond to confident predictions
invented entities (1)
-
resilience
no independent evidence
Reference graph
Works this paper leans on
-
[1]
saliency maps from physicians
Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilar- iño, F.: Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized medical imaging and graphics 43, 99–111 (2015)
2015
-
[2]
Nature methods 16(12), 1247–1253 (2019)
Caicedo, J.C., Goodman, A., Karhohs, K.W., Cimini, B.A., Ackerman, J., Haghighi, M., Heng, C., Becker, T., Doan, M., McQuin, C., et al.: Nucleus segmen- tation across imaging experiments: the 2018 data science bowl. Nature methods 16(12), 1247–1253 (2019)
2018
-
[3]
In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018)
Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomed- ical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th intern...
2017
-
[4]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Deutges, M., Sadafi, A., Navab, N., Marr, C.: Neural cellular automata for lightweight, robust and explainable classification of white blood cell images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 693–702. Springer (2024) 10 A. Sadafi et al
2024
-
[5]
In: International Workshop on Efficient Medical Artificial Intelligence
Deutges, M., Yang, C., Salehi, R., Navab, N., Marr, C., Sadafi, A.: Neural cellular automata for weakly supervised segmentation of white blood cells. In: International Workshop on Efficient Medical Artificial Intelligence. pp. 289–298. Springer (2025)
2025
-
[6]
Nature communications13(1), 6572 (2022)
Dolezal, J.M., Srisuwananukorn, A., Karpeyev, D., Ramesh, S., Kochanny, S., Cody, B., Mansfield, A.S., Rakshit, S., Bansal, R., Bois, M.C., et al.: Uncertainty- informed deep learning models enable high-confidence predictions for digital histopathology. Nature communications13(1), 6572 (2022)
2022
-
[7]
In: international conference on machine learn- ing
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: international conference on machine learn- ing. pp. 1050–1059. PMLR (2016)
2016
-
[8]
Medical image analysis82, 102596 (2022)
González, C., Gotkowski, K., Fuchs, M., Bucher, A., Dadras, A., Fischbach, R., Kaltenborn, I.J., Mukhopadhyay, A.: Distance-based detection of out-of- distribution silent failures for covid-19 lung lesion segmentation. Medical image analysis82, 102596 (2022)
2022
-
[9]
In: International con- ference on multimedia modeling
Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., De Lange, T., Johansen, D., Johansen, H.D.: Kvasir-seg: A segmented polyp dataset. In: International con- ference on multimedia modeling. pp. 451–462. Springer (2019)
2019
-
[10]
In: International Conference on In- formation Processing in Medical Imaging
Kalkhof, J., González, C., Mukhopadhyay, A.: Med-nca: Robust and lightweight segmentation with neural cellular automata. In: International Conference on In- formation Processing in Medical Imaging. pp. 705–716. Springer (2023)
2023
-
[11]
Medical Image Analysis103, 103601 (2025)
Kalkhof, J., Ihm, N., Koehler, T., Gregori, B., Mukhopadhyay, A.: Med-nca: Bio- inspired medical image segmentation. Medical Image Analysis103, 103601 (2025)
2025
-
[12]
In: International Conference on Medical Image Computing and Computer-Assisted Intervention
Kalkhof, J., Mukhopadhyay, A.: M3d-nca: Robust 3d segmentation with built-in quality control. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 169–178. Springer (2023)
2023
-
[13]
arXiv preprint arXiv:2508.06993 (2025)
Lemke, N., Kalkhof, J., Babendererde, N., Mukhopadhyay, A.: Octreenca: Single- pass 184 mp segmentation on consumer hardware. arXiv preprint arXiv:2508.06993 (2025)
-
[14]
Scientific Data11(1), 295 (2024)
Mahbod, A., Polak, C., Feldmann, K., Khan, R., Gelles, K., Dorffner, G., Woitek, R., Hatamikia, S., Ellinger, I.: Nuinsseg: a fully annotated dataset for nuclei in- stance segmentation in h&e-stained histological images. Scientific Data11(1), 295 (2024)
2024
-
[15]
Distill5(2), e23 (2020)
Mordvintsev, A., Randazzo, E., Niklasson, E., Levin, M.: Growing neural cellular automata. Distill5(2), e23 (2020)
2020
-
[16]
IET image processing16(5), 1243– 1267 (2022)
Wang, R., Lei, T., Cui, R., Zhang, B., Meng, H., Nandi, A.K.: Medical image segmentation using deep learning: A survey. IET image processing16(5), 1243– 1267 (2022)
2022
-
[17]
In: International Workshop on Machine Learning in Medical Imaging
Yang, C., Deutges, M., Liu, J., Li, H., Navab, N., Marr, C., Sadafi, A.: Attention pooling enhances nca-based classification of microscopy images. In: International Workshop on Machine Learning in Medical Imaging. pp. 583–593. Springer (2025)
2025
-
[18]
In: International Conference on Information Processing in Medical Imaging
Yang, C., Deutges, M., Navab, N., Sadafi, A., Marr, C.: Hierarchical neural cel- lular automata for lightweight microscopy image classification. In: International Conference on Information Processing in Medical Imaging. pp. 19–32. Springer (2025)
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.