Measuring Prediction Uncertainty in Neural Cellular Automata

Ario Sadafi; Carsten Marr; Michael Deutges; Nassir Navab

arxiv: 2605.26726 · v1 · pith:BTFE4FPFnew · submitted 2026-05-26 · 📡 eess.IV · cs.AI· cs.CV

Measuring Prediction Uncertainty in Neural Cellular Automata

Ario Sadafi , Michael Deutges , Nassir Navab , Carsten Marr This is my paper

Pith reviewed 2026-07-01 16:21 UTC · model grok-4.3

classification 📡 eess.IV cs.AIcs.CV

keywords neural cellular automatauncertainty estimationmedical image segmentationprediction stabilityselective predictiondynamical systemsresilience measure

0 comments

The pith

Resilience measures uncertainty in neural cellular automata by testing prediction stability under small state perturbations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to estimate uncertainty for NCA segmentation models without altering their architecture or retraining. It views the NCA as a dynamical system whose fixed-point attractors represent confident outputs. Resilience checks whether the final state returns to the same segmentation after minor perturbations; stable results count as confident while drifting ones signal uncertainty. This approach is evaluated on selective prediction and ranking metrics across medical image benchmarks, where it identifies poor segmentations more effectively than prior methods.

Core claim

Resilience works by probing the iterative NCA dynamics: after convergence to a candidate segmentation, the automaton state receives small random perturbations and is allowed to iterate again; the degree of return to the original output quantifies how strongly the prediction sits in a stable attractor, thereby serving as an uncertainty score that requires no model changes.

What carries the argument

Resilience, the consistency of the converged NCA state when the automaton is restarted from slightly perturbed versions of its final configuration.

If this is right

Resilience can be computed at inference time to withhold or flag unreliable segmentations in clinical workflows.
The method applies to any pretrained NCA without requiring ensemble training or architectural modifications.
Higher resilience values predict better selective-prediction performance on metrics such as Delta Dice at 90 percent coverage and AURC.
The same stability signal improves ranking of uncertain cases on AUROC and AUPRC relative to baseline uncertainty estimators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same perturbation-stability idea could be tested on other iterative refinement architectures that converge to fixed points.
If resilience correlates with attractor strength, it may offer a route to uncertainty quantification in any dynamical system whose iterations are cheap to rerun.
Combining resilience with existing uncertainty techniques might produce hybrid detectors that catch both model-intrinsic instability and data-shift failures.

Load-bearing premise

Stability of the NCA output under small perturbations of its final state corresponds to confident, trustworthy predictions.

What would settle it

A dataset of NCA segmentations with known ground-truth quality where resilience scores show no better correlation with actual Dice scores than random guessing would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.26726 by Ario Sadafi, Carsten Marr, Michael Deutges, Nassir Navab.

**Figure 1.** Figure 1: Uncertainty quantification via perturb-and-recover in Neural Cellular Automata (NCA). After iterating the NCA for T steps on an input image, a final state ST and segmentation mask m are obtained. To probe the stability of the learned dynamics, Gaussian noise ϵ is injected and the system is allowed to relax for an additional T ′ steps. The resulting prediction m′ is compared to the original mask using the … view at source ↗

**Figure 2.** Figure 2: Our resilience method for uncertainty estimation highlights confident and failed segmentations. For each dataset, one low- and one high-uncertainty example are shown; low uncertainty corresponds to more accurate masks, while high uncertainty captures failures. 3.5 Results and Discussion [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Neural cellular automata (NCA) provide a lightweight alternative to encoder-decoder segmentation networks. However, it can be difficult to decide when a prediction should be trusted. Here, we study uncertainty estimation for NCA-based medical image segmentation without modifying the underlying architecture or retraining the model. Our approach is motivated by viewing the NCA as a dynamical system where convergent attractors correspond to confident predictions. Concretely, we propose resilience, a simple measure that leverages the intrinsic iterative structure of NCAs by probing the stability of the final prediction under small perturbations of the automaton state. Predictions that return to the same solution are deemed confident, while those that change substantially are flagged as uncertain. We evaluate uncertainty by its ability to predict segmentation quality using selective prediction metrics ($\Delta$Dice@90 and AURC) and ranking metrics (AUROC and AUPRC). Across multiple medical segmentation benchmarks, resilience identifies failure cases more reliably than baselines, improving trust and safety in NCA-based models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical perturbation-based resilience score for uncertainty in NCA medical segmentation without retraining, but the dynamical-systems attractor story is asserted more than shown.

read the letter

The main takeaway is a simple resilience score: perturb the final NCA state slightly, rerun the iterations, and see whether the output returns to the same segmentation. Stable cases count as confident. This uses the model's built-in iterative structure and needs no architecture changes or extra training.

What the work does is apply this idea to several medical segmentation benchmarks and measure it with the right tools—selective prediction via delta Dice at 90 percent and AURC, plus ranking via AUROC and AUPRC. The claim is that it flags failures better than the baselines they tried. That is a concrete, usable contribution for anyone already running NCAs on imaging data.

The soft spot is the motivation. The abstract presents the method as following from viewing NCAs as dynamical systems whose confident predictions sit at stable attractors. Nothing in the provided text derives this property from the learned update rule, tests invariance to perturbation size, or shows that the stability signal is not just an artifact of how the model was trained. If that link does not hold, the reported metric gains cannot be credited to the proposed framing.

The paper is for the narrow set of groups already working with neural cellular automata in medical segmentation. A reader in that area gets a ready-to-try method and standard evaluation. It is worth sending to peer review because the core idea is straightforward to implement and the evaluation choices are appropriate, even if the theoretical grounding needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper proposes 'resilience' as an uncertainty measure for Neural Cellular Automata (NCA) applied to medical image segmentation. Motivated by a dynamical-systems view in which convergent attractors indicate confident predictions, resilience probes stability of the final state under small perturbations without altering the NCA architecture or retraining. Performance is assessed via selective-prediction metrics (ΔDice@90, AURC) and ranking metrics (AUROC, AUPRC) on multiple benchmarks, with the claim that resilience identifies failure cases more reliably than baselines.

Significance. If the core assumption holds, the method offers a lightweight, training-free route to uncertainty quantification that exploits the iterative structure already present in NCA models. Evaluation across several medical segmentation benchmarks using standard selective-prediction and ranking metrics is a positive feature; the absence of fitted parameters or invented auxiliary networks is also a strength.

major comments (2)

[Abstract / motivation section] Abstract and motivation section: the claim that 'convergent attractors correspond to confident predictions' is asserted as motivation but is not derived from the learned NCA update rule, nor supported by any ablation demonstrating invariance to perturbation magnitude or that stability coincides with correct outputs rather than training artifacts.
[Evaluation section] Evaluation: the manuscript reports ΔDice@90 and AURC gains but supplies insufficient detail on the precise baselines, implementation of the perturbation test, or statistical significance testing, preventing full assessment of whether the observed improvements can be attributed to resilience.

minor comments (2)

[Method] Clarify the exact operational definition of resilience (perturbation distribution, number of steps, convergence criterion) so that the measure can be reproduced from the text alone.
[Discussion] Add a short discussion of how resilience behaves under changes in NCA iteration count or grid resolution, as these are intrinsic to the model class.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment point by point below and indicate the planned revisions.

read point-by-point responses

Referee: [Abstract / motivation section] Abstract and motivation section: the claim that 'convergent attractors correspond to confident predictions' is asserted as motivation but is not derived from the learned NCA update rule, nor supported by any ablation demonstrating invariance to perturbation magnitude or that stability coincides with correct outputs rather than training artifacts.

Authors: The motivation draws from a high-level dynamical-systems interpretation of NCA convergence rather than a direct derivation from the specific learned update rule. We agree that the manuscript would benefit from additional support. In the revised version we will add a concise theoretical paragraph linking fixed-point stability of the NCA iteration to prediction confidence and include an ablation study examining invariance to perturbation magnitude together with an analysis of the correlation between stability and segmentation correctness versus training artifacts. revision: yes
Referee: [Evaluation section] Evaluation: the manuscript reports ΔDice@90 and AURC gains but supplies insufficient detail on the precise baselines, implementation of the perturbation test, or statistical significance testing, preventing full assessment of whether the observed improvements can be attributed to resilience.

Authors: We acknowledge that the current manuscript lacks sufficient implementation detail. The revised manuscript will expand the evaluation section to specify the exact baseline implementations, the perturbation procedure (including magnitude, number of trials, and state-update steps), and will report statistical significance tests (paired t-tests or Wilcoxon signed-rank tests with p-values) on the reported metric improvements. revision: yes

Circularity Check

0 steps flagged

No circularity; resilience defined directly from NCA iterative rule

full rationale

The paper defines resilience explicitly from the NCA's built-in iterative update rule and a perturbation test on the automaton state. This construction uses only the model's existing dynamics and does not reduce to any fitted parameter, self-citation chain, or ansatz smuggled from prior work. The dynamical-system motivation is presented as interpretive framing rather than a derived theorem, and selective-prediction metrics are evaluated empirically on external benchmarks. No load-bearing step collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the dynamical systems interpretation of NCA behavior and the assumption that perturbation stability correlates with prediction quality, with no free parameters or additional invented entities beyond the proposed resilience metric.

axioms (1)

domain assumption NCA can be viewed as a dynamical system where convergent attractors correspond to confident predictions
Explicitly stated as the motivation for the resilience approach in the abstract.

invented entities (1)

resilience no independent evidence
purpose: Uncertainty measure based on stability of NCA predictions under state perturbations
Newly defined in this work as a simple probe of the iterative structure.

pith-pipeline@v0.9.1-grok · 5708 in / 1164 out tokens · 46977 ms · 2026-07-01T16:21:01.079646+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 1 canonical work pages

[1]

saliency maps from physicians

Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilar- iño, F.: Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized medical imaging and graphics 43, 99–111 (2015)

2015
[2]

Nature methods 16(12), 1247–1253 (2019)

Caicedo, J.C., Goodman, A., Karhohs, K.W., Cimini, B.A., Ackerman, J., Haghighi, M., Heng, C., Becker, T., Doan, M., McQuin, C., et al.: Nucleus segmen- tation across imaging experiments: the 2018 data science bowl. Nature methods 16(12), 1247–1253 (2019)

2018
[3]

In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018)

Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomed- ical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th intern...

2017
[4]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Deutges, M., Sadafi, A., Navab, N., Marr, C.: Neural cellular automata for lightweight, robust and explainable classification of white blood cell images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 693–702. Springer (2024) 10 A. Sadafi et al

2024
[5]

In: International Workshop on Efficient Medical Artificial Intelligence

Deutges, M., Yang, C., Salehi, R., Navab, N., Marr, C., Sadafi, A.: Neural cellular automata for weakly supervised segmentation of white blood cells. In: International Workshop on Efficient Medical Artificial Intelligence. pp. 289–298. Springer (2025)

2025
[6]

Nature communications13(1), 6572 (2022)

Dolezal, J.M., Srisuwananukorn, A., Karpeyev, D., Ramesh, S., Kochanny, S., Cody, B., Mansfield, A.S., Rakshit, S., Bansal, R., Bois, M.C., et al.: Uncertainty- informed deep learning models enable high-confidence predictions for digital histopathology. Nature communications13(1), 6572 (2022)

2022
[7]

In: international conference on machine learn- ing

Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: international conference on machine learn- ing. pp. 1050–1059. PMLR (2016)

2016
[8]

Medical image analysis82, 102596 (2022)

González, C., Gotkowski, K., Fuchs, M., Bucher, A., Dadras, A., Fischbach, R., Kaltenborn, I.J., Mukhopadhyay, A.: Distance-based detection of out-of- distribution silent failures for covid-19 lung lesion segmentation. Medical image analysis82, 102596 (2022)

2022
[9]

In: International con- ference on multimedia modeling

Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., De Lange, T., Johansen, D., Johansen, H.D.: Kvasir-seg: A segmented polyp dataset. In: International con- ference on multimedia modeling. pp. 451–462. Springer (2019)

2019
[10]

In: International Conference on In- formation Processing in Medical Imaging

Kalkhof, J., González, C., Mukhopadhyay, A.: Med-nca: Robust and lightweight segmentation with neural cellular automata. In: International Conference on In- formation Processing in Medical Imaging. pp. 705–716. Springer (2023)

2023
[11]

Medical Image Analysis103, 103601 (2025)

Kalkhof, J., Ihm, N., Koehler, T., Gregori, B., Mukhopadhyay, A.: Med-nca: Bio- inspired medical image segmentation. Medical Image Analysis103, 103601 (2025)

2025
[12]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Kalkhof, J., Mukhopadhyay, A.: M3d-nca: Robust 3d segmentation with built-in quality control. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 169–178. Springer (2023)

2023
[13]

arXiv preprint arXiv:2508.06993 (2025)

Lemke, N., Kalkhof, J., Babendererde, N., Mukhopadhyay, A.: Octreenca: Single- pass 184 mp segmentation on consumer hardware. arXiv preprint arXiv:2508.06993 (2025)

work page arXiv 2025
[14]

Scientific Data11(1), 295 (2024)

Mahbod, A., Polak, C., Feldmann, K., Khan, R., Gelles, K., Dorffner, G., Woitek, R., Hatamikia, S., Ellinger, I.: Nuinsseg: a fully annotated dataset for nuclei in- stance segmentation in h&e-stained histological images. Scientific Data11(1), 295 (2024)

2024
[15]

Distill5(2), e23 (2020)

Mordvintsev, A., Randazzo, E., Niklasson, E., Levin, M.: Growing neural cellular automata. Distill5(2), e23 (2020)

2020
[16]

IET image processing16(5), 1243– 1267 (2022)

Wang, R., Lei, T., Cui, R., Zhang, B., Meng, H., Nandi, A.K.: Medical image segmentation using deep learning: A survey. IET image processing16(5), 1243– 1267 (2022)

2022
[17]

In: International Workshop on Machine Learning in Medical Imaging

Yang, C., Deutges, M., Liu, J., Li, H., Navab, N., Marr, C., Sadafi, A.: Attention pooling enhances nca-based classification of microscopy images. In: International Workshop on Machine Learning in Medical Imaging. pp. 583–593. Springer (2025)

2025
[18]

In: International Conference on Information Processing in Medical Imaging

Yang, C., Deutges, M., Navab, N., Sadafi, A., Marr, C.: Hierarchical neural cel- lular automata for lightweight microscopy image classification. In: International Conference on Information Processing in Medical Imaging. pp. 19–32. Springer (2025)

2025

[1] [1]

saliency maps from physicians

Bernal, J., Sánchez, F.J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilar- iño, F.: Wm-dova maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized medical imaging and graphics 43, 99–111 (2015)

2015

[2] [2]

Nature methods 16(12), 1247–1253 (2019)

Caicedo, J.C., Goodman, A., Karhohs, K.W., Cimini, B.A., Ackerman, J., Haghighi, M., Heng, C., Becker, T., Doan, M., McQuin, C., et al.: Nucleus segmen- tation across imaging experiments: the 2018 data science bowl. Nature methods 16(12), 1247–1253 (2019)

2018

[3] [3]

In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018)

Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H., et al.: Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomed- ical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th intern...

2017

[4] [4]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Deutges, M., Sadafi, A., Navab, N., Marr, C.: Neural cellular automata for lightweight, robust and explainable classification of white blood cell images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 693–702. Springer (2024) 10 A. Sadafi et al

2024

[5] [5]

In: International Workshop on Efficient Medical Artificial Intelligence

Deutges, M., Yang, C., Salehi, R., Navab, N., Marr, C., Sadafi, A.: Neural cellular automata for weakly supervised segmentation of white blood cells. In: International Workshop on Efficient Medical Artificial Intelligence. pp. 289–298. Springer (2025)

2025

[6] [6]

Nature communications13(1), 6572 (2022)

Dolezal, J.M., Srisuwananukorn, A., Karpeyev, D., Ramesh, S., Kochanny, S., Cody, B., Mansfield, A.S., Rakshit, S., Bansal, R., Bois, M.C., et al.: Uncertainty- informed deep learning models enable high-confidence predictions for digital histopathology. Nature communications13(1), 6572 (2022)

2022

[7] [7]

In: international conference on machine learn- ing

Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: international conference on machine learn- ing. pp. 1050–1059. PMLR (2016)

2016

[8] [8]

Medical image analysis82, 102596 (2022)

González, C., Gotkowski, K., Fuchs, M., Bucher, A., Dadras, A., Fischbach, R., Kaltenborn, I.J., Mukhopadhyay, A.: Distance-based detection of out-of- distribution silent failures for covid-19 lung lesion segmentation. Medical image analysis82, 102596 (2022)

2022

[9] [9]

In: International con- ference on multimedia modeling

Jha, D., Smedsrud, P.H., Riegler, M.A., Halvorsen, P., De Lange, T., Johansen, D., Johansen, H.D.: Kvasir-seg: A segmented polyp dataset. In: International con- ference on multimedia modeling. pp. 451–462. Springer (2019)

2019

[10] [10]

In: International Conference on In- formation Processing in Medical Imaging

Kalkhof, J., González, C., Mukhopadhyay, A.: Med-nca: Robust and lightweight segmentation with neural cellular automata. In: International Conference on In- formation Processing in Medical Imaging. pp. 705–716. Springer (2023)

2023

[11] [11]

Medical Image Analysis103, 103601 (2025)

Kalkhof, J., Ihm, N., Koehler, T., Gregori, B., Mukhopadhyay, A.: Med-nca: Bio- inspired medical image segmentation. Medical Image Analysis103, 103601 (2025)

2025

[12] [12]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Kalkhof, J., Mukhopadhyay, A.: M3d-nca: Robust 3d segmentation with built-in quality control. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 169–178. Springer (2023)

2023

[13] [13]

arXiv preprint arXiv:2508.06993 (2025)

Lemke, N., Kalkhof, J., Babendererde, N., Mukhopadhyay, A.: Octreenca: Single- pass 184 mp segmentation on consumer hardware. arXiv preprint arXiv:2508.06993 (2025)

work page arXiv 2025

[14] [14]

Scientific Data11(1), 295 (2024)

Mahbod, A., Polak, C., Feldmann, K., Khan, R., Gelles, K., Dorffner, G., Woitek, R., Hatamikia, S., Ellinger, I.: Nuinsseg: a fully annotated dataset for nuclei in- stance segmentation in h&e-stained histological images. Scientific Data11(1), 295 (2024)

2024

[15] [15]

Distill5(2), e23 (2020)

Mordvintsev, A., Randazzo, E., Niklasson, E., Levin, M.: Growing neural cellular automata. Distill5(2), e23 (2020)

2020

[16] [16]

IET image processing16(5), 1243– 1267 (2022)

Wang, R., Lei, T., Cui, R., Zhang, B., Meng, H., Nandi, A.K.: Medical image segmentation using deep learning: A survey. IET image processing16(5), 1243– 1267 (2022)

2022

[17] [17]

In: International Workshop on Machine Learning in Medical Imaging

Yang, C., Deutges, M., Liu, J., Li, H., Navab, N., Marr, C., Sadafi, A.: Attention pooling enhances nca-based classification of microscopy images. In: International Workshop on Machine Learning in Medical Imaging. pp. 583–593. Springer (2025)

2025

[18] [18]

In: International Conference on Information Processing in Medical Imaging

Yang, C., Deutges, M., Navab, N., Sadafi, A., Marr, C.: Hierarchical neural cel- lular automata for lightweight microscopy image classification. In: International Conference on Information Processing in Medical Imaging. pp. 19–32. Springer (2025)

2025