pith. sign in

arxiv: 2408.12974 · v4 · submitted 2024-08-23 · 💻 cs.CV

Accuracy Improvement of Cell Image Segmentation Using Feedback Former

Pith reviewed 2026-05-23 21:48 UTC · model grok-4.3

classification 💻 cs.CV
keywords cell image segmentationsemantic segmentationTransformer encoderfeedback processingmicroscopy imagesdeep learningaccuracy improvement
0
0 comments X

The pith

Feedback Former adds a loop that sends detailed features from near the output back to lower Transformer layers, raising segmentation accuracy on cell images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that standard Transformers, while strong on context, fall short on the fine local details needed for accurate semantic segmentation of microscopy cell images. The proposed fix is a feedback path that routes feature maps carrying those details from layers close to the final output back down to earlier layers inside the Transformer encoder. Experiments on three cell-image datasets indicate this loop produces higher accuracy than plain Transformer baselines, at lower computational cost than earlier feedback designs, and without any need to enlarge the encoder itself.

Core claim

Feedback Former is a Transformer-encoder segmentation model that inserts a feedback connection carrying detailed feature maps from near the output back to lower layers; the connection is intended to offset the Transformer's relative weakness on local detail and thereby improve boundary precision in cell-image segmentation.

What carries the argument

Feedback Former architecture: a Transformer encoder augmented with a feedback route that returns high-resolution feature maps from near the decoder input to earlier encoder stages.

If this is right

  • Segmentation accuracy rises on the tested cell datasets while total compute stays below that of prior feedback methods.
  • The same accuracy level is reached without enlarging the Transformer encoder size.
  • The feedback route supplies the detail that standard self-attention layers tend to under-emphasize.
  • The architecture remains compatible with existing Transformer backbones used as encoders.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feedback pattern could be tested on non-cell medical images where boundary precision is also critical.
  • If the feedback cost stays low, it might allow smaller encoders to match the performance of larger ones on detail-sensitive tasks.
  • The approach separates the benefit of feedback from the cost of simply scaling model width or depth.

Load-bearing premise

That the missing local detail in Transformer features can be reliably restored simply by routing maps from near the output back to lower layers.

What would settle it

A controlled experiment on the same three cell datasets in which Feedback Former shows no accuracy gain over the no-feedback Transformer baseline or requires more FLOPs than the compared feedback methods.

Figures

Figures reproduced from arXiv: 2408.12974 by Hinako Mitsuoka, Kazuhiro Hotta.

Figure 1
Figure 1. Figure 1: The overview of the architecture of the Feedback Former. for loss computation and training. These feedback approaches can improve accu￾racy by allowing the model to focus on specific areas based on previous results. Methods that combine Transformers and feedback processing exist for other tasks, but not yet in segmentation task. Thus, we introduce feedback process￾ing influenced by the human brain and the … view at source ↗
Figure 2
Figure 2. Figure 2: Lite Feedback Module 3 Proposed Method 3.1 Feedback Former [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results. The first and second rows are the results on Drosophila, the third row is the results on ISBI2012, and the bottom row is the result on iRPE dataset. (a) Input image, (b) Ground truth, (c) AttentionFormer, (d) Feedback Attention(ST) [25], (e) Feedback Attention(Self) [25], (f) Feedback Former [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

Semantic segmentation of microscopy cell images by deep learning is a significant technique. We considered that the Transformers, which have recently outperformed CNNs in image recognition, could also be improved and developed for cell image segmentation. Transformers tend to focus more on contextual information than on detailed information. This tendency leads to a lack of detailed information for segmentation. Therefore, to supplement or reinforce the missing detailed information, we hypothesized that feedback processing in the human visual cortex should be effective. Our proposed Feedback Former is a novel architecture for semantic segmentation, in which Transformers is used as an encoder and has a feedback processing mechanism. Feature maps with detailed information are fed back to the lower layers from near the output of the model to compensate for the lack of detailed information which is the weakness of Transformers and improve the segmentation accuracy. By experiments on three cell image datasets, we confirmed that our method surpasses methods without feedback, demonstrating its superior accuracy in cell image segmentation. Our method achieved higher segmentation accuracy while consuming less computational cost than conventional feedback approaches. Moreover, our method offered superior precision without simply increasing the model size of Transformer encoder, demonstrating higher accuracy with lower computational cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Feedback Former, a Transformer-encoder architecture augmented with a feedback mechanism that routes detailed feature maps from near the output back to lower layers to address Transformers' bias toward contextual over local information in cell-image semantic segmentation. Experiments on three cell-image datasets report higher accuracy than non-feedback baselines and prior feedback methods, together with lower computational cost and without simply enlarging the encoder.

Significance. If the controlled comparisons hold, the work supplies empirical evidence that targeted feedback connections can improve Transformer segmentation accuracy for microscopy images while remaining more efficient than earlier feedback designs. The reported ablations and cost measurements constitute a concrete strength that supports the efficiency claim.

major comments (2)
  1. [§4.2, Table 2] §4.2 and Table 2: the headline claim of higher accuracy at lower computational cost rests on the reported Dice/IoU gains and FLOPs/parameter counts; without error bars, standard deviations across runs, or statistical tests, the practical significance of the 1–3 % improvements cannot be evaluated.
  2. [§3.2, Eq. (3)–(5)] §3.2, Eq. (3)–(5): the feedback connection is defined by learned weights; the manuscript must therefore tabulate the exact added parameter count and show that the total remains smaller than the conventional feedback baselines it claims to outperform.
minor comments (2)
  1. [Abstract] The three datasets are never named in the abstract or introduction; explicit dataset identifiers (e.g., DSB2018, etc.) would aid reproducibility.
  2. [Figure 2] Figure 2: the diagram of the feedback paths would be clearer if the exact layer indices receiving the routed features were labeled.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses
  1. Referee: [§4.2, Table 2] §4.2 and Table 2: the headline claim of higher accuracy at lower computational cost rests on the reported Dice/IoU gains and FLOPs/parameter counts; without error bars, standard deviations across runs, or statistical tests, the practical significance of the 1–3 % improvements cannot be evaluated.

    Authors: We acknowledge the absence of error bars or statistical tests in the reported results. The 1–3% gains are observed consistently across three independent cell-image datasets with different characteristics, which lends support to the robustness of the improvements. The current experiments used single runs per model configuration. In the revision we will add an explicit discussion of this point and, where additional runs can be completed within the revision period, report standard deviations to allow better assessment of practical significance. revision: partial

  2. Referee: [§3.2, Eq. (3)–(5)] §3.2, Eq. (3)–(5): the feedback connection is defined by learned weights; the manuscript must therefore tabulate the exact added parameter count and show that the total remains smaller than the conventional feedback baselines it claims to outperform.

    Authors: We agree that an explicit accounting of the parameters introduced by the learned feedback weights in Equations (3)–(5) is necessary. In the revised manuscript we will insert a table (or expanded section) that isolates the parameter overhead of the feedback connections and directly compares the overall parameter count and FLOPs of Feedback Former against the conventional feedback baselines, thereby confirming the efficiency advantage. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper advances an empirical architecture (Feedback Former) whose central claims rest on controlled experiments across three cell-image datasets, direct accuracy and compute comparisons to no-feedback and prior-feedback baselines, and ablations that do not reduce to fitted parameters or self-referential definitions. No equations, uniqueness theorems, or predictions are presented; the motivating premise about Transformer detail loss is treated as background and externally tested rather than assumed. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on standard deep-learning training assumptions plus one domain assumption about transformer weaknesses; no new physical entities or ad-hoc constants are introduced beyond ordinary network weights.

free parameters (1)
  • feedback connection weights
    Learnable parameters in the feedback paths are fitted during training on the cell datasets.
axioms (1)
  • domain assumption Transformers focus more on contextual than detailed information
    Invoked in the abstract to motivate the feedback mechanism.

pith-pipeline@v0.9.0 · 5725 in / 1216 out tokens · 43727 ms · 2026-05-23T21:48:18.173366+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 6 internal anchors

  1. [1]

    Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation

    Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmen- tation. arXiv preprint arXiv:1802.06955 (2018)

  2. [2]

    Microscopy Cell Segmentation via Convolutional LSTM Networks

    Arbelle, A., Raviv, T.R.: Microscopy cell segmentation via convolutional LSTM networks. CoRR abs/1805.11247 (2018), http://arxiv.org/abs/1805.11247

  3. [3]

    IEEE transactions on pat- tern analysis and machine intelligence39(12), 2481–2495 (2017)

    Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pat- tern analysis and machine intelligence39(12), 2481–2495 (2017)

  4. [4]

    IEEE transactions on pattern analysis and machine intelli- gence 40(4), 834–848 (2017)

    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Se- mantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelli- gence 40(4), 834–848 (2017)

  5. [5]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1290–1299 (2022)

  6. [6]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. CoRR abs/2010.11929 (2020), https://arxiv.org/abs/2010.11929

  7. [7]

    Nature methods18(9), 1038–1045 (2021)

    Edlund, C., Jackson, T.R., Khalid, N., Bevan, N., Dale, T., Dengel, A., Ahmed, S., Trygg, J., Sjögren, R.: Livecell—a large-scale dataset for label-free live cell segmentation. Nature methods18(9), 1038–1045 (2021)

  8. [8]

    Cerebral cortex (New York, NY: 1991)1(1), 1–47 (1991)

    Felleman, D.J., Van Essen, D.C.: Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, NY: 1991)1(1), 1–47 (1991)

  9. [9]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Feng, M., Lu, H., Ding, E.: Attentive feedback network for boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1623–1632 (2019)

  10. [10]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

    Fujii, H., Tanaka, H., Ikeuchi, M., Hotta, K.: X-net with different loss functions for cell image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 3793–3800 (June 2021)

  11. [11]

    CoRR abs/1811.12231 (2018), http://arxiv.org/ abs/1811.12231

    Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. CoRR abs/1811.12231 (2018), http://arxiv.org/ abs/1811.12231

  12. [12]

    figshare (11 2013).https://doi.org/10.6084/m9

    Gerhard, S., Funke, J., Martel, J., Cardona, A., Fetter, R.: Segmented anisotropic ssTEM dataset of neural tissue. figshare (11 2013).https://doi.org/10.6084/m9. figshare.856713.v1 , https://figshare.com/articles/dataset/Segmented_ anisotropic_ssTEM_dataset_of_neural_tissue/856713

  13. [13]

    IEEE Transactions on Medical Imaging40(6), 1542–1554 (2021) 10 H

    Girum, K.B., Créhange, G., Lalande, A.: Learning with context feedback loop for robust medical image segmentation. IEEE Transactions on Medical Imaging40(6), 1542–1554 (2021) 10 H. Mitsuoka et al

  14. [14]

    He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

  15. [15]

    Segmentation of neuronal structures in em stacks challenge.https://imagej.net/ events/isbi-2012-segmentation-challenge (2012)

  16. [16]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

    Kirillov,A., Girshick,R., He,K.,Dollár,P.:Panopticfeaturepyramid networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 6399–6408 (2019)

  17. [17]

    Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer:Hierarchicalvisiontransformerusingshiftedwindows.In:Proceedings of the IEEE/CVF international conference on computer vision. pp. 10012–10022 (2021)

  18. [18]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

  19. [19]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops

    Majurski,M.,Manescu,P.,Padi,S.,Schaub,N.,Hotaling,N.,SimonJr,C.,Bajcsy, P.: Cell image segmentation using generative adversarial networks, transfer learn- ing, and augmentations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 0–0 (2019)

  20. [20]

    Markram, H.: A network of tufted layer 5 pyramidal neurons. Cereb. Cortex7(6), 523–533 (Sep 1997)

  21. [21]

    V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

    Milletari, F., Navab, N., Ahmadi, S.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. CoRRabs/1606.04797(2016), http: //arxiv.org/abs/1606.04797

  22. [22]

    In: International symposium on visual computing

    Rahman, M.A., Wang, Y.: Optimizing intersection-over-union in deep neural net- works for image segmentation. In: International symposium on visual computing. pp. 234–244. Springer (2016)

  23. [23]

    Ro, T., Breitmeyer, B., Burton, P., Singhal, N.S., Lane, D.: Feedback contributions to visual awareness in human occipital cortex. Curr. Biol.13(12), 1038–1041 (Jun 2003)

  24. [24]

    U-Net: Convolutional Networks for Biomedical Image Segmentation

    Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. CoRRabs/1505.04597 (2015), http://arxiv.org/abs/ 1505.04597

  25. [25]

    In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16

    Tsuda, H., Shibuya, E., Hotta, K.: Feedback attention for cell image segmentation. In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. pp. 365–379. Springer (2020)

  26. [26]

    Tuli, S., Dasgupta, I., Grant, E., Griffiths, T.L.: Are convolutional neural networks or transformers more like human vision? CoRRabs/2105.07197 (2021), https: //arxiv.org/abs/2105.07197

  27. [27]

    arXiv preprint arXiv:2307.09283 (2023)

    Wang, A., Chen, H., Lin, Z., Pu, H., Ding, G.: Repvit: Revisiting mobile cnn from vit perspective. arXiv preprint arXiv:2307.09283 (2023)

  28. [28]

    IEEE transactions on pattern analysis and machine intelligence 43(10), 3349–3364 (2020)

    Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recog- nition. IEEE transactions on pattern analysis and machine intelligence 43(10), 3349–3364 (2020)

  29. [29]

    CoRR abs/2102.12122 (2021), https://arxiv.org/abs/2102.12122

    Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: A versatile backbone for dense prediction without con- volutions. CoRR abs/2102.12122 (2021), https://arxiv.org/abs/2102.12122

  30. [30]

    Widrow, B., Lehr, M.A.: Perceptrons, adalines, and backpropagation, p. 719–724. MIT Press, Cambridge, MA, USA (1998) Feedback Former 11

  31. [31]

    Advances in Neural Information Processing Systems34, 12077–12090 (2021)

    Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems34, 12077–12090 (2021)

  32. [32]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Xu, J., Xiong, Z., Bhattacharyya, S.P.: Pidnet: A real-time semantic segmentation network inspired by pid controllers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19529–19539 (2023)

  33. [33]

    CoRRabs/2111.11418 (2021), https:// arxiv.org/abs/2111.11418

    Yu,W.,Luo,M.,Zhou,P.,Si,C.,Zhou,Y.,Wang,X.,Feng,J.,Yan,S.:Metaformer is actually what you need for vision. CoRRabs/2111.11418 (2021), https:// arxiv.org/abs/2111.11418

  34. [34]

    IEEE Transactions on Pattern Analysis and Machine Intelli- gence p

    Yu,W.,Si,C.,Zhou,P.,Luo,M.,Zhou,Y.,Feng,J.,Yan,S.,Wang,X.:Metaformer baselines for vision. IEEE Transactions on Pattern Analysis and Machine Intelli- gence p. 1–17 (2023).https://doi.org/10.1109/tpami.2023.3329173 , http: //dx.doi.org/10.1109/TPAMI.2023.3329173

  35. [35]

    Mathematical Biosciences and Engineering20(7), 12039–12055 (2023).https://doi.org/10.3934/mbe.2023535 , https://www

    Yuan, L., Song, J., Fan, Y.: Fm-unet: Biomedical image segmentation based on feedback mechanism unet. Mathematical Biosciences and Engineering20(7), 12039–12055 (2023).https://doi.org/10.3934/mbe.2023535 , https://www. aimspress.com/article/doi/10.3934/mbe.2023535

  36. [36]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2881–2890 (2017)

  37. [37]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6881–6890 (2021)

  38. [38]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Zong, Z., Song, G., Liu, Y.: Detrs with collaborative hybrid assignments training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6748–6758 (2023)