Accuracy Improvement of Cell Image Segmentation Using Feedback Former
Pith reviewed 2026-05-23 21:48 UTC · model grok-4.3
The pith
Feedback Former adds a loop that sends detailed features from near the output back to lower Transformer layers, raising segmentation accuracy on cell images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Feedback Former is a Transformer-encoder segmentation model that inserts a feedback connection carrying detailed feature maps from near the output back to lower layers; the connection is intended to offset the Transformer's relative weakness on local detail and thereby improve boundary precision in cell-image segmentation.
What carries the argument
Feedback Former architecture: a Transformer encoder augmented with a feedback route that returns high-resolution feature maps from near the decoder input to earlier encoder stages.
If this is right
- Segmentation accuracy rises on the tested cell datasets while total compute stays below that of prior feedback methods.
- The same accuracy level is reached without enlarging the Transformer encoder size.
- The feedback route supplies the detail that standard self-attention layers tend to under-emphasize.
- The architecture remains compatible with existing Transformer backbones used as encoders.
Where Pith is reading between the lines
- The same feedback pattern could be tested on non-cell medical images where boundary precision is also critical.
- If the feedback cost stays low, it might allow smaller encoders to match the performance of larger ones on detail-sensitive tasks.
- The approach separates the benefit of feedback from the cost of simply scaling model width or depth.
Load-bearing premise
That the missing local detail in Transformer features can be reliably restored simply by routing maps from near the output back to lower layers.
What would settle it
A controlled experiment on the same three cell datasets in which Feedback Former shows no accuracy gain over the no-feedback Transformer baseline or requires more FLOPs than the compared feedback methods.
Figures
read the original abstract
Semantic segmentation of microscopy cell images by deep learning is a significant technique. We considered that the Transformers, which have recently outperformed CNNs in image recognition, could also be improved and developed for cell image segmentation. Transformers tend to focus more on contextual information than on detailed information. This tendency leads to a lack of detailed information for segmentation. Therefore, to supplement or reinforce the missing detailed information, we hypothesized that feedback processing in the human visual cortex should be effective. Our proposed Feedback Former is a novel architecture for semantic segmentation, in which Transformers is used as an encoder and has a feedback processing mechanism. Feature maps with detailed information are fed back to the lower layers from near the output of the model to compensate for the lack of detailed information which is the weakness of Transformers and improve the segmentation accuracy. By experiments on three cell image datasets, we confirmed that our method surpasses methods without feedback, demonstrating its superior accuracy in cell image segmentation. Our method achieved higher segmentation accuracy while consuming less computational cost than conventional feedback approaches. Moreover, our method offered superior precision without simply increasing the model size of Transformer encoder, demonstrating higher accuracy with lower computational cost.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Feedback Former, a Transformer-encoder architecture augmented with a feedback mechanism that routes detailed feature maps from near the output back to lower layers to address Transformers' bias toward contextual over local information in cell-image semantic segmentation. Experiments on three cell-image datasets report higher accuracy than non-feedback baselines and prior feedback methods, together with lower computational cost and without simply enlarging the encoder.
Significance. If the controlled comparisons hold, the work supplies empirical evidence that targeted feedback connections can improve Transformer segmentation accuracy for microscopy images while remaining more efficient than earlier feedback designs. The reported ablations and cost measurements constitute a concrete strength that supports the efficiency claim.
major comments (2)
- [§4.2, Table 2] §4.2 and Table 2: the headline claim of higher accuracy at lower computational cost rests on the reported Dice/IoU gains and FLOPs/parameter counts; without error bars, standard deviations across runs, or statistical tests, the practical significance of the 1–3 % improvements cannot be evaluated.
- [§3.2, Eq. (3)–(5)] §3.2, Eq. (3)–(5): the feedback connection is defined by learned weights; the manuscript must therefore tabulate the exact added parameter count and show that the total remains smaller than the conventional feedback baselines it claims to outperform.
minor comments (2)
- [Abstract] The three datasets are never named in the abstract or introduction; explicit dataset identifiers (e.g., DSB2018, etc.) would aid reproducibility.
- [Figure 2] Figure 2: the diagram of the feedback paths would be clearer if the exact layer indices receiving the routed features were labeled.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and constructive feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [§4.2, Table 2] §4.2 and Table 2: the headline claim of higher accuracy at lower computational cost rests on the reported Dice/IoU gains and FLOPs/parameter counts; without error bars, standard deviations across runs, or statistical tests, the practical significance of the 1–3 % improvements cannot be evaluated.
Authors: We acknowledge the absence of error bars or statistical tests in the reported results. The 1–3% gains are observed consistently across three independent cell-image datasets with different characteristics, which lends support to the robustness of the improvements. The current experiments used single runs per model configuration. In the revision we will add an explicit discussion of this point and, where additional runs can be completed within the revision period, report standard deviations to allow better assessment of practical significance. revision: partial
-
Referee: [§3.2, Eq. (3)–(5)] §3.2, Eq. (3)–(5): the feedback connection is defined by learned weights; the manuscript must therefore tabulate the exact added parameter count and show that the total remains smaller than the conventional feedback baselines it claims to outperform.
Authors: We agree that an explicit accounting of the parameters introduced by the learned feedback weights in Equations (3)–(5) is necessary. In the revised manuscript we will insert a table (or expanded section) that isolates the parameter overhead of the feedback connections and directly compares the overall parameter count and FLOPs of Feedback Former against the conventional feedback baselines, thereby confirming the efficiency advantage. revision: yes
Circularity Check
No significant circularity
full rationale
The paper advances an empirical architecture (Feedback Former) whose central claims rest on controlled experiments across three cell-image datasets, direct accuracy and compute comparisons to no-feedback and prior-feedback baselines, and ablations that do not reduce to fitted parameters or self-referential definitions. No equations, uniqueness theorems, or predictions are presented; the motivating premise about Transformer detail loss is treated as background and externally tested rather than assumed. The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- feedback connection weights
axioms (1)
- domain assumption Transformers focus more on contextual than detailed information
Reference graph
Works this paper leans on
-
[1]
Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmen- tation. arXiv preprint arXiv:1802.06955 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[2]
Microscopy Cell Segmentation via Convolutional LSTM Networks
Arbelle, A., Raviv, T.R.: Microscopy cell segmentation via convolutional LSTM networks. CoRR abs/1805.11247 (2018), http://arxiv.org/abs/1805.11247
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
IEEE transactions on pat- tern analysis and machine intelligence39(12), 2481–2495 (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pat- tern analysis and machine intelligence39(12), 2481–2495 (2017)
work page 2017
-
[4]
IEEE transactions on pattern analysis and machine intelli- gence 40(4), 834–848 (2017)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Se- mantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelli- gence 40(4), 834–848 (2017)
work page 2017
-
[5]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1290–1299 (2022)
work page 2022
-
[6]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. CoRR abs/2010.11929 (2020), https://arxiv.org/abs/2010.11929
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[7]
Nature methods18(9), 1038–1045 (2021)
Edlund, C., Jackson, T.R., Khalid, N., Bevan, N., Dale, T., Dengel, A., Ahmed, S., Trygg, J., Sjögren, R.: Livecell—a large-scale dataset for label-free live cell segmentation. Nature methods18(9), 1038–1045 (2021)
work page 2021
-
[8]
Cerebral cortex (New York, NY: 1991)1(1), 1–47 (1991)
Felleman, D.J., Van Essen, D.C.: Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, NY: 1991)1(1), 1–47 (1991)
work page 1991
-
[9]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Feng, M., Lu, H., Ding, E.: Attentive feedback network for boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1623–1632 (2019)
work page 2019
-
[10]
Fujii, H., Tanaka, H., Ikeuchi, M., Hotta, K.: X-net with different loss functions for cell image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 3793–3800 (June 2021)
work page 2021
-
[11]
CoRR abs/1811.12231 (2018), http://arxiv.org/ abs/1811.12231
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. CoRR abs/1811.12231 (2018), http://arxiv.org/ abs/1811.12231
-
[12]
figshare (11 2013).https://doi.org/10.6084/m9
Gerhard, S., Funke, J., Martel, J., Cardona, A., Fetter, R.: Segmented anisotropic ssTEM dataset of neural tissue. figshare (11 2013).https://doi.org/10.6084/m9. figshare.856713.v1 , https://figshare.com/articles/dataset/Segmented_ anisotropic_ssTEM_dataset_of_neural_tissue/856713
work page doi:10.6084/m9 2013
-
[13]
IEEE Transactions on Medical Imaging40(6), 1542–1554 (2021) 10 H
Girum, K.B., Créhange, G., Lalande, A.: Learning with context feedback loop for robust medical image segmentation. IEEE Transactions on Medical Imaging40(6), 1542–1554 (2021) 10 H. Mitsuoka et al
work page 2021
-
[14]
He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
work page 2016
-
[15]
Segmentation of neuronal structures in em stacks challenge.https://imagej.net/ events/isbi-2012-segmentation-challenge (2012)
work page 2012
-
[16]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition
Kirillov,A., Girshick,R., He,K.,Dollár,P.:Panopticfeaturepyramid networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 6399–6408 (2019)
work page 2019
-
[17]
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer:Hierarchicalvisiontransformerusingshiftedwindows.In:Proceedings of the IEEE/CVF international conference on computer vision. pp. 10012–10022 (2021)
work page 2021
-
[18]
SGDR: Stochastic Gradient Descent with Warm Restarts
Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[19]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops
Majurski,M.,Manescu,P.,Padi,S.,Schaub,N.,Hotaling,N.,SimonJr,C.,Bajcsy, P.: Cell image segmentation using generative adversarial networks, transfer learn- ing, and augmentations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 0–0 (2019)
work page 2019
-
[20]
Markram, H.: A network of tufted layer 5 pyramidal neurons. Cereb. Cortex7(6), 523–533 (Sep 1997)
work page 1997
-
[21]
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
Milletari, F., Navab, N., Ahmadi, S.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. CoRRabs/1606.04797(2016), http: //arxiv.org/abs/1606.04797
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[22]
In: International symposium on visual computing
Rahman, M.A., Wang, Y.: Optimizing intersection-over-union in deep neural net- works for image segmentation. In: International symposium on visual computing. pp. 234–244. Springer (2016)
work page 2016
-
[23]
Ro, T., Breitmeyer, B., Burton, P., Singhal, N.S., Lane, D.: Feedback contributions to visual awareness in human occipital cortex. Curr. Biol.13(12), 1038–1041 (Jun 2003)
work page 2003
-
[24]
U-Net: Convolutional Networks for Biomedical Image Segmentation
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. CoRRabs/1505.04597 (2015), http://arxiv.org/abs/ 1505.04597
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[25]
In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16
Tsuda, H., Shibuya, E., Hotta, K.: Feedback attention for cell image segmentation. In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. pp. 365–379. Springer (2020)
work page 2020
- [26]
-
[27]
arXiv preprint arXiv:2307.09283 (2023)
Wang, A., Chen, H., Lin, Z., Pu, H., Ding, G.: Repvit: Revisiting mobile cnn from vit perspective. arXiv preprint arXiv:2307.09283 (2023)
-
[28]
IEEE transactions on pattern analysis and machine intelligence 43(10), 3349–3364 (2020)
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recog- nition. IEEE transactions on pattern analysis and machine intelligence 43(10), 3349–3364 (2020)
work page 2020
-
[29]
CoRR abs/2102.12122 (2021), https://arxiv.org/abs/2102.12122
Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: A versatile backbone for dense prediction without con- volutions. CoRR abs/2102.12122 (2021), https://arxiv.org/abs/2102.12122
-
[30]
Widrow, B., Lehr, M.A.: Perceptrons, adalines, and backpropagation, p. 719–724. MIT Press, Cambridge, MA, USA (1998) Feedback Former 11
work page 1998
-
[31]
Advances in Neural Information Processing Systems34, 12077–12090 (2021)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems34, 12077–12090 (2021)
work page 2021
-
[32]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Xu, J., Xiong, Z., Bhattacharyya, S.P.: Pidnet: A real-time semantic segmentation network inspired by pid controllers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19529–19539 (2023)
work page 2023
-
[33]
CoRRabs/2111.11418 (2021), https:// arxiv.org/abs/2111.11418
Yu,W.,Luo,M.,Zhou,P.,Si,C.,Zhou,Y.,Wang,X.,Feng,J.,Yan,S.:Metaformer is actually what you need for vision. CoRRabs/2111.11418 (2021), https:// arxiv.org/abs/2111.11418
-
[34]
IEEE Transactions on Pattern Analysis and Machine Intelli- gence p
Yu,W.,Si,C.,Zhou,P.,Luo,M.,Zhou,Y.,Feng,J.,Yan,S.,Wang,X.:Metaformer baselines for vision. IEEE Transactions on Pattern Analysis and Machine Intelli- gence p. 1–17 (2023).https://doi.org/10.1109/tpami.2023.3329173 , http: //dx.doi.org/10.1109/TPAMI.2023.3329173
-
[35]
Yuan, L., Song, J., Fan, Y.: Fm-unet: Biomedical image segmentation based on feedback mechanism unet. Mathematical Biosciences and Engineering20(7), 12039–12055 (2023).https://doi.org/10.3934/mbe.2023535 , https://www. aimspress.com/article/doi/10.3934/mbe.2023535
-
[36]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2881–2890 (2017)
work page 2017
-
[37]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6881–6890 (2021)
work page 2021
-
[38]
In: Proceedings of the IEEE/CVF international conference on computer vision
Zong, Z., Song, G., Liu, Y.: Detrs with collaborative hybrid assignments training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6748–6758 (2023)
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.