Accuracy Improvement of Cell Image Segmentation Using Feedback Former

Hinako Mitsuoka; Kazuhiro Hotta

arxiv: 2408.12974 · v4 · submitted 2024-08-23 · 💻 cs.CV

Accuracy Improvement of Cell Image Segmentation Using Feedback Former

Hinako Mitsuoka , Kazuhiro Hotta This is my paper

Pith reviewed 2026-05-23 21:48 UTC · model grok-4.3

classification 💻 cs.CV

keywords cell image segmentationsemantic segmentationTransformer encoderfeedback processingmicroscopy imagesdeep learningaccuracy improvement

0 comments

The pith

Feedback Former adds a loop that sends detailed features from near the output back to lower Transformer layers, raising segmentation accuracy on cell images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that standard Transformers, while strong on context, fall short on the fine local details needed for accurate semantic segmentation of microscopy cell images. The proposed fix is a feedback path that routes feature maps carrying those details from layers close to the final output back down to earlier layers inside the Transformer encoder. Experiments on three cell-image datasets indicate this loop produces higher accuracy than plain Transformer baselines, at lower computational cost than earlier feedback designs, and without any need to enlarge the encoder itself.

Core claim

Feedback Former is a Transformer-encoder segmentation model that inserts a feedback connection carrying detailed feature maps from near the output back to lower layers; the connection is intended to offset the Transformer's relative weakness on local detail and thereby improve boundary precision in cell-image segmentation.

What carries the argument

Feedback Former architecture: a Transformer encoder augmented with a feedback route that returns high-resolution feature maps from near the decoder input to earlier encoder stages.

If this is right

Segmentation accuracy rises on the tested cell datasets while total compute stays below that of prior feedback methods.
The same accuracy level is reached without enlarging the Transformer encoder size.
The feedback route supplies the detail that standard self-attention layers tend to under-emphasize.
The architecture remains compatible with existing Transformer backbones used as encoders.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same feedback pattern could be tested on non-cell medical images where boundary precision is also critical.
If the feedback cost stays low, it might allow smaller encoders to match the performance of larger ones on detail-sensitive tasks.
The approach separates the benefit of feedback from the cost of simply scaling model width or depth.

Load-bearing premise

That the missing local detail in Transformer features can be reliably restored simply by routing maps from near the output back to lower layers.

What would settle it

A controlled experiment on the same three cell datasets in which Feedback Former shows no accuracy gain over the no-feedback Transformer baseline or requires more FLOPs than the compared feedback methods.

Figures

Figures reproduced from arXiv: 2408.12974 by Hinako Mitsuoka, Kazuhiro Hotta.

**Figure 1.** Figure 1: The overview of the architecture of the Feedback Former. for loss computation and training. These feedback approaches can improve accuracy by allowing the model to focus on specific areas based on previous results. Methods that combine Transformers and feedback processing exist for other tasks, but not yet in segmentation task. Thus, we introduce feedback processing influenced by the human brain and the … view at source ↗

**Figure 2.** Figure 2: Lite Feedback Module 3 Proposed Method 3.1 Feedback Former [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Qualitative results. The first and second rows are the results on Drosophila, the third row is the results on ISBI2012, and the bottom row is the result on iRPE dataset. (a) Input image, (b) Ground truth, (c) AttentionFormer, (d) Feedback Attention(ST) [25], (e) Feedback Attention(Self) [25], (f) Feedback Former [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Semantic segmentation of microscopy cell images by deep learning is a significant technique. We considered that the Transformers, which have recently outperformed CNNs in image recognition, could also be improved and developed for cell image segmentation. Transformers tend to focus more on contextual information than on detailed information. This tendency leads to a lack of detailed information for segmentation. Therefore, to supplement or reinforce the missing detailed information, we hypothesized that feedback processing in the human visual cortex should be effective. Our proposed Feedback Former is a novel architecture for semantic segmentation, in which Transformers is used as an encoder and has a feedback processing mechanism. Feature maps with detailed information are fed back to the lower layers from near the output of the model to compensate for the lack of detailed information which is the weakness of Transformers and improve the segmentation accuracy. By experiments on three cell image datasets, we confirmed that our method surpasses methods without feedback, demonstrating its superior accuracy in cell image segmentation. Our method achieved higher segmentation accuracy while consuming less computational cost than conventional feedback approaches. Moreover, our method offered superior precision without simply increasing the model size of Transformer encoder, demonstrating higher accuracy with lower computational cost.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

Feedback Former adds a feedback loop from near-output features back into lower transformer layers for cell segmentation and reports accuracy gains with lower compute than prior feedback methods on three datasets. The design is a direct response to the known transformer bias toward context over local detail, and the authors test it specifically on cell microscopy images rather than general scenes. What stands out is the concrete routing of detailed feature maps inside the encoder itself, plus the claim that this beats both no-feedback baselines and earlier feedback approaches without simply enlarging the model. That combination of accuracy and efficiency is the practical takeaway for anyone running segmentation on limited hardware. The comparisons to non-feedback and conventional feedback versions are the main evidence offered, and the abstract frames the results as consistent across the three datasets. The work is honest about its scope: it stays inside the medical-image niche and does not claim broader architectural breakthroughs. The soft spots are the missing numbers. No dice, error bars, ablation tables, or cost breakdowns appear in the abstract, so the size of the gains and the exact source of the compute savings remain unclear. The premise that transformers inherently under-emphasize detail is presented as background rather than measured in this setting. If the full paper supplies controlled ablations and reproducible metrics, the central empirical claim becomes easier to evaluate; without them the support stays moderate. This paper is for people already using transformers in cell-analysis pipelines who want a lightweight way to recover detail. A reader outside biomedical segmentation will find little to take away. It deserves a serious referee because the architecture is simple to implement and the application claim is testable, even if the gains turn out to be small once the numbers are examined.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Feedback Former, a Transformer-encoder architecture augmented with a feedback mechanism that routes detailed feature maps from near the output back to lower layers to address Transformers' bias toward contextual over local information in cell-image semantic segmentation. Experiments on three cell-image datasets report higher accuracy than non-feedback baselines and prior feedback methods, together with lower computational cost and without simply enlarging the encoder.

Significance. If the controlled comparisons hold, the work supplies empirical evidence that targeted feedback connections can improve Transformer segmentation accuracy for microscopy images while remaining more efficient than earlier feedback designs. The reported ablations and cost measurements constitute a concrete strength that supports the efficiency claim.

major comments (2)

[§4.2, Table 2] §4.2 and Table 2: the headline claim of higher accuracy at lower computational cost rests on the reported Dice/IoU gains and FLOPs/parameter counts; without error bars, standard deviations across runs, or statistical tests, the practical significance of the 1–3 % improvements cannot be evaluated.
[§3.2, Eq. (3)–(5)] §3.2, Eq. (3)–(5): the feedback connection is defined by learned weights; the manuscript must therefore tabulate the exact added parameter count and show that the total remains smaller than the conventional feedback baselines it claims to outperform.

minor comments (2)

[Abstract] The three datasets are never named in the abstract or introduction; explicit dataset identifiers (e.g., DSB2018, etc.) would aid reproducibility.
[Figure 2] Figure 2: the diagram of the feedback paths would be clearer if the exact layer indices receiving the routed features were labeled.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive feedback. We address each major comment below and will revise the manuscript to incorporate the suggested improvements.

read point-by-point responses

Referee: [§4.2, Table 2] §4.2 and Table 2: the headline claim of higher accuracy at lower computational cost rests on the reported Dice/IoU gains and FLOPs/parameter counts; without error bars, standard deviations across runs, or statistical tests, the practical significance of the 1–3 % improvements cannot be evaluated.

Authors: We acknowledge the absence of error bars or statistical tests in the reported results. The 1–3% gains are observed consistently across three independent cell-image datasets with different characteristics, which lends support to the robustness of the improvements. The current experiments used single runs per model configuration. In the revision we will add an explicit discussion of this point and, where additional runs can be completed within the revision period, report standard deviations to allow better assessment of practical significance. revision: partial
Referee: [§3.2, Eq. (3)–(5)] §3.2, Eq. (3)–(5): the feedback connection is defined by learned weights; the manuscript must therefore tabulate the exact added parameter count and show that the total remains smaller than the conventional feedback baselines it claims to outperform.

Authors: We agree that an explicit accounting of the parameters introduced by the learned feedback weights in Equations (3)–(5) is necessary. In the revised manuscript we will insert a table (or expanded section) that isolates the parameter overhead of the feedback connections and directly compares the overall parameter count and FLOPs of Feedback Former against the conventional feedback baselines, thereby confirming the efficiency advantage. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper advances an empirical architecture (Feedback Former) whose central claims rest on controlled experiments across three cell-image datasets, direct accuracy and compute comparisons to no-feedback and prior-feedback baselines, and ablations that do not reduce to fitted parameters or self-referential definitions. No equations, uniqueness theorems, or predictions are presented; the motivating premise about Transformer detail loss is treated as background and externally tested rather than assumed. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on standard deep-learning training assumptions plus one domain assumption about transformer weaknesses; no new physical entities or ad-hoc constants are introduced beyond ordinary network weights.

free parameters (1)

feedback connection weights
Learnable parameters in the feedback paths are fitted during training on the cell datasets.

axioms (1)

domain assumption Transformers focus more on contextual than detailed information
Invoked in the abstract to motivate the feedback mechanism.

pith-pipeline@v0.9.0 · 5725 in / 1216 out tokens · 43727 ms · 2026-05-23T21:48:18.173366+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 6 internal anchors

[1]

Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation

Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmen- tation. arXiv preprint arXiv:1802.06955 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Microscopy Cell Segmentation via Convolutional LSTM Networks

Arbelle, A., Raviv, T.R.: Microscopy cell segmentation via convolutional LSTM networks. CoRR abs/1805.11247 (2018), http://arxiv.org/abs/1805.11247

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

IEEE transactions on pat- tern analysis and machine intelligence39(12), 2481–2495 (2017)

Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pat- tern analysis and machine intelligence39(12), 2481–2495 (2017)

work page 2017
[4]

IEEE transactions on pattern analysis and machine intelli- gence 40(4), 834–848 (2017)

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Se- mantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelli- gence 40(4), 834–848 (2017)

work page 2017
[5]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1290–1299 (2022)

work page 2022
[6]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. CoRR abs/2010.11929 (2020), https://arxiv.org/abs/2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2010
[7]

Nature methods18(9), 1038–1045 (2021)

Edlund, C., Jackson, T.R., Khalid, N., Bevan, N., Dale, T., Dengel, A., Ahmed, S., Trygg, J., Sjögren, R.: Livecell—a large-scale dataset for label-free live cell segmentation. Nature methods18(9), 1038–1045 (2021)

work page 2021
[8]

Cerebral cortex (New York, NY: 1991)1(1), 1–47 (1991)

Felleman, D.J., Van Essen, D.C.: Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, NY: 1991)1(1), 1–47 (1991)

work page 1991
[9]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Feng, M., Lu, H., Ding, E.: Attentive feedback network for boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1623–1632 (2019)

work page 2019
[10]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

Fujii, H., Tanaka, H., Ikeuchi, M., Hotta, K.: X-net with different loss functions for cell image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 3793–3800 (June 2021)

work page 2021
[11]

CoRR abs/1811.12231 (2018), http://arxiv.org/ abs/1811.12231

Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. CoRR abs/1811.12231 (2018), http://arxiv.org/ abs/1811.12231

work page arXiv 2018
[12]

figshare (11 2013).https://doi.org/10.6084/m9

Gerhard, S., Funke, J., Martel, J., Cardona, A., Fetter, R.: Segmented anisotropic ssTEM dataset of neural tissue. figshare (11 2013).https://doi.org/10.6084/m9. figshare.856713.v1 , https://figshare.com/articles/dataset/Segmented_ anisotropic_ssTEM_dataset_of_neural_tissue/856713

work page doi:10.6084/m9 2013
[13]

IEEE Transactions on Medical Imaging40(6), 1542–1554 (2021) 10 H

Girum, K.B., Créhange, G., Lalande, A.: Learning with context feedback loop for robust medical image segmentation. IEEE Transactions on Medical Imaging40(6), 1542–1554 (2021) 10 H. Mitsuoka et al

work page 2021
[14]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

work page 2016
[15]

Segmentation of neuronal structures in em stacks challenge.https://imagej.net/ events/isbi-2012-segmentation-challenge (2012)

work page 2012
[16]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Kirillov,A., Girshick,R., He,K.,Dollár,P.:Panopticfeaturepyramid networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 6399–6408 (2019)

work page 2019
[17]

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer:Hierarchicalvisiontransformerusingshiftedwindows.In:Proceedings of the IEEE/CVF international conference on computer vision. pp. 10012–10022 (2021)

work page 2021
[18]

SGDR: Stochastic Gradient Descent with Warm Restarts

Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[19]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops

Majurski,M.,Manescu,P.,Padi,S.,Schaub,N.,Hotaling,N.,SimonJr,C.,Bajcsy, P.: Cell image segmentation using generative adversarial networks, transfer learn- ing, and augmentations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 0–0 (2019)

work page 2019
[20]

Markram, H.: A network of tufted layer 5 pyramidal neurons. Cereb. Cortex7(6), 523–533 (Sep 1997)

work page 1997
[21]

V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

Milletari, F., Navab, N., Ahmadi, S.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. CoRRabs/1606.04797(2016), http: //arxiv.org/abs/1606.04797

work page internal anchor Pith review Pith/arXiv arXiv 2016
[22]

In: International symposium on visual computing

Rahman, M.A., Wang, Y.: Optimizing intersection-over-union in deep neural net- works for image segmentation. In: International symposium on visual computing. pp. 234–244. Springer (2016)

work page 2016
[23]

Ro, T., Breitmeyer, B., Burton, P., Singhal, N.S., Lane, D.: Feedback contributions to visual awareness in human occipital cortex. Curr. Biol.13(12), 1038–1041 (Jun 2003)

work page 2003
[24]

U-Net: Convolutional Networks for Biomedical Image Segmentation

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. CoRRabs/1505.04597 (2015), http://arxiv.org/abs/ 1505.04597

work page internal anchor Pith review Pith/arXiv arXiv 2015
[25]

In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16

Tsuda, H., Shibuya, E., Hotta, K.: Feedback attention for cell image segmentation. In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. pp. 365–379. Springer (2020)

work page 2020
[26]

Tuli, S., Dasgupta, I., Grant, E., Griffiths, T.L.: Are convolutional neural networks or transformers more like human vision? CoRRabs/2105.07197 (2021), https: //arxiv.org/abs/2105.07197

work page arXiv 2021
[27]

arXiv preprint arXiv:2307.09283 (2023)

Wang, A., Chen, H., Lin, Z., Pu, H., Ding, G.: Repvit: Revisiting mobile cnn from vit perspective. arXiv preprint arXiv:2307.09283 (2023)

work page arXiv 2023
[28]

IEEE transactions on pattern analysis and machine intelligence 43(10), 3349–3364 (2020)

Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recog- nition. IEEE transactions on pattern analysis and machine intelligence 43(10), 3349–3364 (2020)

work page 2020
[29]

CoRR abs/2102.12122 (2021), https://arxiv.org/abs/2102.12122

Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: A versatile backbone for dense prediction without con- volutions. CoRR abs/2102.12122 (2021), https://arxiv.org/abs/2102.12122

work page arXiv 2021
[30]

Widrow, B., Lehr, M.A.: Perceptrons, adalines, and backpropagation, p. 719–724. MIT Press, Cambridge, MA, USA (1998) Feedback Former 11

work page 1998
[31]

Advances in Neural Information Processing Systems34, 12077–12090 (2021)

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems34, 12077–12090 (2021)

work page 2021
[32]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Xu, J., Xiong, Z., Bhattacharyya, S.P.: Pidnet: A real-time semantic segmentation network inspired by pid controllers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19529–19539 (2023)

work page 2023
[33]

CoRRabs/2111.11418 (2021), https:// arxiv.org/abs/2111.11418

Yu,W.,Luo,M.,Zhou,P.,Si,C.,Zhou,Y.,Wang,X.,Feng,J.,Yan,S.:Metaformer is actually what you need for vision. CoRRabs/2111.11418 (2021), https:// arxiv.org/abs/2111.11418

work page arXiv 2021
[34]

IEEE Transactions on Pattern Analysis and Machine Intelli- gence p

Yu,W.,Si,C.,Zhou,P.,Luo,M.,Zhou,Y.,Feng,J.,Yan,S.,Wang,X.:Metaformer baselines for vision. IEEE Transactions on Pattern Analysis and Machine Intelli- gence p. 1–17 (2023).https://doi.org/10.1109/tpami.2023.3329173 , http: //dx.doi.org/10.1109/TPAMI.2023.3329173

work page doi:10.1109/tpami.2023.3329173 2023
[35]

Mathematical Biosciences and Engineering20(7), 12039–12055 (2023).https://doi.org/10.3934/mbe.2023535 , https://www

Yuan, L., Song, J., Fan, Y.: Fm-unet: Biomedical image segmentation based on feedback mechanism unet. Mathematical Biosciences and Engineering20(7), 12039–12055 (2023).https://doi.org/10.3934/mbe.2023535 , https://www. aimspress.com/article/doi/10.3934/mbe.2023535

work page doi:10.3934/mbe.2023535 2023
[36]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2881–2890 (2017)

work page 2017
[37]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6881–6890 (2021)

work page 2021
[38]

In: Proceedings of the IEEE/CVF international conference on computer vision

Zong, Z., Song, G., Liu, Y.: Detrs with collaborative hybrid assignments training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6748–6758 (2023)

work page 2023

[1] [1]

Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation

Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmen- tation. arXiv preprint arXiv:1802.06955 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Microscopy Cell Segmentation via Convolutional LSTM Networks

Arbelle, A., Raviv, T.R.: Microscopy cell segmentation via convolutional LSTM networks. CoRR abs/1805.11247 (2018), http://arxiv.org/abs/1805.11247

work page internal anchor Pith review Pith/arXiv arXiv 2018

[3] [3]

IEEE transactions on pat- tern analysis and machine intelligence39(12), 2481–2495 (2017)

Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pat- tern analysis and machine intelligence39(12), 2481–2495 (2017)

work page 2017

[4] [4]

IEEE transactions on pattern analysis and machine intelli- gence 40(4), 834–848 (2017)

Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Se- mantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelli- gence 40(4), 834–848 (2017)

work page 2017

[5] [5]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1290–1299 (2022)

work page 2022

[6] [6]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. CoRR abs/2010.11929 (2020), https://arxiv.org/abs/2010.11929

work page internal anchor Pith review Pith/arXiv arXiv 2010

[7] [7]

Nature methods18(9), 1038–1045 (2021)

Edlund, C., Jackson, T.R., Khalid, N., Bevan, N., Dale, T., Dengel, A., Ahmed, S., Trygg, J., Sjögren, R.: Livecell—a large-scale dataset for label-free live cell segmentation. Nature methods18(9), 1038–1045 (2021)

work page 2021

[8] [8]

Cerebral cortex (New York, NY: 1991)1(1), 1–47 (1991)

Felleman, D.J., Van Essen, D.C.: Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, NY: 1991)1(1), 1–47 (1991)

work page 1991

[9] [9]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Feng, M., Lu, H., Ding, E.: Attentive feedback network for boundary-aware salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1623–1632 (2019)

work page 2019

[10] [10]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

Fujii, H., Tanaka, H., Ikeuchi, M., Hotta, K.: X-net with different loss functions for cell image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 3793–3800 (June 2021)

work page 2021

[11] [11]

CoRR abs/1811.12231 (2018), http://arxiv.org/ abs/1811.12231

Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. CoRR abs/1811.12231 (2018), http://arxiv.org/ abs/1811.12231

work page arXiv 2018

[12] [12]

figshare (11 2013).https://doi.org/10.6084/m9

Gerhard, S., Funke, J., Martel, J., Cardona, A., Fetter, R.: Segmented anisotropic ssTEM dataset of neural tissue. figshare (11 2013).https://doi.org/10.6084/m9. figshare.856713.v1 , https://figshare.com/articles/dataset/Segmented_ anisotropic_ssTEM_dataset_of_neural_tissue/856713

work page doi:10.6084/m9 2013

[13] [13]

IEEE Transactions on Medical Imaging40(6), 1542–1554 (2021) 10 H

Girum, K.B., Créhange, G., Lalande, A.: Learning with context feedback loop for robust medical image segmentation. IEEE Transactions on Medical Imaging40(6), 1542–1554 (2021) 10 H. Mitsuoka et al

work page 2021

[14] [14]

He,K.,Zhang,X.,Ren,S.,Sun,J.:Deepresiduallearningforimagerecognition.In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

work page 2016

[15] [15]

Segmentation of neuronal structures in em stacks challenge.https://imagej.net/ events/isbi-2012-segmentation-challenge (2012)

work page 2012

[16] [16]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

Kirillov,A., Girshick,R., He,K.,Dollár,P.:Panopticfeaturepyramid networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 6399–6408 (2019)

work page 2019

[17] [17]

Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer:Hierarchicalvisiontransformerusingshiftedwindows.In:Proceedings of the IEEE/CVF international conference on computer vision. pp. 10012–10022 (2021)

work page 2021

[18] [18]

SGDR: Stochastic Gradient Descent with Warm Restarts

Loshchilov, I., Hutter, F.: Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[19] [19]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops

Majurski,M.,Manescu,P.,Padi,S.,Schaub,N.,Hotaling,N.,SimonJr,C.,Bajcsy, P.: Cell image segmentation using generative adversarial networks, transfer learn- ing, and augmentations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 0–0 (2019)

work page 2019

[20] [20]

Markram, H.: A network of tufted layer 5 pyramidal neurons. Cereb. Cortex7(6), 523–533 (Sep 1997)

work page 1997

[21] [21]

V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

Milletari, F., Navab, N., Ahmadi, S.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. CoRRabs/1606.04797(2016), http: //arxiv.org/abs/1606.04797

work page internal anchor Pith review Pith/arXiv arXiv 2016

[22] [22]

In: International symposium on visual computing

Rahman, M.A., Wang, Y.: Optimizing intersection-over-union in deep neural net- works for image segmentation. In: International symposium on visual computing. pp. 234–244. Springer (2016)

work page 2016

[23] [23]

Ro, T., Breitmeyer, B., Burton, P., Singhal, N.S., Lane, D.: Feedback contributions to visual awareness in human occipital cortex. Curr. Biol.13(12), 1038–1041 (Jun 2003)

work page 2003

[24] [24]

U-Net: Convolutional Networks for Biomedical Image Segmentation

Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedi- cal image segmentation. CoRRabs/1505.04597 (2015), http://arxiv.org/abs/ 1505.04597

work page internal anchor Pith review Pith/arXiv arXiv 2015

[25] [25]

In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16

Tsuda, H., Shibuya, E., Hotta, K.: Feedback attention for cell image segmentation. In: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16. pp. 365–379. Springer (2020)

work page 2020

[26] [26]

Tuli, S., Dasgupta, I., Grant, E., Griffiths, T.L.: Are convolutional neural networks or transformers more like human vision? CoRRabs/2105.07197 (2021), https: //arxiv.org/abs/2105.07197

work page arXiv 2021

[27] [27]

arXiv preprint arXiv:2307.09283 (2023)

Wang, A., Chen, H., Lin, Z., Pu, H., Ding, G.: Repvit: Revisiting mobile cnn from vit perspective. arXiv preprint arXiv:2307.09283 (2023)

work page arXiv 2023

[28] [28]

IEEE transactions on pattern analysis and machine intelligence 43(10), 3349–3364 (2020)

Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al.: Deep high-resolution representation learning for visual recog- nition. IEEE transactions on pattern analysis and machine intelligence 43(10), 3349–3364 (2020)

work page 2020

[29] [29]

CoRR abs/2102.12122 (2021), https://arxiv.org/abs/2102.12122

Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: A versatile backbone for dense prediction without con- volutions. CoRR abs/2102.12122 (2021), https://arxiv.org/abs/2102.12122

work page arXiv 2021

[30] [30]

Widrow, B., Lehr, M.A.: Perceptrons, adalines, and backpropagation, p. 719–724. MIT Press, Cambridge, MA, USA (1998) Feedback Former 11

work page 1998

[31] [31]

Advances in Neural Information Processing Systems34, 12077–12090 (2021)

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems34, 12077–12090 (2021)

work page 2021

[32] [32]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Xu, J., Xiong, Z., Bhattacharyya, S.P.: Pidnet: A real-time semantic segmentation network inspired by pid controllers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19529–19539 (2023)

work page 2023

[33] [33]

CoRRabs/2111.11418 (2021), https:// arxiv.org/abs/2111.11418

Yu,W.,Luo,M.,Zhou,P.,Si,C.,Zhou,Y.,Wang,X.,Feng,J.,Yan,S.:Metaformer is actually what you need for vision. CoRRabs/2111.11418 (2021), https:// arxiv.org/abs/2111.11418

work page arXiv 2021

[34] [34]

IEEE Transactions on Pattern Analysis and Machine Intelli- gence p

Yu,W.,Si,C.,Zhou,P.,Luo,M.,Zhou,Y.,Feng,J.,Yan,S.,Wang,X.:Metaformer baselines for vision. IEEE Transactions on Pattern Analysis and Machine Intelli- gence p. 1–17 (2023).https://doi.org/10.1109/tpami.2023.3329173 , http: //dx.doi.org/10.1109/TPAMI.2023.3329173

work page doi:10.1109/tpami.2023.3329173 2023

[35] [35]

Mathematical Biosciences and Engineering20(7), 12039–12055 (2023).https://doi.org/10.3934/mbe.2023535 , https://www

Yuan, L., Song, J., Fan, Y.: Fm-unet: Biomedical image segmentation based on feedback mechanism unet. Mathematical Biosciences and Engineering20(7), 12039–12055 (2023).https://doi.org/10.3934/mbe.2023535 , https://www. aimspress.com/article/doi/10.3934/mbe.2023535

work page doi:10.3934/mbe.2023535 2023

[36] [36]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2881–2890 (2017)

work page 2017

[37] [37]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., Torr, P.H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 6881–6890 (2021)

work page 2021

[38] [38]

In: Proceedings of the IEEE/CVF international conference on computer vision

Zong, Z., Song, G., Liu, Y.: Detrs with collaborative hybrid assignments training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6748–6758 (2023)

work page 2023