pith. sign in

arxiv: 2605.22290 · v1 · pith:3CI33JWWnew · submitted 2026-05-21 · 💻 cs.CV

Detection of Virus and Small Cell Patches in Foci Images Using Switchable Convolution and Feature Pyramid Networks

Pith reviewed 2026-05-22 07:01 UTC · model grok-4.3

classification 💻 cs.CV
keywords virus patch detectionfoci imagesFFU imagesYOLOv2Feature Pyramid Networkswitchable atrous convolutionbiomedical object detectionsmall cell patches
0
0 comments X

The pith

YOLOv2 enhanced with Feature Pyramid Network and switchable atrous convolution reaches 68% mAP on virus patch detection in foci images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an enhanced YOLOv2 detector that adds a Feature Pyramid Network for better multi-scale feature handling and a switchable atrous convolution mechanism to adjust receptive fields for targets of varying size. This addresses the challenge of detecting virus and small cell patches in biomedical foci images where objects differ in size, density, contrast, and shape. The model reports 40.5% mAP for small cell patches and 68% mAP for FFU virus patches at a 25% IoU threshold. These results suggest the additions make the base detector more suitable for specialized microscopy tasks.

Core claim

The central claim is that integrating FPN-based feature fusion with switchable convolution improves the suitability of YOLOv2 for biomedical object detection in foci images, demonstrated by the reported mAP scores of 40.5% on small cell patch detection and 68% on FFU virus patch detection.

What carries the argument

Feature Pyramid Network for multi-scale feature representation combined with switchable atrous convolution to adapt receptive field size for fine-grained targets.

If this is right

  • The detector can more accurately count virus patches to quantify infection levels in FFU images.
  • Multi-scale feature fusion helps maintain performance when patches appear at different resolutions in dense microscopy scenes.
  • Switchable convolution allows the network to better capture both small isolated targets and clustered ones without manual tuning of dilation rates.
  • The approach shows that standard single-stage detectors can be adapted for biomedical tasks with modest architectural additions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same FPN-plus-switchable-convolution pattern could transfer to other variable-scale object detection problems in medical imaging such as cell nuclei or lesion detection.
  • Testing the model on time-lapse foci sequences might reveal whether the receptive-field adaptation also improves tracking across frames.
  • Replacing the YOLOv2 backbone with a more recent single-stage detector while keeping the FPN and switchable modules could isolate how much of the gain comes from each component.

Load-bearing premise

The variations in size, density, contrast, and shape of targets in the biomedical foci image datasets are representative enough for the FPN and switchable convolution enhancements to provide consistent improvements without overfitting to the specific test sets used.

What would settle it

Running the model on a new collection of foci images that contain target size and contrast distributions outside those in the original training and test sets and measuring whether mAP falls below 40.5% for cells or 68% for viruses.

Figures

Figures reproduced from arXiv: 2605.22290 by Amrita Singh, Snehasis Mukherjee.

Figure 1
Figure 1. Figure 1: Modified YOLOv2 architecture. One max-pooling layer is removed from Module [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Switchable Atrous Convolution (SAC) block used in the proposed model. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overall architecture of the proposed model for small patch detection. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Precision–recall behavior of YOLOv2_FPN_SAC for virus patch detection. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative virus patch detection results obtained using the proposed [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Accurate detection and counting of virus patches in focus-forming unit (FFU) images, also known as foci images, are important for quantifying viral infection and analyzing cellular structures. This task is challenging because biomedical targets often vary substantially in size, density, contrast, and shape. In this paper, we propose an enhanced YOLOv2-based detector that integrates a Feature Pyramid Network (FPN) to improve multi-scale feature representation. We also incorporate a switchable atrous convolution mechanism to adapt the receptive field for fine-grained targets in dense microscopy images. The proposed method is evaluated on biomedical foci image datasets for virus patch and small cell patch detection. For small cell patch detection, the model achieves a mean average precision (mAP) of 40.5% at a 25% Intersection over Union (IoU) threshold. For FFU virus patch detection, the model achieves an mAP of 68%. These results indicate that combining FPN-based feature fusion with switchable convolution improves the suitability of YOLOv2 for specialized biomedical object detection tasks

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes an enhanced YOLOv2 detector that integrates a Feature Pyramid Network (FPN) for multi-scale feature representation and a switchable atrous convolution mechanism to adapt receptive fields, targeting the challenges of varying size, density, contrast, and shape in biomedical foci images. It reports mAP of 40.5% at 25% IoU for small cell patch detection and 68% for FFU virus patch detection, concluding that these additions improve YOLOv2's suitability for such specialized detection tasks.

Significance. If validated with proper controls, the approach could offer a practical adaptation of established detectors for biomedical microscopy, where multi-scale and fine-grained targets are common; the explicit focus on switchable convolutions and FPN fusion provides a concrete architectural recipe that might generalize to other dense imaging domains.

major comments (2)
  1. [Abstract and evaluation] Abstract and evaluation: absolute mAP figures (40.5% at 25% IoU for small cells; 68% for virus patches) are supplied without any baseline runs of unmodified YOLOv2, ablation tables isolating FPN versus switchable convolution, or statistical significance tests on the same datasets, so the central claim that the enhancements 'improve the suitability of YOLOv2' cannot be isolated from dataset-specific effects or implementation choices.
  2. [Evaluation] Evaluation section: no details are given on dataset size, train/test splits, annotation protocol, or class imbalance handling, which are load-bearing for interpreting whether the reported mAPs reflect genuine generalization rather than overfitting to the particular foci-image collections.
minor comments (2)
  1. [Abstract] The IoU threshold of 25% for small-cell mAP is unusually low compared with standard COCO or PASCAL VOC protocols; a brief justification or additional results at 50% IoU would improve clarity.
  2. [Methods] Notation for the switchable atrous convolution (e.g., how the dilation rates are selected or switched) is introduced without an accompanying equation or diagram in the methods description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and constructive suggestions. We address each major comment below and will revise the manuscript to incorporate the requested information and experiments.

read point-by-point responses
  1. Referee: [Abstract and evaluation] Abstract and evaluation: absolute mAP figures (40.5% at 25% IoU for small cells; 68% for virus patches) are supplied without any baseline runs of unmodified YOLOv2, ablation tables isolating FPN versus switchable convolution, or statistical significance tests on the same datasets, so the central claim that the enhancements 'improve the suitability of YOLOv2' cannot be isolated from dataset-specific effects or implementation choices.

    Authors: We agree that baseline comparisons, ablation studies, and statistical tests are necessary to isolate the contributions of our modifications. In the revised manuscript we will report results from the unmodified YOLOv2 on the identical datasets, provide ablation tables that separately evaluate the addition of FPN and the switchable atrous convolution, and include appropriate statistical significance tests. revision: yes

  2. Referee: [Evaluation] Evaluation section: no details are given on dataset size, train/test splits, annotation protocol, or class imbalance handling, which are load-bearing for interpreting whether the reported mAPs reflect genuine generalization rather than overfitting to the particular foci-image collections.

    Authors: We acknowledge that these details are essential for reproducibility and for assessing generalization. The revised manuscript will add a dedicated subsection describing the dataset size, the train/test split procedure, the annotation protocol, and the methods used to address class imbalance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance metrics on external datasets

full rationale

The paper describes an application of existing components (YOLOv2 base, FPN for multi-scale features, switchable atrous convolution for receptive-field adaptation) to foci-image detection. Reported mAP values (40.5% at 25% IoU for small-cell patches, 68% for FFU virus patches) are direct empirical measurements on held-out test data. No derivation chain, fitted-parameter prediction, self-definitional equation, or load-bearing self-citation is present in the provided text. The suitability claim rests on observed numbers rather than any quantity defined in terms of itself.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim depends on the standard assumptions of deep learning optimization and the specific architectural choices being beneficial for the described image characteristics.

free parameters (1)
  • Training hyperparameters for the enhanced YOLOv2 model
    Deep learning models typically involve many fitted hyperparameters such as learning rate and batch size, though not specified in the abstract.
axioms (1)
  • domain assumption Switchable atrous convolution effectively adapts receptive fields for fine-grained targets in dense microscopy images.
    This is assumed in the proposal to justify the mechanism for handling variations in biomedical targets.

pith-pipeline@v0.9.0 · 5719 in / 1470 out tokens · 73720 ms · 2026-05-22T07:01:57.915572+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    X. Wang, A. Wang, J. Yi, Y. Song, A. Chehri, Small object detection based on deep learning for remote sensing: A comprehensive review, Remote Sensing 15 (13) (2023) 3265

  2. [2]

    B. Hu, Y. Liu, P. Chu, M. Tong, Q. Kong, Small object detection via pixel level balancing with applications to blood cell detection, Frontiers in Physiology 13 (2022) 911297

  3. [3]

    S. Zhou, H. Zhou, L. Qian, A multi-scale small object detection algo- rithm sma-yolo for uav remote sensing images, Scientific Reports 15 (1) (2025) 9255

  4. [4]

    Q. Feng, X. Xu, Z. Wang, Deep learning-based small object detection: A survey, Mathematical Biosciences and Engineering 20 (4) (2023) 6551– 6590

  5. [5]

    S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural in- formation processing systems 28 (2015)

  6. [6]

    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37. 10

  7. [7]

    Redmon, S

    J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE confer- ence on computer vision and pattern recognition, 2016, pp. 779–788

  8. [8]

    Redmon, A

    J. Redmon, A. Farhadi, Yolo9000: better, faster, stronger, in: Proceed- ings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7263–7271

  9. [9]

    J. Li, Improving the application of yolov8 in image object detection, in: 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), IEEE, 2024, pp. 668–673

  10. [10]

    T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125

  11. [11]

    Singh, A

    A. Singh, A. Kumar, S. Mukherjee, N. S. Veerapu, Automatic detection of virus infection patterns in foci images using switchable convolutions, IEEE Access (2024)

  12. [12]

    R.Girshick, F.Iandola, T.Darrell, J.Malik, Deformablepartmodelsare convolutional neural networks, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015, pp. 437–446

  13. [13]

    Redmon, S

    J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition (CVPR), 2016

  14. [14]

    Sezgin, B

    M. Sezgin, B. Sankur, Survey over image thresholding techniques and quantitative performance evaluation, Journal of Electronic Imaging 13 (1) (2004) 146–165

  15. [15]

    Ronneberger, P

    O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international con- ference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, Springer, 2015, pp. 234–241. 11

  16. [16]

    D. Shen, G. Wu, H.-I. Suk, Deep learning in medical image analysis: A survey, IEEE Transactions on Biomedical Engineering 64 (7) (2017) 1453–1464.doi:10.1109/TBME.2017.2690428

  17. [17]

    Singh, S

    D. Singh, S. Soni, S. Khan, A. N. Sarangi, R. M. Yennamalli, R. Ag- garwal, N. S. Veerapu, Genome-wide mutagenesis of hepatitis c virus reveals ability of genome to overcome detrimental mutations, Journal of Virology 94 (3) (2020). 12