Detection of Virus and Small Cell Patches in Foci Images Using Switchable Convolution and Feature Pyramid Networks
Pith reviewed 2026-05-22 07:01 UTC · model grok-4.3
The pith
YOLOv2 enhanced with Feature Pyramid Network and switchable atrous convolution reaches 68% mAP on virus patch detection in foci images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that integrating FPN-based feature fusion with switchable convolution improves the suitability of YOLOv2 for biomedical object detection in foci images, demonstrated by the reported mAP scores of 40.5% on small cell patch detection and 68% on FFU virus patch detection.
What carries the argument
Feature Pyramid Network for multi-scale feature representation combined with switchable atrous convolution to adapt receptive field size for fine-grained targets.
If this is right
- The detector can more accurately count virus patches to quantify infection levels in FFU images.
- Multi-scale feature fusion helps maintain performance when patches appear at different resolutions in dense microscopy scenes.
- Switchable convolution allows the network to better capture both small isolated targets and clustered ones without manual tuning of dilation rates.
- The approach shows that standard single-stage detectors can be adapted for biomedical tasks with modest architectural additions.
Where Pith is reading between the lines
- The same FPN-plus-switchable-convolution pattern could transfer to other variable-scale object detection problems in medical imaging such as cell nuclei or lesion detection.
- Testing the model on time-lapse foci sequences might reveal whether the receptive-field adaptation also improves tracking across frames.
- Replacing the YOLOv2 backbone with a more recent single-stage detector while keeping the FPN and switchable modules could isolate how much of the gain comes from each component.
Load-bearing premise
The variations in size, density, contrast, and shape of targets in the biomedical foci image datasets are representative enough for the FPN and switchable convolution enhancements to provide consistent improvements without overfitting to the specific test sets used.
What would settle it
Running the model on a new collection of foci images that contain target size and contrast distributions outside those in the original training and test sets and measuring whether mAP falls below 40.5% for cells or 68% for viruses.
Figures
read the original abstract
Accurate detection and counting of virus patches in focus-forming unit (FFU) images, also known as foci images, are important for quantifying viral infection and analyzing cellular structures. This task is challenging because biomedical targets often vary substantially in size, density, contrast, and shape. In this paper, we propose an enhanced YOLOv2-based detector that integrates a Feature Pyramid Network (FPN) to improve multi-scale feature representation. We also incorporate a switchable atrous convolution mechanism to adapt the receptive field for fine-grained targets in dense microscopy images. The proposed method is evaluated on biomedical foci image datasets for virus patch and small cell patch detection. For small cell patch detection, the model achieves a mean average precision (mAP) of 40.5% at a 25% Intersection over Union (IoU) threshold. For FFU virus patch detection, the model achieves an mAP of 68%. These results indicate that combining FPN-based feature fusion with switchable convolution improves the suitability of YOLOv2 for specialized biomedical object detection tasks
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes an enhanced YOLOv2 detector that integrates a Feature Pyramid Network (FPN) for multi-scale feature representation and a switchable atrous convolution mechanism to adapt receptive fields, targeting the challenges of varying size, density, contrast, and shape in biomedical foci images. It reports mAP of 40.5% at 25% IoU for small cell patch detection and 68% for FFU virus patch detection, concluding that these additions improve YOLOv2's suitability for such specialized detection tasks.
Significance. If validated with proper controls, the approach could offer a practical adaptation of established detectors for biomedical microscopy, where multi-scale and fine-grained targets are common; the explicit focus on switchable convolutions and FPN fusion provides a concrete architectural recipe that might generalize to other dense imaging domains.
major comments (2)
- [Abstract and evaluation] Abstract and evaluation: absolute mAP figures (40.5% at 25% IoU for small cells; 68% for virus patches) are supplied without any baseline runs of unmodified YOLOv2, ablation tables isolating FPN versus switchable convolution, or statistical significance tests on the same datasets, so the central claim that the enhancements 'improve the suitability of YOLOv2' cannot be isolated from dataset-specific effects or implementation choices.
- [Evaluation] Evaluation section: no details are given on dataset size, train/test splits, annotation protocol, or class imbalance handling, which are load-bearing for interpreting whether the reported mAPs reflect genuine generalization rather than overfitting to the particular foci-image collections.
minor comments (2)
- [Abstract] The IoU threshold of 25% for small-cell mAP is unusually low compared with standard COCO or PASCAL VOC protocols; a brief justification or additional results at 50% IoU would improve clarity.
- [Methods] Notation for the switchable atrous convolution (e.g., how the dilation rates are selected or switched) is introduced without an accompanying equation or diagram in the methods description.
Simulated Author's Rebuttal
We thank the referee for their detailed review and constructive suggestions. We address each major comment below and will revise the manuscript to incorporate the requested information and experiments.
read point-by-point responses
-
Referee: [Abstract and evaluation] Abstract and evaluation: absolute mAP figures (40.5% at 25% IoU for small cells; 68% for virus patches) are supplied without any baseline runs of unmodified YOLOv2, ablation tables isolating FPN versus switchable convolution, or statistical significance tests on the same datasets, so the central claim that the enhancements 'improve the suitability of YOLOv2' cannot be isolated from dataset-specific effects or implementation choices.
Authors: We agree that baseline comparisons, ablation studies, and statistical tests are necessary to isolate the contributions of our modifications. In the revised manuscript we will report results from the unmodified YOLOv2 on the identical datasets, provide ablation tables that separately evaluate the addition of FPN and the switchable atrous convolution, and include appropriate statistical significance tests. revision: yes
-
Referee: [Evaluation] Evaluation section: no details are given on dataset size, train/test splits, annotation protocol, or class imbalance handling, which are load-bearing for interpreting whether the reported mAPs reflect genuine generalization rather than overfitting to the particular foci-image collections.
Authors: We acknowledge that these details are essential for reproducibility and for assessing generalization. The revised manuscript will add a dedicated subsection describing the dataset size, the train/test split procedure, the annotation protocol, and the methods used to address class imbalance. revision: yes
Circularity Check
No circularity: empirical performance metrics on external datasets
full rationale
The paper describes an application of existing components (YOLOv2 base, FPN for multi-scale features, switchable atrous convolution for receptive-field adaptation) to foci-image detection. Reported mAP values (40.5% at 25% IoU for small-cell patches, 68% for FFU virus patches) are direct empirical measurements on held-out test data. No derivation chain, fitted-parameter prediction, self-definitional equation, or load-bearing self-citation is present in the provided text. The suitability claim rests on observed numbers rather than any quantity defined in terms of itself.
Axiom & Free-Parameter Ledger
free parameters (1)
- Training hyperparameters for the enhanced YOLOv2 model
axioms (1)
- domain assumption Switchable atrous convolution effectively adapts receptive fields for fine-grained targets in dense microscopy images.
Reference graph
Works this paper leans on
-
[1]
X. Wang, A. Wang, J. Yi, Y. Song, A. Chehri, Small object detection based on deep learning for remote sensing: A comprehensive review, Remote Sensing 15 (13) (2023) 3265
work page 2023
-
[2]
B. Hu, Y. Liu, P. Chu, M. Tong, Q. Kong, Small object detection via pixel level balancing with applications to blood cell detection, Frontiers in Physiology 13 (2022) 911297
work page 2022
-
[3]
S. Zhou, H. Zhou, L. Qian, A multi-scale small object detection algo- rithm sma-yolo for uav remote sensing images, Scientific Reports 15 (1) (2025) 9255
work page 2025
-
[4]
Q. Feng, X. Xu, Z. Wang, Deep learning-based small object detection: A survey, Mathematical Biosciences and Engineering 20 (4) (2023) 6551– 6590
work page 2023
-
[5]
S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural in- formation processing systems 28 (2015)
work page 2015
-
[6]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: European conference on computer vision, Springer, 2016, pp. 21–37. 10
work page 2016
- [7]
- [8]
-
[9]
J. Li, Improving the application of yolov8 in image object detection, in: 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), IEEE, 2024, pp. 668–673
work page 2024
-
[10]
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125
work page 2017
- [11]
-
[12]
R.Girshick, F.Iandola, T.Darrell, J.Malik, Deformablepartmodelsare convolutional neural networks, in: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015, pp. 437–446
work page 2015
- [13]
- [14]
-
[15]
O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international con- ference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, Springer, 2015, pp. 234–241. 11
work page 2015
-
[16]
D. Shen, G. Wu, H.-I. Suk, Deep learning in medical image analysis: A survey, IEEE Transactions on Biomedical Engineering 64 (7) (2017) 1453–1464.doi:10.1109/TBME.2017.2690428
- [17]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.