Weakly Supervised Attention-based Models Using Activation Maps for Citrus Mite and Insect Pest Classification
Pith reviewed 2026-05-24 13:00 UTC · model grok-4.3
The pith
A two-weighted activation mapping method in an attention-based two-stage network classifies tiny citrus mites and pests from class labels alone, beating prior weakly supervised approaches by at least 16 percentage points while also infering
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the Two-Weighted Activation Mapping produces class-specific feature-map scores that, when used to guide an attention-based multiple instance learning network, deliver both higher classification accuracy on tiny pest regions and usable location estimates, all trained solely from image-level class labels.
What carries the argument
Two-Weighted Activation Mapping (TWAM), which computes saliency from class-label-driven feature maps and supplies those maps to steer attention weights inside the multiple instance learning stage.
If this is right
- The model surpasses Attention-based Deep MIL and WILDCAT by at least 16 percentage points on both the Citrus Pest Benchmark and Insect Pest datasets.
- Bounding-box locations for salient insects are produced at test time without any location labels seen during training.
- The two-stage pipeline (TWAM followed by attention MIL) works on images containing multiple tiny objects against complex backgrounds.
- Only image-level class labels are required, removing the expense of generating bounding-box annotations.
Where Pith is reading between the lines
- The same label-only localization trick could be tested on other small-object domains such as weed seedlings or cell nuclei.
- If the saliency maps prove spatially accurate when checked against held-out bounding boxes, the method supplies cheap pseudo-labels for fully supervised detectors.
- The reported gains might shrink if future baselines adopt identical training protocols rather than published numbers.
Load-bearing premise
The class-label activation maps reliably mark the tiny mite locations rather than latching onto background texture or noise.
What would settle it
Retraining the compared Attention-based Deep MIL and WILDCAT baselines on the exact same data splits, augmentations, and optimization schedule yields accuracy within a few points of the proposed model.
Figures
read the original abstract
Citrus juices and fruits are commodities with great economic potential in the international market, but productivity losses caused by mites and other pests are still far from being a good mark. Despite the integrated pest mechanical aspect, only a few works on automatic classification have handled images with orange mite characteristics, which means tiny and noisy regions of interest. On the computational side, attention-based models have gained prominence in deep learning research, and, along with weakly supervised learning algorithms, they have improved tasks performed with some label restrictions. In agronomic research of pests and diseases, these techniques can improve classification performance while pointing out the location of mites and insects without specific labels, reducing deep learning development costs related to generating bounding boxes. In this context, this work proposes an attention-based activation map approach developed to improve the classification of tiny regions called Two-Weighted Activation Mapping, which also produces locations using feature map scores learned from class labels. We apply our method in a two-stage network process called Attention-based Multiple Instance Learning Guided by Saliency Maps. We analyze the proposed approach in two challenging datasets, the Citrus Pest Benchmark, which was captured directly in the field using magnifying glasses, and the Insect Pest, a large pest image benchmark. In addition, we evaluate and compare our models with weakly supervised methods, such as Attention-based Deep MIL and WILDCAT. The results show that our classifier is superior to literature methods that use tiny regions in their classification tasks, surpassing them in all scenarios by at least 16 percentage points. Moreover, our approach infers bounding box locations for salient insects, even training without any location labels.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Two-Weighted Activation Mapping (TWAM) within an Attention-based Multiple Instance Learning pipeline guided by saliency maps for weakly supervised classification of tiny mite and insect regions. It evaluates the approach on the Citrus Pest Benchmark (field-captured images) and Insect Pest dataset, claiming at least 16 percentage point gains over Attention-based Deep MIL and WILDCAT while also producing bounding-box localizations from class labels alone.
Significance. If the performance margins are shown to arise from the proposed components rather than training-protocol differences, the work would demonstrate a practical route to localization without bounding-box supervision for small-object agronomic tasks. The emphasis on field-captured noisy data and label-efficient training aligns with real deployment constraints.
major comments (3)
- [Abstract / Experimental results] Abstract and experimental results: the headline claim of 'surpassing them in all scenarios by at least 16 percentage points' is load-bearing, yet the manuscript supplies no statement that the baselines were re-trained under identical data splits, augmentation, optimizer schedules, or hyper-parameters on the Citrus Pest Benchmark; any deviation can produce large deltas on small noisy datasets.
- [Results] Results section: no ablation is reported that isolates the contribution of the two weights in TWAM, the saliency-map guidance, or the two-stage MIL pipeline from other implementation choices; without such controls the attribution of the reported gains remains unverified.
- [Abstract / Localization discussion] Abstract and localization discussion: the claim that the model 'infers bounding box locations for salient insects' is presented without any quantitative localization metric (IoU, precision-recall on inferred boxes, or comparison to ground-truth boxes) or protocol for converting activation maps to boxes.
minor comments (1)
- [Abstract] Abstract: 'integrated pest mechanical aspect' appears to be a phrasing error; consider 'integrated pest management aspect'.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract / Experimental results] Abstract and experimental results: the headline claim of 'surpassing them in all scenarios by at least 16 percentage points' is load-bearing, yet the manuscript supplies no statement that the baselines were re-trained under identical data splits, augmentation, optimizer schedules, or hyper-parameters on the Citrus Pest Benchmark; any deviation can produce large deltas on small noisy datasets.
Authors: We acknowledge that the manuscript does not explicitly confirm identical re-training of the baselines. To ensure a fair comparison, we will re-train Attention-based Deep MIL and WILDCAT using the exact same data splits, augmentations, optimizer, and hyper-parameter schedules as our method on the Citrus Pest Benchmark and report the updated results in the revised experimental section. revision: yes
-
Referee: [Results] Results section: no ablation is reported that isolates the contribution of the two weights in TWAM, the saliency-map guidance, or the two-stage MIL pipeline from other implementation choices; without such controls the attribution of the reported gains remains unverified.
Authors: We agree that the absence of targeted ablations leaves the source of the gains unclear. In the revised manuscript we will add ablation experiments that successively remove the two weights in TWAM, the saliency-map guidance term, and the two-stage training procedure while keeping all other implementation details fixed, thereby isolating their individual contributions. revision: yes
-
Referee: [Abstract / Localization discussion] Abstract and localization discussion: the claim that the model 'infers bounding box locations for salient insects' is presented without any quantitative localization metric (IoU, precision-recall on inferred boxes, or comparison to ground-truth boxes) or protocol for converting activation maps to boxes.
Authors: The datasets used are weakly supervised and contain no bounding-box annotations, so direct IoU or precision-recall against ground truth is not possible. We will nevertheless add an explicit description of the activation-to-box conversion protocol (thresholding and connected-component extraction) together with qualitative localization examples and any feasible proxy metrics. The abstract and discussion will be revised to accurately reflect these limitations and the added protocol. revision: partial
Circularity Check
No significant circularity; empirical benchmark comparisons are self-contained
full rationale
The paper proposes the TWAM activation mapping method and a two-stage MIL-guided architecture, then evaluates them via standard training and accuracy reporting on the Citrus Pest Benchmark and Insect Pest datasets. Superiority is asserted through direct numerical comparison to Attention-based Deep MIL and WILDCAT on the same benchmarks. No equations, parameters, or predictions are defined in terms of the target quantities themselves, and no load-bearing step reduces by construction to a fit or self-citation. The central claims rest on external experimental outcomes rather than tautological re-labeling of inputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- two weights in TWAM
- saliency and MIL stage thresholds
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We proposed ... Two-Weighted Activation Mapping (Two-WAM) ... Attention-based Multiple Instance Learning Guided by Saliency Maps
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
surpassing them in all scenarios by at least 16 percentage points
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. F. Neves, V . G. Trombin, V . N. Marques, L. F. Martinez, Global orange juice market: a 16-year summary and opportu- nities for creating value, Tropical Plant Pathology 45 (2020) 166–174
work page 2020
-
[2]
T. H. Spreen, Z. Gao, W. Fernandes, M. L. Zansler, Global eco- nomics and marketing of citrus products, Elsevier Inc., (2020)
work page 2020
-
[3]
R. Bassanezi, A. Czermainski, F. Laranjeira, A. Moreira, P. Ribeiro, E. Krainski, L. Amorim, Spatial patterns of the Citrus leprosis virus and its associated mite vector in systems without intervention, Plant Pathology 68 (2019) 85–93
work page 2019
-
[4]
S. de Carvalho, E. Girardi, F. Mour ˜ao F., R. Ferrarezi, H. Co- letta F., Advances in citrus propagation in Brazil, in: Revista Brasileira de Fruticultura, 6, (2019), pp. 1–36
work page 2019
-
[5]
X. Wu, C. Zhan, Y .-K. Lai, M.-M. Cheng, J. Yang, IP102: A Large-Scale Benchmark Dataset for Insect Pest Recogni- tion, Computer Vision and Pattern Recognition (CVPR) (2019) 8787–8796
work page 2019
-
[6]
H. Pei, K. Liu, X. Zhao, A. A. Yahya, Enhancing aphid detection framework based on ORB and convolutional neural networks, Scientific Reports 10 (2020) 1–15
work page 2020
-
[7]
R. Wang, L. Liu, C. Xie, P. Yang, R. Li, M. Zhou, Agripest: A large-scale domain-specific benchmark dataset for practical agricultural pest detection in the wild, Sensors 21 (2021) 1–15
work page 2021
- [8]
-
[9]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learn- ing Deep Features for Discriminative Localization, Computer Vision and Pattern Recognition (CVPR) (2016) 2921–2929
work page 2016
-
[10]
T. G. Dietterich, R. H. Lathrop, T. Lozano-P ´erez, Solving the multiple instance problem with axis-parallel rectangles, Artifi- cial Intelligence 89 (1997) 31–71
work page 1997
- [11]
-
[12]
M. Ilse, J. M. Tomczak, M. Welling, Attention-based deep mul- tiple instance learning, International Conference on Machine Learning (ICML) 5 (2018) 3376–3391
work page 2018
-
[13]
H. Chen, Q. Hu, B. Zhai, H. Chen, K. Liu, A robust weakly supervised learning of deep conv-nets for surface defect inspec- tion, Neural Computing and Applications (2020) 1–16
work page 2020
- [14]
-
[15]
S. Woo, J. Park, J. Lee, I. S. Kweon, CBAM: Convolutional block attention module, in: European Conference on Computer Vision, 2018, pp. 3–19
work page 2018
-
[16]
Y . Shen, N. Wu, J. Phang, J. Park, G. Kim, L. Moy, K. Cho, K. J. Geras, Globally-Aware Multiple Instance Classifier for Breast Cancer Screening, Lecture Notes in Computer Science (LNCS) 11861 (2019) 18–26
work page 2019
-
[17]
A. Krizhevsky, I. Sutskever, H. Geo ffrey E., ImageNet Classifi- cation with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems (NIPS) (2012) 1–9
work page 2012
-
[18]
C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Computer Vision and Pattern Recognition (CVPR), (2015), pp. 1–9
work page 2015
-
[19]
N. Elliott, J. Farrell, A. Gutierrez, C. van Lenteren, M. Walton, S. Wratten, Integrated Pest Management, Springer Science & Business Media, (1995)
work page 1995
- [20]
- [21]
-
[22]
Z. Wang, W. Gong, W. Li, A dynamic feature weighting method for mangrove pests image classification with heavy-tailed distri- butions, International Conference Proceeding Series (2020)
work page 2020
-
[23]
J. Lu, J. Hu, G. Zhao, F. Mei, C. Zhang, An in-field automatic wheat disease diagnosis system, Computers and Electronics in Agriculture 142 (2017) 369–379
work page 2017
-
[24]
Q. H. Cap, H. Uga, S. Kagiwada, H. Iyatomi, LeafGAN: An Ef- fective Data Augmentation Method for Practical Plant Disease Diagnosis, Transactions on Automation Science and Engineer- ing (2020) 1–10
work page 2020
-
[25]
M. Bastianel, J. Freitas-Ast ´ua, E. W. Kitajima, M. A. Machado, The citrus leprosis pathosystem, Summa Phytopathologica 32 (2006) 211–220. 17
work page 2006
-
[26]
Z. H. Zhou, A brief introduction to weakly supervised learning, National Science Review 5 (2018) 44–53
work page 2018
-
[27]
M. A. Carbonneau, V . Cheplygina, E. Granger, G. Gagnon, Mul- tiple instance learning: A survey of problem characteristics and applications, Pattern Recognition 77 (2018) 329–353
work page 2018
-
[28]
D. Bahdanau, K. Cho, Y . Bengio, Neural machine translation by jointly learning to align and translate, in: Y . Bengio, Y . Le- Cun (Eds.), International Conference on Learning Representa- tions (ICLR), (2015), pp. 1–11
work page 2015
- [29]
-
[30]
S. Chaudhari, G. Polatkan, R. Ramanath, V . Mithal, An attentive survey of attention models, arXiv 37 (2019)
work page 2019
-
[31]
A. Borji, M.-M. Cheng, Q. Hou, H. Jiang, J. Li, Salient object detection: A survey, Computational Visual Media (2019) 117– 150
work page 2019
-
[32]
J. Choe, S. J. Oh, S. Lee, S. Chun, Z. Akata, H. Shim, Evalu- ating weakly supervised object localization methods right, in: Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3130–3139
work page 2020
-
[33]
S. Adiga V ., J. Dolz, H. Lombaert, Manifold-driven attention maps for weakly supervised segmentation, arXiv:2004.03046 (2020)
- [34]
-
[35]
W. Wang, Q. Lai, H. Fu, J. Shen, H. Ling, R. Yang, Salient Ob- ject Detection in the Deep Learning Era: An In-Depth Survey, Pattern Analysis and Machine Intelligence (2019) 1–20
work page 2019
- [36]
-
[37]
R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization, International Conference on Computer Vision (ICCV) (2017) 618–626
work page 2017
-
[38]
W. S. Kim, D. H. Lee, Y . J. Kim, Machine vision-based auto- matic disease symptom detection of onion downy mildew, Com- puters and Electronics in Agriculture 168 (2020) 105099
work page 2020
-
[39]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations (ICLR), (2015), pp. 7–9
work page 2015
-
[40]
Y . Chen, X. Zhang, Z. Chen, M. Song, J. Wang, Fine-grained classification of fly species in the natural environment based on deep convolutional neural network, Computers in Biology and Medicine 135 (2021) 104655
work page 2021
-
[41]
E. A. Lins, J. P. M. Rodriguez, S. I. Scoloski, J. Pivato, M. B. Lima, J. M. C. Fernandes, P. R. V . da Silva Pereira, D. Lau, R. Rieder, A method for counting and classifying aphids using computer vision, Computers and Electronics in Agriculture 169 (2020) 105200
work page 2020
-
[42]
Y . Wu, L. Xu, Crop organ segmentation and disease identifica- tion based on weakly supervised deep neural network, Agron- omy 9 (2019)
work page 2019
-
[43]
M. Tan, Q. Le, E fficientNet: Rethinking model scaling for con- volutional neural networks, in: International Conference on Ma- chine Learning (ICML), (2019), pp. 6105–6114
work page 2019
- [44]
-
[45]
J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu, Squeeze-and- Excitation Networks, Transactions on Pattern Analysis and Ma- chine Intelligence 42 (2020) 2011–2023
work page 2020
-
[46]
L. Liu, R. Wang, C. Xie, P. Yang, F. Wang, S. Sudirman, W. Liu, PestNet: An End-to-End Deep Learning Approach for Large- Scale Multi-Class Pest Detection and Classification, IEEE Ac- cess 7 (2019) 45301–45312
work page 2019
-
[47]
F. Wang, R. Wang, C. Xie, P. Yang, L. Liu, Fusing multi-scale context-aware information representation for automatic in-field pest detection and recognition, Computers and Electronics in Agriculture 169 (2020) 105222
work page 2020
-
[48]
W. Zeng, M. Li, Crop leaf disease recognition based on Self- Attention convolutional neural network, Computers and Elec- tronics in Agriculture 172 (2020) 105341
work page 2020
-
[49]
L. Deng, Y . Wang, Z. Han, R. Yu, Research on insect pest im- age detection and recognition based on bio-inspired methods, Biosystems Engineering 169 (2018) 139–148
work page 2018
- [50]
-
[51]
M. Costello, Mites, in: P. Christensen (Ed.), Raisin production manual, UCANR Publications, (2000), pp. 187–190
work page 2000
-
[52]
H. Robbins, S. Monro, A stochastic approximation method, The annals of mathematical statistics (1951) 400–407
work page 1951
- [53]
- [54]
-
[55]
S. Wu, S. Zhong, Y . Liu, Deep residual learning for image recognition, Multimedia Tools and Applications (2017) 1–17. 18
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.