Weakly Supervised Attention-based Models Using Activation Maps for Citrus Mite and Insect Pest Classification

Edson Bollis; Helena Maia; Helio Pedrini; Sandra Avila

arxiv: 2110.00881 · v1 · submitted 2021-10-02 · 💻 cs.CV · cs.LG

Weakly Supervised Attention-based Models Using Activation Maps for Citrus Mite and Insect Pest Classification

Edson Bollis , Helena Maia , Helio Pedrini , Sandra Avila This is my paper

Pith reviewed 2026-05-24 13:00 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords weakly supervised learningattention mechanismsactivation mapscitrus pest classificationmultiple instance learningpest detectionimage classification

0 comments

The pith

A two-weighted activation mapping method in an attention-based two-stage network classifies tiny citrus mites and pests from class labels alone, beating prior weakly supervised approaches by at least 16 percentage points while also infering

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops Two-Weighted Activation Mapping to generate saliency scores from class labels only and feeds those scores into an attention-based multiple instance learning stage. The resulting classifier handles the very small, noisy regions that characterize mites and insects on citrus images captured in the field. It reports higher accuracy than earlier weakly supervised baselines on both the Citrus Pest Benchmark and the larger Insect Pest dataset. The same maps also produce bounding-box locations without any location supervision during training. A reader would care because the method lowers the cost of building pest detectors by removing the need for manual bounding-box labels.

Core claim

The central claim is that the Two-Weighted Activation Mapping produces class-specific feature-map scores that, when used to guide an attention-based multiple instance learning network, deliver both higher classification accuracy on tiny pest regions and usable location estimates, all trained solely from image-level class labels.

What carries the argument

Two-Weighted Activation Mapping (TWAM), which computes saliency from class-label-driven feature maps and supplies those maps to steer attention weights inside the multiple instance learning stage.

If this is right

The model surpasses Attention-based Deep MIL and WILDCAT by at least 16 percentage points on both the Citrus Pest Benchmark and Insect Pest datasets.
Bounding-box locations for salient insects are produced at test time without any location labels seen during training.
The two-stage pipeline (TWAM followed by attention MIL) works on images containing multiple tiny objects against complex backgrounds.
Only image-level class labels are required, removing the expense of generating bounding-box annotations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same label-only localization trick could be tested on other small-object domains such as weed seedlings or cell nuclei.
If the saliency maps prove spatially accurate when checked against held-out bounding boxes, the method supplies cheap pseudo-labels for fully supervised detectors.
The reported gains might shrink if future baselines adopt identical training protocols rather than published numbers.

Load-bearing premise

The class-label activation maps reliably mark the tiny mite locations rather than latching onto background texture or noise.

What would settle it

Retraining the compared Attention-based Deep MIL and WILDCAT baselines on the exact same data splits, augmentations, and optimization schedule yields accuracy within a few points of the proposed model.

Figures

Figures reproduced from arXiv: 2110.00881 by Edson Bollis, Helena Maia, Helio Pedrini, Sandra Avila.

**Figure 3.** Figure 3: We take advantage of the attention-based [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 3.** Figure 3: (a) Attention-based Multiple Instance Learning Guided by Saliency Maps (Attention-based MIL-Guided) consists of four steps. In Step 1, [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: RGB image transformation. (a) RGB encoded image, [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: The effects of using: (a) removal of noisy images and (b) dropout in Bag Model training. Acronyms: ‘Atten.’ models trained using the attention-based activation map proposed approach (TwoWAM); ‘Drop.’ models trained with dropout; ‘NCPB’ experiments trained with only images or instances from NCPB; and ‘CPB’ experiments trained with original images. consider CPB or NCPB images for training (‘NCPB Validation… view at source ↗

**Figure 6.** Figure 6: b presents the results concerning the Bag Model fine-tuning in the Instance Model training. The fine-tuning does not improve the classification performance using Two-WAM instances, but it improves using Grad-CAM instances. The fine-tuning strategy’s best result is 92.8% accuracy and 92.2% F1-score in CPB validation set, and 92.9% accuracy and 91.8% F1- score on the NCPB validation set. We achieved the be… view at source ↗

**Figure 7.** Figure 7: (a) CPB sample images. (b) Attention-based MIL [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: (a) IP102 sample images. (b) MIL-Guided saliency maps produced by Grad-CAM. (c) Attention-based MIL-Guided saliency maps based [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: (a) IP102 sample images [5]. (b) Attention-based MIL [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

read the original abstract

Citrus juices and fruits are commodities with great economic potential in the international market, but productivity losses caused by mites and other pests are still far from being a good mark. Despite the integrated pest mechanical aspect, only a few works on automatic classification have handled images with orange mite characteristics, which means tiny and noisy regions of interest. On the computational side, attention-based models have gained prominence in deep learning research, and, along with weakly supervised learning algorithms, they have improved tasks performed with some label restrictions. In agronomic research of pests and diseases, these techniques can improve classification performance while pointing out the location of mites and insects without specific labels, reducing deep learning development costs related to generating bounding boxes. In this context, this work proposes an attention-based activation map approach developed to improve the classification of tiny regions called Two-Weighted Activation Mapping, which also produces locations using feature map scores learned from class labels. We apply our method in a two-stage network process called Attention-based Multiple Instance Learning Guided by Saliency Maps. We analyze the proposed approach in two challenging datasets, the Citrus Pest Benchmark, which was captured directly in the field using magnifying glasses, and the Insect Pest, a large pest image benchmark. In addition, we evaluate and compare our models with weakly supervised methods, such as Attention-based Deep MIL and WILDCAT. The results show that our classifier is superior to literature methods that use tiny regions in their classification tasks, surpassing them in all scenarios by at least 16 percentage points. Moreover, our approach infers bounding box locations for salient insects, even training without any location labels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TWAM is a modest tweak on activation maps for tiny noisy pests, but the 16-point gains look vulnerable to training-protocol differences rather than the new components.

read the letter

The main things to know are that the authors define a Two-Weighted Activation Mapping (TWAM) that reweights feature maps with two learned scalars before feeding them into a saliency-guided attention MIL pipeline, and they apply the whole thing to field images of citrus mites and other insects. They report at least 16-point accuracy lifts over Attention-based Deep MIL and WILDCAT on the Citrus Pest Benchmark and the Insect Pest dataset, plus the ability to output rough bounding boxes from class labels alone. That combination of the weighting scheme and the two-stage setup is the concrete novelty relative to the cited baselines. The practical setting is also a plus: real magnifying-glass photos with tiny, cluttered regions where full bounding-box labels are expensive. The method stays within standard weakly supervised machinery and shows it can be adapted to this domain without extra annotation cost. The soft spot is exactly the one the stress-test flagged. The abstract gives no indication that the baselines were re-run under identical splits, augmentations, optimizers, or hyper-parameters on these small noisy sets. On datasets like this, even modest protocol shifts can produce double-digit swings, so the margin cannot yet be credited to TWAM or the saliency stage. No ablations, no statistical tests, and no quantitative localization scores (IoU or similar) appear in the provided summary either. The localization claim therefore stays qualitative. This is a narrow but useful empirical paper for people working on agricultural CV or weakly supervised small-object tasks. A reader who needs a concrete example of activation-map weighting in MIL would get something out of the datasets and the specific formulation. It is coherent on its own terms and engages the relevant literature, so it clears the bar for serious refereeing even if the experiments will need tightening. I would send it to review with a request for full protocol details and ablations.

Referee Report

3 major / 1 minor

Summary. The paper proposes Two-Weighted Activation Mapping (TWAM) within an Attention-based Multiple Instance Learning pipeline guided by saliency maps for weakly supervised classification of tiny mite and insect regions. It evaluates the approach on the Citrus Pest Benchmark (field-captured images) and Insect Pest dataset, claiming at least 16 percentage point gains over Attention-based Deep MIL and WILDCAT while also producing bounding-box localizations from class labels alone.

Significance. If the performance margins are shown to arise from the proposed components rather than training-protocol differences, the work would demonstrate a practical route to localization without bounding-box supervision for small-object agronomic tasks. The emphasis on field-captured noisy data and label-efficient training aligns with real deployment constraints.

major comments (3)

[Abstract / Experimental results] Abstract and experimental results: the headline claim of 'surpassing them in all scenarios by at least 16 percentage points' is load-bearing, yet the manuscript supplies no statement that the baselines were re-trained under identical data splits, augmentation, optimizer schedules, or hyper-parameters on the Citrus Pest Benchmark; any deviation can produce large deltas on small noisy datasets.
[Results] Results section: no ablation is reported that isolates the contribution of the two weights in TWAM, the saliency-map guidance, or the two-stage MIL pipeline from other implementation choices; without such controls the attribution of the reported gains remains unverified.
[Abstract / Localization discussion] Abstract and localization discussion: the claim that the model 'infers bounding box locations for salient insects' is presented without any quantitative localization metric (IoU, precision-recall on inferred boxes, or comparison to ground-truth boxes) or protocol for converting activation maps to boxes.

minor comments (1)

[Abstract] Abstract: 'integrated pest mechanical aspect' appears to be a phrasing error; consider 'integrated pest management aspect'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract / Experimental results] Abstract and experimental results: the headline claim of 'surpassing them in all scenarios by at least 16 percentage points' is load-bearing, yet the manuscript supplies no statement that the baselines were re-trained under identical data splits, augmentation, optimizer schedules, or hyper-parameters on the Citrus Pest Benchmark; any deviation can produce large deltas on small noisy datasets.

Authors: We acknowledge that the manuscript does not explicitly confirm identical re-training of the baselines. To ensure a fair comparison, we will re-train Attention-based Deep MIL and WILDCAT using the exact same data splits, augmentations, optimizer, and hyper-parameter schedules as our method on the Citrus Pest Benchmark and report the updated results in the revised experimental section. revision: yes
Referee: [Results] Results section: no ablation is reported that isolates the contribution of the two weights in TWAM, the saliency-map guidance, or the two-stage MIL pipeline from other implementation choices; without such controls the attribution of the reported gains remains unverified.

Authors: We agree that the absence of targeted ablations leaves the source of the gains unclear. In the revised manuscript we will add ablation experiments that successively remove the two weights in TWAM, the saliency-map guidance term, and the two-stage training procedure while keeping all other implementation details fixed, thereby isolating their individual contributions. revision: yes
Referee: [Abstract / Localization discussion] Abstract and localization discussion: the claim that the model 'infers bounding box locations for salient insects' is presented without any quantitative localization metric (IoU, precision-recall on inferred boxes, or comparison to ground-truth boxes) or protocol for converting activation maps to boxes.

Authors: The datasets used are weakly supervised and contain no bounding-box annotations, so direct IoU or precision-recall against ground truth is not possible. We will nevertheless add an explicit description of the activation-to-box conversion protocol (thresholding and connected-component extraction) together with qualitative localization examples and any feasible proxy metrics. The abstract and discussion will be revised to accurately reflect these limitations and the added protocol. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical benchmark comparisons are self-contained

full rationale

The paper proposes the TWAM activation mapping method and a two-stage MIL-guided architecture, then evaluates them via standard training and accuracy reporting on the Citrus Pest Benchmark and Insect Pest datasets. Superiority is asserted through direct numerical comparison to Attention-based Deep MIL and WILDCAT on the same benchmarks. No equations, parameters, or predictions are defined in terms of the target quantities themselves, and no load-bearing step reduces by construction to a fit or self-citation. The central claims rest on external experimental outcomes rather than tautological re-labeling of inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields limited visibility into exact hyperparameters; the method necessarily introduces learned weights for the activation maps and multiple training-stage thresholds that function as free parameters.

free parameters (2)

two weights in TWAM
The two weighting coefficients that combine feature maps are learned or chosen to produce the final activation map.
saliency and MIL stage thresholds
Decision thresholds that separate candidate regions in the two-stage pipeline are not specified and must be set.

pith-pipeline@v0.9.0 · 5832 in / 1156 out tokens · 24081 ms · 2026-05-24T13:00:41.375815+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We proposed ... Two-Weighted Activation Mapping (Two-WAM) ... Attention-based Multiple Instance Learning Guided by Saliency Maps
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

surpassing them in all scenarios by at least 16 percentage points

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages

[1]

M. F. Neves, V . G. Trombin, V . N. Marques, L. F. Martinez, Global orange juice market: a 16-year summary and opportu- nities for creating value, Tropical Plant Pathology 45 (2020) 166–174

work page 2020
[2]

T. H. Spreen, Z. Gao, W. Fernandes, M. L. Zansler, Global eco- nomics and marketing of citrus products, Elsevier Inc., (2020)

work page 2020
[3]

Bassanezi, A

R. Bassanezi, A. Czermainski, F. Laranjeira, A. Moreira, P. Ribeiro, E. Krainski, L. Amorim, Spatial patterns of the Citrus leprosis virus and its associated mite vector in systems without intervention, Plant Pathology 68 (2019) 85–93

work page 2019
[4]

de Carvalho, E

S. de Carvalho, E. Girardi, F. Mour ˜ao F., R. Ferrarezi, H. Co- letta F., Advances in citrus propagation in Brazil, in: Revista Brasileira de Fruticultura, 6, (2019), pp. 1–36

work page 2019
[5]

X. Wu, C. Zhan, Y .-K. Lai, M.-M. Cheng, J. Yang, IP102: A Large-Scale Benchmark Dataset for Insect Pest Recogni- tion, Computer Vision and Pattern Recognition (CVPR) (2019) 8787–8796

work page 2019
[6]

H. Pei, K. Liu, X. Zhao, A. A. Yahya, Enhancing aphid detection framework based on ORB and convolutional neural networks, Scientiﬁc Reports 10 (2020) 1–15

work page 2020
[7]

R. Wang, L. Liu, C. Xie, P. Yang, R. Li, M. Zhou, Agripest: A large-scale domain-speciﬁc benchmark dataset for practical agricultural pest detection in the wild, Sensors 21 (2021) 1–15

work page 2021
[8]

Bollis, H

E. Bollis, H. Pedrini, S. Avila, Weakly supervised learning guided by activation mapping applied to a novel citrus pest benchmark, Computer Vision and Pattern Recognition Work- shops (CVPRW) (2020) 310–319

work page 2020
[9]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learn- ing Deep Features for Discriminative Localization, Computer Vision and Pattern Recognition (CVPR) (2016) 2921–2929

work page 2016
[10]

T. G. Dietterich, R. H. Lathrop, T. Lozano-P ´erez, Solving the multiple instance problem with axis-parallel rectangles, Artiﬁ- cial Intelligence 89 (1997) 31–71

work page 1997
[11]

Durand, T

T. Durand, T. Mordan, N. Thome, M. Cord, WILDCAT: Weakly supervised learning of deep convnets for image classiﬁcation, pointwise localization and segmentation, Computer Vision and Pattern Recognition (CVPR) (2017) 5957–5966

work page 2017
[12]

M. Ilse, J. M. Tomczak, M. Welling, Attention-based deep mul- tiple instance learning, International Conference on Machine Learning (ICML) 5 (2018) 3376–3391

work page 2018
[13]

H. Chen, Q. Hu, B. Zhai, H. Chen, K. Liu, A robust weakly supervised learning of deep conv-nets for surface defect inspec- tion, Neural Computing and Applications (2020) 1–16

work page 2020
[14]

Yeh, M.-H

C.-H. Yeh, M.-H. Lin, P.-C. Chang, L.-W. Kang, Enhanced vi- sual attention-guided deep neural networks for image classiﬁca- tion, IEEE Access 8 (2020) 163447–163457

work page 2020
[15]

S. Woo, J. Park, J. Lee, I. S. Kweon, CBAM: Convolutional block attention module, in: European Conference on Computer Vision, 2018, pp. 3–19

work page 2018
[16]

Y . Shen, N. Wu, J. Phang, J. Park, G. Kim, L. Moy, K. Cho, K. J. Geras, Globally-Aware Multiple Instance Classiﬁer for Breast Cancer Screening, Lecture Notes in Computer Science (LNCS) 11861 (2019) 18–26

work page 2019
[17]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, H. Geo ﬀrey E., ImageNet Classiﬁ- cation with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems (NIPS) (2012) 1–9

work page 2012
[18]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Computer Vision and Pattern Recognition (CVPR), (2015), pp. 1–9

work page 2015
[19]

Elliott, J

N. Elliott, J. Farrell, A. Gutierrez, C. van Lenteren, M. Walton, S. Wratten, Integrated Pest Management, Springer Science & Business Media, (1995)

work page 1995
[20]

Smith, H

R. Smith, H. Reynolds, Principles, deﬁnitions and scope of in- tegrated pest control, in: FAO Symposium on Integrated Pest Control, (1966), pp. 11–17

work page 1966
[21]

Morgan, U

K. Morgan, U. Albrecht, F. Alferez, O. Batuman, et al., Florida Citrus Production Guide, Technical Report, Institute of Food and Agricultural Sciences, University of Florida, 2020

work page 2020
[22]

Z. Wang, W. Gong, W. Li, A dynamic feature weighting method for mangrove pests image classiﬁcation with heavy-tailed distri- butions, International Conference Proceeding Series (2020)

work page 2020
[23]

J. Lu, J. Hu, G. Zhao, F. Mei, C. Zhang, An in-ﬁeld automatic wheat disease diagnosis system, Computers and Electronics in Agriculture 142 (2017) 369–379

work page 2017
[24]

Q. H. Cap, H. Uga, S. Kagiwada, H. Iyatomi, LeafGAN: An Ef- fective Data Augmentation Method for Practical Plant Disease Diagnosis, Transactions on Automation Science and Engineer- ing (2020) 1–10

work page 2020
[25]

Bastianel, J

M. Bastianel, J. Freitas-Ast ´ua, E. W. Kitajima, M. A. Machado, The citrus leprosis pathosystem, Summa Phytopathologica 32 (2006) 211–220. 17

work page 2006
[26]

Z. H. Zhou, A brief introduction to weakly supervised learning, National Science Review 5 (2018) 44–53

work page 2018
[27]

M. A. Carbonneau, V . Cheplygina, E. Granger, G. Gagnon, Mul- tiple instance learning: A survey of problem characteristics and applications, Pattern Recognition 77 (2018) 329–353

work page 2018
[28]

Bahdanau, K

D. Bahdanau, K. Cho, Y . Bengio, Neural machine translation by jointly learning to align and translate, in: Y . Bengio, Y . Le- Cun (Eds.), International Conference on Learning Representa- tions (ICLR), (2015), pp. 1–11

work page 2015
[29]

Ra ﬀel, D

C. Ra ﬀel, D. Ellis, Feed-forward networks with attention can solve some long-term memory problems, in: International Con- ference on Learning Representations (ICLR), 2016, pp. 1–6

work page 2016
[30]

Chaudhari, G

S. Chaudhari, G. Polatkan, R. Ramanath, V . Mithal, An attentive survey of attention models, arXiv 37 (2019)

work page 2019
[31]

Borji, M.-M

A. Borji, M.-M. Cheng, Q. Hou, H. Jiang, J. Li, Salient object detection: A survey, Computational Visual Media (2019) 117– 150

work page 2019
[32]

J. Choe, S. J. Oh, S. Lee, S. Chun, Z. Akata, H. Shim, Evalu- ating weakly supervised object localization methods right, in: Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3130–3139

work page 2020
[33]

Adiga V ., J

S. Adiga V ., J. Dolz, H. Lombaert, Manifold-driven attention maps for weakly supervised segmentation, arXiv:2004.03046 (2020)

work page arXiv 2004
[34]

J. Rony, S. Belharbi, J. Dolz, I. B. Ayed, L. McCa ﬀrey, E. Granger, Deep weakly-supervised learning methods for classiﬁcation and localization in histology images: a survey, arXiv:1909.03354v2 (2019)

work page arXiv 1909
[35]

W. Wang, Q. Lai, H. Fu, J. Shen, H. Ling, R. Yang, Salient Ob- ject Detection in the Deep Learning Era: An In-Depth Survey, Pattern Analysis and Machine Intelligence (2019) 1–20

work page 2019
[36]

A. d. S. Correia, E. L. Colombini, Attention, please! A survey of Neural Attention Models in Deep Learning, arXiv:2103.16775v1 (2021)

work page arXiv 2021
[37]

Selvaraju, M

R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization, International Conference on Computer Vision (ICCV) (2017) 618–626

work page 2017
[38]

W. S. Kim, D. H. Lee, Y . J. Kim, Machine vision-based auto- matic disease symptom detection of onion downy mildew, Com- puters and Electronics in Agriculture 168 (2020) 105099

work page 2020
[39]

Simonyan, A

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations (ICLR), (2015), pp. 7–9

work page 2015
[40]

Y . Chen, X. Zhang, Z. Chen, M. Song, J. Wang, Fine-grained classiﬁcation of ﬂy species in the natural environment based on deep convolutional neural network, Computers in Biology and Medicine 135 (2021) 104655

work page 2021
[41]

E. A. Lins, J. P. M. Rodriguez, S. I. Scoloski, J. Pivato, M. B. Lima, J. M. C. Fernandes, P. R. V . da Silva Pereira, D. Lau, R. Rieder, A method for counting and classifying aphids using computer vision, Computers and Electronics in Agriculture 169 (2020) 105200

work page 2020
[42]

Y . Wu, L. Xu, Crop organ segmentation and disease identiﬁca- tion based on weakly supervised deep neural network, Agron- omy 9 (2019)

work page 2019
[43]

M. Tan, Q. Le, E ﬃcientNet: Rethinking model scaling for con- volutional neural networks, in: International Conference on Ma- chine Learning (ICML), (2019), pp. 6105–6114

work page 2019
[44]

Howard, M

A. Howard, M. Sandler, B. Chen, W. Wang, L. Chen, M. Tan, G. Chu, V . Vasudevan, Y . Zhu, R. Pang, H. Adam, Q. Le, Searching for MobileNetV3, in: International Conference on Computer Vision (ICCV), (2019), pp. 1314–1324

work page 2019
[45]

J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu, Squeeze-and- Excitation Networks, Transactions on Pattern Analysis and Ma- chine Intelligence 42 (2020) 2011–2023

work page 2020
[46]

L. Liu, R. Wang, C. Xie, P. Yang, F. Wang, S. Sudirman, W. Liu, PestNet: An End-to-End Deep Learning Approach for Large- Scale Multi-Class Pest Detection and Classiﬁcation, IEEE Ac- cess 7 (2019) 45301–45312

work page 2019
[47]

F. Wang, R. Wang, C. Xie, P. Yang, L. Liu, Fusing multi-scale context-aware information representation for automatic in-ﬁeld pest detection and recognition, Computers and Electronics in Agriculture 169 (2020) 105222

work page 2020
[48]

W. Zeng, M. Li, Crop leaf disease recognition based on Self- Attention convolutional neural network, Computers and Elec- tronics in Agriculture 172 (2020) 105341

work page 2020
[49]

L. Deng, Y . Wang, Z. Han, R. Yu, Research on insect pest im- age detection and recognition based on bio-inspired methods, Biosystems Engineering 169 (2018) 139–148

work page 2018
[50]

Nanni, G

L. Nanni, G. Maguolo, F. Pancino, Insect pest image detection and recognition based on bio-inspired methods, Ecological In- formatics 57 (2020)

work page 2020
[51]

Costello, Mites, in: P

M. Costello, Mites, in: P. Christensen (Ed.), Raisin production manual, UCANR Publications, (2000), pp. 187–190

work page 2000
[52]

Robbins, S

H. Robbins, S. Monro, A stochastic approximation method, The annals of mathematical statistics (1951) 400–407

work page 1951
[53]

Kiefer, J

J. Kiefer, J. Wolfowitz, Stochastic estimation of the maximum of a regression function, The Annals of Mathematical Statistics (1952) 462–466

work page 1952
[54]

LeCun, L

Y . LeCun, L. Bottou, Y . Bengio, P. Ha ﬀner, Gradient-based learning applied to document recognition, IEEE Access 86 (1998) 2278–2323

work page 1998
[55]

S. Wu, S. Zhong, Y . Liu, Deep residual learning for image recognition, Multimedia Tools and Applications (2017) 1–17. 18

work page 2017

[1] [1]

M. F. Neves, V . G. Trombin, V . N. Marques, L. F. Martinez, Global orange juice market: a 16-year summary and opportu- nities for creating value, Tropical Plant Pathology 45 (2020) 166–174

work page 2020

[2] [2]

T. H. Spreen, Z. Gao, W. Fernandes, M. L. Zansler, Global eco- nomics and marketing of citrus products, Elsevier Inc., (2020)

work page 2020

[3] [3]

Bassanezi, A

R. Bassanezi, A. Czermainski, F. Laranjeira, A. Moreira, P. Ribeiro, E. Krainski, L. Amorim, Spatial patterns of the Citrus leprosis virus and its associated mite vector in systems without intervention, Plant Pathology 68 (2019) 85–93

work page 2019

[4] [4]

de Carvalho, E

S. de Carvalho, E. Girardi, F. Mour ˜ao F., R. Ferrarezi, H. Co- letta F., Advances in citrus propagation in Brazil, in: Revista Brasileira de Fruticultura, 6, (2019), pp. 1–36

work page 2019

[5] [5]

X. Wu, C. Zhan, Y .-K. Lai, M.-M. Cheng, J. Yang, IP102: A Large-Scale Benchmark Dataset for Insect Pest Recogni- tion, Computer Vision and Pattern Recognition (CVPR) (2019) 8787–8796

work page 2019

[6] [6]

H. Pei, K. Liu, X. Zhao, A. A. Yahya, Enhancing aphid detection framework based on ORB and convolutional neural networks, Scientiﬁc Reports 10 (2020) 1–15

work page 2020

[7] [7]

R. Wang, L. Liu, C. Xie, P. Yang, R. Li, M. Zhou, Agripest: A large-scale domain-speciﬁc benchmark dataset for practical agricultural pest detection in the wild, Sensors 21 (2021) 1–15

work page 2021

[8] [8]

Bollis, H

E. Bollis, H. Pedrini, S. Avila, Weakly supervised learning guided by activation mapping applied to a novel citrus pest benchmark, Computer Vision and Pattern Recognition Work- shops (CVPRW) (2020) 310–319

work page 2020

[9] [9]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learn- ing Deep Features for Discriminative Localization, Computer Vision and Pattern Recognition (CVPR) (2016) 2921–2929

work page 2016

[10] [10]

T. G. Dietterich, R. H. Lathrop, T. Lozano-P ´erez, Solving the multiple instance problem with axis-parallel rectangles, Artiﬁ- cial Intelligence 89 (1997) 31–71

work page 1997

[11] [11]

Durand, T

T. Durand, T. Mordan, N. Thome, M. Cord, WILDCAT: Weakly supervised learning of deep convnets for image classiﬁcation, pointwise localization and segmentation, Computer Vision and Pattern Recognition (CVPR) (2017) 5957–5966

work page 2017

[12] [12]

M. Ilse, J. M. Tomczak, M. Welling, Attention-based deep mul- tiple instance learning, International Conference on Machine Learning (ICML) 5 (2018) 3376–3391

work page 2018

[13] [13]

H. Chen, Q. Hu, B. Zhai, H. Chen, K. Liu, A robust weakly supervised learning of deep conv-nets for surface defect inspec- tion, Neural Computing and Applications (2020) 1–16

work page 2020

[14] [14]

Yeh, M.-H

C.-H. Yeh, M.-H. Lin, P.-C. Chang, L.-W. Kang, Enhanced vi- sual attention-guided deep neural networks for image classiﬁca- tion, IEEE Access 8 (2020) 163447–163457

work page 2020

[15] [15]

S. Woo, J. Park, J. Lee, I. S. Kweon, CBAM: Convolutional block attention module, in: European Conference on Computer Vision, 2018, pp. 3–19

work page 2018

[16] [16]

Y . Shen, N. Wu, J. Phang, J. Park, G. Kim, L. Moy, K. Cho, K. J. Geras, Globally-Aware Multiple Instance Classiﬁer for Breast Cancer Screening, Lecture Notes in Computer Science (LNCS) 11861 (2019) 18–26

work page 2019

[17] [17]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, H. Geo ﬀrey E., ImageNet Classiﬁ- cation with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems (NIPS) (2012) 1–9

work page 2012

[18] [18]

Szegedy, W

C. Szegedy, W. Liu, Y . Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V . Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Computer Vision and Pattern Recognition (CVPR), (2015), pp. 1–9

work page 2015

[19] [19]

Elliott, J

N. Elliott, J. Farrell, A. Gutierrez, C. van Lenteren, M. Walton, S. Wratten, Integrated Pest Management, Springer Science & Business Media, (1995)

work page 1995

[20] [20]

Smith, H

R. Smith, H. Reynolds, Principles, deﬁnitions and scope of in- tegrated pest control, in: FAO Symposium on Integrated Pest Control, (1966), pp. 11–17

work page 1966

[21] [21]

Morgan, U

K. Morgan, U. Albrecht, F. Alferez, O. Batuman, et al., Florida Citrus Production Guide, Technical Report, Institute of Food and Agricultural Sciences, University of Florida, 2020

work page 2020

[22] [22]

Z. Wang, W. Gong, W. Li, A dynamic feature weighting method for mangrove pests image classiﬁcation with heavy-tailed distri- butions, International Conference Proceeding Series (2020)

work page 2020

[23] [23]

J. Lu, J. Hu, G. Zhao, F. Mei, C. Zhang, An in-ﬁeld automatic wheat disease diagnosis system, Computers and Electronics in Agriculture 142 (2017) 369–379

work page 2017

[24] [24]

Q. H. Cap, H. Uga, S. Kagiwada, H. Iyatomi, LeafGAN: An Ef- fective Data Augmentation Method for Practical Plant Disease Diagnosis, Transactions on Automation Science and Engineer- ing (2020) 1–10

work page 2020

[25] [25]

Bastianel, J

M. Bastianel, J. Freitas-Ast ´ua, E. W. Kitajima, M. A. Machado, The citrus leprosis pathosystem, Summa Phytopathologica 32 (2006) 211–220. 17

work page 2006

[26] [26]

Z. H. Zhou, A brief introduction to weakly supervised learning, National Science Review 5 (2018) 44–53

work page 2018

[27] [27]

M. A. Carbonneau, V . Cheplygina, E. Granger, G. Gagnon, Mul- tiple instance learning: A survey of problem characteristics and applications, Pattern Recognition 77 (2018) 329–353

work page 2018

[28] [28]

Bahdanau, K

D. Bahdanau, K. Cho, Y . Bengio, Neural machine translation by jointly learning to align and translate, in: Y . Bengio, Y . Le- Cun (Eds.), International Conference on Learning Representa- tions (ICLR), (2015), pp. 1–11

work page 2015

[29] [29]

Ra ﬀel, D

C. Ra ﬀel, D. Ellis, Feed-forward networks with attention can solve some long-term memory problems, in: International Con- ference on Learning Representations (ICLR), 2016, pp. 1–6

work page 2016

[30] [30]

Chaudhari, G

S. Chaudhari, G. Polatkan, R. Ramanath, V . Mithal, An attentive survey of attention models, arXiv 37 (2019)

work page 2019

[31] [31]

Borji, M.-M

A. Borji, M.-M. Cheng, Q. Hou, H. Jiang, J. Li, Salient object detection: A survey, Computational Visual Media (2019) 117– 150

work page 2019

[32] [32]

J. Choe, S. J. Oh, S. Lee, S. Chun, Z. Akata, H. Shim, Evalu- ating weakly supervised object localization methods right, in: Computer Vision and Pattern Recognition (CVPR), 2020, pp. 3130–3139

work page 2020

[33] [33]

Adiga V ., J

S. Adiga V ., J. Dolz, H. Lombaert, Manifold-driven attention maps for weakly supervised segmentation, arXiv:2004.03046 (2020)

work page arXiv 2004

[34] [34]

J. Rony, S. Belharbi, J. Dolz, I. B. Ayed, L. McCa ﬀrey, E. Granger, Deep weakly-supervised learning methods for classiﬁcation and localization in histology images: a survey, arXiv:1909.03354v2 (2019)

work page arXiv 1909

[35] [35]

W. Wang, Q. Lai, H. Fu, J. Shen, H. Ling, R. Yang, Salient Ob- ject Detection in the Deep Learning Era: An In-Depth Survey, Pattern Analysis and Machine Intelligence (2019) 1–20

work page 2019

[36] [36]

A. d. S. Correia, E. L. Colombini, Attention, please! A survey of Neural Attention Models in Deep Learning, arXiv:2103.16775v1 (2021)

work page arXiv 2021

[37] [37]

Selvaraju, M

R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization, International Conference on Computer Vision (ICCV) (2017) 618–626

work page 2017

[38] [38]

W. S. Kim, D. H. Lee, Y . J. Kim, Machine vision-based auto- matic disease symptom detection of onion downy mildew, Com- puters and Electronics in Agriculture 168 (2020) 105099

work page 2020

[39] [39]

Simonyan, A

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations (ICLR), (2015), pp. 7–9

work page 2015

[40] [40]

Y . Chen, X. Zhang, Z. Chen, M. Song, J. Wang, Fine-grained classiﬁcation of ﬂy species in the natural environment based on deep convolutional neural network, Computers in Biology and Medicine 135 (2021) 104655

work page 2021

[41] [41]

E. A. Lins, J. P. M. Rodriguez, S. I. Scoloski, J. Pivato, M. B. Lima, J. M. C. Fernandes, P. R. V . da Silva Pereira, D. Lau, R. Rieder, A method for counting and classifying aphids using computer vision, Computers and Electronics in Agriculture 169 (2020) 105200

work page 2020

[42] [42]

Y . Wu, L. Xu, Crop organ segmentation and disease identiﬁca- tion based on weakly supervised deep neural network, Agron- omy 9 (2019)

work page 2019

[43] [43]

M. Tan, Q. Le, E ﬃcientNet: Rethinking model scaling for con- volutional neural networks, in: International Conference on Ma- chine Learning (ICML), (2019), pp. 6105–6114

work page 2019

[44] [44]

Howard, M

A. Howard, M. Sandler, B. Chen, W. Wang, L. Chen, M. Tan, G. Chu, V . Vasudevan, Y . Zhu, R. Pang, H. Adam, Q. Le, Searching for MobileNetV3, in: International Conference on Computer Vision (ICCV), (2019), pp. 1314–1324

work page 2019

[45] [45]

J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu, Squeeze-and- Excitation Networks, Transactions on Pattern Analysis and Ma- chine Intelligence 42 (2020) 2011–2023

work page 2020

[46] [46]

L. Liu, R. Wang, C. Xie, P. Yang, F. Wang, S. Sudirman, W. Liu, PestNet: An End-to-End Deep Learning Approach for Large- Scale Multi-Class Pest Detection and Classiﬁcation, IEEE Ac- cess 7 (2019) 45301–45312

work page 2019

[47] [47]

F. Wang, R. Wang, C. Xie, P. Yang, L. Liu, Fusing multi-scale context-aware information representation for automatic in-ﬁeld pest detection and recognition, Computers and Electronics in Agriculture 169 (2020) 105222

work page 2020

[48] [48]

W. Zeng, M. Li, Crop leaf disease recognition based on Self- Attention convolutional neural network, Computers and Elec- tronics in Agriculture 172 (2020) 105341

work page 2020

[49] [49]

L. Deng, Y . Wang, Z. Han, R. Yu, Research on insect pest im- age detection and recognition based on bio-inspired methods, Biosystems Engineering 169 (2018) 139–148

work page 2018

[50] [50]

Nanni, G

L. Nanni, G. Maguolo, F. Pancino, Insect pest image detection and recognition based on bio-inspired methods, Ecological In- formatics 57 (2020)

work page 2020

[51] [51]

Costello, Mites, in: P

M. Costello, Mites, in: P. Christensen (Ed.), Raisin production manual, UCANR Publications, (2000), pp. 187–190

work page 2000

[52] [52]

Robbins, S

H. Robbins, S. Monro, A stochastic approximation method, The annals of mathematical statistics (1951) 400–407

work page 1951

[53] [53]

Kiefer, J

J. Kiefer, J. Wolfowitz, Stochastic estimation of the maximum of a regression function, The Annals of Mathematical Statistics (1952) 462–466

work page 1952

[54] [54]

LeCun, L

Y . LeCun, L. Bottou, Y . Bengio, P. Ha ﬀner, Gradient-based learning applied to document recognition, IEEE Access 86 (1998) 2278–2323

work page 1998

[55] [55]

S. Wu, S. Zhong, Y . Liu, Deep residual learning for image recognition, Multimedia Tools and Applications (2017) 1–17. 18

work page 2017