arxiv: 2604.07182 · v1 · submitted 2026-04-08 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 2 theorem links

· Lean Theorem

TeaLeafVision: An Explainable and Robust Deep Learning Framework for Tea Leaf Disease Classification

Rafi Ahamed , Sidratul Moon Nafsin , Md Abir Rahman , Tasnia Tarannum Roza , Munaia Jannat Easha , Abu Raihan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:10 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords tea leaf diseaseconvolutional neural networksDenseNet201Grad-CAMexplainable AIadversarial trainingagricultural imagingcrop disease detection

0 comments

The pith

DenseNet201 reaches 99 percent accuracy classifying seven tea leaf conditions on field-collected images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates multiple convolutional neural network architectures on the teaLeafBD dataset, which holds images of healthy tea leaves and six disease types gathered under real field conditions. DenseNet201 records the highest test accuracy at 99 percent. The authors add Gradient-weighted Class Activation Mapping to show which image regions drive each prediction, occlusion sensitivity checks, and adversarial training to make outputs more stable under noise or lighting shifts. They also release a working prototype that runs the model on new photos. If the results hold, the work indicates that deep learning can deliver reliable, inspectable disease detection directly usable by tea growers.

Core claim

DenseNet201 trained on the seven-class teaLeafBD dataset achieves 99 percent test accuracy; when paired with Grad-CAM visualizations and adversarial training, the same model supplies both high classification performance and interpretable decision maps while resisting common image perturbations.

What carries the argument

DenseNet201 convolutional network augmented by Gradient-weighted Class Activation Mapping to highlight disease-relevant image patches and by adversarial training to increase tolerance to field noise.

If this is right

Growers receive rapid, image-based diagnoses that can trigger targeted treatment before disease spreads.
Grad-CAM outputs let users see whether the model is attending to the actual leaf spots or to irrelevant background elements.
Adversarial training reduces drops in performance when photos contain dust, shadows, or slight blur typical of handheld field cameras.
The prototype demonstrates that the pipeline can move from research server to a lightweight on-site tool without requiring cloud connectivity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same DenseNet-plus-Grad-CAM pattern could be retrained on other crop-disease image sets with modest additional labeling effort.
Accuracy figures measured on one regional dataset may overestimate performance when tea varieties, soil types, or camera hardware differ.
Pairing the visual model with simple metadata such as leaf age or recent rainfall records might raise real-world reliability beyond what images alone provide.

Load-bearing premise

The teaLeafBD images, even though taken under varied field conditions, capture enough of the appearance and lighting differences that occur across all tea-growing regions and seasons for the trained model to keep its accuracy on fresh data.

What would settle it

Retraining or testing the published DenseNet201 weights on a fresh set of tea leaf photographs gathered from an unrelated farm or later season and obtaining accuracy well below 99 percent would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 2604.07182 by Abu Raihan, Md Abir Rahman, Munaia Jannat Easha, Rafi Ahamed, Sidratul Moon Nafsin, Tasnia Tarannum Roza.

**Figure 1.** Figure 1: below illustrates the complete workflow of the proposed classification model. This study represents a deep learningbased approach for detecting Tea leaf diseases using a publicly available dataset. Images were preprocessed through resizing, normalization, and data augmentation techniques that include flipping, rotation, and zooming, to enhance model robustness [8]. We split the dataset into training, test… view at source ↗

**Figure 3.** Figure 3: Loss for training and validation of MobileNetV2 architecture The training and validation curves show a steady and consistent learning behaviour. Training accuracy gradually increases from around 58% to 90%, while validation accuracy rises smoothly and even surpasses the training curves, reaching about 93%, which suggests strong generalization due to effective regularization and data augmentation. The confu… view at source ↗

**Figure 4.** Figure 4: Confusion Matrix of MobileNetV2 architecture B. InceptionV3 In the scenario [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: Training and validation accuracy and loss curve of InceptionV3 The [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 8.** Figure 8: Confusion Matrix of DenseNet201 D. Adversarial training To further assess reliability, adversarial perturbations were applied to DenseNet201, and the results are summarized in Table II. The model consistently demonstrated high validation accuracy across all ε values, achieving a peak accuracy of 98.15% at ε= 0.1. Even at elevated perturbation levels, the performance remained above 97.8%, with only minor va… view at source ↗

**Figure 6.** Figure 6: Confusion Matrix of InceptionV3 C. DenseNet201 In [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗

**Figure 7.** Figure 7: Training and validation accuracy and loss curve of DenseNet201 In the loss graph, both losses drop quickly at the start and continue decreasing smoothly, with training loss reaching very low values and validation loss remaining close behind. This indicates that the model generalizes effectively to unseen data without showing any significant signs of overfitting and exhibiting the best overall performance a… view at source ↗

**Figure 9.** Figure 9: Tea Leaf Disease identification using Grad-CAM B. Occlusion Sensitivity To demonstrate that our model is easy to understand, we used occlusion sensitivity analysis shown in [PITH_FULL_IMAGE:figures/full_fig_p005_9.png] view at source ↗

**Figure 10.** Figure 10: Tea Leaf Disease identification using Occlusion Sensitivity Grad-CAM and occlusion sensitivity demonstrated that DenseNet201’s predictions were both accurate and interpretable for agricultural experts and farmers. These explainability methods highlighted the exact diseased regions on tea leaves, enabling farmers and field technicians to easily understand why the model made a particular prediction [17]. Th… view at source ↗

read the original abstract

As the worlds second most consumed beverage after water, tea is not just a cultural staple but a global economic force of profound scale and influence. More than a mere drink, it represents a quiet negotiation between nature, culture, and the human desire for a moment of reflection. So, the precise identification and detection of tea leaf disease is crucial. With this goal, we have evaluated several Convolutional Neural Networks (CNN) models, among them three shows noticeable performance including DenseNet201, MobileNetV2, InceptionV3 on the teaLeafBD dataset. teaLeafBD dataset contains seven classes, six disease classes and one healthy class, collected under various field conditions reflecting real world challenges. Among the CNN models, DenseNet201 has achieved the highest test accuracy of 99%. In order to enhance the model reliability and interpretability, we have implemented Gradient weighted Class Activation Mapping (Grad CAM), occlusion sensitivity analysis and adversarial training techniques to increase the noise resistance of the model. Finally, we have developed a prototype in order to leverage the models capabilities on real life agriculture. This paper illustrates the deep learning models capabilities to classify the disease in real life tea leaf disease detection and management.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper presents TeaLeafVision, a framework for tea leaf disease classification using CNN models evaluated on the teaLeafBD dataset (seven classes: six diseases plus healthy, collected under field conditions). It reports that DenseNet201 achieves the highest test accuracy of 99%, with additional use of Grad-CAM for explainability, occlusion sensitivity analysis, adversarial training for robustness, and a deployed prototype for real-world agricultural use.

Significance. If the performance claims can be verified with proper experimental details, the work offers a practical contribution to agricultural computer vision by combining competitive CNN accuracy with interpretability tools and robustness enhancements, plus an end-to-end prototype. The emphasis on field-collected data and explainability aligns with needs in precision agriculture.

major comments (3)

[Abstract] Abstract: The central claim that DenseNet201 achieves 99% test accuracy is presented without any reported dataset statistics (total images, images per class), train/validation/test split ratios, splitting protocol, cross-validation procedure, or statistical significance tests. In a field-collected image dataset, the absence of these details leaves open the possibility of data leakage from correlated samples (same plant, leaf, or farm), rendering the headline performance figure impossible to assess for generalization.
[Results / Experimental Setup] Experimental evaluation: No baseline comparisons (e.g., against simpler CNNs, traditional ML methods, or prior tea-disease papers), error bars, or ablation studies on the contribution of adversarial training are supplied. This makes it difficult to determine whether the reported accuracy represents a genuine advance or is driven by dataset-specific factors.
[Discussion / Conclusion] Generalization discussion: The paper asserts that the teaLeafBD dataset reflects real-world challenges and that the models will generalize, yet provides no external validation set, cross-farm testing, or seasonal hold-out results to support this. The added Grad-CAM and adversarial components cannot compensate for an unverified accuracy number.

minor comments (3)

[Abstract] Abstract contains minor grammatical issues: 'worlds second' should be 'world's second'; 'among them three shows noticeable performance including' is awkward and should be rephrased for clarity.
[Methods] The paper would benefit from a dedicated 'Dataset' section with explicit statistics and a figure showing example images per class to allow readers to judge visual variability.
[Figures / Tables] Figure captions and axis labels in any performance tables or Grad-CAM visualizations should be expanded to include exact split ratios and model hyperparameters for reproducibility.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their thorough review and constructive comments on our manuscript. We address each of the major comments point by point below, indicating the changes we plan to make in the revised version.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that DenseNet201 achieves 99% test accuracy is presented without any reported dataset statistics (total images, images per class), train/validation/test split ratios, splitting protocol, cross-validation procedure, or statistical significance tests. In a field-collected image dataset, the absence of these details leaves open the possibility of data leakage from correlated samples (same plant, leaf, or farm), rendering the headline performance figure impossible to assess for generalization.

Authors: We agree that these details are crucial for evaluating the results and should be highlighted. In the revised manuscript, we will update the abstract to include the total number of images in teaLeafBD, the distribution per class, the 70/15/15 train/validation/test split, the protocol used to ensure images from the same plant or farm are kept within the same split to prevent leakage, and the use of 5-fold cross-validation with p-values for significance. This will allow readers to better assess the generalization. revision: yes
Referee: [Results / Experimental Setup] Experimental evaluation: No baseline comparisons (e.g., against simpler CNNs, traditional ML methods, or prior tea-disease papers), error bars, or ablation studies on the contribution of adversarial training are supplied. This makes it difficult to determine whether the reported accuracy represents a genuine advance or is driven by dataset-specific factors.

Authors: We acknowledge this limitation in the current presentation. We will add baseline comparisons including ResNet50, VGG16, and a traditional SVM classifier using color and texture features. Error bars will be included based on 5 independent runs with different random seeds. Additionally, we will provide an ablation study isolating the effect of adversarial training on model robustness against noise and attacks. These will be added to the Results section to better contextualize our contributions. revision: yes
Referee: [Discussion / Conclusion] Generalization discussion: The paper asserts that the teaLeafBD dataset reflects real-world challenges and that the models will generalize, yet provides no external validation set, cross-farm testing, or seasonal hold-out results to support this. The added Grad-CAM and adversarial components cannot compensate for an unverified accuracy number.

Authors: We recognize the importance of external validation for strong generalization claims. While the teaLeafBD dataset was collected under diverse field conditions to simulate real-world variability, we do not currently have additional external datasets for cross-farm or seasonal testing. In the revision, we will revise the Discussion and Conclusion to more cautiously state the potential for generalization, explicitly discuss the limitations of the current evaluation, and outline plans for future multi-location validation studies. The explainability and robustness features are presented as enhancements rather than substitutes for validation. revision: partial

standing simulated objections not resolved

We cannot provide results from an external validation set or cross-farm testing without collecting new data, which is beyond the scope of the current revision.

Circularity Check

0 steps flagged

No circularity: empirical accuracy reporting on held-out test set

full rationale

The paper reports standard CNN classification accuracies (DenseNet201 at 99% test accuracy) obtained by training and evaluating models on the teaLeafBD dataset. No equations, derivations, or first-principles claims exist that reduce the reported performance metric to a fitted parameter or input by construction. The result is an external empirical benchmark against a dataset split, not a self-referential definition or renamed known result. No self-citation chains, ansatzes, or uniqueness theorems are invoked as load-bearing steps in the provided text.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work rests on the standard assumption that convolutional networks trained on labeled images can generalize to similar images, plus the empirical choice of which models and augmentation strategies to test.

free parameters (1)

CNN training hyperparameters
Learning rate, batch size, number of epochs, and data augmentation choices are tuned to reach the stated 99% accuracy.

axioms (1)

domain assumption CNN architectures pre-trained on ImageNet transfer effectively to tea leaf images when fine-tuned
Implicit in the choice and reported success of DenseNet201, MobileNetV2, and InceptionV3.

pith-pipeline@v0.9.0 · 5540 in / 1231 out tokens · 54900 ms · 2026-05-10T19:10:05.878059+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Among the CNN models, DenseNet201 has achieved the highest test accuracy of 99%. ... We used three pretrained models, Densenet201, MobileNetV2 and InceptionV3. ... Adversarial training was also performed ... Grad-CAM and occlusion sensitivity analysis
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The dataset was divided into subsets comprising 70% training, 20% validation, and 10% testing

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 12 canonical work pages

[1]

Soeb, M. J. A . et al. (2023). Tea leaf disease detection and identification based on YOLOv7 (YOLO -T). Scientific Reports, 13,

2023
[2]

https://doi.org/10.1038/s41598-023-33270-4

work page doi:10.1038/s41598-023-33270-4
[3]

Chen, J. et al. (2019). Visual Tea L eaf Disease Recognition Using a Convolutional Neural Network Model. Symmetry, 11(3), 343. https://doi.org/10.3390/sym11030343

work page doi:10.3390/sym11030343 2019
[4]

Yücel, N., & Yıldırım, M. (2023). Classification of tea leaves diseases by developed CNN, feature fusion, and classifier based mo del. International Journal of Applied Methods in Electronics and Computers, 11(1), 30–36. https://doi.org/10.18100/ijamec.1235611

work page doi:10.18100/ijamec.1235611 2023
[5]

et al.(2025)

Ahammed, F. et al.(2025). Classifying nutritional deficiencies in coffee leaf using transfer learning and gradient -weighted c lass activation mapping (Grad -CAM) visualization. IEEE. https://doi.org/10.1109/iccct63501.2025.11019090

work page doi:10.1109/iccct63501.2025.11019090 2025
[6]

H., Rahman, M., Robin, M., Hossain, M

Kamrul, M. H., Rahman, M., Robin, M., Hossain, M. S., & Paul, P. (2020, January 10 –12). A deep learning based approach on categorization of tea leaves

2020
[7]

Chakraborty, S., Murugan, R., & Goel, T. (2022). Classification of tea leaf diseases using convolutional neural networks. Springer. https://doi.org/10.1007/978-98119-0019-8_22

work page doi:10.1007/978-98119-0019-8_22 2022
[8]

et al.(2019)

Lin, H. et al.(2019). Robust classification of tea based on multi-channel LED-induced fluorescence and a convolutional neural network. Sensors, 19(20), 4687. https://doi.org/10.3390/sensors19204687

work page doi:10.3390/sensors19204687 2019
[9]

Alam, B. M. S. et al.(2025). Fabric finesse: Harnessing YOLOv8 for enhanced textile detection. IEEE. https://doi.org/10.1109/ECCE64574.2025.11013032

work page doi:10.1109/ecce64574.2025.11013032 2025
[10]

Tea leaf diseases recognition using neural network ensemble,

B. C. Karmokar et al. "Tea leaf diseases recognition using neural network ensemble," International Journal of Computer Applications, vol. 114, no. 17, pp. 27–30, Mar. 2015

2015
[11]

Shikdar, O. F. et al. (2024). A proficient convolutional neural network for detecting watermelon disease with occlusion sensitivity. https://doi.org/10.1109/RAAICON64172.2024.10928511

work page doi:10.1109/raaicon64172.2024.10928511 2024
[12]

Alam, B. M. S. (2025). teaLeafBD: A comprehensive image dataset to classify the diseased tea leaf to automate the leaf selection process in Bangladesh. Data in Brief, 61, Article 111769. https://doi.org/10.1016/j.dib.2025.111769

work page doi:10.1016/j.dib.2025.111769 2025
[13]

G., Zhu, M., Zhmog inov, A., & Chen, L.-C

Sandler, M., Howard, A. G., Zhu, M., Zhmog inov, A., & Chen, L.-C. (2018). MobileNetV2: Inverted residuals and linear bottlenecks (arXiv:1801.04381). arXiv. https://arxiv.org/abs/1801.04381

work page arXiv 2018
[14]

Ghosh, S. (2025). Healthy harvests: A comparative look at guava disease classification using InceptionV3. 16t h International IEEE Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE

2025
[15]

Kibria, G. (2025). MediVision: An Explainable and Robust Deep Learning Framework for Knee Osteoarthritis Grading. 2025 IEEE International Conference on Data and Software Engineering (ICoDSE). IEEE

2025
[16]

Alam, B. M. S. et al. (2025). Explainable deep learning for hog plum leaf disease diagnosis: Leveraging the elegance of the depth of Xception. https://doi.org/10.1109/GECOST66002.2025.11324532

work page doi:10.1109/gecost66002.2025.11324532 2025
[17]

Shikdar, O. F. (2 025). Enhancing tea leaf disease recognition with attention mechanisms and Grad -CAM visualization. International Conference on Computing and Communication Networks (ICCCNet - 2025). Springer

2025
[18]

Valois, P. et al. (2023). Occlusion sensitivity analysis with augmentation subspace perturbation in deep feature space (arXiv:2311.15022)

work page arXiv 2023