Mango Tree Net -- A fully convolutional network for semantic segmentation and individual crown detection of mango trees
Pith reviewed 2026-05-24 21:11 UTC · model grok-4.3
The pith
Mango Tree Net, a fully convolutional network, segments mango trees in UAV imagery and detects individual crowns by separating overlapping trees with retraining and contour detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mango Tree Net is trained using supervised learning on 8,824 image patches to segment mango trees, then retrained to separate touching crowns, after which contour based connected object detection on the output produces bounding boxes for individual crown detection, demonstrating robustness despite variations in scale, occlusion, lighting conditions and surrounding vegetation.
What carries the argument
Mango Tree Net, a fully convolutional neural network that is retrained to separate overlapping crowns, combined with contour-based connected object detection on the segmentation output.
If this is right
- The method achieves reliable semantic segmentation on test images with 36 images.
- Individual crown detection works on 4 test images using the retrained network and contour detection.
- Performance is measured with standard metrics showing robustness to image variations.
- The approach handles variations in scale, occlusion, lighting, and vegetation through the retraining step.
Where Pith is reading between the lines
- Similar retraining strategies could apply to detecting individual instances of other objects in segmentation outputs.
- The bounding box outputs could feed into further analysis like tree health monitoring from additional image features.
- Testing on larger or more diverse UAV datasets would clarify the limits of the contour detection step.
- Replacing contour detection with learned instance segmentation might improve performance on highly irregular overlaps.
Load-bearing premise
The retrained segmentation output plus contour-based connected object detection will reliably separate and localize individual crowns even under the variations in scale, occlusion, lighting conditions and surrounding vegetation present in the test images.
What would settle it
Manual verification on a new UAV image set with many overlapping mango crowns would show if the method merges or misses a substantial fraction of trees compared to ground truth counts.
read the original abstract
This work presents a method for semantic segmentation of mango trees in high resolution aerial imagery, and, a novel method for individual crown detection of mango trees using segmentation output. Mango Tree Net, a fully convolutional neural network (FCN), is trained using supervised learning to perform semantic segmentation of mango trees in imagery acquired using an unmanned aerial vehicle (UAV). The proposed network is retrained to separate touching/overlapping tree crowns in segmentation output. Contour based connected object detection is performed on the segmentation output from retrained network. Bounding boxes are drawn on the original images using coordinates of connected objects to achieve individual crown detection. The training dataset consists of 8,824 image patches of size 240 x 240. The approach is tested for performance on segmentation and individual crown detection tasks using test datasets containing 36 and 4 images respectively. The performance is analyzed using standard metrics precision, recall, f1-score and accuracy. Results obtained demonstrate the robustness of the proposed methods despite variations in factors such as scale, occlusion, lighting conditions and surrounding vegetation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Mango Tree Net, a fully convolutional network trained via supervised learning on 8,824 patches for semantic segmentation of mango trees in UAV imagery. It describes retraining the network to separate overlapping crowns and applying contour-based connected-component detection plus bounding-box extraction for individual crown localization. The approach is evaluated on held-out sets of 36 images (segmentation) and 4 images (crown detection) using precision, recall, F1-score and accuracy, with claims of robustness to scale, occlusion, lighting and vegetation variations.
Significance. If the numerical results and evaluation protocol were shown to support the claims, the work would provide a practical demonstration of FCN-based segmentation and post-processing for orchard monitoring, extending remote-sensing techniques to a specific agricultural application with potential utility in precision agriculture.
major comments (3)
- [Abstract / Results] Abstract and results section: the manuscript states that performance is analyzed using precision, recall, F1-score and accuracy on the 36- and 4-image test sets and that results demonstrate robustness, yet reports none of the numerical metric values. Without these scores the central empirical claims cannot be assessed.
- [Methods / Experiments] Experiments / training description: no information is supplied on the train/validation split of the 8,824 patches, the precise network architecture, or any hyper-parameter choices. This absence prevents evaluation of the supervised-learning pipeline and reproducibility.
- [Results / Crown detection] Crown-detection evaluation: the individual-crown task is tested on only four images. Given the explicit claim of robustness across variations in scale, occlusion, lighting and surrounding vegetation, this test cardinality supplies insufficient coverage of the variation space and does not support the generalization statement.
minor comments (1)
- [Methods] The manuscript would benefit from an architecture diagram of Mango Tree Net and a clear statement of the loss function and optimizer used during training.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve clarity, reproducibility, and support for the claims.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and results section: the manuscript states that performance is analyzed using precision, recall, F1-score and accuracy on the 36- and 4-image test sets and that results demonstrate robustness, yet reports none of the numerical metric values. Without these scores the central empirical claims cannot be assessed.
Authors: We agree that the numerical metric values must be reported to substantiate the claims. The revised manuscript will include the specific precision, recall, F1-score, and accuracy figures obtained on the 36-image segmentation test set and the 4-image crown-detection test set. revision: yes
-
Referee: [Methods / Experiments] Experiments / training description: no information is supplied on the train/validation split of the 8,824 patches, the precise network architecture, or any hyper-parameter choices. This absence prevents evaluation of the supervised-learning pipeline and reproducibility.
Authors: We acknowledge the omission of these details. The revised methods section will specify the train/validation split of the 8,824 patches, the exact FCN architecture (including layer configuration), and all hyper-parameter choices such as learning rate, optimizer, batch size, and number of epochs. revision: yes
-
Referee: [Results / Crown detection] Crown-detection evaluation: the individual-crown task is tested on only four images. Given the explicit claim of robustness across variations in scale, occlusion, lighting and surrounding vegetation, this test cardinality supplies insufficient coverage of the variation space and does not support the generalization statement.
Authors: The four images were selected to exhibit the cited variations, but we recognize that the sample size is small for broad generalization claims. We will revise the results and discussion to qualify the robustness statements accordingly and, if feasible, report results on any additional held-out images available. revision: partial
Circularity Check
No circularity in derivation chain
full rationale
The paper presents an empirical computer-vision pipeline: supervised training of an FCN on 8,824 patches for semantic segmentation, followed by retraining and standard contour-based connected-component post-processing for crown detection. No equations, fitted parameters, or mathematical derivations are described that reduce to their own inputs by construction. Performance is reported via standard metrics on separate test sets (36 images for segmentation, 4 for detection). No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear. The central claims rest on external empirical evaluation rather than self-referential definitions, satisfying the criteria for a self-contained, non-circular result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Supervised learning on the provided 8,824 labeled patches produces a model that generalizes to unseen images containing scale, occlusion, lighting, and vegetation variation.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Mango Tree Net, a fully convolutional neural network (FCN), is trained using supervised learning to perform semantic segmentation...
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Contour based connected object detection is performed on the segmentation output...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.