Mango Tree Net -- A fully convolutional network for semantic segmentation and individual crown detection of mango trees

Omkar Narasipura; Ramesh Kestur; Vikas Agaradahalli Gurumurthy

arxiv: 1907.06915 · v1 · pith:SEOKU6EDnew · submitted 2019-07-16 · 💻 cs.CV

Mango Tree Net -- A fully convolutional network for semantic segmentation and individual crown detection of mango trees

Vikas Agaradahalli Gurumurthy , Ramesh Kestur , Omkar Narasipura This is my paper

Pith reviewed 2026-05-24 21:11 UTC · model grok-4.3

classification 💻 cs.CV

keywords mango tree segmentationfully convolutional networkUAV imagerycrown detectionsemantic segmentationindividual tree detectionaerial image analysiscontour based detection

0 comments

The pith

Mango Tree Net, a fully convolutional network, segments mango trees in UAV imagery and detects individual crowns by separating overlapping trees with retraining and contour detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Mango Tree Net to perform semantic segmentation on high-resolution aerial images captured by drones. After initial training, the network is retrained specifically to distinguish between touching or overlapping tree crowns in the segmentation map. Contour-based connected object detection then identifies individual crowns and draws bounding boxes on the original images. This pipeline is evaluated on separate test sets for segmentation and crown detection tasks using precision, recall, f1-score, and accuracy. A reader would care because accurate individual tree detection from aerial views supports applications in orchard management and agricultural surveying.

Core claim

Mango Tree Net is trained using supervised learning on 8,824 image patches to segment mango trees, then retrained to separate touching crowns, after which contour based connected object detection on the output produces bounding boxes for individual crown detection, demonstrating robustness despite variations in scale, occlusion, lighting conditions and surrounding vegetation.

What carries the argument

Mango Tree Net, a fully convolutional neural network that is retrained to separate overlapping crowns, combined with contour-based connected object detection on the segmentation output.

If this is right

The method achieves reliable semantic segmentation on test images with 36 images.
Individual crown detection works on 4 test images using the retrained network and contour detection.
Performance is measured with standard metrics showing robustness to image variations.
The approach handles variations in scale, occlusion, lighting, and vegetation through the retraining step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar retraining strategies could apply to detecting individual instances of other objects in segmentation outputs.
The bounding box outputs could feed into further analysis like tree health monitoring from additional image features.
Testing on larger or more diverse UAV datasets would clarify the limits of the contour detection step.
Replacing contour detection with learned instance segmentation might improve performance on highly irregular overlaps.

Load-bearing premise

The retrained segmentation output plus contour-based connected object detection will reliably separate and localize individual crowns even under the variations in scale, occlusion, lighting conditions and surrounding vegetation present in the test images.

What would settle it

Manual verification on a new UAV image set with many overlapping mango crowns would show if the method merges or misses a substantial fraction of trees compared to ground truth counts.

read the original abstract

This work presents a method for semantic segmentation of mango trees in high resolution aerial imagery, and, a novel method for individual crown detection of mango trees using segmentation output. Mango Tree Net, a fully convolutional neural network (FCN), is trained using supervised learning to perform semantic segmentation of mango trees in imagery acquired using an unmanned aerial vehicle (UAV). The proposed network is retrained to separate touching/overlapping tree crowns in segmentation output. Contour based connected object detection is performed on the segmentation output from retrained network. Bounding boxes are drawn on the original images using coordinates of connected objects to achieve individual crown detection. The training dataset consists of 8,824 image patches of size 240 x 240. The approach is tested for performance on segmentation and individual crown detection tasks using test datasets containing 36 and 4 images respectively. The performance is analyzed using standard metrics precision, recall, f1-score and accuracy. Results obtained demonstrate the robustness of the proposed methods despite variations in factors such as scale, occlusion, lighting conditions and surrounding vegetation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mango Tree Net applies a standard FCN plus retraining and contours to mango crown detection from UAV images, but the four-image test set leaves the robustness claim unsupported.

read the letter

The paper applies a fully convolutional network to segment mango trees from UAV imagery and adds a retraining step plus contour-based detection to handle individual crowns. This specific combination for mango trees is new, though it builds directly on known FCN methods. The paper does a good job setting up the supervised learning on 8824 patches of 240x240 and describing the two tasks: segmentation on 36 test images and detection on 4. They plan to use the usual metrics. Where it falls short is the support for the robustness claim. Four images provide very little coverage of the variation space the authors themselves list. Without reported scores or a larger, more diverse test set, it's difficult to accept that the method handles real-world conditions reliably. The absence of any architecture details or training split information adds to the problem. Readers working on similar precision agriculture projects with UAV data for tree crops would find this useful as an example. It is not for those interested in new network designs or theoretical contributions. I think it deserves a serious referee to check if the full manuscript has the quantitative results that the abstract omits. The thinking is straightforward and honest about the goal, so peer review makes sense even with the evaluation gaps. If the full paper includes the actual metric values and perhaps cross-validation, it could be a solid practical contribution. Otherwise the small test set is the load-bearing issue.

Referee Report

3 major / 1 minor

Summary. The paper introduces Mango Tree Net, a fully convolutional network trained via supervised learning on 8,824 patches for semantic segmentation of mango trees in UAV imagery. It describes retraining the network to separate overlapping crowns and applying contour-based connected-component detection plus bounding-box extraction for individual crown localization. The approach is evaluated on held-out sets of 36 images (segmentation) and 4 images (crown detection) using precision, recall, F1-score and accuracy, with claims of robustness to scale, occlusion, lighting and vegetation variations.

Significance. If the numerical results and evaluation protocol were shown to support the claims, the work would provide a practical demonstration of FCN-based segmentation and post-processing for orchard monitoring, extending remote-sensing techniques to a specific agricultural application with potential utility in precision agriculture.

major comments (3)

[Abstract / Results] Abstract and results section: the manuscript states that performance is analyzed using precision, recall, F1-score and accuracy on the 36- and 4-image test sets and that results demonstrate robustness, yet reports none of the numerical metric values. Without these scores the central empirical claims cannot be assessed.
[Methods / Experiments] Experiments / training description: no information is supplied on the train/validation split of the 8,824 patches, the precise network architecture, or any hyper-parameter choices. This absence prevents evaluation of the supervised-learning pipeline and reproducibility.
[Results / Crown detection] Crown-detection evaluation: the individual-crown task is tested on only four images. Given the explicit claim of robustness across variations in scale, occlusion, lighting and surrounding vegetation, this test cardinality supplies insufficient coverage of the variation space and does not support the generalization statement.

minor comments (1)

[Methods] The manuscript would benefit from an architecture diagram of Mango Tree Net and a clear statement of the loss function and optimizer used during training.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve clarity, reproducibility, and support for the claims.

read point-by-point responses

Referee: [Abstract / Results] Abstract and results section: the manuscript states that performance is analyzed using precision, recall, F1-score and accuracy on the 36- and 4-image test sets and that results demonstrate robustness, yet reports none of the numerical metric values. Without these scores the central empirical claims cannot be assessed.

Authors: We agree that the numerical metric values must be reported to substantiate the claims. The revised manuscript will include the specific precision, recall, F1-score, and accuracy figures obtained on the 36-image segmentation test set and the 4-image crown-detection test set. revision: yes
Referee: [Methods / Experiments] Experiments / training description: no information is supplied on the train/validation split of the 8,824 patches, the precise network architecture, or any hyper-parameter choices. This absence prevents evaluation of the supervised-learning pipeline and reproducibility.

Authors: We acknowledge the omission of these details. The revised methods section will specify the train/validation split of the 8,824 patches, the exact FCN architecture (including layer configuration), and all hyper-parameter choices such as learning rate, optimizer, batch size, and number of epochs. revision: yes
Referee: [Results / Crown detection] Crown-detection evaluation: the individual-crown task is tested on only four images. Given the explicit claim of robustness across variations in scale, occlusion, lighting and surrounding vegetation, this test cardinality supplies insufficient coverage of the variation space and does not support the generalization statement.

Authors: The four images were selected to exhibit the cited variations, but we recognize that the sample size is small for broad generalization claims. We will revise the results and discussion to qualify the robustness statements accordingly and, if feasible, report results on any additional held-out images available. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper presents an empirical computer-vision pipeline: supervised training of an FCN on 8,824 patches for semantic segmentation, followed by retraining and standard contour-based connected-component post-processing for crown detection. No equations, fitted parameters, or mathematical derivations are described that reduce to their own inputs by construction. Performance is reported via standard metrics on separate test sets (36 images for segmentation, 4 for detection). No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear. The central claims rest on external empirical evaluation rather than self-referential definitions, satisfying the criteria for a self-contained, non-circular result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that labeled UAV patches are sufficient to train a generalizable segmenter and that contour detection on the output will resolve overlaps; no free parameters, axioms, or invented entities are explicitly introduced beyond the network itself.

axioms (1)

domain assumption Supervised learning on the provided 8,824 labeled patches produces a model that generalizes to unseen images containing scale, occlusion, lighting, and vegetation variation.
Invoked by the statement that the approach is tested on separate test datasets and demonstrates robustness despite those factors.

pith-pipeline@v0.9.0 · 5728 in / 1282 out tokens · 22921 ms · 2026-05-24T21:11:29.738209+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Mango Tree Net, a fully convolutional neural network (FCN), is trained using supervised learning to perform semantic segmentation...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Contour based connected object detection is performed on the segmentation output...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.