Horticultural Temporal Fruit Monitoring via 3D Instance Segmentation and Re-Identification using Colored Point Clouds
Pith reviewed 2026-05-23 17:14 UTC · model grok-4.3
The pith
Colored 3D point clouds with instance segmentation and attention matching enable temporal fruit re-identification across orchard scans.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The method segments fruits via learning-based instance segmentation on colored point clouds, extracts discriminative descriptors with 3D sparse convolutional neural networks, and associates fruits across sessions through an attention-based matching network with probabilistic assignment, producing higher accuracy than existing techniques on strawberry and apple orchard datasets and thereby supporting reliable temporal monitoring despite variations in size, orientation, occlusion, and fruit presence.
What carries the argument
Attention-based matching network that performs probabilistic assignment on descriptors produced by a 3D sparse convolutional neural network from instance-segmented colored point clouds.
If this is right
- The system produces more accurate fruit counts and locations over time than prior point-cloud methods.
- It handles the dynamic appearance and disappearance of fruits between scans.
- It works directly on dense colored terrestrial point clouds without intermediate 2D processing.
- It supports automated agricultural production by delivering consistent individual-fruit tracking.
Where Pith is reading between the lines
- The same descriptor-plus-attention pipeline could be tested on additional crop types or combined with robotic platforms for active data collection.
- If re-identification remains stable, the method could support per-fruit growth modeling by linking measurements across more than two sessions.
- The approach suggests a route for extending 3D temporal tracking to other dynamic natural scenes where objects vary in appearance.
Load-bearing premise
Individual fruits stay distinguishable by 3D shape, color, and local context between observation sessions even when size, orientation, occlusion, and visibility change.
What would settle it
A new set of repeated orchard scans in which many fruits exhibit large changes in shape or color between sessions, with the matching network then failing to produce correct associations at rates usable for monitoring.
Figures
read the original abstract
Accurate and consistent fruit monitoring over time is a key step toward automated agricultural production systems. However, this task is inherently difficult due to variations in fruit size, shape, occlusion, orientation, and the dynamic nature of orchards where fruits may appear or disappear between observations. In this article, we propose a novel method for fruit instance segmentation and re-identification on 3D terrestrial point clouds collected over time. Our approach directly operates on dense colored point clouds, capturing fine-grained 3D spatial detail. We segment individual fruits using a learning-based instance segmentation method applied directly to the point cloud. For each segmented fruit, we extract a compact and discriminative descriptor using a 3D sparse convolutional neural network. To track fruits across different times, we introduce an attention-based matching network that associates fruits with their counterparts from previous sessions. Matching is performed using a probabilistic assignment scheme, selecting the most likely associations across time. We evaluate our approach on real-world datasets of strawberries and apples, demonstrating that it outperforms existing methods in both instance segmentation and temporal re-identification, enabling robust and precise fruit monitoring across complex and dynamic orchard environments. Keywords = Agricultural Robotics, 3D Fruit Tracking, Instance Segmentation, Deep Learning , Point Clouds, Sparse Convolutional Networks, Temporal Monitoring
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a pipeline for horticultural temporal fruit monitoring that performs instance segmentation directly on dense colored 3D point clouds, extracts compact descriptors via a 3D sparse convolutional network, and tracks instances across observation sessions with an attention-based matching network that uses probabilistic assignment. The central empirical claim is that the method outperforms prior approaches on real-world strawberry and apple datasets in both segmentation and re-identification accuracy, enabling robust monitoring despite changes in size, orientation, occlusion, and fruit appearance/disappearance.
Significance. If the reported gains in re-identification are shown to be statistically reliable and generalizable beyond the tested orchards, the work would provide a practical advance for automated agricultural systems by demonstrating that learned 3D descriptors plus attention matching can handle the temporal dynamics of real orchards. The choice of sparse CNNs and probabilistic matching is a natural fit for colored point clouds; however, the absence of quantitative metrics, baselines, error bars, or failure-mode analysis in the abstract leaves the strength of this contribution difficult to assess from the provided material.
major comments (2)
- [Evaluation] Evaluation section: The abstract asserts outperformance on real-world strawberry and apple datasets without supplying any numerical results (e.g., segmentation mAP, re-identification accuracy, or F1 scores), baseline comparisons, dataset sizes, or statistical significance tests. This omission prevents verification of the central claim and must be addressed with full tables and error analysis before the empirical contribution can be evaluated.
- [§3.3] §3.3 (Descriptor extraction and matching): The claim that the 3D sparse CNN produces 'compact and discriminative' descriptors that remain reliable under size/orientation changes and lighting variation is load-bearing for the re-identification results, yet no ablation on descriptor collision rates, sensitivity to color shifts, or comparison against simple shape+color baselines is referenced. Without such evidence the probabilistic assignment step cannot be shown to deliver the claimed robustness.
minor comments (1)
- [Abstract] The keywords list contains an extraneous space before the final comma ('Deep Learning , Point Clouds').
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below with direct responses and commit to revisions that strengthen the presentation of our empirical results without altering the core claims or methodology.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The abstract asserts outperformance on real-world strawberry and apple datasets without supplying any numerical results (e.g., segmentation mAP, re-identification accuracy, or F1 scores), baseline comparisons, dataset sizes, or statistical significance tests. This omission prevents verification of the central claim and must be addressed with full tables and error analysis before the empirical contribution can be evaluated.
Authors: We agree that the abstract, due to length constraints, does not include numerical results. The Evaluation section (Section 4) of the full manuscript already contains the requested elements: comprehensive tables reporting segmentation mAP and re-identification accuracy/F1 scores with comparisons to prior methods, explicit dataset sizes and session counts for both strawberry and apple orchards, and error bars derived from repeated trials with statistical significance testing. To make the central claims immediately verifiable, we will revise the abstract to incorporate a concise summary of the key quantitative improvements and ensure the evaluation tables are cross-referenced prominently. revision: yes
-
Referee: [§3.3] §3.3 (Descriptor extraction and matching): The claim that the 3D sparse CNN produces 'compact and discriminative' descriptors that remain reliable under size/orientation changes and lighting variation is load-bearing for the re-identification results, yet no ablation on descriptor collision rates, sensitivity to color shifts, or comparison against simple shape+color baselines is referenced. Without such evidence the probabilistic assignment step cannot be shown to deliver the claimed robustness.
Authors: The manuscript demonstrates the effectiveness of the descriptors through end-to-end re-identification performance on real orchard data exhibiting the mentioned variations. However, we acknowledge that explicit ablations on collision rates, color-shift sensitivity, and direct comparisons to non-learned shape+color baselines are not currently included. We will add a dedicated ablation subsection in the revised manuscript to quantify these aspects, including collision analysis and baseline descriptor comparisons, to provide stronger support for the descriptor quality claim. revision: yes
Circularity Check
No significant circularity; method relies on external evaluation
full rationale
The paper describes a pipeline of instance segmentation via learning-based methods on point clouds, descriptor extraction with 3D sparse CNNs, and attention-based probabilistic matching for re-identification. No equations, fitted parameters, or self-citations are presented that reduce any claimed prediction or result to the inputs by construction. Performance claims are tied to evaluation on external real-world strawberry and apple datasets, making the derivation self-contained against benchmarks rather than internally forced.
Axiom & Free-Parameter Ledger
free parameters (1)
- Weights of segmentation and descriptor networks
axioms (1)
- domain assumption Input point clouds are dense and colored
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
extract a compact and discriminative descriptor using a 3D sparse convolutional neural network... attention-based matching network that associates fruits with their counterparts
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MinkPanoptic... mean shift clustering... transformer encoder layer
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Agricultural Robotics: The Future of Robotic Agriculture
T. Duckett, S. Pearson, S. Blackmore, B. Grieve, W. Chen, G. Cielniak, J. Cleaversmith, J. Dai, S. Davis, C. Fox, P. From, I. Georgilas, R. Gill, I. Gould, M. Hanheide, A. Hunter, F. Iida, L. Mihalyova, S. Nefti-Meziani, 24 G. Neumann, P. Paoletti, T. Pridmore, D. Ross, M. Smith, M. Stoelen, M. Swainson, S. Wane, P. Wilson, I. Wright, G. Yang, Agricultura...
work page internal anchor Pith review Pith/arXiv arXiv 2018
- [2]
-
[3]
F. Fiorani, U. Schurr, Future scenarios for plant phenotyping, Annual Re- view of Plant Biology 64 (2013) 267–291
work page 2013
-
[4]
M. Watt, F. Fiorani, B. Usadel, U. Rascher, O. Muller, U. Schurr, Phenotyping: New windows into the plant for breeders, An- nual Review of Plant Biology 71 (1) (2020). doi:10.1146/ annurev-arplant-042916-041124
work page 2020
-
[5]
I. P ´erez-Borrero, D. Mar´ın-Santos, M. E. Geg´undez-Arias, E. Cort´es-Ancos, A fast and accurate deep learning method for strawberry instance seg- mentation, Computers and Electronics in Agriculture 178 (2020) 105736. doi:10.1016/j.compag.2020.105736
-
[6]
P. Ganesh, K. V olle, T. Burks, S. Mehta, Deep orange: Mask R-CNN based orange detection and segmentation, IFAC Proceedings V olumes 52 (30) (2019) 70–75. doi:10.1016/j.ifacol.2019.12.499
-
[7]
S. Gonzalez, C. Arellano, J. E. Tapia, Deepblueberry: Quantification of blueberries in the wild using instance segmentation, IEEE Access 7 (2019). doi:10.1109/ACCESS.2019.2933062
-
[8]
J. Gen ´e-Mola, R. Sanz-Cortiella, J. R. Rosell-Polo, J.-R. Morros, J. Ruiz- Hidalgo, V . Vilaplana, E. Gregorio, Fruit detection and 3d location us- ing instance segmentation neural networks and structure-from-motion pho- togrammetry, Computers and Electronics in Agriculture 169 (2020) 105165. doi:10.1016/j.compag.2019.105165
- [9]
-
[10]
C. Stachniss, J. Leonard, S. Thrun, Springer Handbook of Robotics, 2nd edition, Springer Verlag, 2016, Ch. Chapt. 46: Simultaneous Localization and Mapping, pp. 1153–1176
work page 2016
-
[11]
O. Vysotska, C. Stachniss, Effective Visual Place Recognition Using Multi- Sequence Maps, IEEE Robotics and Automation Letters (RA-L) 4 (2) (2019) 1730–1736
work page 2019
-
[12]
J. Rodriguez-Sanchez, J. L. Snider, K. Johnsen, C. Li, Cotton morpholog- ical traits tracking through spatiotemporal registration of terrestrial laser scanning time-series data, Frontiers in Plant Science 15 (2024). doi: 10.3389/fpls.2024.1436120
-
[13]
K. He, G. Gkioxari, P. Doll ´ar, R. Girshick, Mask R-CNN, in: Proc. of the IEEE Intl. Conf. on Computer Vision (ICCV), 2017
work page 2017
-
[14]
Q. Zhu, L. Fan, N. Weng, Advancements in point cloud data augmentation for deep learning: A survey, Pattern Recognition 153 (2024) 110532. doi: 10.1016/j.patcog.2024.110532
-
[15]
C. Choy, J. Gwak, S. Savarese, 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks, in: Proc. of the IEEE/CVF Conf. on Com- puter Vision and Pattern Recognition (CVPR), 2019
work page 2019
-
[16]
X. Zhu, H. Zhou, T. Wang, F. Hong, W. Li, Y . Ma, H. Li, R. Yang, D. Lin, Cylindrical and asymmetrical 3d convolution networks for lidar-based per- ception, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) 44 (10) (2022) 6807–6822. doi:10.1109/TPAMI.2021.3098789
- [17]
-
[18]
R. Marcuzzi, L. Nunes, L. Wiesmann, J. Behley, C. Stachniss, Mask-Based Panoptic LiDAR Segmentation for Autonomous Driving, IEEE Robotics and Automation Letters (RA-L) 8 (2) (2023) 1141–1148
work page 2023
- [19]
-
[20]
S. Shin, K. Zhou, M. Vankadari, A. Markham, N. Trigoni, Spherical mask: Coarse-to-fine 3d point cloud instance segmentation with spherical repre- sentation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2024
work page 2024
-
[21]
Z. Xiao, W. Zhang, T. Wang, C. C. Loy, D. Lin, J. Pang, Position- guided point cloud panoptic segmentation transformer, Intl. Journal of Computer Vision (IJCV) 133 (1) (2025) 275–290. doi:10.1007/ s11263-024-02162-z
work page 2025
-
[22]
M. Kolodiazhnyi, A. V orontsova, A. Konushin, D. Rukhovich, One- former3d: One transformer for unified point cloud segmentation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 20943–20953
work page 2024
-
[23]
F. Hong, H. Zhou, X. Zhu, H. Li, Z. Liu, Lidar-based panoptic segmentation via dynamic shifting network, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13090–13099
work page 2021
-
[24]
Air traffic control complexity as workload driver
B. Xiang, Y . Yue, T. Peters, K. Schindler, A review of panoptic segmen- tation for mobile mapping point clouds, ISPRS Journal of Photogramme- try and Remote Sensing (JPRS) 203 (2023) 373–391. doi:10.1016/j. isprsjprs.2023.08.008
work page doi:10.1016/j 2023
- [25]
-
[26]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention Is All You Need, in: Proc. of the Conf. on Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[27]
H. Kang, X. Wang, C. Chen, Accurate fruit localisation using high resolution lidar-camera fusion and instance segmentation, Computers and Electronics in Agriculture 203 (2022) 107450. doi:10.1016/j.compag.2022. 107450
-
[28]
F. Magistri, Y . Pan, J. Bartels, J. Behley, C. Stachniss, C. Lehnert, Improving Robotic Fruit Harvesting Within Cluttered Environments Through 3D Shape Completion, IEEE Robotics and Automation Letters (RA-L) 9 (8) (2024) 7357–7364. doi:10.1109/LRA.2024.3421788. 27
-
[29]
J. P. Rodr ´ıguez, D. C. Corrales, J.-N. Aubertot, J. C. Corrales, A computer vision system for automatic cherry beans detection on coffee trees, Pattern Recognition Letters 136 (2020) 142–153. doi:10.1016/j.patrec. 2020.05.034
-
[30]
L. Liu, G. Li, Y . Du, X. Li, X. Wu, Z. Qiao, T. Wang, Cs-net: Conv- simpleformer network for agricultural image segmentation, Pattern Recog- nition 147 (2024) 110140. doi:10.1016/j.patcog.2023.110140
-
[31]
P. Chu, Z. Li, K. Lammers, R. Lu, X. Liu, Deep learning-based apple detec- tion using a suppression mask r-cnn, Pattern Recognition Letters 147 (2021) 206–211. doi:10.1016/j.patrec.2021.04.022
-
[32]
A. Cardellicchio, F. Solimani, G. Dimauro, S. Summerer, V . Ren `o, Patch- based probabilistic identification of plant roots using convolutional neu- ral networks, Pattern Recognition Letters 183 (2024) 125–132. doi: 10.1016/j.patrec.2024.05.010
-
[33]
J. Kierdorf, I. Weber, A. Kicherer, L. Zabawa, L. Drees, R. Roscher, Be- hind the leaves: Estimation of occluded grapevine berries with conditional generative adversarial networks, Frontiers in Artificial Intelligence 5 (2022). doi:10.3389/frai.2022.830026
- [34]
-
[35]
M. Halstead, C. McCool, S. Denman, T. Perez, C. Fookes, Fruit quantity and ripeness estimation using a robotic vision system, IEEE Robotics and Automation Letters (RA-L) 3 (4) (2018) 2995–3002
work page 2018
- [36]
-
[37]
P. M. Blok, E. J. van Henten, F. K. van Evert, G. Kootstra, Image-based size estimation of broccoli heads under varying degrees of occlusion, Biosystems Engineering 208 (2021) 213–233
work page 2021
-
[38]
A. S. Gomez, E. Aptoula, S. Parsons, P. Bosilj, Deep regression versus de- tection for counting in robotic phenotyping, IEEE Robotics and Automation 28 Letters (RA-L) 6 (2) (2021) 2902–2907. doi:10.1109/LRA.2021. 3062586
-
[39]
H. Hao, S. Wu, Y . Li, W. Wen, jiangchuan Fan, Y . Zhang, L. Zhuang, L. Xu, H. Li, X. Guo, S. Liu, Automatic acquisition, analysis and wilting measure- ment of cotton 3d phenotype based on point cloud, Biosystems Engineering 239 (2024) 173–189. doi:10.1016/j.biosystemseng.2024.02. 010
-
[40]
F. P. Boogaard, E. J. van Henten, G. Kootstra, The added value of 3d point clouds for digital plant phenotyping – a case study on internode length mea- surements in cucumber, Biosystems Engineering 234 (2023) 1–12. doi: 10.1016/j.biosystemseng.2023.08.010
-
[41]
J. Rodriguez-Sanchez, K. Johnsen, C. Li, A ground mobile robot for au- tonomous terrestrial laser scanning-based field phenotyping, arXiv preprint arXiv:2404.04404 (2024)
-
[42]
N. Chebrolu, F. Magistri, T. L ¨abe, C. Stachniss, Registration of Spatio- Temporal Point Clouds of Plants for Phenotyping, PLOS ONE 16 (2) (2021)
work page 2021
-
[43]
A. Riccardi, S. Kelly, E. Marks, F. Magistri, T. Guadagnino, J. Behley, M. Bennewitz, C. Stachniss, Fruit Tracking Over Time Using High-Precision Point Clouds, in: Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2023
work page 2023
-
[44]
L. Lobefaro, M. Malladi, O. Vysotska, T. Guadagnino, C. Stachniss, Esti- mating 4D Data Associations Towards Spatial-Temporal Mapping of Grow- ing Plants for Agricultural Robots, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023
work page 2023
-
[45]
L. Lobefaro, M. Malladi, T. Guadagnino, C. Stachniss, Spatio-Temporal Consistent Mapping of Growing Plants for Agricultural Robots in the Wild, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2024
work page 2024
-
[46]
D. Comaniciu, P. Meer, Mean Shift: A Robust Approach Toward Feature Space Analysis, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) 24 (5) (2002) 603–619. 29
work page 2002
- [47]
-
[48]
B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi, R. Ng, NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, in: Proc. of the Europ. Conf. on Computer Vision (ECCV), 2020
work page 2020
- [49]
-
[50]
J. Gen ´e-Mola, R. Sanz-Cortiella, J. R. Rosell-Polo, A. Escol `a, E. Gregorio, Pfuji-size dataset: A collection of images and photogrammetry-derived 3d point clouds with ground truth annotations for fuji apple detection and size estimation in field conditions, Data in Brief 39 (2021) 107629. doi:10. 1016/j.dib.2021.107629
-
[51]
A. Kirillov, K. He, R. Girshick, C. Rother, P. Doll´ar, Panoptic Segmentation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recogni- tion (CVPR), 2019
work page 2019
-
[52]
A. Kirillov, R. Girshick, K. He, P. Dollar, Panoptic Feature Pyramid Net- works, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2019
work page 2019
- [53]
-
[54]
T. N. Kipf, M. Welling, Semi-supervised classification with graph convo- lutional networks, in: Proc. of the Intl. Conf. on Learning Representations (ICLR), 2017
work page 2017
-
[55]
Y . Wang, Y . Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, J. M. Solomon, Dynamic graph CNN for learning on point clouds, ACM Trans. on Graphics (TOG) 38 (5) (2019). doi:10.1145/3326362. 30
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.