Horticultural Temporal Fruit Monitoring via 3D Instance Segmentation and Re-Identification using Colored Point Clouds

Alberto Pretto; Cyrill Stachniss; Daniel Fusaro; Federico Magistri; Jens Behley

arxiv: 2411.07799 · v3 · submitted 2024-11-12 · 💻 cs.CV · cs.RO

Horticultural Temporal Fruit Monitoring via 3D Instance Segmentation and Re-Identification using Colored Point Clouds

Daniel Fusaro , Federico Magistri , Jens Behley , Alberto Pretto , Cyrill Stachniss This is my paper

Pith reviewed 2026-05-23 17:14 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords 3D point cloudsfruit instance segmentationtemporal re-identificationagricultural roboticssparse convolutional networksattention-based matchingorchard monitoringcolored point clouds

0 comments

The pith

Colored 3D point clouds with instance segmentation and attention matching enable temporal fruit re-identification across orchard scans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to create a system that segments individual fruits directly from dense colored point clouds collected over multiple sessions and then re-identifies the same fruits in later scans. It does this by running learning-based instance segmentation, building compact descriptors with a 3D sparse convolutional network, and feeding those descriptors into an attention-based matching network that uses probabilistic assignment to link fruits across time. The approach is evaluated on real strawberry and apple datasets where it beats prior methods at both segmentation and re-identification. A sympathetic reader would care because consistent automated tracking of individual fruits is a prerequisite for precision agriculture in environments where fruits grow, shift, or become hidden between observations.

Core claim

The method segments fruits via learning-based instance segmentation on colored point clouds, extracts discriminative descriptors with 3D sparse convolutional neural networks, and associates fruits across sessions through an attention-based matching network with probabilistic assignment, producing higher accuracy than existing techniques on strawberry and apple orchard datasets and thereby supporting reliable temporal monitoring despite variations in size, orientation, occlusion, and fruit presence.

What carries the argument

Attention-based matching network that performs probabilistic assignment on descriptors produced by a 3D sparse convolutional neural network from instance-segmented colored point clouds.

If this is right

The system produces more accurate fruit counts and locations over time than prior point-cloud methods.
It handles the dynamic appearance and disappearance of fruits between scans.
It works directly on dense colored terrestrial point clouds without intermediate 2D processing.
It supports automated agricultural production by delivering consistent individual-fruit tracking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same descriptor-plus-attention pipeline could be tested on additional crop types or combined with robotic platforms for active data collection.
If re-identification remains stable, the method could support per-fruit growth modeling by linking measurements across more than two sessions.
The approach suggests a route for extending 3D temporal tracking to other dynamic natural scenes where objects vary in appearance.

Load-bearing premise

Individual fruits stay distinguishable by 3D shape, color, and local context between observation sessions even when size, orientation, occlusion, and visibility change.

What would settle it

A new set of repeated orchard scans in which many fruits exhibit large changes in shape or color between sessions, with the matching network then failing to produce correct associations at rates usable for monitoring.

Figures

Figures reproduced from arXiv: 2411.07799 by Alberto Pretto, Cyrill Stachniss, Daniel Fusaro, Federico Magistri, Jens Behley.

**Figure 2.** Figure 2: Pipeline of our approach. Fruit instance segmentation provides [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Fruit instance segmentation and re-identification using our method. On the top [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 4.** Figure 4: Radar plots comparing re-identification performance, detailed in Tab. 3, at vary [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: We investigated the significance of neighboring fruits in the descriptor compu [PITH_FULL_IMAGE:figures/full_fig_p023_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study to evaluate the impact of various design choices on our method. [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

read the original abstract

Accurate and consistent fruit monitoring over time is a key step toward automated agricultural production systems. However, this task is inherently difficult due to variations in fruit size, shape, occlusion, orientation, and the dynamic nature of orchards where fruits may appear or disappear between observations. In this article, we propose a novel method for fruit instance segmentation and re-identification on 3D terrestrial point clouds collected over time. Our approach directly operates on dense colored point clouds, capturing fine-grained 3D spatial detail. We segment individual fruits using a learning-based instance segmentation method applied directly to the point cloud. For each segmented fruit, we extract a compact and discriminative descriptor using a 3D sparse convolutional neural network. To track fruits across different times, we introduce an attention-based matching network that associates fruits with their counterparts from previous sessions. Matching is performed using a probabilistic assignment scheme, selecting the most likely associations across time. We evaluate our approach on real-world datasets of strawberries and apples, demonstrating that it outperforms existing methods in both instance segmentation and temporal re-identification, enabling robust and precise fruit monitoring across complex and dynamic orchard environments. Keywords = Agricultural Robotics, 3D Fruit Tracking, Instance Segmentation, Deep Learning , Point Clouds, Sparse Convolutional Networks, Temporal Monitoring

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper assembles a 3D point-cloud pipeline for fruit segmentation and temporal re-identification in orchards but the abstract supplies no metrics or baselines to support the outperformance claim.

read the letter

The main takeaway is a concrete pipeline that segments fruits directly on colored point clouds, extracts descriptors with a 3D sparse CNN, and links instances across sessions via an attention-based probabilistic matcher. This targets the practical problem of tracking strawberries and apples as they grow, move, or get occluded between scans. The work is new in the sense that it couples these pieces end-to-end for multi-session orchard monitoring rather than treating segmentation and re-identification as separate tasks. It does a reasonable job stating the domain constraints and choosing colored point clouds as input, which aligns with the visual cues fruits actually provide. The components themselves draw from existing sparse-convolution and attention literature, so the contribution sits in the integrated application more than in novel primitives. The soft spot is the complete absence of numbers. The abstract asserts outperformance on real datasets yet reports no IoU scores, matching accuracy, dataset sizes, session counts, or comparisons to prior methods. Without those, it is impossible to check whether the descriptors remain separable when fruits change size or when lighting shifts color. The stress-test concern about untested robustness under orchard variability therefore lands directly on the given text. This paper is aimed at researchers building perception systems for agricultural robotics. A reader already working on 3D instance segmentation or temporal matching in unstructured environments would find the pipeline description useful as a reference point. It deserves peer review because the problem is well-motivated and the method is described at a level that referees can evaluate once the quantitative results and failure-mode analysis are supplied.

Referee Report

2 major / 1 minor

Summary. The paper proposes a pipeline for horticultural temporal fruit monitoring that performs instance segmentation directly on dense colored 3D point clouds, extracts compact descriptors via a 3D sparse convolutional network, and tracks instances across observation sessions with an attention-based matching network that uses probabilistic assignment. The central empirical claim is that the method outperforms prior approaches on real-world strawberry and apple datasets in both segmentation and re-identification accuracy, enabling robust monitoring despite changes in size, orientation, occlusion, and fruit appearance/disappearance.

Significance. If the reported gains in re-identification are shown to be statistically reliable and generalizable beyond the tested orchards, the work would provide a practical advance for automated agricultural systems by demonstrating that learned 3D descriptors plus attention matching can handle the temporal dynamics of real orchards. The choice of sparse CNNs and probabilistic matching is a natural fit for colored point clouds; however, the absence of quantitative metrics, baselines, error bars, or failure-mode analysis in the abstract leaves the strength of this contribution difficult to assess from the provided material.

major comments (2)

[Evaluation] Evaluation section: The abstract asserts outperformance on real-world strawberry and apple datasets without supplying any numerical results (e.g., segmentation mAP, re-identification accuracy, or F1 scores), baseline comparisons, dataset sizes, or statistical significance tests. This omission prevents verification of the central claim and must be addressed with full tables and error analysis before the empirical contribution can be evaluated.
[§3.3] §3.3 (Descriptor extraction and matching): The claim that the 3D sparse CNN produces 'compact and discriminative' descriptors that remain reliable under size/orientation changes and lighting variation is load-bearing for the re-identification results, yet no ablation on descriptor collision rates, sensitivity to color shifts, or comparison against simple shape+color baselines is referenced. Without such evidence the probabilistic assignment step cannot be shown to deliver the claimed robustness.

minor comments (1)

[Abstract] The keywords list contains an extraneous space before the final comma ('Deep Learning , Point Clouds').

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with direct responses and commit to revisions that strengthen the presentation of our empirical results without altering the core claims or methodology.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The abstract asserts outperformance on real-world strawberry and apple datasets without supplying any numerical results (e.g., segmentation mAP, re-identification accuracy, or F1 scores), baseline comparisons, dataset sizes, or statistical significance tests. This omission prevents verification of the central claim and must be addressed with full tables and error analysis before the empirical contribution can be evaluated.

Authors: We agree that the abstract, due to length constraints, does not include numerical results. The Evaluation section (Section 4) of the full manuscript already contains the requested elements: comprehensive tables reporting segmentation mAP and re-identification accuracy/F1 scores with comparisons to prior methods, explicit dataset sizes and session counts for both strawberry and apple orchards, and error bars derived from repeated trials with statistical significance testing. To make the central claims immediately verifiable, we will revise the abstract to incorporate a concise summary of the key quantitative improvements and ensure the evaluation tables are cross-referenced prominently. revision: yes
Referee: [§3.3] §3.3 (Descriptor extraction and matching): The claim that the 3D sparse CNN produces 'compact and discriminative' descriptors that remain reliable under size/orientation changes and lighting variation is load-bearing for the re-identification results, yet no ablation on descriptor collision rates, sensitivity to color shifts, or comparison against simple shape+color baselines is referenced. Without such evidence the probabilistic assignment step cannot be shown to deliver the claimed robustness.

Authors: The manuscript demonstrates the effectiveness of the descriptors through end-to-end re-identification performance on real orchard data exhibiting the mentioned variations. However, we acknowledge that explicit ablations on collision rates, color-shift sensitivity, and direct comparisons to non-learned shape+color baselines are not currently included. We will add a dedicated ablation subsection in the revised manuscript to quantify these aspects, including collision analysis and baseline descriptor comparisons, to provide stronger support for the descriptor quality claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method relies on external evaluation

full rationale

The paper describes a pipeline of instance segmentation via learning-based methods on point clouds, descriptor extraction with 3D sparse CNNs, and attention-based probabilistic matching for re-identification. No equations, fitted parameters, or self-citations are presented that reduce any claimed prediction or result to the inputs by construction. Performance claims are tied to evaluation on external real-world strawberry and apple datasets, making the derivation self-contained against benchmarks rather than internally forced.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the learned behavior of two neural networks whose weights are fitted to orchard data; the abstract supplies no information on training sets, hyperparameters, or architectural choices, so the ledger cannot be populated with concrete free parameters or axioms beyond the domain assumption of dense colored point clouds.

free parameters (1)

Weights of segmentation and descriptor networks
Learned parameters that determine both instance masks and fruit descriptors; their values are not reported.

axioms (1)

domain assumption Input point clouds are dense and colored
Stated as necessary to capture fine-grained 3D spatial detail.

pith-pipeline@v0.9.0 · 5773 in / 1227 out tokens · 36359 ms · 2026-05-23T17:14:53.689476+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

extract a compact and discriminative descriptor using a 3D sparse convolutional neural network... attention-based matching network that associates fruits with their counterparts
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

MinkPanoptic... mean shift clustering... transformer encoder layer

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 1 internal anchor

[1]

Agricultural Robotics: The Future of Robotic Agriculture

T. Duckett, S. Pearson, S. Blackmore, B. Grieve, W. Chen, G. Cielniak, J. Cleaversmith, J. Dai, S. Davis, C. Fox, P. From, I. Georgilas, R. Gill, I. Gould, M. Hanheide, A. Hunter, F. Iida, L. Mihalyova, S. Nefti-Meziani, 24 G. Neumann, P. Paoletti, T. Pridmore, D. Ross, M. Smith, M. Stoelen, M. Swainson, S. Wane, P. Wilson, I. Wright, G. Yang, Agricultura...

work page internal anchor Pith review Pith/arXiv arXiv 2018
[2]

Walter, R

A. Walter, R. Finger, R. Huber, N. Buchmann, Opinion: Smart farming is key to developing sustainable agriculture, Proceedings of the National Academy of Sciences 114 (24) (2017) 6148–6150

work page 2017
[3]

Fiorani, U

F. Fiorani, U. Schurr, Future scenarios for plant phenotyping, Annual Re- view of Plant Biology 64 (2013) 267–291

work page 2013
[4]

M. Watt, F. Fiorani, B. Usadel, U. Rascher, O. Muller, U. Schurr, Phenotyping: New windows into the plant for breeders, An- nual Review of Plant Biology 71 (1) (2020). doi:10.1146/ annurev-arplant-042916-041124

work page 2020
[5]

P ´erez-Borrero, D

I. P ´erez-Borrero, D. Mar´ın-Santos, M. E. Geg´undez-Arias, E. Cort´es-Ancos, A fast and accurate deep learning method for strawberry instance seg- mentation, Computers and Electronics in Agriculture 178 (2020) 105736. doi:10.1016/j.compag.2020.105736

work page doi:10.1016/j.compag.2020.105736 2020
[6]

Ganesh, K

P. Ganesh, K. V olle, T. Burks, S. Mehta, Deep orange: Mask R-CNN based orange detection and segmentation, IFAC Proceedings V olumes 52 (30) (2019) 70–75. doi:10.1016/j.ifacol.2019.12.499

work page doi:10.1016/j.ifacol.2019.12.499 2019
[7]

Gonzalez, C

S. Gonzalez, C. Arellano, J. E. Tapia, Deepblueberry: Quantification of blueberries in the wild using instance segmentation, IEEE Access 7 (2019). doi:10.1109/ACCESS.2019.2933062

work page doi:10.1109/access.2019.2933062 2019
[8]

Gen ´e-Mola, R

J. Gen ´e-Mola, R. Sanz-Cortiella, J. R. Rosell-Polo, J.-R. Morros, J. Ruiz- Hidalgo, V . Vilaplana, E. Gregorio, Fruit detection and 3d location us- ing instance segmentation neural networks and structure-from-motion pho- togrammetry, Computers and Electronics in Agriculture 169 (2020) 105165. doi:10.1016/j.compag.2019.105165

work page doi:10.1016/j.compag.2019.105165 2020
[9]

W. Jia, Z. Zhang, W. Shao, S. Hou, Z. Ji, G. Liu, X. Yin, Foveamask: A fast and accurate deep learning model for green fruit instance segmentation, Computers and Electronics in Agriculture 191 (2021) 106488. doi:10. 1016/j.compag.2021.106488. 25

work page arXiv 2021
[10]

Stachniss, J

C. Stachniss, J. Leonard, S. Thrun, Springer Handbook of Robotics, 2nd edition, Springer Verlag, 2016, Ch. Chapt. 46: Simultaneous Localization and Mapping, pp. 1153–1176

work page 2016
[11]

Vysotska, C

O. Vysotska, C. Stachniss, Effective Visual Place Recognition Using Multi- Sequence Maps, IEEE Robotics and Automation Letters (RA-L) 4 (2) (2019) 1730–1736

work page 2019
[12]

Rodriguez-Sanchez, J

J. Rodriguez-Sanchez, J. L. Snider, K. Johnsen, C. Li, Cotton morpholog- ical traits tracking through spatiotemporal registration of terrestrial laser scanning time-series data, Frontiers in Plant Science 15 (2024). doi: 10.3389/fpls.2024.1436120

work page doi:10.3389/fpls.2024.1436120 2024
[13]

K. He, G. Gkioxari, P. Doll ´ar, R. Girshick, Mask R-CNN, in: Proc. of the IEEE Intl. Conf. on Computer Vision (ICCV), 2017

work page 2017
[14]

Q. Zhu, L. Fan, N. Weng, Advancements in point cloud data augmentation for deep learning: A survey, Pattern Recognition 153 (2024) 110532. doi: 10.1016/j.patcog.2024.110532

work page doi:10.1016/j.patcog.2024.110532 2024
[15]

C. Choy, J. Gwak, S. Savarese, 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks, in: Proc. of the IEEE/CVF Conf. on Com- puter Vision and Pattern Recognition (CVPR), 2019

work page 2019
[16]

X. Zhu, H. Zhou, T. Wang, F. Hong, W. Li, Y . Ma, H. Li, R. Yang, D. Lin, Cylindrical and asymmetrical 3d convolution networks for lidar-based per- ception, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) 44 (10) (2022) 6807–6822. doi:10.1109/TPAMI.2021.3098789

work page doi:10.1109/tpami.2021.3098789 2022
[17]

Schult, F

J. Schult, F. Engelmann, A. Hermans, O. Litany, S. Tang, B. Leibe, Mask3D: Mask Transformer for 3D Semantic Instance Segmentation, Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA) (2023)

work page 2023
[18]

Marcuzzi, L

R. Marcuzzi, L. Nunes, L. Wiesmann, J. Behley, C. Stachniss, Mask-Based Panoptic LiDAR Segmentation for Autonomous Driving, IEEE Robotics and Automation Letters (RA-L) 8 (2) (2023) 1141–1148

work page 2023
[19]

Robert, H

D. Robert, H. Raguet, L. Landrieu, Scalable 3d panoptic segmentation as superpoint graph clustering, Proc. of the Intl. Conf. on 3D Vision (3DV) (2024). 26

work page 2024
[20]

S. Shin, K. Zhou, M. Vankadari, A. Markham, N. Trigoni, Spherical mask: Coarse-to-fine 3d point cloud instance segmentation with spherical repre- sentation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024
[21]

Z. Xiao, W. Zhang, T. Wang, C. C. Loy, D. Lin, J. Pang, Position- guided point cloud panoptic segmentation transformer, Intl. Journal of Computer Vision (IJCV) 133 (1) (2025) 275–290. doi:10.1007/ s11263-024-02162-z

work page 2025
[22]

Kolodiazhnyi, A

M. Kolodiazhnyi, A. V orontsova, A. Konushin, D. Rukhovich, One- former3d: One transformer for unified point cloud segmentation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 20943–20953

work page 2024
[23]

F. Hong, H. Zhou, X. Zhu, H. Li, Z. Liu, Lidar-based panoptic segmentation via dynamic shifting network, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13090–13099

work page 2021
[24]

Air traffic control complexity as workload driver

B. Xiang, Y . Yue, T. Peters, K. Schindler, A review of panoptic segmen- tation for mobile mapping point clouds, ISPRS Journal of Photogramme- try and Remote Sensing (JPRS) 203 (2023) 373–391. doi:10.1016/j. isprsjprs.2023.08.008

work page doi:10.1016/j 2023
[25]

Graham, M

B. Graham, M. Engelcke, L. van der Maaten, 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, in: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018
[26]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention Is All You Need, in: Proc. of the Conf. on Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[27]

H. Kang, X. Wang, C. Chen, Accurate fruit localisation using high resolution lidar-camera fusion and instance segmentation, Computers and Electronics in Agriculture 203 (2022) 107450. doi:10.1016/j.compag.2022. 107450

work page doi:10.1016/j.compag.2022 2022
[28]

Magistri, Y

F. Magistri, Y . Pan, J. Bartels, J. Behley, C. Stachniss, C. Lehnert, Improving Robotic Fruit Harvesting Within Cluttered Environments Through 3D Shape Completion, IEEE Robotics and Automation Letters (RA-L) 9 (8) (2024) 7357–7364. doi:10.1109/LRA.2024.3421788. 27

work page doi:10.1109/lra.2024.3421788 2024
[29]

J. P. Rodr ´ıguez, D. C. Corrales, J.-N. Aubertot, J. C. Corrales, A computer vision system for automatic cherry beans detection on coffee trees, Pattern Recognition Letters 136 (2020) 142–153. doi:10.1016/j.patrec. 2020.05.034

work page doi:10.1016/j.patrec 2020
[30]

L. Liu, G. Li, Y . Du, X. Li, X. Wu, Z. Qiao, T. Wang, Cs-net: Conv- simpleformer network for agricultural image segmentation, Pattern Recog- nition 147 (2024) 110140. doi:10.1016/j.patcog.2023.110140

work page doi:10.1016/j.patcog.2023.110140 2024
[31]

P. Chu, Z. Li, K. Lammers, R. Lu, X. Liu, Deep learning-based apple detec- tion using a suppression mask r-cnn, Pattern Recognition Letters 147 (2021) 206–211. doi:10.1016/j.patrec.2021.04.022

work page doi:10.1016/j.patrec.2021.04.022 2021
[32]

Cardellicchio, F

A. Cardellicchio, F. Solimani, G. Dimauro, S. Summerer, V . Ren `o, Patch- based probabilistic identification of plant roots using convolutional neu- ral networks, Pattern Recognition Letters 183 (2024) 125–132. doi: 10.1016/j.patrec.2024.05.010

work page doi:10.1016/j.patrec.2024.05.010 2024
[33]

Kierdorf, I

J. Kierdorf, I. Weber, A. Kicherer, L. Zabawa, L. Drees, R. Roscher, Be- hind the leaves: Estimation of occluded grapevine berries with conditional generative adversarial networks, Frontiers in Artificial Intelligence 5 (2022). doi:10.3389/frai.2022.830026

work page doi:10.3389/frai.2022.830026 2022
[34]

Nuske, S

S. Nuske, S. Achar, T. Bates, S. Narasimhan, S. Singh, Yield Estima- tion in Vineyards by Visual Grape Detection, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2011

work page 2011
[35]

Halstead, C

M. Halstead, C. McCool, S. Denman, T. Perez, C. Fookes, Fruit quantity and ripeness estimation using a robotic vision system, IEEE Robotics and Automation Letters (RA-L) 3 (4) (2018) 2995–3002

work page 2018
[36]

Smitt, M

C. Smitt, M. Halstead, T. Zaenker, M. Bennewitz, C. McCool, PATHoBot: A robot for glasshouse crop phenotyping and intervention, in: Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2021

work page 2021
[37]

P. M. Blok, E. J. van Henten, F. K. van Evert, G. Kootstra, Image-based size estimation of broccoli heads under varying degrees of occlusion, Biosystems Engineering 208 (2021) 213–233

work page 2021
[38]

A. S. Gomez, E. Aptoula, S. Parsons, P. Bosilj, Deep regression versus de- tection for counting in robotic phenotyping, IEEE Robotics and Automation 28 Letters (RA-L) 6 (2) (2021) 2902–2907. doi:10.1109/LRA.2021. 3062586

work page doi:10.1109/lra.2021 2021
[39]

H. Hao, S. Wu, Y . Li, W. Wen, jiangchuan Fan, Y . Zhang, L. Zhuang, L. Xu, H. Li, X. Guo, S. Liu, Automatic acquisition, analysis and wilting measure- ment of cotton 3d phenotype based on point cloud, Biosystems Engineering 239 (2024) 173–189. doi:10.1016/j.biosystemseng.2024.02. 010

work page doi:10.1016/j.biosystemseng.2024.02 2024
[40]

F. P. Boogaard, E. J. van Henten, G. Kootstra, The added value of 3d point clouds for digital plant phenotyping – a case study on internode length mea- surements in cucumber, Biosystems Engineering 234 (2023) 1–12. doi: 10.1016/j.biosystemseng.2023.08.010

work page doi:10.1016/j.biosystemseng.2023.08.010 2023
[41]

Rodriguez-Sanchez, K

J. Rodriguez-Sanchez, K. Johnsen, C. Li, A ground mobile robot for au- tonomous terrestrial laser scanning-based field phenotyping, arXiv preprint arXiv:2404.04404 (2024)

work page arXiv 2024
[42]

Chebrolu, F

N. Chebrolu, F. Magistri, T. L ¨abe, C. Stachniss, Registration of Spatio- Temporal Point Clouds of Plants for Phenotyping, PLOS ONE 16 (2) (2021)

work page 2021
[43]

Riccardi, S

A. Riccardi, S. Kelly, E. Marks, F. Magistri, T. Guadagnino, J. Behley, M. Bennewitz, C. Stachniss, Fruit Tracking Over Time Using High-Precision Point Clouds, in: Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2023

work page 2023
[44]

Lobefaro, M

L. Lobefaro, M. Malladi, O. Vysotska, T. Guadagnino, C. Stachniss, Esti- mating 4D Data Associations Towards Spatial-Temporal Mapping of Grow- ing Plants for Agricultural Robots, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023

work page 2023
[45]

Lobefaro, M

L. Lobefaro, M. Malladi, T. Guadagnino, C. Stachniss, Spatio-Temporal Consistent Mapping of Growing Plants for Agricultural Robots in the Wild, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2024

work page 2024
[46]

Comaniciu, P

D. Comaniciu, P. Meer, Mean Shift: A Robust Approach Toward Feature Space Analysis, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) 24 (5) (2002) 603–619. 29

work page 2002
[47]

Ioffe, C

S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, in: Proc. of the Intl. Conf. on Machine Learning (ICML), 2015

work page 2015
[48]

Mildenhall, P

B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi, R. Ng, NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, in: Proc. of the Europ. Conf. on Computer Vision (ECCV), 2020

work page 2020
[49]

Berman, A

M. Berman, A. R. Triki, M. B. Blaschko, The Lov ´asz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks, in: Proc. of the IEEE/CVF Conf. on Com- puter Vision and Pattern Recognition (CVPR), 2018

work page 2018
[50]

Gen ´e-Mola, R

J. Gen ´e-Mola, R. Sanz-Cortiella, J. R. Rosell-Polo, A. Escol `a, E. Gregorio, Pfuji-size dataset: A collection of images and photogrammetry-derived 3d point clouds with ground truth annotations for fuji apple detection and size estimation in field conditions, Data in Brief 39 (2021) 107629. doi:10. 1016/j.dib.2021.107629

work page arXiv 2021
[51]

Kirillov, K

A. Kirillov, K. He, R. Girshick, C. Rother, P. Doll´ar, Panoptic Segmentation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recogni- tion (CVPR), 2019

work page 2019
[52]

Kirillov, R

A. Kirillov, R. Girshick, K. He, P. Dollar, Panoptic Feature Pyramid Net- works, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019
[53]

Akiba, S

T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next- generation hyperparameter optimization framework, in: Proc. of the Intl. Conf. on Knowledge Discovery and Data Mining, 2019

work page 2019
[54]

T. N. Kipf, M. Welling, Semi-supervised classification with graph convo- lutional networks, in: Proc. of the Intl. Conf. on Learning Representations (ICLR), 2017

work page 2017
[55]

Y . Wang, Y . Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, J. M. Solomon, Dynamic graph CNN for learning on point clouds, ACM Trans. on Graphics (TOG) 38 (5) (2019). doi:10.1145/3326362. 30

work page doi:10.1145/3326362 2019

[1] [1]

Agricultural Robotics: The Future of Robotic Agriculture

T. Duckett, S. Pearson, S. Blackmore, B. Grieve, W. Chen, G. Cielniak, J. Cleaversmith, J. Dai, S. Davis, C. Fox, P. From, I. Georgilas, R. Gill, I. Gould, M. Hanheide, A. Hunter, F. Iida, L. Mihalyova, S. Nefti-Meziani, 24 G. Neumann, P. Paoletti, T. Pridmore, D. Ross, M. Smith, M. Stoelen, M. Swainson, S. Wane, P. Wilson, I. Wright, G. Yang, Agricultura...

work page internal anchor Pith review Pith/arXiv arXiv 2018

[2] [2]

Walter, R

A. Walter, R. Finger, R. Huber, N. Buchmann, Opinion: Smart farming is key to developing sustainable agriculture, Proceedings of the National Academy of Sciences 114 (24) (2017) 6148–6150

work page 2017

[3] [3]

Fiorani, U

F. Fiorani, U. Schurr, Future scenarios for plant phenotyping, Annual Re- view of Plant Biology 64 (2013) 267–291

work page 2013

[4] [4]

M. Watt, F. Fiorani, B. Usadel, U. Rascher, O. Muller, U. Schurr, Phenotyping: New windows into the plant for breeders, An- nual Review of Plant Biology 71 (1) (2020). doi:10.1146/ annurev-arplant-042916-041124

work page 2020

[5] [5]

P ´erez-Borrero, D

I. P ´erez-Borrero, D. Mar´ın-Santos, M. E. Geg´undez-Arias, E. Cort´es-Ancos, A fast and accurate deep learning method for strawberry instance seg- mentation, Computers and Electronics in Agriculture 178 (2020) 105736. doi:10.1016/j.compag.2020.105736

work page doi:10.1016/j.compag.2020.105736 2020

[6] [6]

Ganesh, K

P. Ganesh, K. V olle, T. Burks, S. Mehta, Deep orange: Mask R-CNN based orange detection and segmentation, IFAC Proceedings V olumes 52 (30) (2019) 70–75. doi:10.1016/j.ifacol.2019.12.499

work page doi:10.1016/j.ifacol.2019.12.499 2019

[7] [7]

Gonzalez, C

S. Gonzalez, C. Arellano, J. E. Tapia, Deepblueberry: Quantification of blueberries in the wild using instance segmentation, IEEE Access 7 (2019). doi:10.1109/ACCESS.2019.2933062

work page doi:10.1109/access.2019.2933062 2019

[8] [8]

Gen ´e-Mola, R

J. Gen ´e-Mola, R. Sanz-Cortiella, J. R. Rosell-Polo, J.-R. Morros, J. Ruiz- Hidalgo, V . Vilaplana, E. Gregorio, Fruit detection and 3d location us- ing instance segmentation neural networks and structure-from-motion pho- togrammetry, Computers and Electronics in Agriculture 169 (2020) 105165. doi:10.1016/j.compag.2019.105165

work page doi:10.1016/j.compag.2019.105165 2020

[9] [9]

W. Jia, Z. Zhang, W. Shao, S. Hou, Z. Ji, G. Liu, X. Yin, Foveamask: A fast and accurate deep learning model for green fruit instance segmentation, Computers and Electronics in Agriculture 191 (2021) 106488. doi:10. 1016/j.compag.2021.106488. 25

work page arXiv 2021

[10] [10]

Stachniss, J

C. Stachniss, J. Leonard, S. Thrun, Springer Handbook of Robotics, 2nd edition, Springer Verlag, 2016, Ch. Chapt. 46: Simultaneous Localization and Mapping, pp. 1153–1176

work page 2016

[11] [11]

Vysotska, C

O. Vysotska, C. Stachniss, Effective Visual Place Recognition Using Multi- Sequence Maps, IEEE Robotics and Automation Letters (RA-L) 4 (2) (2019) 1730–1736

work page 2019

[12] [12]

Rodriguez-Sanchez, J

J. Rodriguez-Sanchez, J. L. Snider, K. Johnsen, C. Li, Cotton morpholog- ical traits tracking through spatiotemporal registration of terrestrial laser scanning time-series data, Frontiers in Plant Science 15 (2024). doi: 10.3389/fpls.2024.1436120

work page doi:10.3389/fpls.2024.1436120 2024

[13] [13]

K. He, G. Gkioxari, P. Doll ´ar, R. Girshick, Mask R-CNN, in: Proc. of the IEEE Intl. Conf. on Computer Vision (ICCV), 2017

work page 2017

[14] [14]

Q. Zhu, L. Fan, N. Weng, Advancements in point cloud data augmentation for deep learning: A survey, Pattern Recognition 153 (2024) 110532. doi: 10.1016/j.patcog.2024.110532

work page doi:10.1016/j.patcog.2024.110532 2024

[15] [15]

C. Choy, J. Gwak, S. Savarese, 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks, in: Proc. of the IEEE/CVF Conf. on Com- puter Vision and Pattern Recognition (CVPR), 2019

work page 2019

[16] [16]

X. Zhu, H. Zhou, T. Wang, F. Hong, W. Li, Y . Ma, H. Li, R. Yang, D. Lin, Cylindrical and asymmetrical 3d convolution networks for lidar-based per- ception, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) 44 (10) (2022) 6807–6822. doi:10.1109/TPAMI.2021.3098789

work page doi:10.1109/tpami.2021.3098789 2022

[17] [17]

Schult, F

J. Schult, F. Engelmann, A. Hermans, O. Litany, S. Tang, B. Leibe, Mask3D: Mask Transformer for 3D Semantic Instance Segmentation, Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA) (2023)

work page 2023

[18] [18]

Marcuzzi, L

R. Marcuzzi, L. Nunes, L. Wiesmann, J. Behley, C. Stachniss, Mask-Based Panoptic LiDAR Segmentation for Autonomous Driving, IEEE Robotics and Automation Letters (RA-L) 8 (2) (2023) 1141–1148

work page 2023

[19] [19]

Robert, H

D. Robert, H. Raguet, L. Landrieu, Scalable 3d panoptic segmentation as superpoint graph clustering, Proc. of the Intl. Conf. on 3D Vision (3DV) (2024). 26

work page 2024

[20] [20]

S. Shin, K. Zhou, M. Vankadari, A. Markham, N. Trigoni, Spherical mask: Coarse-to-fine 3d point cloud instance segmentation with spherical repre- sentation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2024

work page 2024

[21] [21]

Z. Xiao, W. Zhang, T. Wang, C. C. Loy, D. Lin, J. Pang, Position- guided point cloud panoptic segmentation transformer, Intl. Journal of Computer Vision (IJCV) 133 (1) (2025) 275–290. doi:10.1007/ s11263-024-02162-z

work page 2025

[22] [22]

Kolodiazhnyi, A

M. Kolodiazhnyi, A. V orontsova, A. Konushin, D. Rukhovich, One- former3d: One transformer for unified point cloud segmentation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 20943–20953

work page 2024

[23] [23]

F. Hong, H. Zhou, X. Zhu, H. Li, Z. Liu, Lidar-based panoptic segmentation via dynamic shifting network, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 13090–13099

work page 2021

[24] [24]

Air traffic control complexity as workload driver

B. Xiang, Y . Yue, T. Peters, K. Schindler, A review of panoptic segmen- tation for mobile mapping point clouds, ISPRS Journal of Photogramme- try and Remote Sensing (JPRS) 203 (2023) 373–391. doi:10.1016/j. isprsjprs.2023.08.008

work page doi:10.1016/j 2023

[25] [25]

Graham, M

B. Graham, M. Engelcke, L. van der Maaten, 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks, in: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2018

work page 2018

[26] [26]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention Is All You Need, in: Proc. of the Conf. on Neural Information Processing Systems (NeurIPS), 2017

work page 2017

[27] [27]

H. Kang, X. Wang, C. Chen, Accurate fruit localisation using high resolution lidar-camera fusion and instance segmentation, Computers and Electronics in Agriculture 203 (2022) 107450. doi:10.1016/j.compag.2022. 107450

work page doi:10.1016/j.compag.2022 2022

[28] [28]

Magistri, Y

F. Magistri, Y . Pan, J. Bartels, J. Behley, C. Stachniss, C. Lehnert, Improving Robotic Fruit Harvesting Within Cluttered Environments Through 3D Shape Completion, IEEE Robotics and Automation Letters (RA-L) 9 (8) (2024) 7357–7364. doi:10.1109/LRA.2024.3421788. 27

work page doi:10.1109/lra.2024.3421788 2024

[29] [29]

J. P. Rodr ´ıguez, D. C. Corrales, J.-N. Aubertot, J. C. Corrales, A computer vision system for automatic cherry beans detection on coffee trees, Pattern Recognition Letters 136 (2020) 142–153. doi:10.1016/j.patrec. 2020.05.034

work page doi:10.1016/j.patrec 2020

[30] [30]

L. Liu, G. Li, Y . Du, X. Li, X. Wu, Z. Qiao, T. Wang, Cs-net: Conv- simpleformer network for agricultural image segmentation, Pattern Recog- nition 147 (2024) 110140. doi:10.1016/j.patcog.2023.110140

work page doi:10.1016/j.patcog.2023.110140 2024

[31] [31]

P. Chu, Z. Li, K. Lammers, R. Lu, X. Liu, Deep learning-based apple detec- tion using a suppression mask r-cnn, Pattern Recognition Letters 147 (2021) 206–211. doi:10.1016/j.patrec.2021.04.022

work page doi:10.1016/j.patrec.2021.04.022 2021

[32] [32]

Cardellicchio, F

A. Cardellicchio, F. Solimani, G. Dimauro, S. Summerer, V . Ren `o, Patch- based probabilistic identification of plant roots using convolutional neu- ral networks, Pattern Recognition Letters 183 (2024) 125–132. doi: 10.1016/j.patrec.2024.05.010

work page doi:10.1016/j.patrec.2024.05.010 2024

[33] [33]

Kierdorf, I

J. Kierdorf, I. Weber, A. Kicherer, L. Zabawa, L. Drees, R. Roscher, Be- hind the leaves: Estimation of occluded grapevine berries with conditional generative adversarial networks, Frontiers in Artificial Intelligence 5 (2022). doi:10.3389/frai.2022.830026

work page doi:10.3389/frai.2022.830026 2022

[34] [34]

Nuske, S

S. Nuske, S. Achar, T. Bates, S. Narasimhan, S. Singh, Yield Estima- tion in Vineyards by Visual Grape Detection, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2011

work page 2011

[35] [35]

Halstead, C

M. Halstead, C. McCool, S. Denman, T. Perez, C. Fookes, Fruit quantity and ripeness estimation using a robotic vision system, IEEE Robotics and Automation Letters (RA-L) 3 (4) (2018) 2995–3002

work page 2018

[36] [36]

Smitt, M

C. Smitt, M. Halstead, T. Zaenker, M. Bennewitz, C. McCool, PATHoBot: A robot for glasshouse crop phenotyping and intervention, in: Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2021

work page 2021

[37] [37]

P. M. Blok, E. J. van Henten, F. K. van Evert, G. Kootstra, Image-based size estimation of broccoli heads under varying degrees of occlusion, Biosystems Engineering 208 (2021) 213–233

work page 2021

[38] [38]

A. S. Gomez, E. Aptoula, S. Parsons, P. Bosilj, Deep regression versus de- tection for counting in robotic phenotyping, IEEE Robotics and Automation 28 Letters (RA-L) 6 (2) (2021) 2902–2907. doi:10.1109/LRA.2021. 3062586

work page doi:10.1109/lra.2021 2021

[39] [39]

H. Hao, S. Wu, Y . Li, W. Wen, jiangchuan Fan, Y . Zhang, L. Zhuang, L. Xu, H. Li, X. Guo, S. Liu, Automatic acquisition, analysis and wilting measure- ment of cotton 3d phenotype based on point cloud, Biosystems Engineering 239 (2024) 173–189. doi:10.1016/j.biosystemseng.2024.02. 010

work page doi:10.1016/j.biosystemseng.2024.02 2024

[40] [40]

F. P. Boogaard, E. J. van Henten, G. Kootstra, The added value of 3d point clouds for digital plant phenotyping – a case study on internode length mea- surements in cucumber, Biosystems Engineering 234 (2023) 1–12. doi: 10.1016/j.biosystemseng.2023.08.010

work page doi:10.1016/j.biosystemseng.2023.08.010 2023

[41] [41]

Rodriguez-Sanchez, K

J. Rodriguez-Sanchez, K. Johnsen, C. Li, A ground mobile robot for au- tonomous terrestrial laser scanning-based field phenotyping, arXiv preprint arXiv:2404.04404 (2024)

work page arXiv 2024

[42] [42]

Chebrolu, F

N. Chebrolu, F. Magistri, T. L ¨abe, C. Stachniss, Registration of Spatio- Temporal Point Clouds of Plants for Phenotyping, PLOS ONE 16 (2) (2021)

work page 2021

[43] [43]

Riccardi, S

A. Riccardi, S. Kelly, E. Marks, F. Magistri, T. Guadagnino, J. Behley, M. Bennewitz, C. Stachniss, Fruit Tracking Over Time Using High-Precision Point Clouds, in: Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2023

work page 2023

[44] [44]

Lobefaro, M

L. Lobefaro, M. Malladi, O. Vysotska, T. Guadagnino, C. Stachniss, Esti- mating 4D Data Associations Towards Spatial-Temporal Mapping of Grow- ing Plants for Agricultural Robots, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2023

work page 2023

[45] [45]

Lobefaro, M

L. Lobefaro, M. Malladi, T. Guadagnino, C. Stachniss, Spatio-Temporal Consistent Mapping of Growing Plants for Agricultural Robots in the Wild, in: Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2024

work page 2024

[46] [46]

Comaniciu, P

D. Comaniciu, P. Meer, Mean Shift: A Robust Approach Toward Feature Space Analysis, IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI) 24 (5) (2002) 603–619. 29

work page 2002

[47] [47]

Ioffe, C

S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, in: Proc. of the Intl. Conf. on Machine Learning (ICML), 2015

work page 2015

[48] [48]

Mildenhall, P

B. Mildenhall, P. Srinivasan, M. Tancik, J. Barron, R. Ramamoorthi, R. Ng, NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis, in: Proc. of the Europ. Conf. on Computer Vision (ECCV), 2020

work page 2020

[49] [49]

Berman, A

M. Berman, A. R. Triki, M. B. Blaschko, The Lov ´asz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks, in: Proc. of the IEEE/CVF Conf. on Com- puter Vision and Pattern Recognition (CVPR), 2018

work page 2018

[50] [50]

Gen ´e-Mola, R

J. Gen ´e-Mola, R. Sanz-Cortiella, J. R. Rosell-Polo, A. Escol `a, E. Gregorio, Pfuji-size dataset: A collection of images and photogrammetry-derived 3d point clouds with ground truth annotations for fuji apple detection and size estimation in field conditions, Data in Brief 39 (2021) 107629. doi:10. 1016/j.dib.2021.107629

work page arXiv 2021

[51] [51]

Kirillov, K

A. Kirillov, K. He, R. Girshick, C. Rother, P. Doll´ar, Panoptic Segmentation, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recogni- tion (CVPR), 2019

work page 2019

[52] [52]

Kirillov, R

A. Kirillov, R. Girshick, K. He, P. Dollar, Panoptic Feature Pyramid Net- works, in: Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2019

work page 2019

[53] [53]

Akiba, S

T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next- generation hyperparameter optimization framework, in: Proc. of the Intl. Conf. on Knowledge Discovery and Data Mining, 2019

work page 2019

[54] [54]

T. N. Kipf, M. Welling, Semi-supervised classification with graph convo- lutional networks, in: Proc. of the Intl. Conf. on Learning Representations (ICLR), 2017

work page 2017

[55] [55]

Y . Wang, Y . Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, J. M. Solomon, Dynamic graph CNN for learning on point clouds, ACM Trans. on Graphics (TOG) 38 (5) (2019). doi:10.1145/3326362. 30

work page doi:10.1145/3326362 2019