Low Cost, High Efficiency: LiDAR Place Recognition in Vineyards with Matryoshka Representation Learning
Pith reviewed 2026-05-16 10:59 UTC · model grok-4.3
The pith
A lightweight network recognizes places in vineyards using only low-cost sparse LiDAR data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MinkUNeXt-VINE surpasses existing place recognition methods in vineyard environments through its pre-processing pipeline and Matryoshka Representation Learning multi-loss, which together enable strong results from low-cost sparse LiDAR inputs and reduced-dimensionality outputs suitable for real-time use.
What carries the argument
MinkUNeXt-VINE architecture paired with Matryoshka multi-loss training that supports variable embedding sizes while preserving accuracy on sparse point clouds.
If this is right
- Agricultural robots can perform reliable place recognition with sparse low-cost LiDAR scans instead of dense expensive ones.
- Lower output embedding dimensions reduce memory and computation needs without major accuracy loss.
- The same pipeline maintains performance across different LiDAR sensors in extended vineyard recordings.
- Real-time operation on embedded hardware becomes practical due to the lightweight design.
Where Pith is reading between the lines
- The Matryoshka approach could be adapted for place recognition with other sparse sensors in feature-poor environments such as orchards or greenhouses.
- Combining the method with visual or inertial data might further lower hardware costs in precision farming.
- Seasonal growth changes in vines would provide a direct test of whether the learned representations remain stable over time.
Load-bearing premise
The pre-processing steps and Matryoshka multi-loss combination will continue to deliver performance gains on new vineyard data or different sensor configurations beyond the two specific long-term datasets used.
What would settle it
Evaluating the trained model on a third independent vineyard dataset recorded with a different LiDAR model and finding that accuracy falls below current state-of-the-art baselines.
Figures
read the original abstract
Localization in agricultural environments is challenging due to their unstructured nature and lack of distinctive landmarks. Although agricultural settings have been studied in the context of object classification and segmentation, the place recognition task for mobile robots is not trivial in the current state of the art. In this study, we propose MinkUNeXt-VINE, a lightweight, deep-learning-based method that surpasses state-of-the-art methods in vineyard environments thanks to its pre-processing and Matryoshka Representation Learning multi-loss approach. Our method prioritizes enhanced performance with low-cost, sparse LiDAR inputs and lower-dimensionality outputs to ensure high efficiency in real-time scenarios. Additionally, we present a comprehensive ablation study of the results on various evaluation cases and two extensive long-term vineyard datasets employing different LiDAR sensors. The results demonstrate the efficiency of the trade-off output produced by this approach, as well as its robust performance on low-cost and low-resolution input data. The code is publicly available for reproduction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MinkUNeXt-VINE, a lightweight deep-learning architecture for LiDAR place recognition in vineyards. It claims to surpass SOTA by combining specific pre-processing steps with a Matryoshka Representation Learning multi-loss formulation, achieving strong accuracy-efficiency trade-offs on low-cost, sparse, and low-resolution LiDAR inputs while supporting real-time operation. Evaluation includes ablation studies and results on two long-term vineyard datasets collected with different sensors; code is released for reproduction.
Significance. If the reported gains prove robust, the work would be significant for agricultural robotics, where low-cost LiDAR place recognition remains challenging due to unstructured environments. The emphasis on lower-dimensional outputs and efficiency, together with public code release, supports reproducibility and potential deployment on resource-constrained platforms.
major comments (1)
- [Abstract and §4] Abstract and §4 (Evaluation): The claim that the method delivers 'robust performance on low-cost and low-resolution input data' and surpasses SOTA rests entirely on results from two specific long-term vineyard datasets. No experiments on additional vineyard sites, different row geometries, seasonal extremes, or unseen sensor configurations are reported, leaving open whether the pre-processing and Matryoshka multi-loss gains generalize or are partly dataset-specific.
minor comments (1)
- [Abstract] The abstract refers to a 'comprehensive ablation study' without enumerating the ablated components (e.g., individual loss terms, pre-processing variants) or reporting quantitative deltas; adding a dedicated ablation table would clarify the contribution of each element.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the scope of our evaluation below and will make a partial revision to better contextualize the generalizability of the results.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Evaluation): The claim that the method delivers 'robust performance on low-cost and low-resolution input data' and surpasses SOTA rests entirely on results from two specific long-term vineyard datasets. No experiments on additional vineyard sites, different row geometries, seasonal extremes, or unseen sensor configurations are reported, leaving open whether the pre-processing and Matryoshka multi-loss gains generalize or are partly dataset-specific.
Authors: We thank the referee for highlighting this point. Our evaluation is conducted on two long-term vineyard datasets collected with different LiDAR sensors (detailed in Section 4), which capture temporal variations, unstructured environments, and sparse low-resolution inputs representative of low-cost hardware. Comprehensive ablations isolate the contributions of the pre-processing steps and Matryoshka multi-loss formulation. We agree that experiments on additional sites, row geometries, and seasonal conditions would provide stronger evidence of generalization beyond these datasets. In the revised version, we will add a limitations paragraph in the conclusion discussing the current evaluation scope and outlining future work on broader vineyard configurations. revision: partial
Circularity Check
No circularity: empirical architecture evaluated on held-out data
full rationale
The paper proposes MinkUNeXt-VINE as a lightweight network combining standard pre-processing with a Matryoshka multi-loss training recipe. All performance claims rest on ablation studies and comparisons against baselines using held-out sequences from two long-term vineyard datasets. No equations, uniqueness theorems, or first-principles derivations are presented that reduce to fitted parameters or self-citations by construction. The method is an empirical architecture plus training procedure whose validity is tested externally on the reported splits rather than being definitionally forced.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
I. A. Kazerouni, L. Fitzgerald, G. Dooly, D. Toal, A survey of state-of- the-art on visual SLAM, Expert Systems with Applications 205 (2022) 117734
work page 2022
-
[2]
A. Macario Barros, M. Michel, Y. Moline, G. Corre, F. Carrel, A com- prehensive survey of visual SLAM algorithms, Robotics 11 (1) (2022) 24. 30
work page 2022
- [3]
-
[4]
M. A. Uy, G. H. Lee, PointNetVLAD: Deep point cloud based retrieval for large-scale place recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4470–4479
work page 2018
-
[5]
J. Komorowski, MinkLoc3D: Point cloud based large-scale place recog- nition, in: Proceedings of the IEEE/CVF winter conference on applica- tions of computer vision, 2021, pp. 1790–1799
work page 2021
-
[6]
X. Chen, T. L¨ abe, A. Milioto, T. R¨ ohling, J. Behley, C. Stachniss, Over- lapNet: A siamese network for computing LiDAR scan similarity with applications to loop closing and localization, Autonomous Robots 46 (1) (2022) 61–81
work page 2022
-
[7]
C. Lehnert, D. Tsai, A. Eriksson, C. McCool, 3D move to see: Multi- perspective visual servoing towards the next best view within unstruc- tured and occluded environments, in: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, 2019, pp. 3890–3897
work page 2019
-
[8]
H. Ding, B. Zhang, J. Zhou, Y. Yan, G. Tian, B. Gu, Recent devel- opments and applications of simultaneous localization and mapping in agriculture, Journal of field robotics 39 (6) (2022) 956–983
work page 2022
-
[9]
S. Coulibaly, B. Kamsu-Foguem, D. Kamissoko, D. Traore, Deep learn- ing for precision agriculture: A bibliometric analysis, Intelligent Systems with Applications 16 (2022) 200102
work page 2022
-
[10]
H. Teng, Y. Wang, X. Song, K. Karydis, Multimodal dataset for localiza- tion, mapping and crop monitoring in citrus tree farms, in: International Symposium on Visual Computing, Springer, 2023, pp. 571–582
work page 2023
-
[11]
N. Chebrolu, F. Magistri, T. L¨ abe, C. Stachniss, Registration of spatio- temporal point clouds of plants for phenotyping, PloS one 16 (2) (2021) e0247243
work page 2021
-
[12]
M. Padhiary, L. N. Sethi, A. Kumar, Enhancing hill farming efficiency using unmanned agricultural vehicles: a comprehensive review, Trans- actions of the Indian National Academy of Engineering 9 (2) (2024) 253–268. 31
work page 2024
-
[13]
J. J. Cabrera, A. Santo, A. Gil, C. Viegas, L. Pay´ a, MinkUNeXt: Point cloud-based large-scale place recogni- tion using 3D sparse convolutions, Array 28 (2025) 100569. doi:https://doi.org/10.1016/j.array.2025.100569
-
[14]
A. Kusupati, G. Bhatt, A. Rege, M. Wallingford, A. Sinha, V. Ramanu- jan, W. Howard-Snyder, K. Chen, S. Kakade, P. Jain, et al., Matryoshka representation learning, Advances in Neural Information Processing Sys- tems 35 (2022) 30233–30249
work page 2022
-
[15]
E. Liu, J. Monica, K. Gold, L. Cadle-Davidson, D. Combs, Y. Jiang, Vision-based vineyard navigation solution with automatic annotation, in: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2023, pp. 4234–4241
work page 2023
-
[16]
D. Aghi, V. Mazzia, M. Chiaberge, Local motion planner for au- tonomous navigation in vineyards with a RGB-D camera-based algo- rithm and deep learning synergy, Machines 8 (2) (2020) 27
work page 2020
-
[17]
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolu- tional neural networks for mobile vision applications, arXiv preprint arXiv:1704.04861 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[18]
N. Sun, Z. Fan, Q. Qiu, T. Li, Q. Feng, C. Ji, C. Zhao, TriLoc- NetVLAD: Enhancing Long-term Place Recognition in Orchards with a Novel LiDAR-Based Approach, in: 2024 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), IEEE, 2024, pp. 16– 22
work page 2024
-
[19]
R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, J. Sivic, NetVLAD: CNN architecture for weakly supervised place recognition, in: Proceed- ings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307
work page 2016
-
[20]
T. Barros, C. Premebida, S. Aravecchia, C. Pradalier, U. Nunes, SPV- SoAP3D: A Second-order Average Pooling Approach to enhance 3D Place Recognition in Horticultural Environments, in: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2024, pp. 9–15. 32
work page 2024
- [21]
-
[22]
C. R. Qi, H. Su, K. Mo, L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660
work page 2017
-
[23]
Z. Liu, S. Zhou, C. Suo, P. Yin, W. Chen, H. Wang, H. Li, Y.-H. Liu, LPD-Net: 3D point cloud learning for large-scale place recognition and environment analysis, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2831–2840
work page 2019
-
[24]
C. Choy, J. Gwak, S. Savarese, 4D spatio-temporal convnets: Minkowski convolutional neural networks, in: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, 2019, pp. 3075– 3084
work page 2019
-
[25]
G. Kim, A. Kim, Scan context: Egocentric spatial descriptor for place recognition within 3D point cloud map, in: 2018 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, pp. 4802–4809
work page 2018
-
[26]
H. Deng, Z. Pei, Z. Tang, J. Zhang, J. Yang, Fusion scan context: a global descriptor fusing altitude, intensity and density for place recog- nition, in: 2023 IEEE International Conference on Mechatronics and Automation (ICMA), IEEE, 2023, pp. 1604–1610
work page 2023
-
[27]
L. Li, X. Kong, X. Zhao, T. Huang, W. Li, F. Wen, H. Zhang, Y. Liu, SSC: Semantic scan context for large-scale place recognition, in: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 2092–2099
work page 2021
-
[28]
C. Yuan, J. Lin, Z. Zou, X. Hong, F. Zhang, STD: Stable Triangle De- scriptor for 3D place recognition, in: 2023 IEEE International Confer- ence on Robotics and Automation (ICRA), IEEE, 2023, pp. 1897–1903. 33
work page 2023
- [29]
-
[30]
M. M´ aximo, A. Santo, A. Gil, M. Ballesta, D. Valiente, A Coarse to Fine 3D LiDAR Localization with Deep Local Features for Long Term Robot Navigation in Large Environments, arXiv preprint arXiv:2505.18340 (2025). doi:https://doi.org/10.48550/arXiv.2505.18340
-
[31]
T. Liu, W. Jiao, J. Bao, A novel multi-loss dynamic fusion-enhanced image segmentation model for welding spatter measurement, Journal of Manufacturing Processes 128 (2024) 125–132
work page 2024
- [32]
-
[33]
H. N. Thi, Q. N. Huu, Q. D. T. Thuy, N. T. T. Le, T. N. Van, H. N. Huu, A Multi-Loss Hybrid CNN-ViT Model for Efficient Image Retrieval, in: 2025 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), IEEE, 2025, pp. 1–6
work page 2025
- [34]
-
[35]
D. K. Barbole, P. M. Jadhav, GrapesNet: Indian RGB & RGB-D vine- yard image datasets for deep learning applications, Data in Brief 48 (2023) 109100
work page 2023
-
[36]
F. Abdelghafour, B. Keresztes, A. Deshayes, C. Germain, J.-P. Da Costa, An annotated image dataset of downy mildew symptoms on Merlot grape variety, Data in Brief 37 (2021) 107250
work page 2021
-
[37]
R. Polvara, S. Molina, I. Hroob, A. Papadimitriou, K. Tsiolis, D. Giak- oumis, S. Likothanassis, D. Tzovaras, G. Cielniak, M. Hanheide, Bac- chus Long-Term (BLT) data set: Acquisition of the agricultural multi- modal BLT data set with automated robot deployment, Journal of Field Robotics 41 (7) (2024) 2280–2298. 34
work page 2024
-
[38]
M. Martini, M. Ambrosio, J. Vilella-Cantos, A. Navone, M. Chiaberge, TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localiza- tion and Mapping in Vineyards, arXiv preprint arXiv:2512.04772 (2025). doi:https://doi.org/10.48550/arXiv.2512.04772
-
[39]
J. Komorowski, Improving point cloud based place recognition with ranking-based loss and large batch training, in: 2022 26th international conference on pattern recognition (ICPR), IEEE, 2022, pp. 3699–3705
work page 2022
-
[40]
R. B. Rusu, N. Blodow, M. Beetz, Fast point feature histograms (FPFH) for 3D registration, in: 2009 IEEE international conference on robotics and automation, IEEE, 2009, pp. 3212–3217. 35
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.