arxiv: 2601.18714 · v2 · submitted 2026-01-26 · 💻 cs.CV · cs.AI· cs.LG· cs.RO

Low Cost, High Efficiency: LiDAR Place Recognition in Vineyards with Matryoshka Representation Learning

Judith Vilella-Cantos , Mauro Martini , Marcello Chiaberge , M\'onica Ballesta , David Valiente This is my paper

Pith reviewed 2026-05-16 10:59 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LGcs.RO

keywords place recognitionLiDARvineyardsMatryoshka Representation Learningdeep learningagricultural roboticssparse point cloudsrobot localization

0 comments

The pith

A lightweight network recognizes places in vineyards using only low-cost sparse LiDAR data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MinkUNeXt-VINE, a deep-learning method designed for robot localization in unstructured vineyard settings that lack clear landmarks. It combines targeted LiDAR pre-processing with a Matryoshka Representation Learning multi-loss to produce effective embeddings even when input scans are sparse and output dimensions are kept low. This design targets real-time performance on affordable hardware. Evaluation on two long-term vineyard datasets collected with different sensors shows the method outperforming prior approaches while remaining efficient. If the results hold, agricultural robots could achieve reliable navigation without needing dense or expensive sensors.

Core claim

MinkUNeXt-VINE surpasses existing place recognition methods in vineyard environments through its pre-processing pipeline and Matryoshka Representation Learning multi-loss, which together enable strong results from low-cost sparse LiDAR inputs and reduced-dimensionality outputs suitable for real-time use.

What carries the argument

MinkUNeXt-VINE architecture paired with Matryoshka multi-loss training that supports variable embedding sizes while preserving accuracy on sparse point clouds.

If this is right

Agricultural robots can perform reliable place recognition with sparse low-cost LiDAR scans instead of dense expensive ones.
Lower output embedding dimensions reduce memory and computation needs without major accuracy loss.
The same pipeline maintains performance across different LiDAR sensors in extended vineyard recordings.
Real-time operation on embedded hardware becomes practical due to the lightweight design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The Matryoshka approach could be adapted for place recognition with other sparse sensors in feature-poor environments such as orchards or greenhouses.
Combining the method with visual or inertial data might further lower hardware costs in precision farming.
Seasonal growth changes in vines would provide a direct test of whether the learned representations remain stable over time.

Load-bearing premise

The pre-processing steps and Matryoshka multi-loss combination will continue to deliver performance gains on new vineyard data or different sensor configurations beyond the two specific long-term datasets used.

What would settle it

Evaluating the trained model on a third independent vineyard dataset recorded with a different LiDAR model and finding that accuracy falls below current state-of-the-art baselines.

Figures

Figures reproduced from arXiv: 2601.18714 by David Valiente, Judith Vilella-Cantos, Marcello Chiaberge, Mauro Martini, M\'onica Ballesta.

**Figure 2.** Figure 2: Extension of each of the vineyard considered within the BLT datasets [37]. a) [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison between the two LiDAR sensors used in the TEMPO-VINE dataset [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Extension of the field where the TEMPO-VINE dataset [38] was recorded. [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Recall@N graph for the different design decisions tested. [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Example of seasonal changes from a same location in a vineyard environment [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Cross-season generalization performance matrix showing Recall@1 values. [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: Inference time for a single point cloud with each of the pipelines analyzed in [PITH_FULL_IMAGE:figures/full_fig_p027_8.png] view at source ↗

read the original abstract

Localization in agricultural environments is challenging due to their unstructured nature and lack of distinctive landmarks. Although agricultural settings have been studied in the context of object classification and segmentation, the place recognition task for mobile robots is not trivial in the current state of the art. In this study, we propose MinkUNeXt-VINE, a lightweight, deep-learning-based method that surpasses state-of-the-art methods in vineyard environments thanks to its pre-processing and Matryoshka Representation Learning multi-loss approach. Our method prioritizes enhanced performance with low-cost, sparse LiDAR inputs and lower-dimensionality outputs to ensure high efficiency in real-time scenarios. Additionally, we present a comprehensive ablation study of the results on various evaluation cases and two extensive long-term vineyard datasets employing different LiDAR sensors. The results demonstrate the efficiency of the trade-off output produced by this approach, as well as its robust performance on low-cost and low-resolution input data. The code is publicly available for reproduction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes MinkUNeXt-VINE, a lightweight deep-learning architecture for LiDAR place recognition in vineyards. It claims to surpass SOTA by combining specific pre-processing steps with a Matryoshka Representation Learning multi-loss formulation, achieving strong accuracy-efficiency trade-offs on low-cost, sparse, and low-resolution LiDAR inputs while supporting real-time operation. Evaluation includes ablation studies and results on two long-term vineyard datasets collected with different sensors; code is released for reproduction.

Significance. If the reported gains prove robust, the work would be significant for agricultural robotics, where low-cost LiDAR place recognition remains challenging due to unstructured environments. The emphasis on lower-dimensional outputs and efficiency, together with public code release, supports reproducibility and potential deployment on resource-constrained platforms.

major comments (1)

[Abstract and §4] Abstract and §4 (Evaluation): The claim that the method delivers 'robust performance on low-cost and low-resolution input data' and surpasses SOTA rests entirely on results from two specific long-term vineyard datasets. No experiments on additional vineyard sites, different row geometries, seasonal extremes, or unseen sensor configurations are reported, leaving open whether the pre-processing and Matryoshka multi-loss gains generalize or are partly dataset-specific.

minor comments (1)

[Abstract] The abstract refers to a 'comprehensive ablation study' without enumerating the ablated components (e.g., individual loss terms, pre-processing variants) or reporting quantitative deltas; adding a dedicated ablation table would clarify the contribution of each element.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the scope of our evaluation below and will make a partial revision to better contextualize the generalizability of the results.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Evaluation): The claim that the method delivers 'robust performance on low-cost and low-resolution input data' and surpasses SOTA rests entirely on results from two specific long-term vineyard datasets. No experiments on additional vineyard sites, different row geometries, seasonal extremes, or unseen sensor configurations are reported, leaving open whether the pre-processing and Matryoshka multi-loss gains generalize or are partly dataset-specific.

Authors: We thank the referee for highlighting this point. Our evaluation is conducted on two long-term vineyard datasets collected with different LiDAR sensors (detailed in Section 4), which capture temporal variations, unstructured environments, and sparse low-resolution inputs representative of low-cost hardware. Comprehensive ablations isolate the contributions of the pre-processing steps and Matryoshka multi-loss formulation. We agree that experiments on additional sites, row geometries, and seasonal conditions would provide stronger evidence of generalization beyond these datasets. In the revised version, we will add a limitations paragraph in the conclusion discussing the current evaluation scope and outlining future work on broader vineyard configurations. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical architecture evaluated on held-out data

full rationale

The paper proposes MinkUNeXt-VINE as a lightweight network combining standard pre-processing with a Matryoshka multi-loss training recipe. All performance claims rest on ablation studies and comparisons against baselines using held-out sequences from two long-term vineyard datasets. No equations, uniqueness theorems, or first-principles derivations are presented that reduce to fitted parameters or self-citations by construction. The method is an empirical architecture plus training procedure whose validity is tested externally on the reported splits rather than being definitionally forced.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the approach rests on standard deep-learning assumptions and the empirical performance on the provided datasets.

pith-pipeline@v0.9.0 · 5491 in / 1079 out tokens · 39311 ms · 2026-05-16T10:59:00.056296+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

I. A. Kazerouni, L. Fitzgerald, G. Dooly, D. Toal, A survey of state-of- the-art on visual SLAM, Expert Systems with Applications 205 (2022) 117734

work page 2022
[2]

Macario Barros, M

A. Macario Barros, M. Michel, Y. Moline, G. Corre, F. Carrel, A com- prehensive survey of visual SLAM algorithms, Robotics 11 (1) (2022) 24. 30

work page 2022
[3]

Zhang, P

Y. Zhang, P. Shi, J. Li, 3D LiDAR SLAM: A survey, The Photogram- metric Record 39 (186) (2024) 457–517

work page 2024
[4]

M. A. Uy, G. H. Lee, PointNetVLAD: Deep point cloud based retrieval for large-scale place recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4470–4479

work page 2018
[5]

Komorowski, MinkLoc3D: Point cloud based large-scale place recog- nition, in: Proceedings of the IEEE/CVF winter conference on applica- tions of computer vision, 2021, pp

J. Komorowski, MinkLoc3D: Point cloud based large-scale place recog- nition, in: Proceedings of the IEEE/CVF winter conference on applica- tions of computer vision, 2021, pp. 1790–1799

work page 2021
[6]

X. Chen, T. L¨ abe, A. Milioto, T. R¨ ohling, J. Behley, C. Stachniss, Over- lapNet: A siamese network for computing LiDAR scan similarity with applications to loop closing and localization, Autonomous Robots 46 (1) (2022) 61–81

work page 2022
[7]

Lehnert, D

C. Lehnert, D. Tsai, A. Eriksson, C. McCool, 3D move to see: Multi- perspective visual servoing towards the next best view within unstruc- tured and occluded environments, in: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, 2019, pp. 3890–3897

work page 2019
[8]

H. Ding, B. Zhang, J. Zhou, Y. Yan, G. Tian, B. Gu, Recent devel- opments and applications of simultaneous localization and mapping in agriculture, Journal of field robotics 39 (6) (2022) 956–983

work page 2022
[9]

Coulibaly, B

S. Coulibaly, B. Kamsu-Foguem, D. Kamissoko, D. Traore, Deep learn- ing for precision agriculture: A bibliometric analysis, Intelligent Systems with Applications 16 (2022) 200102

work page 2022
[10]

H. Teng, Y. Wang, X. Song, K. Karydis, Multimodal dataset for localiza- tion, mapping and crop monitoring in citrus tree farms, in: International Symposium on Visual Computing, Springer, 2023, pp. 571–582

work page 2023
[11]

Chebrolu, F

N. Chebrolu, F. Magistri, T. L¨ abe, C. Stachniss, Registration of spatio- temporal point clouds of plants for phenotyping, PloS one 16 (2) (2021) e0247243

work page 2021
[12]

Padhiary, L

M. Padhiary, L. N. Sethi, A. Kumar, Enhancing hill farming efficiency using unmanned agricultural vehicles: a comprehensive review, Trans- actions of the Indian National Academy of Engineering 9 (2) (2024) 253–268. 31

work page 2024
[13]

J. J. Cabrera, A. Santo, A. Gil, C. Viegas, L. Pay´ a, MinkUNeXt: Point cloud-based large-scale place recogni- tion using 3D sparse convolutions, Array 28 (2025) 100569. doi:https://doi.org/10.1016/j.array.2025.100569

work page doi:10.1016/j.array.2025.100569 2025
[14]

Kusupati, G

A. Kusupati, G. Bhatt, A. Rege, M. Wallingford, A. Sinha, V. Ramanu- jan, W. Howard-Snyder, K. Chen, S. Kakade, P. Jain, et al., Matryoshka representation learning, Advances in Neural Information Processing Sys- tems 35 (2022) 30233–30249

work page 2022
[15]

E. Liu, J. Monica, K. Gold, L. Cadle-Davidson, D. Combs, Y. Jiang, Vision-based vineyard navigation solution with automatic annotation, in: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2023, pp. 4234–4241

work page 2023
[16]

D. Aghi, V. Mazzia, M. Chiaberge, Local motion planner for au- tonomous navigation in vineyards with a RGB-D camera-based algo- rithm and deep learning synergy, Machines 8 (2) (2020) 27

work page 2020
[17]

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolu- tional neural networks for mobile vision applications, arXiv preprint arXiv:1704.04861 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

N. Sun, Z. Fan, Q. Qiu, T. Li, Q. Feng, C. Ji, C. Zhao, TriLoc- NetVLAD: Enhancing Long-term Place Recognition in Orchards with a Novel LiDAR-Based Approach, in: 2024 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), IEEE, 2024, pp. 16– 22

work page 2024
[19]

Arandjelovic, P

R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, J. Sivic, NetVLAD: CNN architecture for weakly supervised place recognition, in: Proceed- ings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307

work page 2016
[20]

Barros, C

T. Barros, C. Premebida, S. Aravecchia, C. Pradalier, U. Nunes, SPV- SoAP3D: A Second-order Average Pooling Approach to enhance 3D Place Recognition in Horticultural Environments, in: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2024, pp. 9–15. 32

work page 2024
[21]

Barros, L

T. Barros, L. Garrote, P. Conde, M. Coombes, C. Liu, C. Premebida, U. Nunes, PointNetPGAP-SLC: A 3D LiDAR-Based Place Recognition Approach With Segment-Level Consistency Training for Mobile Robots in Horticulture, IEEE Robotics and Automation Letters (2024)

work page 2024
[22]

C. R. Qi, H. Su, K. Mo, L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660

work page 2017
[23]

Z. Liu, S. Zhou, C. Suo, P. Yin, W. Chen, H. Wang, H. Li, Y.-H. Liu, LPD-Net: 3D point cloud learning for large-scale place recognition and environment analysis, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2831–2840

work page 2019
[24]

C. Choy, J. Gwak, S. Savarese, 4D spatio-temporal convnets: Minkowski convolutional neural networks, in: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, 2019, pp. 3075– 3084

work page 2019
[25]

G. Kim, A. Kim, Scan context: Egocentric spatial descriptor for place recognition within 3D point cloud map, in: 2018 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS), IEEE, 2018, pp. 4802–4809

work page 2018
[26]

H. Deng, Z. Pei, Z. Tang, J. Zhang, J. Yang, Fusion scan context: a global descriptor fusing altitude, intensity and density for place recog- nition, in: 2023 IEEE International Conference on Mechatronics and Automation (ICMA), IEEE, 2023, pp. 1604–1610

work page 2023
[27]

L. Li, X. Kong, X. Zhao, T. Huang, W. Li, F. Wen, H. Zhang, Y. Liu, SSC: Semantic scan context for large-scale place recognition, in: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2021, pp. 2092–2099

work page 2021
[28]

C. Yuan, J. Lin, Z. Zou, X. Hong, F. Zhang, STD: Stable Triangle De- scriptor for 3D place recognition, in: 2023 IEEE International Confer- ence on Robotics and Automation (ICRA), IEEE, 2023, pp. 1897–1903. 33

work page 2023
[29]

Im, S.-W

J.-U. Im, S.-W. Ki, J.-H. Won, Omni Point: 3D LiDAR-based feature extraction method for place recognition and point registration, IEEE Transactions on Intelligent Vehicles 9 (8) (2024) 5255–5271

work page 2024
[30]

M´ aximo, A

M. M´ aximo, A. Santo, A. Gil, M. Ballesta, D. Valiente, A Coarse to Fine 3D LiDAR Localization with Deep Local Features for Long Term Robot Navigation in Large Environments, arXiv preprint arXiv:2505.18340 (2025). doi:https://doi.org/10.48550/arXiv.2505.18340

work page doi:10.48550/arxiv.2505.18340 2025
[31]

T. Liu, W. Jiao, J. Bao, A novel multi-loss dynamic fusion-enhanced image segmentation model for welding spatter measurement, Journal of Manufacturing Processes 128 (2024) 125–132

work page 2024
[32]

Wazir, M

S. Wazir, M. M. Fraz, HistoSeg: Quick attention with multi-loss func- tion for multi-structure segmentation in digital histology images, in: 2022 12th International Conference on Pattern Recognition Systems (ICPRS), IEEE, 2022, pp. 1–7

work page 2022
[33]

H. N. Thi, Q. N. Huu, Q. D. T. Thuy, N. T. T. Le, T. N. Van, H. N. Huu, A Multi-Loss Hybrid CNN-ViT Model for Efficient Image Retrieval, in: 2025 International Conference on Multimedia Analysis and Pattern Recognition (MAPR), IEEE, 2025, pp. 1–6

work page 2025
[34]

Barros, P

T. Barros, P. Conde, G. Gon¸ calves, C. Premebida, M. Monteiro, C. S. S. Ferreira, U. J. Nunes, Multispectral vineyard segmentation: A deep learning comparison study, Computers and electronics in agriculture 195 (2022) 106782

work page 2022
[35]

D. K. Barbole, P. M. Jadhav, GrapesNet: Indian RGB & RGB-D vine- yard image datasets for deep learning applications, Data in Brief 48 (2023) 109100

work page 2023
[36]

Abdelghafour, B

F. Abdelghafour, B. Keresztes, A. Deshayes, C. Germain, J.-P. Da Costa, An annotated image dataset of downy mildew symptoms on Merlot grape variety, Data in Brief 37 (2021) 107250

work page 2021
[37]

Polvara, S

R. Polvara, S. Molina, I. Hroob, A. Papadimitriou, K. Tsiolis, D. Giak- oumis, S. Likothanassis, D. Tzovaras, G. Cielniak, M. Hanheide, Bac- chus Long-Term (BLT) data set: Acquisition of the agricultural multi- modal BLT data set with automated robot deployment, Journal of Field Robotics 41 (7) (2024) 2280–2298. 34

work page 2024
[38]

Martini, M

M. Martini, M. Ambrosio, J. Vilella-Cantos, A. Navone, M. Chiaberge, TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localiza- tion and Mapping in Vineyards, arXiv preprint arXiv:2512.04772 (2025). doi:https://doi.org/10.48550/arXiv.2512.04772

work page doi:10.48550/arxiv.2512.04772 2025
[39]

J. Komorowski, Improving point cloud based place recognition with ranking-based loss and large batch training, in: 2022 26th international conference on pattern recognition (ICPR), IEEE, 2022, pp. 3699–3705

work page 2022
[40]

R. B. Rusu, N. Blodow, M. Beetz, Fast point feature histograms (FPFH) for 3D registration, in: 2009 IEEE international conference on robotics and automation, IEEE, 2009, pp. 3212–3217. 35

work page 2009