pith. sign in

arxiv: 2501.07360 · v1 · submitted 2025-01-13 · 💻 cs.CV · cs.LG

TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations

Pith reviewed 2026-05-23 05:46 UTC · model grok-4.3

classification 💻 cs.CV cs.LG
keywords TimberVisionlog segmentationtrunk trackingforestry automationRGB detectioninstance segmentationoriented object detectionmulti-object tracking
0
0 comments X

The pith

TimberVision dataset and fusion framework enable accurate trunk representations from RGB images alone even in challenging forest conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the TimberVision dataset of more than 2,000 annotated RGB images containing 51,000 trunk components to fill the gap in specialized data for automating forestry tasks like harvesting and measuring logs. It runs ablation studies on oriented object detection and instance segmentation, then fuses the component outputs into unified trunk representations. The work adds automatic derivation of geometric properties and multi-object tracking to boost robustness. A sympathetic reader would care because accurate vision-based trunk detection could reduce human exposure to safety risks in remote operations by supporting automation from ordinary cameras.

Core claim

Our detection and tracking approach provides highly descriptive and accurate trunk representations solely from RGB image data, even under challenging environmental conditions. Our solution is suitable for a wide range of application scenarios and can be readily combined with other sensor modalities.

What carries the argument

The generic framework that fuses component detections from oriented object detection and instance segmentation models into unified trunk representations, then applies multi-object tracking.

If this is right

  • Oriented object detection and instance segmentation models trained on the dataset locate cut and lateral trunk surfaces from RGB input.
  • Fusing the component outputs produces unified trunk representations that support derivation of geometric properties.
  • Multi-object tracking applied to the representations increases robustness across image sequences.
  • The resulting system operates across diverse forestry scenarios without requiring non-RGB sensors.
  • The representations remain descriptive and accurate under challenging environmental conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such RGB-derived trunk representations could feed directly into robotic control loops for automated log handling without extra hardware.
  • The dataset could act as a starting benchmark for testing vision methods in other unstructured outdoor settings.
  • Extending the geometric derivations to estimate volume or weight in real time might support on-site log sorting decisions.

Load-bearing premise

The annotated dataset accurately represents the variety of real-world forestry scenes and conditions encountered in autonomous operations.

What would settle it

A controlled test measuring detection and tracking accuracy on new RGB images captured in forest conditions with tree species, weather, or occlusion levels absent from the dataset would show whether performance remains high.

Figures

Figures reproduced from arXiv: 2501.07360 by Andreas Trondl, Daniel Steininger, Julia Simon, Markus Murschitz.

Figure 1
Figure 1. Figure 1: Representative examples of semi-automatically gener [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative image samples for Low, Mid and High intensities of annotated scene parameters. The color bars show their distributions across the dataset. from the each other or the background, visible Section Ar￾eas as well as Section Lines denoting the visible borders of cross-sections facing the other way. Additional points un￾ambiguously denote areas covered by each trunk between its constituent lines. … view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our annotation pipeline automatically de [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of our task-fusion algorithm deriving unified trunk representations from OOD and ISEG outputs for individual com [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Accuracy as mean class mAP50-95 and average inference time on test set for multiple model capacities and image sizes [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Ablation results as mAP50-95 on test set for three class combinations and two input sizes. For ISEG, Bound is excluded due to its inherent overlap with Side. The dark-to-light bar colors denote model capacities Nano, Medium and X-Large, respectively. sistent trends for both tasks. While mAP scores for the Side class are slightly lower than those of opaque Trunk in￾stances, the values for Cut are significan… view at source ↗
Figure 7
Figure 7. Figure 7: Model performance for different scene-parameter intensities as [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Representative fusion results on test images. The left [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Hourly and monthly recording-time distributions of an [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of annotation schemes in TimberSeg [ [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Distribution of scene parameters for annotated [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Distribution of instance sizes for each class based on oriented-bounding-box dimensions. Instance width refers to the box side [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗
Figure 15
Figure 15. Figure 15: Heat maps illustrating the distribution of instance [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Validation accuracy after each training epoch when [PITH_FULL_IMAGE:figures/full_fig_p015_16.png] view at source ↗
Figure 19
Figure 19. Figure 19: Additional qualitative results on the TimberSeg dataset [PITH_FULL_IMAGE:figures/full_fig_p017_19.png] view at source ↗
Figure 18
Figure 18. Figure 18: Additional qualitative results on the test splits of the [PITH_FULL_IMAGE:figures/full_fig_p017_18.png] view at source ↗
Figure 21
Figure 21. Figure 21: Additional qualitative results showing limitations on [PITH_FULL_IMAGE:figures/full_fig_p018_21.png] view at source ↗
read the original abstract

Timber represents an increasingly valuable and versatile resource. However, forestry operations such as harvesting, handling and measuring logs still require substantial human labor in remote environments posing significant safety risks. Progressively automating these tasks has the potential of increasing their efficiency as well as safety, but requires an accurate detection of individual logs as well as live trees and their context. Although initial approaches have been proposed for this challenging application domain, specialized data and algorithms are still too scarce to develop robust solutions. To mitigate this gap, we introduce the TimberVision dataset, consisting of more than 2k annotated RGB images containing a total of 51k trunk components including cut and lateral surfaces, thereby surpassing any existing dataset in this domain in terms of both quantity and detail by a large margin. Based on this data, we conduct a series of ablation experiments for oriented object detection and instance segmentation and evaluate the influence of multiple scene parameters on model performance. We introduce a generic framework to fuse the components detected by our models for both tasks into unified trunk representations. Furthermore, we automatically derive geometric properties and apply multi-object tracking to further enhance robustness. Our detection and tracking approach provides highly descriptive and accurate trunk representations solely from RGB image data, even under challenging environmental conditions. Our solution is suitable for a wide range of application scenarios and can be readily combined with other sensor modalities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the TimberVision dataset (>2k RGB images, 51k annotated trunk components including cut and lateral surfaces), conducts ablation experiments on oriented object detection and instance segmentation to assess scene-parameter influence, presents a framework fusing component detections into unified trunk representations, derives geometric properties, and applies multi-object tracking, claiming highly descriptive and accurate trunk representations from RGB data alone even under challenging environmental conditions.

Significance. If the central claims hold, the work would be significant for supplying the largest and most detailed public dataset in autonomous forestry vision, along with a generic multi-task fusion and tracking pipeline that integrates detection outputs into usable trunk models; the scale (surpassing prior datasets by a large margin) and the explicit scene-parameter ablations constitute concrete strengths that could accelerate reproducible progress in the domain.

major comments (2)
  1. [Abstract] Abstract: the claim that results hold 'even under challenging environmental conditions' is load-bearing for the central contribution, yet the manuscript provides no quantitative evidence that the 2k-image collection spans the target distribution (e.g., no histograms or statistical comparison of lighting, occlusion, weather, or terrain parameters against operational forestry footage, and no held-out extreme-condition subset).
  2. [Dataset / Experiments] Dataset and experimental sections: without an explicit representativeness analysis or cross-validation against real autonomous-operation footage, the reported accuracy and tracking robustness cannot be taken to support generalization to the safety-critical cases the introduction identifies as motivation.
minor comments (1)
  1. [Abstract] Abstract: the description of ablation experiments does not name the concrete metrics (mAP, mask IoU, tracking MOTA, etc.) or the magnitude of the observed scene-parameter effects.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the need to substantiate claims about dataset representativeness and generalization. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that results hold 'even under challenging environmental conditions' is load-bearing for the central contribution, yet the manuscript provides no quantitative evidence that the 2k-image collection spans the target distribution (e.g., no histograms or statistical comparison of lighting, occlusion, weather, or terrain parameters against operational forestry footage, and no held-out extreme-condition subset).

    Authors: We agree that the manuscript lacks explicit quantitative evidence, such as histograms or statistical comparisons of scene parameters against a reference distribution of operational forestry footage. The dataset was collected across multiple sites and sessions to capture variability in lighting, occlusion, weather, and terrain, and the ablation studies already quantify performance sensitivity to several of these parameters. However, we did not include formal distributional analysis or a held-out extreme-condition subset. We will add this analysis to the dataset section of the revised manuscript (including parameter histograms and summary statistics) and will adjust the abstract wording to align with the provided evidence. revision: yes

  2. Referee: [Dataset / Experiments] Dataset and experimental sections: without an explicit representativeness analysis or cross-validation against real autonomous-operation footage, the reported accuracy and tracking robustness cannot be taken to support generalization to the safety-critical cases the introduction identifies as motivation.

    Authors: We acknowledge the validity of this observation. The reported results are obtained on the TimberVision dataset collected in real forestry settings, and the scene-parameter ablations provide indirect evidence of robustness across conditions. Nevertheless, the manuscript does not contain an explicit representativeness analysis or cross-validation against independent operational footage. We will incorporate a representativeness analysis (parameter distributions and comparison to typical forestry operation characteristics) in the revised dataset section. A full external cross-validation is not feasible with currently available public data and will be noted as a limitation for future work. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical results from new dataset and standard methods

full rationale

The paper introduces a new annotated dataset (TimberVision) and applies standard oriented object detection, instance segmentation, fusion into trunk representations, geometric property derivation, and multi-object tracking. All performance claims are derived from training and evaluation on this held-out or ablated data splits rather than any self-referential fitting, self-citation chain, or ansatz that reduces the output to the input by construction. No equations or uniqueness theorems are invoked that collapse the claimed trunk representations back to the annotations themselves. The central results remain independent empirical measurements on the contributed data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the new dataset being representative of real forestry conditions and standard CV methods (detection, segmentation, tracking) performing well on it without additional sensors.

axioms (1)
  • domain assumption RGB images contain sufficient information for accurate trunk component detection and tracking in forestry environments
    The paper relies on this to claim suitability from RGB data alone under challenging conditions.

pith-pipeline@v0.9.0 · 5783 in / 1177 out tokens · 29010 ms · 2026-05-23T05:46:00.606868+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 2 internal anchors

  1. [1]

    arXiv preprint arXiv:2206.14651

    Nir Aharon, Roy Orfaig, and Ben-Zion Bobrovsky. Bot- sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651, 2022. 5

  2. [2]

    Evaluating mul- tiple object tracking performance: the clear mot metrics

    Keni Bernardin and Rainer Stiefelhagen. Evaluating mul- tiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008:1– 10, 2008. 5

  3. [3]

    MMDetection: Open MMLab Detection Toolbox and Benchmark

    Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detection tool- box and benchmark. arXiv preprint arXiv:1906.07155, 2019. 2

  4. [4]

    Sloam: Semantic lidar odometry and mapping for forest in- ventory

    Steven W Chen, Guilherme V Nardari, Elijah S Lee, Chao Qu, Xu Liu, Roseli Ap Francelin Romero, and Vijay Kumar. Sloam: Semantic lidar odometry and mapping for forest in- ventory. IEEE Robotics and Automation Letters , 5(2):612– 619, 2020. 2

  5. [5]

    An automatic approach for tree species detection and pro- file estimation of urban street trees using deep learning and google street view images

    Kwanghun Choi, Wontaek Lim, Byungwoo Chang, Jinah Jeong, Inyoo Kim, Chan-Ryul Park, and Dongwook W Ko. An automatic approach for tree species detection and pro- file estimation of urban street trees using deep learning and google street view images. ISPRS Journal of Photogramme- try and Remote Sensing, 190:165–180, 2022. 2

  6. [6]

    Visible and thermal image-based trunk detection with deep learning for forestry mobile robotics

    Daniel Queir ´os da Silva, Filipe Neves Dos Santos, Ar- mando Jorge Sousa, and V ´ıtor Filipe. Visible and thermal image-based trunk detection with deep learning for forestry mobile robotics. Journal of Imaging, 7(9):176, 2021. 2

  7. [7]

    The pascal visual object classes (voc) challenge

    Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), 88(2):303–338, 2010. 5

  8. [8]

    Urban tree generator: spatio-temporal and generative deep learning for urban tree localization and modeling

    Adnan Firoze, Bedrich Benes, and Daniel Aliaga. Urban tree generator: spatio-temporal and generative deep learning for urban tree localization and modeling. The Visual Computer, 38(9):3327–3339, 2022. 2

  9. [9]

    Tree instance segmentation with temporal contour graph

    Adnan Firoze, Cameron Wingren, Raymond A Yeh, Bedrich Benes, and Daniel Aliaga. Tree instance segmentation with temporal contour graph. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 2193–2202, 2023. 2

  10. [10]

    Instance segmen- tation for autonomous log grasping in forestry operations

    Jean-Michel Fortin, Olivier Gamache, Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu`ere. Instance segmen- tation for autonomous log grasping in forestry operations. In International Conference on Intelligent Robots and Systems (IROS), pages 6064–6071. IEEE, 2022. 2, 4, 7, 5, 8

  11. [11]

    Tree detection and diameter estima- tion based on deep learning

    Vincent Grondin, Jean-Michel Fortin, Franc ¸ois Pomerleau, and Philippe Gigu `ere. Tree detection and diameter estima- tion based on deep learning. Forestry, 96(2):264–276, 2023. 2, 4, 7

  12. [12]

    Training deep learning algorithms on synthetic forest images for tree detection

    Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu`ere. Training deep learning algorithms on synthetic forest images for tree detection. In Workshop in Innovation in Forestry Robotics: Research and Industry Adoption (ICRA), 2022. 2, 4

  13. [13]

    Mask r-cnn

    Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 2961–2969, 2017. 2

  14. [14]

    py-motmetrics

    Christoph Heindl and Jack Valmadre. py-motmetrics. Code repository https://github.com/cheind/py-motmetrics, 2024. 5

  15. [15]

    Towards autonomous robotic precision harvesting: Mapping, localization, planning and control for a legged tree harvester

    Edo Jelavic, Dominic Jud, Pascal Egli, and Marco Hutter. Towards autonomous robotic precision harvesting: Mapping, localization, planning and control for a legged tree harvester. arXiv preprint arXiv:2104.10110, 2021. 1

  16. [16]

    Rellis-3d dataset: Data, benchmarks and analy- sis

    Peng Jiang, Philip Osteen, Maggie Wigness, and Srikanth Saripalli. Rellis-3d dataset: Data, benchmarks and analy- sis. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1110–1116. IEEE, 2021. 2

  17. [17]

    Ultralyt- ics YOLO

    Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralyt- ics YOLO. https://github.com/ultralytics/ ultralytics, 2024. Accessed: 2024-07-15. 2, 4

  18. [18]

    A deep learning-based approach for tree trunk segmentation

    Danilo Samuel Jodas, Sergio Brazolin, Takashi Yojo, Reinaldo Araujo De Lima, Giuliana Del Nero Velasco, Aline Ribeiro Machado, and Joao Paulo Papa. A deep learning-based approach for tree trunk segmentation. In2021 34th SIBGRAPI Conference on Graphics, Patterns and Im- ages (SIBGRAPI), pages 370–377. IEEE, 2021. 2

  19. [19]

    Panoptic segmentation

    Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll ´ar. Panoptic segmentation. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9404–9413, 2019. 2

  20. [20]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3

  21. [21]

    Sensor agnos- tic semantic segmentation of structurally diverse and com- plex forest point clouds using deep learning

    Sean Krisanski, Mohammad Sadegh Taskhiri, Susana Gon- zalez Aracil, David Herries, and Paul Turner. Sensor agnos- tic semantic segmentation of structurally diverse and com- plex forest point clouds using deep learning. Remote Sens- ing, 13(8):1413, 2021. 2

  22. [22]

    Exploring the feasibility of autonomous forestry operations: Results from the first experimental un- manned machine

    Pedro La Hera, Omar Mendoza-Trejo, Ola Lindroos, H ˚akan Lideskog, Torbj ¨orn Lindb ¨ack, Saira Latif, Songyu Li, and Magnus Karlberg. Exploring the feasibility of autonomous forestry operations: Results from the first experimental un- manned machine. Journal of Field Robotics, 41(4):942–965,

  23. [23]

    Finnwoodlands dataset

    Juan Lagos, Urho Lempi ¨o, and Esa Rahtu. Finnwoodlands dataset. In Scandinavian Conference on Image Analysis , pages 95–110. Springer, 2023. 2

  24. [24]

    Street tree segmentation from mo- bile laser scanning data using deep learning-based image in- stance segmentation

    Qiujie Li and Yu Yan. Street tree segmentation from mo- bile laser scanning data using deep learning-based image in- stance segmentation. Urban Forestry & Urban Greening , 92:128200, 2024. 2

  25. [25]

    Fully convolutional networks for panoptic segmentation

    Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Liwei Wang, Zeming Li, Jian Sun, and Jiaya Jia. Fully convolutional networks for panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 214–223, 2021. 2

  26. [26]

    Focal loss for dense object detection

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Pro- ceedings of the IEEE International Conference on Computer Vision, pages 2980–2988, 2017. 2 9

  27. [27]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 740–755. Springer, 2014. 2, 5, 7

  28. [28]

    Fully convolutional networks for semantic segmentation

    Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015. 2

  29. [29]

    Image segmenta- tion using text and image prompts

    Timo L ¨uddecke and Alexander Ecker. Image segmenta- tion using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7086–7096, 2022. 3

  30. [30]

    Mapping trees along urban street networks with deep learn- ing and street-level imagery

    Stefanie Lumnitz, Tahia Devisscher, Jerome R Mayaud, Valentina Radic, Nicholas C Coops, and Verena C Griess. Mapping trees along urban street networks with deep learn- ing and street-level imagery. ISPRS Journal of Photogram- metry and Remote Sensing, 175:144–157, 2021. 2

  31. [31]

    Meher V . R. Malladi, Tiziano Guadagnino, Luca Lobefaro, Matias Mattamala, Holger Griess, Janine Schweier, Nived Chebrolu, Maurice Fallon, Jens Behley, and Cyrill Stach- niss. Tree instance segmentation and traits estimation for forestry environments exploiting lidar data collected by mo- bile robots. In 2024 IEEE International Conference on Robotics and ...

  32. [32]

    Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming

    Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming. arXiv preprint arXiv:1907.07484, 2019. 2

  33. [33]

    Bureau of Labor Statistics

    U.S. Bureau of Labor Statistics. National census of fatal oc- cupational injuries in 2022. https://www.bls.gov/ news . release / pdf / cfoi . pdf, 2023. Accessed: 2024-07-15. 1

  34. [34]

    Towards real-time forest inventory using handheld lidar.Robotics and Autonomous Systems, 157:104240, 2022

    Alexander Proudman, Milad Ramezani, Sundara Tejaswi Digumarti, Nived Chebrolu, and Maurice Fallon. Towards real-time forest inventory using handheld lidar.Robotics and Autonomous Systems, 157:104240, 2022. 2

  35. [35]

    You only look once: Unified, real-time object de- tection

    Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016. 2

  36. [36]

    Performance measures and a data set for multi-target, multi-camera tracking

    Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 17–35. Springer, 2016. 5

  37. [37]

    Real- ization of a highly accurate mobile robot system for multi purpose precision forestry applications

    Juergen Rossmann, Michael Schluse, Christian Schlette, Arno Buecken, Petra Krahwinkler, and Markus Emde. Real- ization of a highly accurate mobile robot system for multi purpose precision forestry applications. In 2009 Interna- tional Conference on Advanced Robotics , pages 1–6. IEEE,

  38. [38]

    Guided curriculum model adaptation and uncertainty-aware evalua- tion for semantic nighttime image segmentation

    Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Guided curriculum model adaptation and uncertainty-aware evalua- tion for semantic nighttime image segmentation. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 7374–7383, 2019. 2

  39. [39]

    https:// github.com/scalabel/scalabel

    Scalabel open-source web annotation tool. https:// github.com/scalabel/scalabel. Accessed: 2024- 07-15. 3

  40. [40]

    Towards scene understanding for autonomous operations on airport aprons

    Daniel Steininger, Andreas Kriegler, Wolfgang Pointner, Verena Widhalm, Julia Simon, and Oliver Zendel. Towards scene understanding for autonomous operations on airport aprons. In Proceedings of the Asian Conference on Com- puter Vision, pages 147–163, 2022. 2

  41. [41]

    Segmenter: Transformer for semantic segmenta- tion

    Robin Strudel, Ricardo Garcia, Ivan Laptev, and Cordelia Schmid. Segmenter: Transformer for semantic segmenta- tion. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision, pages 7262–7272, 2021. 2

  42. [42]

    Individual rubber tree segmentation based on ground-based lidar data and faster r- cnn of deep learning

    Jiamin Wang, Xinxin Chen, Lin Cao, Feng An, Bangqian Chen, Lianfeng Xue, and Ting Yun. Individual rubber tree segmentation based on ground-based lidar data and faster r- cnn of deep learning. Forests, 10(9):793, 2019. 2

  43. [43]

    A rugd dataset for autonomous naviga- tion and visual perception in unstructured outdoor environ- ments

    Maggie Wigness, Sungmin Eum, John G Rogers, David Han, and Heesung Kwon. A rugd dataset for autonomous naviga- tion and visual perception in unstructured outdoor environ- ments. In 2019 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 5000–5007. IEEE,

  44. [44]

    Detectron2

    Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. https://github. com/facebookresearch/detectron2 , 2019. Ac- cessed: 2024-07-15. 2

  45. [45]

    Dota: A large-scale dataset for object detection in aerial images

    Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Be- longie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liang- pei Zhang. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 3974– 3983, 2018. 2

  46. [46]

    Detecting trees in street images via deep learning with at- tention module

    Qian Xie, Dawei Li, Zhenghao Yu, Jun Zhou, and Jun Wang. Detecting trees in street images via deep learning with at- tention module. IEEE Transactions on Instrumentation and Measurement, 69(8):5395–5406, 2019. 2

  47. [47]

    Oriented r-cnn for object detection

    Xingxing Xie, Gong Cheng, Jiabao Wang, Xiwen Yao, and Junwei Han. Oriented r-cnn for object detection. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 3520–3529, 2021. 2

  48. [48]

    Di- lated residual networks

    Fisher Yu, Vladlen Koltun, and Thomas Funkhouser. Di- lated residual networks. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 472–480, 2017. 2

  49. [49]

    Wilddash- creating hazard-aware benchmarks

    Oliver Zendel, Katrin Honauer, Markus Murschitz, Daniel Steininger, and Gustavo Fernandez Dominguez. Wilddash- creating hazard-aware benchmarks. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 402–416, 2018. 2

  50. [50]

    Bytetrack: Multi-object tracking by associating every detection box

    Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision (ECCV), pages 1–21. Springer, 2022. 5

  51. [51]

    Deformable DETR: Deformable Transformers for End-to-End Object Detection

    Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable trans- formers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020. 2 10 TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations (Supplementary Material) This suppleme...

  52. [52]

    19) and OpenSource images (Fig

    (Fig. 19) and OpenSource images (Fig. 20). This is 5 Oriented Object Detection Instance Segmentation Size C S B C S T CBox SBox CMask SMask TBox TMask n 768 76.5 22.6 49.7 76.7 49.1 55.9 75.5 55.5 69.8 49.0 64.6 59.1 1024 77.8 22.3 49.3 77.7 50.1 56.8 77.5 56.7 72.8 50.3 64.7 59.6 m 768 79.8 25.7 54.7 79.8 54.7 61.2 78.7 62.0 72.9 56.5 69.2 65.4 1024 80.8...

  53. [53]

    Figure 20

    demonstrating the generalization capability of our approach. Figure 20. Additional qualitative results on the test split of the OpenSource subset with complementary scenarios to the main data from public sources. 7 Figure 21. Additional qualitative results showing limitations on our test set and the TimberSeg dataset [10]. 8