TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations

Andreas Trondl; Daniel Steininger; Julia Simon; Markus Murschitz

arxiv: 2501.07360 · v1 · submitted 2025-01-13 · 💻 cs.CV · cs.LG

TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations

Daniel Steininger , Julia Simon , Andreas Trondl , Markus Murschitz This is my paper

Pith reviewed 2026-05-23 05:46 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords TimberVisionlog segmentationtrunk trackingforestry automationRGB detectioninstance segmentationoriented object detectionmulti-object tracking

0 comments

The pith

TimberVision dataset and fusion framework enable accurate trunk representations from RGB images alone even in challenging forest conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces the TimberVision dataset of more than 2,000 annotated RGB images containing 51,000 trunk components to fill the gap in specialized data for automating forestry tasks like harvesting and measuring logs. It runs ablation studies on oriented object detection and instance segmentation, then fuses the component outputs into unified trunk representations. The work adds automatic derivation of geometric properties and multi-object tracking to boost robustness. A sympathetic reader would care because accurate vision-based trunk detection could reduce human exposure to safety risks in remote operations by supporting automation from ordinary cameras.

Core claim

Our detection and tracking approach provides highly descriptive and accurate trunk representations solely from RGB image data, even under challenging environmental conditions. Our solution is suitable for a wide range of application scenarios and can be readily combined with other sensor modalities.

What carries the argument

The generic framework that fuses component detections from oriented object detection and instance segmentation models into unified trunk representations, then applies multi-object tracking.

If this is right

Oriented object detection and instance segmentation models trained on the dataset locate cut and lateral trunk surfaces from RGB input.
Fusing the component outputs produces unified trunk representations that support derivation of geometric properties.
Multi-object tracking applied to the representations increases robustness across image sequences.
The resulting system operates across diverse forestry scenarios without requiring non-RGB sensors.
The representations remain descriptive and accurate under challenging environmental conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such RGB-derived trunk representations could feed directly into robotic control loops for automated log handling without extra hardware.
The dataset could act as a starting benchmark for testing vision methods in other unstructured outdoor settings.
Extending the geometric derivations to estimate volume or weight in real time might support on-site log sorting decisions.

Load-bearing premise

The annotated dataset accurately represents the variety of real-world forestry scenes and conditions encountered in autonomous operations.

What would settle it

A controlled test measuring detection and tracking accuracy on new RGB images captured in forest conditions with tree species, weather, or occlusion levels absent from the dataset would show whether performance remains high.

Figures

Figures reproduced from arXiv: 2501.07360 by Andreas Trondl, Daniel Steininger, Julia Simon, Markus Murschitz.

**Figure 2.** Figure 2: Representative image samples for Low, Mid and High intensities of annotated scene parameters. The color bars show their distributions across the dataset. from the each other or the background, visible Section Areas as well as Section Lines denoting the visible borders of cross-sections facing the other way. Additional points unambiguously denote areas covered by each trunk between its constituent lines. … view at source ↗

**Figure 3.** Figure 3: Overview of our annotation pipeline automatically de [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of our task-fusion algorithm deriving unified trunk representations from OOD and ISEG outputs for individual com [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Accuracy as mean class mAP50-95 and average inference time on test set for multiple model capacities and image sizes [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation results as mAP50-95 on test set for three class combinations and two input sizes. For ISEG, Bound is excluded due to its inherent overlap with Side. The dark-to-light bar colors denote model capacities Nano, Medium and X-Large, respectively. sistent trends for both tasks. While mAP scores for the Side class are slightly lower than those of opaque Trunk instances, the values for Cut are significan… view at source ↗

**Figure 7.** Figure 7: Model performance for different scene-parameter intensities as [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Representative fusion results on test images. The left [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Hourly and monthly recording-time distributions of an [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of annotation schemes in TimberSeg [ [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗

**Figure 12.** Figure 12: Distribution of scene parameters for annotated [PITH_FULL_IMAGE:figures/full_fig_p013_12.png] view at source ↗

**Figure 13.** Figure 13: Distribution of instance sizes for each class based on oriented-bounding-box dimensions. Instance width refers to the box side [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 15.** Figure 15: Heat maps illustrating the distribution of instance [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗

**Figure 16.** Figure 16: Validation accuracy after each training epoch when [PITH_FULL_IMAGE:figures/full_fig_p015_16.png] view at source ↗

**Figure 19.** Figure 19: Additional qualitative results on the TimberSeg dataset [PITH_FULL_IMAGE:figures/full_fig_p017_19.png] view at source ↗

**Figure 18.** Figure 18: Additional qualitative results on the test splits of the [PITH_FULL_IMAGE:figures/full_fig_p017_18.png] view at source ↗

**Figure 21.** Figure 21: Additional qualitative results showing limitations on [PITH_FULL_IMAGE:figures/full_fig_p018_21.png] view at source ↗

read the original abstract

Timber represents an increasingly valuable and versatile resource. However, forestry operations such as harvesting, handling and measuring logs still require substantial human labor in remote environments posing significant safety risks. Progressively automating these tasks has the potential of increasing their efficiency as well as safety, but requires an accurate detection of individual logs as well as live trees and their context. Although initial approaches have been proposed for this challenging application domain, specialized data and algorithms are still too scarce to develop robust solutions. To mitigate this gap, we introduce the TimberVision dataset, consisting of more than 2k annotated RGB images containing a total of 51k trunk components including cut and lateral surfaces, thereby surpassing any existing dataset in this domain in terms of both quantity and detail by a large margin. Based on this data, we conduct a series of ablation experiments for oriented object detection and instance segmentation and evaluate the influence of multiple scene parameters on model performance. We introduce a generic framework to fuse the components detected by our models for both tasks into unified trunk representations. Furthermore, we automatically derive geometric properties and apply multi-object tracking to further enhance robustness. Our detection and tracking approach provides highly descriptive and accurate trunk representations solely from RGB image data, even under challenging environmental conditions. Our solution is suitable for a wide range of application scenarios and can be readily combined with other sensor modalities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TimberVision adds a larger specialized dataset for forestry CV but the robustness claims rest on an unverified assumption about scene coverage.

read the letter

This paper's main contribution is the TimberVision dataset: over 2k RGB images with 51k annotations for trunk components, specifically cut and lateral surfaces. That scale and level of detail exceeds prior collections in this narrow domain. They also outline a fusion step that merges outputs from oriented object detection and instance segmentation into single trunk models, then layer on multi-object tracking and automatic geometry extraction. The experiments include ablations that test how scene parameters affect the models. For anyone working on outdoor robotics or forestry automation, the data itself is the clearest addition here, since labeled examples in this setting have been scarce. The pipeline is built from standard CV components rather than novel architectures, which keeps the focus on the application and the annotations. The soft spot is the central claim that the approach delivers accurate trunk representations even under challenging conditions. The abstract states that ablations were run and that performance holds up, but it gives no evidence that the collected scenes actually span the target distribution of real operations, such as varied lighting, weather, heavy occlusion, or terrain. There is no mention of parameter histograms compared against operational footage or held-out extreme subsets. If the 2k images under-represent safety-critical cases, the reported numbers do not yet support the generalization. This paper is for researchers who need annotated data in applied forestry vision or who are extending tracking methods to similar outdoor settings. A reader who can use the released dataset directly will get the most value; others may find the fusion and ablation details a useful reference but not a breakthrough. It deserves peer review because the dataset is new, the task has clear practical stakes, and the authors appear to engage the literature without circularity or invented entities. The review process would likely push for stronger evidence on representativeness, which is a fixable gap rather than a load-bearing flaw.

Referee Report

2 major / 1 minor

Summary. The paper introduces the TimberVision dataset (>2k RGB images, 51k annotated trunk components including cut and lateral surfaces), conducts ablation experiments on oriented object detection and instance segmentation to assess scene-parameter influence, presents a framework fusing component detections into unified trunk representations, derives geometric properties, and applies multi-object tracking, claiming highly descriptive and accurate trunk representations from RGB data alone even under challenging environmental conditions.

Significance. If the central claims hold, the work would be significant for supplying the largest and most detailed public dataset in autonomous forestry vision, along with a generic multi-task fusion and tracking pipeline that integrates detection outputs into usable trunk models; the scale (surpassing prior datasets by a large margin) and the explicit scene-parameter ablations constitute concrete strengths that could accelerate reproducible progress in the domain.

major comments (2)

[Abstract] Abstract: the claim that results hold 'even under challenging environmental conditions' is load-bearing for the central contribution, yet the manuscript provides no quantitative evidence that the 2k-image collection spans the target distribution (e.g., no histograms or statistical comparison of lighting, occlusion, weather, or terrain parameters against operational forestry footage, and no held-out extreme-condition subset).
[Dataset / Experiments] Dataset and experimental sections: without an explicit representativeness analysis or cross-validation against real autonomous-operation footage, the reported accuracy and tracking robustness cannot be taken to support generalization to the safety-critical cases the introduction identifies as motivation.

minor comments (1)

[Abstract] Abstract: the description of ablation experiments does not name the concrete metrics (mAP, mask IoU, tracking MOTA, etc.) or the magnitude of the observed scene-parameter effects.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the need to substantiate claims about dataset representativeness and generalization. We address the two major comments point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that results hold 'even under challenging environmental conditions' is load-bearing for the central contribution, yet the manuscript provides no quantitative evidence that the 2k-image collection spans the target distribution (e.g., no histograms or statistical comparison of lighting, occlusion, weather, or terrain parameters against operational forestry footage, and no held-out extreme-condition subset).

Authors: We agree that the manuscript lacks explicit quantitative evidence, such as histograms or statistical comparisons of scene parameters against a reference distribution of operational forestry footage. The dataset was collected across multiple sites and sessions to capture variability in lighting, occlusion, weather, and terrain, and the ablation studies already quantify performance sensitivity to several of these parameters. However, we did not include formal distributional analysis or a held-out extreme-condition subset. We will add this analysis to the dataset section of the revised manuscript (including parameter histograms and summary statistics) and will adjust the abstract wording to align with the provided evidence. revision: yes
Referee: [Dataset / Experiments] Dataset and experimental sections: without an explicit representativeness analysis or cross-validation against real autonomous-operation footage, the reported accuracy and tracking robustness cannot be taken to support generalization to the safety-critical cases the introduction identifies as motivation.

Authors: We acknowledge the validity of this observation. The reported results are obtained on the TimberVision dataset collected in real forestry settings, and the scene-parameter ablations provide indirect evidence of robustness across conditions. Nevertheless, the manuscript does not contain an explicit representativeness analysis or cross-validation against independent operational footage. We will incorporate a representativeness analysis (parameter distributions and comparison to typical forestry operation characteristics) in the revised dataset section. A full external cross-validation is not feasible with currently available public data and will be noted as a limitation for future work. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical results from new dataset and standard methods

full rationale

The paper introduces a new annotated dataset (TimberVision) and applies standard oriented object detection, instance segmentation, fusion into trunk representations, geometric property derivation, and multi-object tracking. All performance claims are derived from training and evaluation on this held-out or ablated data splits rather than any self-referential fitting, self-citation chain, or ansatz that reduces the output to the input by construction. No equations or uniqueness theorems are invoked that collapse the claimed trunk representations back to the annotations themselves. The central results remain independent empirical measurements on the contributed data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the new dataset being representative of real forestry conditions and standard CV methods (detection, segmentation, tracking) performing well on it without additional sensors.

axioms (1)

domain assumption RGB images contain sufficient information for accurate trunk component detection and tracking in forestry environments
The paper relies on this to claim suitability from RGB data alone under challenging conditions.

pith-pipeline@v0.9.0 · 5783 in / 1177 out tokens · 29010 ms · 2026-05-23T05:46:00.606868+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce the TimberVision dataset... ablation experiments for oriented object detection and instance segmentation... fuse the components... multi-object tracking
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

YOLOv8.2 framework... ByteTrack and Bot-SORT... mAP50-95 on test set

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 2 internal anchors

[1]

arXiv preprint arXiv:2206.14651

Nir Aharon, Roy Orfaig, and Ben-Zion Bobrovsky. Bot- sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651, 2022. 5

work page arXiv 2022
[2]

Evaluating mul- tiple object tracking performance: the clear mot metrics

Keni Bernardin and Rainer Stiefelhagen. Evaluating mul- tiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008:1– 10, 2008. 5

work page 2008
[3]

MMDetection: Open MMLab Detection Toolbox and Benchmark

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detection tool- box and benchmark. arXiv preprint arXiv:1906.07155, 2019. 2

work page internal anchor Pith review Pith/arXiv arXiv 1906
[4]

Sloam: Semantic lidar odometry and mapping for forest in- ventory

Steven W Chen, Guilherme V Nardari, Elijah S Lee, Chao Qu, Xu Liu, Roseli Ap Francelin Romero, and Vijay Kumar. Sloam: Semantic lidar odometry and mapping for forest in- ventory. IEEE Robotics and Automation Letters , 5(2):612– 619, 2020. 2

work page 2020
[5]

An automatic approach for tree species detection and pro- file estimation of urban street trees using deep learning and google street view images

Kwanghun Choi, Wontaek Lim, Byungwoo Chang, Jinah Jeong, Inyoo Kim, Chan-Ryul Park, and Dongwook W Ko. An automatic approach for tree species detection and pro- file estimation of urban street trees using deep learning and google street view images. ISPRS Journal of Photogramme- try and Remote Sensing, 190:165–180, 2022. 2

work page 2022
[6]

Visible and thermal image-based trunk detection with deep learning for forestry mobile robotics

Daniel Queir ´os da Silva, Filipe Neves Dos Santos, Ar- mando Jorge Sousa, and V ´ıtor Filipe. Visible and thermal image-based trunk detection with deep learning for forestry mobile robotics. Journal of Imaging, 7(9):176, 2021. 2

work page 2021
[7]

The pascal visual object classes (voc) challenge

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), 88(2):303–338, 2010. 5

work page 2010
[8]

Urban tree generator: spatio-temporal and generative deep learning for urban tree localization and modeling

Adnan Firoze, Bedrich Benes, and Daniel Aliaga. Urban tree generator: spatio-temporal and generative deep learning for urban tree localization and modeling. The Visual Computer, 38(9):3327–3339, 2022. 2

work page 2022
[9]

Tree instance segmentation with temporal contour graph

Adnan Firoze, Cameron Wingren, Raymond A Yeh, Bedrich Benes, and Daniel Aliaga. Tree instance segmentation with temporal contour graph. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 2193–2202, 2023. 2

work page 2023
[10]

Instance segmen- tation for autonomous log grasping in forestry operations

Jean-Michel Fortin, Olivier Gamache, Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu`ere. Instance segmen- tation for autonomous log grasping in forestry operations. In International Conference on Intelligent Robots and Systems (IROS), pages 6064–6071. IEEE, 2022. 2, 4, 7, 5, 8

work page 2022
[11]

Tree detection and diameter estima- tion based on deep learning

Vincent Grondin, Jean-Michel Fortin, Franc ¸ois Pomerleau, and Philippe Gigu `ere. Tree detection and diameter estima- tion based on deep learning. Forestry, 96(2):264–276, 2023. 2, 4, 7

work page 2023
[12]

Training deep learning algorithms on synthetic forest images for tree detection

Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu`ere. Training deep learning algorithms on synthetic forest images for tree detection. In Workshop in Innovation in Forestry Robotics: Research and Industry Adoption (ICRA), 2022. 2, 4

work page 2022
[13]

Mask r-cnn

Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 2961–2969, 2017. 2

work page 2017
[14]

py-motmetrics

Christoph Heindl and Jack Valmadre. py-motmetrics. Code repository https://github.com/cheind/py-motmetrics, 2024. 5

work page 2024
[15]

Towards autonomous robotic precision harvesting: Mapping, localization, planning and control for a legged tree harvester

Edo Jelavic, Dominic Jud, Pascal Egli, and Marco Hutter. Towards autonomous robotic precision harvesting: Mapping, localization, planning and control for a legged tree harvester. arXiv preprint arXiv:2104.10110, 2021. 1

work page arXiv 2021
[16]

Rellis-3d dataset: Data, benchmarks and analy- sis

Peng Jiang, Philip Osteen, Maggie Wigness, and Srikanth Saripalli. Rellis-3d dataset: Data, benchmarks and analy- sis. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1110–1116. IEEE, 2021. 2

work page 2021
[17]

Ultralyt- ics YOLO

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralyt- ics YOLO. https://github.com/ultralytics/ ultralytics, 2024. Accessed: 2024-07-15. 2, 4

work page 2024
[18]

A deep learning-based approach for tree trunk segmentation

Danilo Samuel Jodas, Sergio Brazolin, Takashi Yojo, Reinaldo Araujo De Lima, Giuliana Del Nero Velasco, Aline Ribeiro Machado, and Joao Paulo Papa. A deep learning-based approach for tree trunk segmentation. In2021 34th SIBGRAPI Conference on Graphics, Patterns and Im- ages (SIBGRAPI), pages 370–377. IEEE, 2021. 2

work page 2021
[19]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll ´ar. Panoptic segmentation. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9404–9413, 2019. 2

work page 2019
[20]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3

work page 2023
[21]

Sensor agnos- tic semantic segmentation of structurally diverse and com- plex forest point clouds using deep learning

Sean Krisanski, Mohammad Sadegh Taskhiri, Susana Gon- zalez Aracil, David Herries, and Paul Turner. Sensor agnos- tic semantic segmentation of structurally diverse and com- plex forest point clouds using deep learning. Remote Sens- ing, 13(8):1413, 2021. 2

work page 2021
[22]

Exploring the feasibility of autonomous forestry operations: Results from the first experimental un- manned machine

Pedro La Hera, Omar Mendoza-Trejo, Ola Lindroos, H ˚akan Lideskog, Torbj ¨orn Lindb ¨ack, Saira Latif, Songyu Li, and Magnus Karlberg. Exploring the feasibility of autonomous forestry operations: Results from the first experimental un- manned machine. Journal of Field Robotics, 41(4):942–965,

work page
[23]

Finnwoodlands dataset

Juan Lagos, Urho Lempi ¨o, and Esa Rahtu. Finnwoodlands dataset. In Scandinavian Conference on Image Analysis , pages 95–110. Springer, 2023. 2

work page 2023
[24]

Street tree segmentation from mo- bile laser scanning data using deep learning-based image in- stance segmentation

Qiujie Li and Yu Yan. Street tree segmentation from mo- bile laser scanning data using deep learning-based image in- stance segmentation. Urban Forestry & Urban Greening , 92:128200, 2024. 2

work page 2024
[25]

Fully convolutional networks for panoptic segmentation

Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Liwei Wang, Zeming Li, Jian Sun, and Jiaya Jia. Fully convolutional networks for panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 214–223, 2021. 2

work page 2021
[26]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Pro- ceedings of the IEEE International Conference on Computer Vision, pages 2980–2988, 2017. 2 9

work page 2017
[27]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 740–755. Springer, 2014. 2, 5, 7

work page 2014
[28]

Fully convolutional networks for semantic segmentation

Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015. 2

work page 2015
[29]

Image segmenta- tion using text and image prompts

Timo L ¨uddecke and Alexander Ecker. Image segmenta- tion using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7086–7096, 2022. 3

work page 2022
[30]

Mapping trees along urban street networks with deep learn- ing and street-level imagery

Stefanie Lumnitz, Tahia Devisscher, Jerome R Mayaud, Valentina Radic, Nicholas C Coops, and Verena C Griess. Mapping trees along urban street networks with deep learn- ing and street-level imagery. ISPRS Journal of Photogram- metry and Remote Sensing, 175:144–157, 2021. 2

work page 2021
[31]

Meher V . R. Malladi, Tiziano Guadagnino, Luca Lobefaro, Matias Mattamala, Holger Griess, Janine Schweier, Nived Chebrolu, Maurice Fallon, Jens Behley, and Cyrill Stach- niss. Tree instance segmentation and traits estimation for forestry environments exploiting lidar data collected by mo- bile robots. In 2024 IEEE International Conference on Robotics and ...

work page 2024
[32]

Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming

Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming. arXiv preprint arXiv:1907.07484, 2019. 2

work page arXiv 1907
[33]

Bureau of Labor Statistics

U.S. Bureau of Labor Statistics. National census of fatal oc- cupational injuries in 2022. https://www.bls.gov/ news . release / pdf / cfoi . pdf, 2023. Accessed: 2024-07-15. 1

work page 2022
[34]

Towards real-time forest inventory using handheld lidar.Robotics and Autonomous Systems, 157:104240, 2022

Alexander Proudman, Milad Ramezani, Sundara Tejaswi Digumarti, Nived Chebrolu, and Maurice Fallon. Towards real-time forest inventory using handheld lidar.Robotics and Autonomous Systems, 157:104240, 2022. 2

work page 2022
[35]

You only look once: Unified, real-time object de- tection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016. 2

work page 2016
[36]

Performance measures and a data set for multi-target, multi-camera tracking

Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 17–35. Springer, 2016. 5

work page 2016
[37]

Real- ization of a highly accurate mobile robot system for multi purpose precision forestry applications

Juergen Rossmann, Michael Schluse, Christian Schlette, Arno Buecken, Petra Krahwinkler, and Markus Emde. Real- ization of a highly accurate mobile robot system for multi purpose precision forestry applications. In 2009 Interna- tional Conference on Advanced Robotics , pages 1–6. IEEE,

work page 2009
[38]

Guided curriculum model adaptation and uncertainty-aware evalua- tion for semantic nighttime image segmentation

Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Guided curriculum model adaptation and uncertainty-aware evalua- tion for semantic nighttime image segmentation. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 7374–7383, 2019. 2

work page 2019
[39]

https:// github.com/scalabel/scalabel

Scalabel open-source web annotation tool. https:// github.com/scalabel/scalabel. Accessed: 2024- 07-15. 3

work page 2024
[40]

Towards scene understanding for autonomous operations on airport aprons

Daniel Steininger, Andreas Kriegler, Wolfgang Pointner, Verena Widhalm, Julia Simon, and Oliver Zendel. Towards scene understanding for autonomous operations on airport aprons. In Proceedings of the Asian Conference on Com- puter Vision, pages 147–163, 2022. 2

work page 2022
[41]

Segmenter: Transformer for semantic segmenta- tion

Robin Strudel, Ricardo Garcia, Ivan Laptev, and Cordelia Schmid. Segmenter: Transformer for semantic segmenta- tion. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision, pages 7262–7272, 2021. 2

work page 2021
[42]

Individual rubber tree segmentation based on ground-based lidar data and faster r- cnn of deep learning

Jiamin Wang, Xinxin Chen, Lin Cao, Feng An, Bangqian Chen, Lianfeng Xue, and Ting Yun. Individual rubber tree segmentation based on ground-based lidar data and faster r- cnn of deep learning. Forests, 10(9):793, 2019. 2

work page 2019
[43]

A rugd dataset for autonomous naviga- tion and visual perception in unstructured outdoor environ- ments

Maggie Wigness, Sungmin Eum, John G Rogers, David Han, and Heesung Kwon. A rugd dataset for autonomous naviga- tion and visual perception in unstructured outdoor environ- ments. In 2019 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 5000–5007. IEEE,

work page 2019
[44]

Detectron2

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. https://github. com/facebookresearch/detectron2 , 2019. Ac- cessed: 2024-07-15. 2

work page 2019
[45]

Dota: A large-scale dataset for object detection in aerial images

Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Be- longie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liang- pei Zhang. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 3974– 3983, 2018. 2

work page 2018
[46]

Detecting trees in street images via deep learning with at- tention module

Qian Xie, Dawei Li, Zhenghao Yu, Jun Zhou, and Jun Wang. Detecting trees in street images via deep learning with at- tention module. IEEE Transactions on Instrumentation and Measurement, 69(8):5395–5406, 2019. 2

work page 2019
[47]

Oriented r-cnn for object detection

Xingxing Xie, Gong Cheng, Jiabao Wang, Xiwen Yao, and Junwei Han. Oriented r-cnn for object detection. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 3520–3529, 2021. 2

work page 2021
[48]

Di- lated residual networks

Fisher Yu, Vladlen Koltun, and Thomas Funkhouser. Di- lated residual networks. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 472–480, 2017. 2

work page 2017
[49]

Wilddash- creating hazard-aware benchmarks

Oliver Zendel, Katrin Honauer, Markus Murschitz, Daniel Steininger, and Gustavo Fernandez Dominguez. Wilddash- creating hazard-aware benchmarks. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 402–416, 2018. 2

work page 2018
[50]

Bytetrack: Multi-object tracking by associating every detection box

Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision (ECCV), pages 1–21. Springer, 2022. 5

work page 2022
[51]

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable trans- formers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020. 2 10 TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations (Supplementary Material) This suppleme...

work page internal anchor Pith review Pith/arXiv arXiv 2010
[52]

19) and OpenSource images (Fig

(Fig. 19) and OpenSource images (Fig. 20). This is 5 Oriented Object Detection Instance Segmentation Size C S B C S T CBox SBox CMask SMask TBox TMask n 768 76.5 22.6 49.7 76.7 49.1 55.9 75.5 55.5 69.8 49.0 64.6 59.1 1024 77.8 22.3 49.3 77.7 50.1 56.8 77.5 56.7 72.8 50.3 64.7 59.6 m 768 79.8 25.7 54.7 79.8 54.7 61.2 78.7 62.0 72.9 56.5 69.2 65.4 1024 80.8...

work page
[53]

Figure 20

demonstrating the generalization capability of our approach. Figure 20. Additional qualitative results on the test split of the OpenSource subset with complementary scenarios to the main data from public sources. 7 Figure 21. Additional qualitative results showing limitations on our test set and the TimberSeg dataset [10]. 8

work page

[1] [1]

arXiv preprint arXiv:2206.14651

Nir Aharon, Roy Orfaig, and Ben-Zion Bobrovsky. Bot- sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651, 2022. 5

work page arXiv 2022

[2] [2]

Evaluating mul- tiple object tracking performance: the clear mot metrics

Keni Bernardin and Rainer Stiefelhagen. Evaluating mul- tiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008:1– 10, 2008. 5

work page 2008

[3] [3]

MMDetection: Open MMLab Detection Toolbox and Benchmark

Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, et al. Mmdetection: Open mmlab detection tool- box and benchmark. arXiv preprint arXiv:1906.07155, 2019. 2

work page internal anchor Pith review Pith/arXiv arXiv 1906

[4] [4]

Sloam: Semantic lidar odometry and mapping for forest in- ventory

Steven W Chen, Guilherme V Nardari, Elijah S Lee, Chao Qu, Xu Liu, Roseli Ap Francelin Romero, and Vijay Kumar. Sloam: Semantic lidar odometry and mapping for forest in- ventory. IEEE Robotics and Automation Letters , 5(2):612– 619, 2020. 2

work page 2020

[5] [5]

An automatic approach for tree species detection and pro- file estimation of urban street trees using deep learning and google street view images

Kwanghun Choi, Wontaek Lim, Byungwoo Chang, Jinah Jeong, Inyoo Kim, Chan-Ryul Park, and Dongwook W Ko. An automatic approach for tree species detection and pro- file estimation of urban street trees using deep learning and google street view images. ISPRS Journal of Photogramme- try and Remote Sensing, 190:165–180, 2022. 2

work page 2022

[6] [6]

Visible and thermal image-based trunk detection with deep learning for forestry mobile robotics

Daniel Queir ´os da Silva, Filipe Neves Dos Santos, Ar- mando Jorge Sousa, and V ´ıtor Filipe. Visible and thermal image-based trunk detection with deep learning for forestry mobile robotics. Journal of Imaging, 7(9):176, 2021. 2

work page 2021

[7] [7]

The pascal visual object classes (voc) challenge

Mark Everingham, Luc Van Gool, Christopher KI Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), 88(2):303–338, 2010. 5

work page 2010

[8] [8]

Urban tree generator: spatio-temporal and generative deep learning for urban tree localization and modeling

Adnan Firoze, Bedrich Benes, and Daniel Aliaga. Urban tree generator: spatio-temporal and generative deep learning for urban tree localization and modeling. The Visual Computer, 38(9):3327–3339, 2022. 2

work page 2022

[9] [9]

Tree instance segmentation with temporal contour graph

Adnan Firoze, Cameron Wingren, Raymond A Yeh, Bedrich Benes, and Daniel Aliaga. Tree instance segmentation with temporal contour graph. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 2193–2202, 2023. 2

work page 2023

[10] [10]

Instance segmen- tation for autonomous log grasping in forestry operations

Jean-Michel Fortin, Olivier Gamache, Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu`ere. Instance segmen- tation for autonomous log grasping in forestry operations. In International Conference on Intelligent Robots and Systems (IROS), pages 6064–6071. IEEE, 2022. 2, 4, 7, 5, 8

work page 2022

[11] [11]

Tree detection and diameter estima- tion based on deep learning

Vincent Grondin, Jean-Michel Fortin, Franc ¸ois Pomerleau, and Philippe Gigu `ere. Tree detection and diameter estima- tion based on deep learning. Forestry, 96(2):264–276, 2023. 2, 4, 7

work page 2023

[12] [12]

Training deep learning algorithms on synthetic forest images for tree detection

Vincent Grondin, Franc ¸ois Pomerleau, and Philippe Gigu`ere. Training deep learning algorithms on synthetic forest images for tree detection. In Workshop in Innovation in Forestry Robotics: Research and Industry Adoption (ICRA), 2022. 2, 4

work page 2022

[13] [13]

Mask r-cnn

Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 2961–2969, 2017. 2

work page 2017

[14] [14]

py-motmetrics

Christoph Heindl and Jack Valmadre. py-motmetrics. Code repository https://github.com/cheind/py-motmetrics, 2024. 5

work page 2024

[15] [15]

Towards autonomous robotic precision harvesting: Mapping, localization, planning and control for a legged tree harvester

Edo Jelavic, Dominic Jud, Pascal Egli, and Marco Hutter. Towards autonomous robotic precision harvesting: Mapping, localization, planning and control for a legged tree harvester. arXiv preprint arXiv:2104.10110, 2021. 1

work page arXiv 2021

[16] [16]

Rellis-3d dataset: Data, benchmarks and analy- sis

Peng Jiang, Philip Osteen, Maggie Wigness, and Srikanth Saripalli. Rellis-3d dataset: Data, benchmarks and analy- sis. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1110–1116. IEEE, 2021. 2

work page 2021

[17] [17]

Ultralyt- ics YOLO

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralyt- ics YOLO. https://github.com/ultralytics/ ultralytics, 2024. Accessed: 2024-07-15. 2, 4

work page 2024

[18] [18]

A deep learning-based approach for tree trunk segmentation

Danilo Samuel Jodas, Sergio Brazolin, Takashi Yojo, Reinaldo Araujo De Lima, Giuliana Del Nero Velasco, Aline Ribeiro Machado, and Joao Paulo Papa. A deep learning-based approach for tree trunk segmentation. In2021 34th SIBGRAPI Conference on Graphics, Patterns and Im- ages (SIBGRAPI), pages 370–377. IEEE, 2021. 2

work page 2021

[19] [19]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll ´ar. Panoptic segmentation. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9404–9413, 2019. 2

work page 2019

[20] [20]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. In Proceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3

work page 2023

[21] [21]

Sensor agnos- tic semantic segmentation of structurally diverse and com- plex forest point clouds using deep learning

Sean Krisanski, Mohammad Sadegh Taskhiri, Susana Gon- zalez Aracil, David Herries, and Paul Turner. Sensor agnos- tic semantic segmentation of structurally diverse and com- plex forest point clouds using deep learning. Remote Sens- ing, 13(8):1413, 2021. 2

work page 2021

[22] [22]

Exploring the feasibility of autonomous forestry operations: Results from the first experimental un- manned machine

Pedro La Hera, Omar Mendoza-Trejo, Ola Lindroos, H ˚akan Lideskog, Torbj ¨orn Lindb ¨ack, Saira Latif, Songyu Li, and Magnus Karlberg. Exploring the feasibility of autonomous forestry operations: Results from the first experimental un- manned machine. Journal of Field Robotics, 41(4):942–965,

work page

[23] [23]

Finnwoodlands dataset

Juan Lagos, Urho Lempi ¨o, and Esa Rahtu. Finnwoodlands dataset. In Scandinavian Conference on Image Analysis , pages 95–110. Springer, 2023. 2

work page 2023

[24] [24]

Street tree segmentation from mo- bile laser scanning data using deep learning-based image in- stance segmentation

Qiujie Li and Yu Yan. Street tree segmentation from mo- bile laser scanning data using deep learning-based image in- stance segmentation. Urban Forestry & Urban Greening , 92:128200, 2024. 2

work page 2024

[25] [25]

Fully convolutional networks for panoptic segmentation

Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Liwei Wang, Zeming Li, Jian Sun, and Jiaya Jia. Fully convolutional networks for panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 214–223, 2021. 2

work page 2021

[26] [26]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object detection. In Pro- ceedings of the IEEE International Conference on Computer Vision, pages 2980–2988, 2017. 2 9

work page 2017

[27] [27]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vi- sion (ECCV), pages 740–755. Springer, 2014. 2, 5, 7

work page 2014

[28] [28]

Fully convolutional networks for semantic segmentation

Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015. 2

work page 2015

[29] [29]

Image segmenta- tion using text and image prompts

Timo L ¨uddecke and Alexander Ecker. Image segmenta- tion using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7086–7096, 2022. 3

work page 2022

[30] [30]

Mapping trees along urban street networks with deep learn- ing and street-level imagery

Stefanie Lumnitz, Tahia Devisscher, Jerome R Mayaud, Valentina Radic, Nicholas C Coops, and Verena C Griess. Mapping trees along urban street networks with deep learn- ing and street-level imagery. ISPRS Journal of Photogram- metry and Remote Sensing, 175:144–157, 2021. 2

work page 2021

[31] [31]

Meher V . R. Malladi, Tiziano Guadagnino, Luca Lobefaro, Matias Mattamala, Holger Griess, Janine Schweier, Nived Chebrolu, Maurice Fallon, Jens Behley, and Cyrill Stach- niss. Tree instance segmentation and traits estimation for forestry environments exploiting lidar data collected by mo- bile robots. In 2024 IEEE International Conference on Robotics and ...

work page 2024

[32] [32]

Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming

Claudio Michaelis, Benjamin Mitzkus, Robert Geirhos, Evgenia Rusak, Oliver Bringmann, Alexander S Ecker, Matthias Bethge, and Wieland Brendel. Benchmarking ro- bustness in object detection: Autonomous driving when win- ter is coming. arXiv preprint arXiv:1907.07484, 2019. 2

work page arXiv 1907

[33] [33]

Bureau of Labor Statistics

U.S. Bureau of Labor Statistics. National census of fatal oc- cupational injuries in 2022. https://www.bls.gov/ news . release / pdf / cfoi . pdf, 2023. Accessed: 2024-07-15. 1

work page 2022

[34] [34]

Towards real-time forest inventory using handheld lidar.Robotics and Autonomous Systems, 157:104240, 2022

Alexander Proudman, Milad Ramezani, Sundara Tejaswi Digumarti, Nived Chebrolu, and Maurice Fallon. Towards real-time forest inventory using handheld lidar.Robotics and Autonomous Systems, 157:104240, 2022. 2

work page 2022

[35] [35]

You only look once: Unified, real-time object de- tection

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016. 2

work page 2016

[36] [36]

Performance measures and a data set for multi-target, multi-camera tracking

Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 17–35. Springer, 2016. 5

work page 2016

[37] [37]

Real- ization of a highly accurate mobile robot system for multi purpose precision forestry applications

Juergen Rossmann, Michael Schluse, Christian Schlette, Arno Buecken, Petra Krahwinkler, and Markus Emde. Real- ization of a highly accurate mobile robot system for multi purpose precision forestry applications. In 2009 Interna- tional Conference on Advanced Robotics , pages 1–6. IEEE,

work page 2009

[38] [38]

Guided curriculum model adaptation and uncertainty-aware evalua- tion for semantic nighttime image segmentation

Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Guided curriculum model adaptation and uncertainty-aware evalua- tion for semantic nighttime image segmentation. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 7374–7383, 2019. 2

work page 2019

[39] [39]

https:// github.com/scalabel/scalabel

Scalabel open-source web annotation tool. https:// github.com/scalabel/scalabel. Accessed: 2024- 07-15. 3

work page 2024

[40] [40]

Towards scene understanding for autonomous operations on airport aprons

Daniel Steininger, Andreas Kriegler, Wolfgang Pointner, Verena Widhalm, Julia Simon, and Oliver Zendel. Towards scene understanding for autonomous operations on airport aprons. In Proceedings of the Asian Conference on Com- puter Vision, pages 147–163, 2022. 2

work page 2022

[41] [41]

Segmenter: Transformer for semantic segmenta- tion

Robin Strudel, Ricardo Garcia, Ivan Laptev, and Cordelia Schmid. Segmenter: Transformer for semantic segmenta- tion. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision, pages 7262–7272, 2021. 2

work page 2021

[42] [42]

Individual rubber tree segmentation based on ground-based lidar data and faster r- cnn of deep learning

Jiamin Wang, Xinxin Chen, Lin Cao, Feng An, Bangqian Chen, Lianfeng Xue, and Ting Yun. Individual rubber tree segmentation based on ground-based lidar data and faster r- cnn of deep learning. Forests, 10(9):793, 2019. 2

work page 2019

[43] [43]

A rugd dataset for autonomous naviga- tion and visual perception in unstructured outdoor environ- ments

Maggie Wigness, Sungmin Eum, John G Rogers, David Han, and Heesung Kwon. A rugd dataset for autonomous naviga- tion and visual perception in unstructured outdoor environ- ments. In 2019 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), pages 5000–5007. IEEE,

work page 2019

[44] [44]

Detectron2

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. https://github. com/facebookresearch/detectron2 , 2019. Ac- cessed: 2024-07-15. 2

work page 2019

[45] [45]

Dota: A large-scale dataset for object detection in aerial images

Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Be- longie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liang- pei Zhang. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 3974– 3983, 2018. 2

work page 2018

[46] [46]

Detecting trees in street images via deep learning with at- tention module

Qian Xie, Dawei Li, Zhenghao Yu, Jun Zhou, and Jun Wang. Detecting trees in street images via deep learning with at- tention module. IEEE Transactions on Instrumentation and Measurement, 69(8):5395–5406, 2019. 2

work page 2019

[47] [47]

Oriented r-cnn for object detection

Xingxing Xie, Gong Cheng, Jiabao Wang, Xiwen Yao, and Junwei Han. Oriented r-cnn for object detection. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 3520–3529, 2021. 2

work page 2021

[48] [48]

Di- lated residual networks

Fisher Yu, Vladlen Koltun, and Thomas Funkhouser. Di- lated residual networks. In Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, pages 472–480, 2017. 2

work page 2017

[49] [49]

Wilddash- creating hazard-aware benchmarks

Oliver Zendel, Katrin Honauer, Markus Murschitz, Daniel Steininger, and Gustavo Fernandez Dominguez. Wilddash- creating hazard-aware benchmarks. In Proceedings of the European Conference on Computer Vision (ECCV) , pages 402–416, 2018. 2

work page 2018

[50] [50]

Bytetrack: Multi-object tracking by associating every detection box

Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision (ECCV), pages 1–21. Springer, 2022. 5

work page 2022

[51] [51]

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. Deformable detr: Deformable trans- formers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020. 2 10 TimberVision: A Multi-Task Dataset and Framework for Log-Component Segmentation and Tracking in Autonomous Forestry Operations (Supplementary Material) This suppleme...

work page internal anchor Pith review Pith/arXiv arXiv 2010

[52] [52]

19) and OpenSource images (Fig

(Fig. 19) and OpenSource images (Fig. 20). This is 5 Oriented Object Detection Instance Segmentation Size C S B C S T CBox SBox CMask SMask TBox TMask n 768 76.5 22.6 49.7 76.7 49.1 55.9 75.5 55.5 69.8 49.0 64.6 59.1 1024 77.8 22.3 49.3 77.7 50.1 56.8 77.5 56.7 72.8 50.3 64.7 59.6 m 768 79.8 25.7 54.7 79.8 54.7 61.2 78.7 62.0 72.9 56.5 69.2 65.4 1024 80.8...

work page

[53] [53]

Figure 20

demonstrating the generalization capability of our approach. Figure 20. Additional qualitative results on the test split of the OpenSource subset with complementary scenarios to the main data from public sources. 7 Figure 21. Additional qualitative results showing limitations on our test set and the TimberSeg dataset [10]. 8

work page