Sparse Data Tree Canopy Segmentation: Fine-Tuning Leading Pretrained Models on Only 150 Images

Anthony Bertnyk; David Szczecina; Hudson Sun; Kyle Gao; Lincoln Linlin Xu; Niloofar Azad

arxiv: 2601.10931 · v2 · submitted 2026-01-16 · 💻 cs.CV · cs.AI

Sparse Data Tree Canopy Segmentation: Fine-Tuning Leading Pretrained Models on Only 150 Images

David Szczecina , Hudson Sun , Anthony Bertnyk , Niloofar Azad , Kyle Gao , Lincoln Linlin Xu This is my paper

Pith reviewed 2026-05-16 14:02 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords tree canopy segmentationaerial imagerysmall dataset fine-tuningpretrained modelsYOLOv11Mask R-CNNvision transformersdata scarcity

0 comments

The pith

Pretrained CNN models like YOLOv11 and Mask R-CNN outperform transformer models when fine-tuned for tree canopy segmentation on just 150 images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests five pretrained models on a small set of 150 aerial images to segment tree canopies, simulating real-world annotation shortages. It shows that convolution-based models, especially YOLOv11 and Mask R-CNN, produce stronger results on unseen data than transformer-based ones such as Swin-UNet and DINOv2. The work matters for environmental tasks because accurate canopy maps support urban planning and ecosystem monitoring without needing large labeled collections. Readers would care since the findings point to practical model choices that avoid heavy overfitting when data is scarce. Experiments also cover training strategies and augmentations that help keep performance stable under these constraints.

Core claim

When fine-tuned on only 150 annotated aerial images for tree canopy segmentation, pretrained convolution-based models, particularly YOLOv11 and Mask R-CNN, generalize significantly better than pretrained transformer-based models. DeepLabv3, Swin-UNet, and DINOv2 underperform, owing to differences between semantic and instance segmentation tasks, the high data needs of vision transformers, and the absence of strong inductive biases in transformers.

What carries the argument

Fine-tuning and direct comparison of five architectures—YOLOv11, Mask R-CNN, DeepLabv3, Swin-UNet, and DINOv2—on the Solafune Tree Canopy Detection dataset of 150 images to measure generalization under extreme data scarcity.

Load-bearing premise

The performance gaps between convolution-based and transformer-based models stem mainly from architectural differences rather than from specific hyperparameter choices, augmentation details, or dataset biases.

What would settle it

A controlled rerun of all five models on the same 150-image splits using identical hyperparameters, augmentation pipelines, and training schedules, then checking whether the accuracy ordering between CNN and transformer groups remains the same.

Figures

Figures reproduced from arXiv: 2601.10931 by Anthony Bertnyk, David Szczecina, Hudson Sun, Kyle Gao, Lincoln Linlin Xu, Niloofar Azad.

**Figure 2.** Figure 2: Qualitative Tree canopy Segmentation Results. Left to Right: Raw image, YOLOv11, Mask-RCNN, DeepLabv3, Swin-UNet, DinoV2 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Tree canopy detection from aerial imagery is an important task for environmental monitoring, urban planning, and ecosystem analysis. Simulating real-life data annotation scarcity, the Solafune Tree Canopy Detection competition provides a small and imbalanced dataset of only 150 annotated images, posing significant challenges for training deep models without severe overfitting. In this work, we evaluate five representative architectures, YOLOv11, Mask R-CNN, DeepLabv3, Swin-UNet, and DINOv2, to assess their suitability for canopy segmentation under extreme data scarcity. Our experiments show that pretrained convolution-based models, particularly YOLOv11 and Mask R-CNN, generalize significantly better than pretrained transformer-based models. DeeplabV3, Swin-UNet and DINOv2 underperform likely due to differences between semantic and instance segmentation tasks, the high data requirements of Vision Transformers, and the lack of strong inductive biases. These findings confirm that transformer-based architectures struggle in low-data regimes without substantial pretraining or augmentation and that differences between semantic and instance segmentation further affect model performance. We provide a detailed analysis of training strategies, augmentation policies, and model behavior under the small-data constraint and demonstrate that lightweight CNN-based methods remain the most reliable for canopy detection on limited imagery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CNNs like YOLOv11 and Mask R-CNN beat the transformers on this 150-image canopy task, but the gaps look hard to trust without numbers or matched training protocols.

read the letter

The paper's core finding is that on the Solafune 150-image tree canopy dataset, pretrained CNN models YOLOv11 and Mask R-CNN generalized better than the transformer ones (Swin-UNet, DINOv2) and DeepLabv3. It frames this as evidence that convolutional inductive biases help more than transformers in extreme low-data regimes for aerial segmentation, and it walks through some training and augmentation choices that mattered in their runs. That is the useful part: a concrete case study for environmental monitoring where labels are scarce, with some discussion of why instance versus semantic segmentation framing affects outcomes here. The authors also note the dataset imbalance and the competition constraints, which grounds the work in a real application rather than synthetic benchmarks. They give credit to the pretrained weights and show that lightweight CNN approaches stayed reliable without heavy customization. This is the kind of practical comparison that can save time for people facing similar data limits in remote sensing or ecology. The soft spots sit in the evidence. The abstract and description give no quantitative metrics, no tables, no error bars, and no statistical tests, so the size of the reported gaps stays unclear. The stress-test point lands: nothing confirms that the transformer runs received equal hyperparameter search, augmentation strength, or adaptation effort as the CNNs. If the CNNs benefited from competition-specific tuning while the others ran closer to defaults, the architecture story weakens. The work introduces no new loss, architecture, or training method; it is an empirical head-to-head on one dataset. That keeps the contribution narrow. This paper is for practitioners who need quick guidance on which off-the-shelf models hold up with under 200 labeled aerial images. A reader already working on canopy or vegetation segmentation would get the most from the training-strategy notes. It is not required reading for general low-data theory. I would send it to peer review. The task is relevant, the comparison is straightforward, and referees can check the missing controls and numbers. With those added it becomes a solid applied note rather than a desk reject.

Referee Report

2 major / 1 minor

Summary. The manuscript evaluates five pretrained models—YOLOv11, Mask R-CNN, DeepLabv3, Swin-UNet, and DINOv2—for tree canopy segmentation on a small, imbalanced dataset of 150 annotated aerial images from the Solafune competition. It claims that convolution-based models, particularly YOLOv11 and Mask R-CNN, generalize significantly better than transformer-based models, attributing the gaps to stronger inductive biases in CNNs, higher data requirements for Vision Transformers, and differences between instance and semantic segmentation tasks. The work includes analysis of training strategies and augmentation policies under extreme data scarcity.

Significance. If the performance gaps are shown to arise from architecture rather than unequal optimization, the findings would provide practical guidance for selecting models in low-data remote sensing segmentation, confirming that lightweight CNNs remain reliable when annotations are scarce. The emphasis on fine-tuning details for environmental monitoring tasks adds applied value, though the lack of reported metrics limits immediate generalizability.

major comments (2)

[Abstract] Abstract and Experiments section: The central claim that 'pretrained convolution-based models... generalize significantly better' is load-bearing but unsupported by any reported quantitative metrics (e.g., IoU, mAP, Dice scores), error bars, or statistical tests, leaving the magnitude and reliability of the gaps unverified.
[Experimental Setup] Experimental Setup and Analysis sections: The attribution of gaps to 'differences between semantic and instance segmentation tasks, the high data requirements of Vision Transformers, and the lack of strong inductive biases' requires explicit evidence that all five models received equivalent hyperparameter search budgets, augmentation policies, loss formulations, and optimization effort. Without a table or protocol detailing trials per model, the comparison risks confounding architectural effects with tuning differences.

minor comments (1)

[Abstract] Abstract: Inconsistent model naming ('DeeplabV3' vs. 'DeepLabv3') should be standardized for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below and have incorporated revisions to strengthen the quantitative support and experimental transparency.

read point-by-point responses

Referee: [Abstract] Abstract and Experiments section: The central claim that 'pretrained convolution-based models... generalize significantly better' is load-bearing but unsupported by any reported quantitative metrics (e.g., IoU, mAP, Dice scores), error bars, or statistical tests, leaving the magnitude and reliability of the gaps unverified.

Authors: We agree that explicit quantitative metrics are necessary to substantiate the central claim. In the revised manuscript, we have added a results table in the Experiments section reporting IoU, mAP, and Dice scores for all five models, including error bars from five independent runs with different random seeds and statistical significance tests (paired t-tests) comparing CNN-based versus transformer-based models. These additions directly quantify the performance gaps and will be referenced in the abstract. revision: yes
Referee: [Experimental Setup] Experimental Setup and Analysis sections: The attribution of gaps to 'differences between semantic and instance segmentation tasks, the high data requirements of Vision Transformers, and the lack of strong inductive biases' requires explicit evidence that all five models received equivalent hyperparameter search budgets, augmentation policies, loss formulations, and optimization effort. Without a table or protocol detailing trials per model, the comparison risks confounding architectural effects with tuning differences.

Authors: We acknowledge the importance of demonstrating equivalent optimization effort to support architectural attributions. The revised Experimental Setup section now includes a detailed protocol describing the hyperparameter search procedure (grid search over learning rate, batch size, and augmentation strength), the number of trials performed per model (approximately 20–25 configurations each), and the final selected settings. A new table summarizes augmentation policies, loss formulations (e.g., Dice + BCE for semantic models, mask loss for instance models), and optimizer choices, with explicit notes on any task-specific adaptations. This documentation confirms comparable tuning budgets across models. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical model comparison on fixed small dataset

full rationale

The paper conducts a direct experimental comparison of five pretrained architectures fine-tuned on the same 150-image Solafune dataset for canopy segmentation. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the load-bearing claims. Performance differences are reported from training runs; the conclusion that CNN-based models generalize better is an empirical observation, not a reduction to inputs by construction. The analysis remains self-contained against external benchmarks (held-out test images) with no uniqueness theorems or ansatzes imported from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical evaluation paper containing no mathematical derivations, free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5546 in / 908 out tokens · 40157 ms · 2026-05-16T14:02:24.558449+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our experiments show that pretrained convolution-based models, particularly YOLOv11 and Mask R-CNN, generalize significantly better than pretrained transformer-based models... due to differences between semantic and instance segmentation tasks, the high data requirements of Vision Transformers, and the lack of strong inductive biases.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate five representative architectures... on the Solafune Tree Canopy Detection dataset... 150 annotated images

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 3 internal anchors

[1]

Tree canopy detection — compe- tition overview,

Solafune, Inc., “Tree canopy detection — compe- tition overview,” https://solafune.com/competitions/ 26ff758c-7422-4cd1-bfe0-daecfc40db70?menu=about&tab= overview, 2025, accessed: 2025-10-09

work page 2025
[2]

Multi-level self-adaptive individual tree detection for coniferous forest using airborne lidar,

Z. Hui, P. Cheng, B. Yang, and G. Zhou, “Multi-level self-adaptive individual tree detection for coniferous forest using airborne lidar,”International Journal of Applied Earth Observation and Geoinformation, vol. 114, p. 103028,

work page
[3]

Available: https://www.sciencedirect.com/ science/article/pii/S1569843222002163

[Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1569843222002163

work page
[4]

lidr: An r package for analysis of airborne laser scanning (als) data,

J.-R. Roussel, D. Auty, N. C. Coops, P. Tompalski, T. R. Goodbody, A. S. Meador, J.-F. Bourdon, F. de Boissieu, and A. Achim, “lidr: An r package for analysis of airborne laser scanning (als) data,”Remote Sensing of Environment, vol. 251, p. 112061, 2020. [Online]. Available: https://www. sciencedirect.com/science/article/pii/S0034425720304314

work page 2020
[5]

3d segmentation of trees through a flexible multiclass graph cut algorithm,

J. Williams, C.-B. Sch ¨onlieb, T. Swinfield, J. Lee, X. Cai, L. Qie, and D. A. Coomes, “3d segmentation of trees through a flexible multiclass graph cut algorithm,”IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 2, pp. 754–776, 2020

work page 2020
[6]

Nystr ¨om-based spectral clustering using airborne lidar point cloud data for individual tree segmentation,

Y . Pang, W. Wang, L. Du, Z. Zhang, X. Liang, Y . Li, and Z. Wang, “Nystr ¨om-based spectral clustering using airborne lidar point cloud data for individual tree segmentation,” International Journal of Digital Earth, vol. 14, no. 10, pp. 1452–1476, 2021. [Online]. Available: https://doi.org/10.1080/ 17538947.2021.1943018

work page arXiv 2021
[7]

Individual tree crown segmentation and crown width extraction from a heightmap derived from aerial laser scanning data using a deep learning framework,

C. Sun, C. Huang, H. Zhang, B. Chen, F. An, L. Wang, and T. Yun, “Individual tree crown segmentation and crown width extraction from a heightmap derived from aerial laser scanning data using a deep learning framework,” Frontiers in Plant Science, vol. V olume 13 - 2022,

work page 2022
[8]

Available: https://www.frontiersin.org/journals/ plant-science/articles/10.3389/fpls.2022.914974

[Online]. Available: https://www.frontiersin.org/journals/ plant-science/articles/10.3389/fpls.2022.914974

work page doi:10.3389/fpls.2022.914974 2022
[9]

Individual rubber tree segmentation based on ground-based lidar data and faster r-cnn of deep learning,

J. Wang, X. Chen, L. Cao, F. An, B. Chen, L. Xue, and T. Yun, “Individual rubber tree segmentation based on ground-based lidar data and faster r-cnn of deep learning,”F orests, vol. 10, no. 9, 2019. [Online]. Available: https://www.mdpi.com/1999-4907/10/9/793

work page 2019
[10]

Implementing deep learning algorithms for urban tree detection and geolocation with high-resolution aerial, satellite, and ground-level images,

L. Velasquez-Camacho, M. Etxegarai, and S. de Miguel, “Implementing deep learning algorithms for urban tree detection and geolocation with high-resolution aerial, satellite, and ground-level images,”Computers, Environment and Urban Systems, vol. 105, p. 102025, 2023. [Online]. Available: https:// www.sciencedirect.com/science/article/pii/S0198971523000881

work page 2023
[11]

Very high resolution canopy height maps from rgb imagery using self-supervised vision transformer and convolutional decoder trained on aerial lidar,

J. Tolan, H.-I. Yang, B. Nosarzewski, G. Couairon, H. V . V o, J. Brandt, J. Spore, S. Majumdar, D. Haziza, J. Vamaraju, T. Moutakanni, P. Bojanowski, T. Johns, B. White, T. Tiecke, and C. Couprie, “Very high resolution canopy height maps from rgb imagery using self-supervised vision transformer and convolutional decoder trained on aerial lidar,”Remote Se...

work page 2024
[12]

Comparison of vision transformers and convolutional neural networks in medical image analysis: A systematic review,

S. Takahashi, Y . Sakaguchi, N. Kouno, K. Takasawa, K. Ishizu, Y . Akagi, R. Aoyama, N. Teraya, N. Shinkai, H. Machino, K. Kobayashi, K. Asada, M. Komatsu, S. Kaneko, M. Sugiyama, and R. Hamamoto, “Comparison of vision transformers and convolutional neural networks in medical image analysis: A systematic review,”Journal of Medical Systems, vol. 48, p. 84, 09 2024

work page 2024
[13]

Ten deep learning techniques to address small data problems with remote sensing,

A. Safonova, G. Ghazaryan, S. Stiller, M. Main-Knorn, C. Nendel, and M. Ryo, “Ten deep learning techniques to address small data problems with remote sensing,”International Journal of Applied Earth Observation and Geoinformation, vol. 125, p. 103569, 2023. [Online]. Available: https://www. sciencedirect.com/science/article/pii/S156984322300393X

work page 2023
[14]

Ultralytics yolo11,

G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics

work page 2024
[15]

Mask r-cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,”

work page
[16]

Mask R-CNN

[Online]. Available: https://arxiv.org/abs/1703.06870

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Re- thinking atrous convolution for semantic image segmentation,

L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Re- thinking atrous convolution for semantic image segmentation,”

work page
[18]

Rethinking Atrous Convolution for Semantic Image Segmentation

[Online]. Available: https://arxiv.org/abs/1706.05587

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Dinov2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supe...

work page
[20]

DINOv2: Learning Robust Visual Features without Supervision

[Online]. Available: https://arxiv.org/abs/2304.07193

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Swin-unet: Unet-like pure transformer for medical image segmentation,

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and Y . Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,”arXiv preprint arXiv:2105.05537, 2021

work page arXiv 2021

[1] [1]

Tree canopy detection — compe- tition overview,

Solafune, Inc., “Tree canopy detection — compe- tition overview,” https://solafune.com/competitions/ 26ff758c-7422-4cd1-bfe0-daecfc40db70?menu=about&tab= overview, 2025, accessed: 2025-10-09

work page 2025

[2] [2]

Multi-level self-adaptive individual tree detection for coniferous forest using airborne lidar,

Z. Hui, P. Cheng, B. Yang, and G. Zhou, “Multi-level self-adaptive individual tree detection for coniferous forest using airborne lidar,”International Journal of Applied Earth Observation and Geoinformation, vol. 114, p. 103028,

work page

[3] [3]

Available: https://www.sciencedirect.com/ science/article/pii/S1569843222002163

[Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1569843222002163

work page

[4] [4]

lidr: An r package for analysis of airborne laser scanning (als) data,

J.-R. Roussel, D. Auty, N. C. Coops, P. Tompalski, T. R. Goodbody, A. S. Meador, J.-F. Bourdon, F. de Boissieu, and A. Achim, “lidr: An r package for analysis of airborne laser scanning (als) data,”Remote Sensing of Environment, vol. 251, p. 112061, 2020. [Online]. Available: https://www. sciencedirect.com/science/article/pii/S0034425720304314

work page 2020

[5] [5]

3d segmentation of trees through a flexible multiclass graph cut algorithm,

J. Williams, C.-B. Sch ¨onlieb, T. Swinfield, J. Lee, X. Cai, L. Qie, and D. A. Coomes, “3d segmentation of trees through a flexible multiclass graph cut algorithm,”IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 2, pp. 754–776, 2020

work page 2020

[6] [6]

Nystr ¨om-based spectral clustering using airborne lidar point cloud data for individual tree segmentation,

Y . Pang, W. Wang, L. Du, Z. Zhang, X. Liang, Y . Li, and Z. Wang, “Nystr ¨om-based spectral clustering using airborne lidar point cloud data for individual tree segmentation,” International Journal of Digital Earth, vol. 14, no. 10, pp. 1452–1476, 2021. [Online]. Available: https://doi.org/10.1080/ 17538947.2021.1943018

work page arXiv 2021

[7] [7]

Individual tree crown segmentation and crown width extraction from a heightmap derived from aerial laser scanning data using a deep learning framework,

C. Sun, C. Huang, H. Zhang, B. Chen, F. An, L. Wang, and T. Yun, “Individual tree crown segmentation and crown width extraction from a heightmap derived from aerial laser scanning data using a deep learning framework,” Frontiers in Plant Science, vol. V olume 13 - 2022,

work page 2022

[8] [8]

Available: https://www.frontiersin.org/journals/ plant-science/articles/10.3389/fpls.2022.914974

[Online]. Available: https://www.frontiersin.org/journals/ plant-science/articles/10.3389/fpls.2022.914974

work page doi:10.3389/fpls.2022.914974 2022

[9] [9]

Individual rubber tree segmentation based on ground-based lidar data and faster r-cnn of deep learning,

J. Wang, X. Chen, L. Cao, F. An, B. Chen, L. Xue, and T. Yun, “Individual rubber tree segmentation based on ground-based lidar data and faster r-cnn of deep learning,”F orests, vol. 10, no. 9, 2019. [Online]. Available: https://www.mdpi.com/1999-4907/10/9/793

work page 2019

[10] [10]

Implementing deep learning algorithms for urban tree detection and geolocation with high-resolution aerial, satellite, and ground-level images,

L. Velasquez-Camacho, M. Etxegarai, and S. de Miguel, “Implementing deep learning algorithms for urban tree detection and geolocation with high-resolution aerial, satellite, and ground-level images,”Computers, Environment and Urban Systems, vol. 105, p. 102025, 2023. [Online]. Available: https:// www.sciencedirect.com/science/article/pii/S0198971523000881

work page 2023

[11] [11]

Very high resolution canopy height maps from rgb imagery using self-supervised vision transformer and convolutional decoder trained on aerial lidar,

J. Tolan, H.-I. Yang, B. Nosarzewski, G. Couairon, H. V . V o, J. Brandt, J. Spore, S. Majumdar, D. Haziza, J. Vamaraju, T. Moutakanni, P. Bojanowski, T. Johns, B. White, T. Tiecke, and C. Couprie, “Very high resolution canopy height maps from rgb imagery using self-supervised vision transformer and convolutional decoder trained on aerial lidar,”Remote Se...

work page 2024

[12] [12]

Comparison of vision transformers and convolutional neural networks in medical image analysis: A systematic review,

S. Takahashi, Y . Sakaguchi, N. Kouno, K. Takasawa, K. Ishizu, Y . Akagi, R. Aoyama, N. Teraya, N. Shinkai, H. Machino, K. Kobayashi, K. Asada, M. Komatsu, S. Kaneko, M. Sugiyama, and R. Hamamoto, “Comparison of vision transformers and convolutional neural networks in medical image analysis: A systematic review,”Journal of Medical Systems, vol. 48, p. 84, 09 2024

work page 2024

[13] [13]

Ten deep learning techniques to address small data problems with remote sensing,

A. Safonova, G. Ghazaryan, S. Stiller, M. Main-Knorn, C. Nendel, and M. Ryo, “Ten deep learning techniques to address small data problems with remote sensing,”International Journal of Applied Earth Observation and Geoinformation, vol. 125, p. 103569, 2023. [Online]. Available: https://www. sciencedirect.com/science/article/pii/S156984322300393X

work page 2023

[14] [14]

Ultralytics yolo11,

G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available: https://github.com/ultralytics/ultralytics

work page 2024

[15] [15]

Mask r-cnn,

K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r-cnn,”

work page

[16] [16]

Mask R-CNN

[Online]. Available: https://arxiv.org/abs/1703.06870

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Re- thinking atrous convolution for semantic image segmentation,

L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Re- thinking atrous convolution for semantic image segmentation,”

work page

[18] [18]

Rethinking Atrous Convolution for Semantic Image Segmentation

[Online]. Available: https://arxiv.org/abs/1706.05587

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

Dinov2: Learning robust visual features without supervision,

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “Dinov2: Learning robust visual features without supe...

work page

[20] [20]

DINOv2: Learning Robust Visual Features without Supervision

[Online]. Available: https://arxiv.org/abs/2304.07193

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Swin-unet: Unet-like pure transformer for medical image segmentation,

H. Cao, Y . Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and Y . Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,”arXiv preprint arXiv:2105.05537, 2021

work page arXiv 2021