pith. machine review for the scientific record. sign in

arxiv: 2604.13722 · v1 · submitted 2026-04-15 · 💻 cs.CV

Recognition: unknown

Granularity-Aware Transfer for Tree Instance Segmentation in Synthetic and Real Forests

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords instance segmentationdomain transfersynthetic dataforestrygranularitydistillationtree detection
0
0 comments X

The pith

Granularity-aware distillation transfers fine-grained synthetic tree annotations to improve segmentation on real coarse-labeled forest images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles synthetic-to-real transfer for tree instance segmentation where synthetic data offers fine trunk and crown labels but real data has only coarse tree labels. It proposes the MGTD dataset with thousands of images from both domains and a four-stage protocol to separate domain shift from granularity issues. The key method is granularity-aware distillation that merges logits from synthetic teachers and unifies masks to pass structural priors to a student model trained on coarse labels. This results in better mask average precision, especially for small and distant trees. A reader would care because it shows how to exploit detailed synthetic data to compensate for limited real annotations in practical forestry applications.

Core claim

The authors establish that granularity-aware distillation, which performs logit-space merging and mask unification to transfer structural priors from fine-grained synthetic teachers to coarse-label students, yields consistent improvements in mask AP on real forest images despite domain shift and label coarseness.

What carries the argument

Granularity-aware distillation via logit-space merging and mask unification to align fine synthetic priors with coarse real labels.

If this is right

  • Consistent gains in mask AP for tree instance segmentation on real data.
  • Particular benefits for detecting small and distant trees.
  • Provides an isolated testbed for studying granularity mismatch in sim-to-real transfer.
  • Enables better use of synthetic data in scenarios with limited real labeling resources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method might extend to other segmentation tasks with hierarchical or multi-level labels, such as in urban scene parsing.
  • Combining this with other domain adaptation techniques could further reduce the performance gap.
  • If more detailed real labels become available through semi-supervised means, they could be integrated into the unification step for additional gains.

Load-bearing premise

Structural priors learned from fine-grained synthetic annotations about tree trunks and crowns remain transferable and beneficial even when the target real labels are coarse and the images come from a different domain.

What would settle it

Training a model solely on the real coarse labels and comparing its mask AP to the distilled model on the same real test set; if the distilled version shows no gain or worse performance, the claim would be falsified.

Figures

Figures reproduced from arXiv: 2604.13722 by Anandatirtha JS, Anmol Ashri, Atef Tej, Karsten Berns, Pankaj Deoli.

Figure 1
Figure 1. Figure 1: Phase 1: Instance segmentation results on synthetic validation data. Top row [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Phase 2: Domain transfer. Comparison of predictions obtained from [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of real-only training (Phase 3) and granularity-aware dis [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 1
Figure 1. Figure 1: Sample images from the MGTD dataset. Includes both the real and simulated [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Phase 1: Instance segmentation results on Tree Trunk examples. Each row shows the RGB image, model prediction, and ground truth mask [PITH_FULL_IMAGE:figures/full_fig_p022_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Phase 1: Instance segmentation results on Whole tree examples. Each row shows the RGB image, model prediction, and ground truth mask [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Phase 2: Domain transfer (Tree Trunk → Real Trees) [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Phase 2: Domain transfer (Whole Tree → Real Trees) [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Phase 3: Instance segmentation (Mask-RCNN with Swin-T backbone) results on real trees (trained directly on real data). Each row shows RGB image, prediction, and ground truth [PITH_FULL_IMAGE:figures/full_fig_p030_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Phase 1: Qualitative analysis of yolov11m (trained on simulated tree trunks) on the simulated tree trunks val set. RGB Image Prediction Ground Truth The qualitative results in [PITH_FULL_IMAGE:figures/full_fig_p033_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Phase 1: Qualitative analysis of yolov8m (trained on simulated whole trees) on the simulated whole trees val set. RGB Image Prediction Ground Truth and cluttered backgrounds. The predictions align closely with the ground truth annota￾tions, capturing both near- and far-field trees. Occasional errors occur in cases of severe occlusion or trees with very thin stems, where detections are sometimes fragmented … view at source ↗
Figure 9
Figure 9. Figure 9: Phase 2: Domain gap when the best simulated (tree trunk and whole tree model) was directly tested on real tree images. Qualitative examples further highlight these quantitative differences. Predictions on real images reveal that the trunk-trained model often detects only a limited subset of trees, focusing on highly salient or well-lit stems while missing thinner or background trunks, particularly under he… view at source ↗
read the original abstract

We address the challenge of synthetic-to-real transfer in forestry perception where real data have only coarse Tree labels while synthetic data provide fine-grained trunk/crown annotations. We introduce MGTD, a mixed-granularity dataset with 53k synthetic and 3.6k real images, and a four-stage protocol isolating domain shift and granularity mismatch. Our core contribution is granularity-aware distillation, which transfers structural priors from fine-grained synthetic teachers to a coarse-label student via logit-space merging and mask unification. Experiments show consistent mask AP gains, especially for small/distant trees, establishing a testbed for Sim-Real transfer under label granularity constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces MGTD, a mixed-granularity dataset with 53k synthetic images providing fine-grained trunk/crown annotations and 3.6k real images with coarse tree labels. It proposes a four-stage protocol that isolates domain shift from granularity mismatch, along with granularity-aware distillation that transfers structural priors via logit-space merging and mask unification from a fine-grained synthetic teacher to a coarse-label student. Experiments report consistent mask AP gains, with particular benefits for small and distant trees.

Significance. If the reported gains prove robust, the work provides a practical testbed and technique for sim-to-real transfer in forestry instance segmentation under realistic label-granularity constraints. The isolation of factors in the protocol and the emphasis on small/distant trees align with application needs in forest inventory and perception.

minor comments (3)
  1. Abstract: The four-stage protocol and logit-space merging/mask unification steps are described at a high level; a diagram or pseudocode in §3 would clarify how fine-grained priors survive the unification without introducing label-induced bias.
  2. Abstract: No numerical AP values, baseline comparisons, or statistical tests are mentioned; the full experiments section should include these to substantiate the 'consistent gains' claim.
  3. Abstract: Consider spelling out MGTD on first use and confirming whether the dataset will be released publicly, as it is positioned as a core contribution.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. We appreciate the recognition that the MGTD dataset and four-stage protocol provide a practical testbed for isolating domain shift from granularity mismatch, and that granularity-aware distillation offers a useful technique for transferring structural priors to coarse real labels, with benefits for small and distant trees.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical method consisting of a new mixed-granularity dataset and a four-stage transfer protocol that applies standard distillation and domain-adaptation techniques to tree instance segmentation. No mathematical derivations, first-principles predictions, or equations are described in the provided text. The central claims rest on reported experimental mask AP improvements rather than any reduction of outputs to fitted inputs or self-referential definitions by construction. No load-bearing self-citations or ansatz smuggling are visible; the approach is self-contained against external benchmarks and does not invoke uniqueness theorems or prior author results to force its conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that synthetic fine-grained annotations encode transferable structural priors and that logit merging plus mask unification can bridge granularity without introducing new biases.

pith-pipeline@v0.9.0 · 5413 in / 1142 out tokens · 28805 ms · 2026-05-10T13:52:10.424947+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 8 canonical work pages · 1 internal anchor

  1. [1]

    MMDetection: Open MMLab Detection Toolbox and Benchmark

    Chen, K., Wang, J., Pang, J., Cao, Y ., Xiong, Y ., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y ., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)

  2. [2]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion

    Deng, J., Li, W., Chen, Y ., Duan, L.: Unbiased mean teacher for cross-domain object detec- tion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recogni- tion. pp. 4091–4101 (2021)

  3. [3]

    a multispectral imagery analy- sis

    Deoli, P., Deshpande, S.A., Vierling, A., Berns, K.: Exploring image fusion techniques for off-road semantic segmentation in harsh lighting conditions. a multispectral imagery analy- sis. In: 2024 21st International Conference on Ubiquitous Robots (UR). pp. 566–573 (2024). https://doi.org/10.1109/UR61395.2024.10597528

  4. [4]

    In: Proceedings of the 32nd International Conference on Neural Information Processing Sys- tems

    Dubey, A., Gupta, O., Raskar, R., Naik, N.: Maximum entropy fine-grained classification. In: Proceedings of the 32nd International Conference on Neural Information Processing Sys- tems. p. 635–645. NIPS’18, Curran Associates Inc., Red Hook, NY , USA (2018)

  5. [5]

    Ecological Informatics87, 103085 (2025)

    Feng, Z., She, Y ., Keshav, S.: Spread: A large-scale, high-fidelity synthetic dataset for mul- tiple forest vision tasks. Ecological Informatics87, 103085 (2025)

  6. [6]

    In: ICRA 2022 Workshop in Innovation in Forestry Robotics: Research and Industry Adoption (2022)

    Grondin, V ., Pomerleau, F., Giguère, P.: Training deep learning algorithms on synthetic forest images for tree detection. In: ICRA 2022 Workshop in Innovation in Forestry Robotics: Research and Industry Adoption (2022)

  7. [7]

    Open-vocabulary object detection via vision and language knowledge distillation,

    Gu, X., Lin, T.Y ., Kuo, W., Cui, Y .: Open-vocabulary detection via vision and language knowledge distillation. arXiv preprint arXiv:2104.13921 (2021)

  8. [8]

    In: 2021 IEEE international conference on robotics and automation (ICRA)

    Jiang, P., Osteen, P., Wigness, M., Saripalli, S.: Rellis-3d dataset: Data, benchmarks and analysis. In: 2021 IEEE international conference on robotics and automation (ICRA). pp. 1110–1116. IEEE (2021)

  9. [9]

    Segment Anything

    Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., White- head, S., Berg, A.C., Lo, W.Y ., Dollár, P., Girshick, R.: Segment anything. arXiv:2304.02643 (2023) 42 No Author Given

  10. [10]

    In: Scandinavian Conference on Image Analysis

    Lagos, J., Lempiö, U., Rahtu, E.: Finnwoodlands dataset. In: Scandinavian Conference on Image Analysis. pp. 95–110. Springer (2023)

  11. [11]

    Ecological Informatics77, 102215 (2023).https://doi.org/https://doi.org/10.1016/j.ecoinf.2023

    Li, R., Sun, G., Wang, S., Tan, T., Xu, F.: Tree trunk detection in urban scenes using a multiscale attention-based deep learning method. Ecological Informatics77, 102215 (2023).https://doi.org/https://doi.org/10.1016/j.ecoinf.2023. 102215,https://www.sciencedirect.com/science/article/pii/ S1574954123002443

  12. [12]

    arXiv preprint arXiv:2309.01279 (2023)

    Puliti, S., Pearse, G., Surov `y, P., Wallace, L., Hollaus, M., Wielgosz, M., Astrup, R.: For- instance: a uav laser scanning benchmark dataset for semantic and instance segmentation of individual trees. arXiv preprint arXiv:2309.01279 (2023)

  13. [13]

    IEEE transactions on pattern analysis and machine intelligence 39(6), 1137–1149 (2016)

    Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence 39(6), 1137–1149 (2016)

  14. [14]

    Leveraging vision language models for specialized agricultural tasks

    Steininger, D., Simon, J., Trondl, A., Murschitz, M.: Timbervision: A multi-task dataset and framework for log-component segmentation and tracking in autonomous forestry operations. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (W ACV). pp. 5601–5610 (2025).https://doi.org/10.1109/WACV61041.2025.00547

  15. [15]

    Advances in neural information processing systems30(2017)

    Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consis- tency targets improve semi-supervised deep learning results. Advances in neural information processing systems30(2017)

  16. [16]

    In: Proceedings of the IEEE/CVF winter conference on applications of computer vision

    Tranheden, W., Olsson, V ., Pinto, J., Svensson, L.: Dacs: Domain adaptation via cross- domain mixed sampling. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 1379–1389 (2021)

  17. [17]

    In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Tsai, Y .H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  18. [18]

    test (2020)

    Weinstein, B., Marconi, S., Zare, A., Bohlman, S., Graves, S., Singh, A., White, E.: Neon tree crowns dataset. test (2020)

  19. [19]

    In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

    Wigness, M., Eum, S., Rogers, J.G., Han, D., Kwon, H.: A rugd dataset for autonomous navigation and visual perception in unstructured outdoor environments. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 5000–5007. IEEE (2019)

  20. [20]

    International Journal of Automation and Computing 14(01 2017).https://doi.org/10.1007/s11633-017-1053-3

    Zhao, B., Feng, J., Wu, X., Yan, S.: A survey on deep learning-based fine-grained object clas- sification and semantic segmentation. International Journal of Automation and Computing 14(01 2017).https://doi.org/10.1007/s11633-017-1053-3

  21. [21]

    In: ECCV (2022)

    Zhou, X., Girdhar, R., Joulin, A., Krähenbühl, P., Misra, I.: Detecting twenty-thousand classes using image-level supervision. In: ECCV (2022)