TowerDataset: A Heterogeneous Benchmark for Transmission Corridor Segmentation with a Global-Local Fusion Framework
Pith reviewed 2026-05-10 06:54 UTC · model grok-4.3
The pith
TowerDataset provides 661 long real-world scenes and a 22-class taxonomy to benchmark fine-grained segmentation of heterogeneous transmission corridors, supported by a global-local fusion framework.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TowerDataset establishes a heterogeneous benchmark of 661 real transmission corridor scenes with a 22-class taxonomy that preserves long extents and long-tail distributions, while the global-local fusion framework combines whole-scene topological context with local geometric precision to improve recognition of rare and confusing components in point cloud segmentation.
What carries the argument
The global-local fusion framework, which runs a whole-scene branch with NoCrop training and prototypical contrastive learning to model long-range topology, a block-wise local branch to retain fine geometric structures, and geometric validation to fuse and refine the two predictions.
If this is right
- Standardized evaluation on TowerDataset will expose the gap between current methods and requirements for long heterogeneous corridors.
- The fusion design allows models to maintain performance on both common corridor elements and safety-critical rare components.
- Public release of the dataset and splits will enable consistent comparison of future segmentation techniques for transmission inspection.
- The framework's handling of long-range dependencies suggests it can scale to other extended linear infrastructure scenes.
Where Pith is reading between the lines
- If the taxonomy aligns with operational inspection needs, the benchmark could support development of end-to-end systems that flag maintenance issues directly from point clouds.
- The separation of global topology learning from local detail processing may transfer to segmentation of other long, variable structures such as pipelines or rail tracks.
- Independent testing on additional corridors outside the 661 scenes would reveal whether the reported robustness holds under broader geographic and seasonal variation.
Load-bearing premise
The 661 scenes and 22-class taxonomy capture enough real-world variability that the global-local fusion preserves complementary cues without creating new errors on rare classes.
What would settle it
An experiment in which the fusion framework produces higher error rates than single-branch baselines on rare classes or on corridor scenes with different topologies would indicate that the approach does not reliably integrate global and local information.
Figures
read the original abstract
Fine-grained semantic segmentation of transmission-corridor point clouds is fundamental for intelligent power-line inspection. However, current progress is limited by realistic data scarcity and the difficulty of modeling global corridor structure and local geometric details in long, heterogeneous scenes. Existing public datasets usually provide only a few coarse categories or short cropped scenes which overlook long-range structural dependencies, severe long-tail distributions, and subtle distinctions among safety-critical components. As a result, current methods are difficult to evaluate under realistic inspection settings, and their ability to preserve and integrate complementary global and local cues remains unclear. To address the above challenges, we introduce TowerDataset, a heterogeneous benchmark for transmission-corridor segmentation. TowerDataset contains 661 real-world scenes and about 2.466 billion points. It preserves long corridor extents, defines a fine-grained 22-class taxonomy, and provides standardized splits and evaluation protocols. In addition, we present a global-local fusion framework which preserves and fuses whole-scene and local-detail information. A whole-scene branch with NoCrop training and prototypical contrastive learning captures long-range topology and contextual dependencies. A block-wise local branch retains fine geometric structures. Both predictions are then fused and refined by geometric validation. This design allows the model to exploit both global relationships and local shape details when recognizing rare and confusing components. Experiments on TowerDataset and two public benchmarks demonstrate the challenge of the proposed benchmark and the robustness of our framework in real, complex, and heterogeneous transmission-corridor scenes. The dataset will be released soon at https://huggingface.co/datasets/tccx18/Towerdataset/tree/main.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces TowerDataset, a heterogeneous benchmark containing 661 real-world transmission corridor scenes with approximately 2.466 billion points and a fine-grained 22-class taxonomy, along with standardized splits and evaluation protocols. It also proposes a global-local fusion framework consisting of a whole-scene NoCrop branch with prototypical contrastive learning, a block-wise local branch, and fusion via geometric validation to capture long-range topology and local geometric details. Experiments on TowerDataset and two public benchmarks are claimed to demonstrate the benchmark's challenges and the framework's robustness in complex, heterogeneous scenes.
Significance. If the dataset is publicly released at the stated scale and the empirical claims are supported by detailed metrics, this work could meaningfully advance fine-grained point cloud segmentation for power-line inspection by addressing long-scene dependencies and long-tail distributions that existing datasets overlook. The framework's explicit design for preserving complementary global and local cues is a relevant technical contribution. Credit is given for the intent to release the data with standardized protocols.
major comments (2)
- [Abstract] Abstract: the statement that 'Experiments on TowerDataset and two public benchmarks demonstrate the challenge of the proposed benchmark and the robustness of our framework' is unsupported, as the manuscript provides no quantitative results, mIoU values, per-class metrics, error bars, or tables.
- [Methods and Experiments] Methods and Experiments: no ablation studies isolate the contribution of the global-local fusion components (NoCrop training, prototypical contrastive learning, block-wise local branch, geometric validation) on rare classes within the 22-class taxonomy, leaving the claim that the framework 'preserves and fuses' cues 'without introducing new failure modes' untested despite the benchmark's explicit motivation around long-tail distributions and safety-critical components.
minor comments (2)
- [Abstract] Abstract: the Hugging Face link uses 'Towerdataset' while the title and text use 'TowerDataset'; ensure consistent capitalization.
- [Abstract] Abstract: the point count is phrased as 'about 2.466 billion points'; provide the exact total or a precise range for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review of our manuscript. We appreciate the recognition of the benchmark's potential impact and the framework's design intent. We address each major comment below and outline the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: the statement that 'Experiments on TowerDataset and two public benchmarks demonstrate the challenge of the proposed benchmark and the robustness of our framework' is unsupported, as the manuscript provides no quantitative results, mIoU values, per-class metrics, error bars, or tables.
Authors: We agree that the abstract claim is currently unsupported by quantitative evidence in the manuscript. The experiments section describes the evaluation protocol and setup on TowerDataset and the two public benchmarks but does not present the corresponding numerical results, tables, mIoU scores, per-class metrics, or error bars. We will revise the abstract to accurately reflect the experimental contributions without overstating the validation and will add a complete results section with all requested quantitative metrics, tables, and analysis in the revised manuscript. revision: yes
-
Referee: [Methods and Experiments] Methods and Experiments: no ablation studies isolate the contribution of the global-local fusion components (NoCrop training, prototypical contrastive learning, block-wise local branch, geometric validation) on rare classes within the 22-class taxonomy, leaving the claim that the framework 'preserves and fuses' cues 'without introducing new failure modes' untested despite the benchmark's explicit motivation around long-tail distributions and safety-critical components.
Authors: We acknowledge that the manuscript lacks dedicated ablation studies isolating each fusion component's contribution specifically on rare classes in the 22-class taxonomy. While the overall framework motivation addresses long-tail distributions and safety-critical elements, the individual effects of NoCrop training, prototypical contrastive learning, the block-wise local branch, and geometric validation on these classes—and confirmation that fusion introduces no new failure modes—are not empirically isolated. We will add targeted ablation experiments in the revision, with per-class metrics on rare categories, to directly test and support these claims. revision: yes
Circularity Check
No circularity: empirical dataset release and architecture description with no self-referential derivations
full rationale
The paper introduces TowerDataset (661 scenes, 22-class taxonomy, ~2.466B points) and describes a global-local fusion framework (NoCrop whole-scene branch with prototypical contrastive learning, block-wise local branch, geometric validation fusion). No equations, fitted parameters, or derivation steps are present that could reduce to inputs by construction. The central claims rest on experimental results on the new benchmark plus two public sets; these are falsifiable empirical outcomes rather than algebraic identities or self-cited uniqueness theorems. Self-citations, if any, are not load-bearing for any claimed derivation. This is a standard empirical benchmark paper whose validity hinges on data quality and ablation completeness, not on circular reasoning.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. 2019. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. InProceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea, 9296–9306
work page 2019
-
[2]
Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, and Tengyu Ma. 2019. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. InAd- vances in Neural Information Processing Systems, Vol. 32. Vancouver, BC, Canada, 1565–1576
work page 2019
- [3]
-
[4]
Christopher Choy, JunYoung Gwak, and Silvio Savarese. 2019. 4D Spatio- Temporal ConvNets: Minkowski Convolutional Neural Networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 3075–3084
work page 2019
-
[5]
Pointcept Contributors. 2023. Pointcept: A Codebase for Point Cloud Perception Research. https://github.com/Pointcept/Pointcept
work page 2023
-
[6]
Yin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, and Serge Belongie. 2019. Class- Balanced Loss Based on Effective Number of Samples. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 9268–9277
work page 2019
-
[7]
Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner
Angela Dai, Angel X. Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Nießner. 2017. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA, 2432–2443
work page 2017
-
[8]
Dasari, Rajshekhar Sunderraman, and Yi Ding
Manish Dhakal, Venkat R. Dasari, Rajshekhar Sunderraman, and Yi Ding. 2026. GFT: Graph Feature Tuning for Efficient Point Cloud Analysis. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, 7955–7964
work page 2026
-
[9]
Qingyong Hu, Bo Yang, Sheikh Khalid, Wen Xiao, Niki Trigoni, and Andrew Markham. 2022. SensatUrban: Learning Semantics from Urban-Scale Photogram- metric Point Clouds.International Journal of Computer Vision130, 2 (2022), 316–343
work page 2022
-
[10]
Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhizhong Wang, Niki Trigoni, and Andrew Markham. 2020. RandLA-Net: Efficient Semantic Seg- mentation of Large-Scale Point Clouds. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, 11105–11114
work page 2020
-
[11]
Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, and Katerina Fragkiadaki
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios, Adam W. Harley, Gabriel Sarch, Kriti Aggarwal, Vishrav Chaudhary, and Katerina Fragkiadaki. 2024. ODIN: A Single Model for 2D and 3D Segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, 3564– 3574
work page 2024
-
[12]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised Contrastive Learning. InAdvances in Neural Information Processing Systems, Vol. 33. virtual, 18661–18673
work page 2020
-
[13]
Minseok Kim, Jiyong Boo, and Kuk-Jin Yoon. 2026. Generalized Category Dis- covery for LiDAR Semantic Segmentation. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, 8416–8426
work page 2026
-
[14]
Maxim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, and Danila Rukhovich
-
[15]
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
OneFormer3D: One Transformer for Unified Point Cloud Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, 20943–20953
-
[16]
Xin Lai, Jianhui Liu, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, and Jiaya Jia. 2022. Stratified Transformer for 3D Point Cloud Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA, 8490–8499
work page 2022
-
[17]
Diogo Lavado, Ricardo Santos, Andre Coelho, Joao Santos, Alessandra Micheletti, and Claudia Soares. 2025. Learning Under Noisy Labels, Spurious Points, and Diverse Structures: TS40K, a 3D Point Cloud Dataset of Rural Terrain and Elec- trical Transmission Systems. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, A...
work page 2025
-
[18]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal Loss for Dense Object Detection. InProceedings of the IEEE International Conference on Computer Vision. Venice, Italy, 2999–3007
work page 2017
-
[19]
Li Lu, Linong Wang, Shaocheng Wu, Shengxuan Zu, Yuhao Ai, and Bin Song
-
[20]
Semantic Segmentation of Key Categories in Transmission Line Corridor Point Clouds Based on EMAFL-PTv3.Electronics14, 4 (2025), 650
work page 2025
-
[21]
Iaroslav Melekhov, Anand Umashankar, Hyeong-Jin Kim, Vladislav Serkov, and Dusty Argyle. 2024. ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle, WA, USA, 7627–7637
work page 2024
-
[22]
Simone Mosco, Daniel Fusaro, Wanmeng Li, and Alberto Pretto. 2026. Revisiting Retentive Networks for Fast Range-View 3D LiDAR Semantic Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Tucson, AZ, USA, 2499–2509
work page 2026
-
[23]
Yatian Pang, Eng Hock Francis Tay, Li Yuan, and Zhenghua Chen. 2024. Masked autoencoders for 3d point cloud self-supervised learning.World Scientific Annual Review of Artificial Intelligence2 (2024), 2440001:1–2440001:22
work page 2024
-
[24]
Shaotong Pei, Haichao Sun, Chenlong Hu, Weiqi Wang, Mianxiao Wu, and Bo Lan
-
[25]
TLSNet: A Semantic Segmentation Method for Key Point Cloud Regions of Transmission Lines.iScience28, 6 (2025)
work page 2025
-
[26]
Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, and Jiaya Jia. 2024. OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, 21305–21315
work page 2024
-
[27]
Qi, Li Yi, Hao Su, and Leonidas J
Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. InAdvances in Neural Information Processing Systems. Long Beach, CA, USA, 5099–5108
work page 2017
-
[28]
Wentao Qu, Jing Wang, YongShun Gong, Xiaoshui Huang, and Liang Xiao. 2025. An End-to-End Robust Point Cloud Semantic Segmentation Network with Single- Step Conditional Diffusion Models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 27325–27335
work page 2025
-
[29]
Bin Ren, Xiaoshui Huang, Mengyuan Liu, Hong Liu, Fabio Poiesi, Nicu Sebe, and Guofeng Mei. 2026. Masked Clustering Prediction for Unsupervised Point Cloud Pre-training. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. Singapore, 8712–8720
work page 2026
-
[30]
Jiawei Ren, Cunjun Yu, Xiao Ma, Haiyu Zhao, Shuai Yi, et al. 2020. Balanced Meta- Softmax for Long-Tailed Visual Recognition. InAdvances in Neural Information Processing Systems, Vol. 33. virtual, 4175–4186
work page 2020
-
[31]
Jake Snell, Kevin Swersky, and Richard S. Zemel. 2017. Prototypical Networks for Few-shot Learning. InAdvances in Neural Information Processing Systems, Vol. 30. Long Beach, CA, USA, 4077–4087
work page 2017
-
[32]
Colton Stearns, Alex Fu, Jiateng Liu, Jeong Joon Park, Davis Rempe, Despoina Paschalidou, and Leonidas J. Guibas. 2024. CurveCloudNet: Processing Point Clouds with 1D Structure. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA, 27981–27991
work page 2024
-
[33]
Weikai Tan, Nannan Qin, Lingfei Ma, Ying Li, Jing Du, Guorong Cai, Ke Yang, and Jonathan Li. 2020. Toronto-3D: A Large-Scale Mobile LiDAR Dataset for Semantic Segmentation of Urban Roadways. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 797–806
work page 2020
-
[34]
Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas J
Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, François Goulette, and Leonidas J. Guibas. 2019. KPConv: Flexible and Deformable Convolution for Point Clouds. InProceedings of the IEEE/CVF International Con- ference on Computer Vision. Seoul, South Korea, 6410–6419
work page 2019
-
[35]
Guanjian Wang, Linong Wang, Shaocheng Wu, Shengxuan Zu, and Bin Song
-
[36]
Semantic Segmentation of Transmission Corridor 3D Point Clouds Based on CA-PointNet++.Electronics12, 13 (2023), 2829
work page 2023
-
[37]
Jiahui Wang, Haiyue Zhu, Haoren Guo, Abdullah Al Mamun, Cheng Xiang, and Tong Heng Lee. 2026. EPSegFZ: Efficient Point Cloud Semantic Segmentation for Few- and Zero-Shot Scenarios with Language Guidance. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. Singapore, 9885–9893
work page 2026
-
[38]
Peng-Shuai Wang. 2023. OctFormer: Octree-based Transformers for 3D Point Clouds.ACM Transactions on Graphics42, 4 (2023), 155:1–155:11
work page 2023
-
[39]
Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xihui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. 2024. Point Transformer V3: Simpler, Faster, Stronger. InProceedings of the IEEE/CVF Conference on Computer Vision MM ’26, November 10–14, 2026, Rio de Janeiro, Brazil Cui et al. and Pattern Recognition. Seattle, WA, USA, 4840–4851
work page 2024
-
[40]
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Hengshuang Zhao. 2022. Point Transformer V2: Grouped Vector Attention and Partition-based Pooling. InAdvances in Neural Information Processing Systems, Vol. 35. New Orleans, LA, USA, 33330–33342
work page 2022
-
[41]
Xiaoyang Wu, Xin Wen, Xihui Liu, and Hengshuang Zhao. 2023. Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada, 9415–9424
work page 2023
-
[42]
Wanjing Yan, Weifeng Ma, Xiaodong Wu, Chong Wang, Jianpeng Zhang, and Yuncheng Deng. 2024. Filtering-Assisted Airborne Point Cloud Semantic Seg- mentation for Transmission Lines.Sensors24, 21 (2024), 7028
work page 2024
-
[43]
Hao Yu, Zhengyang Wang, Qingjie Zhou, Yuxuan Ma, Zhuo Wang, Huan Liu, Chunqing Ran, Shengli Wang, Xinghua Zhou, and Xiaobo Zhang. 2023. Deep- Learning-Based Semantic Segmentation Approach for Point Clouds of Extra- High-Voltage Transmission Lines.Remote Sensing15, 9 (2023), 2371
work page 2023
-
[44]
Yaohua Zha, Yanzi Wang, Hang Guo, Jinpeng Wang, Tao Dai, Bin Chen, Zhihao Ouyang, Yuerong Xue, Ke Chen, and Shu-Tao Xia. 2025. PMA: Towards Parameter- Efficient Point Cloud Understanding via Point Mamba Adapter. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 16976–16986
work page 2025
-
[45]
Jianhui Zhang, Yizhi Luo, Zicheng Zhang, Xuecheng Nie, and Bonan Li. 2025. CamPoint: Boosting Point Cloud Segmentation with Virtual Camera. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 11822–11832
work page 2025
-
[46]
Su Zhang, Haibo Liu, Jingguo Rong, and Yaping Zhang. 2025. GKCAE: A Graph-Attention-Based Encoder for Fine-Grained Semantic Segmentation of High-Voltage Transmission Corridors Scenario LiDAR Data.Frontiers in Earth Science13 (2025), 1649203
work page 2025
-
[47]
Weiguang Zhao, Rui Zhang, Qiufeng Wang, Guangliang Cheng, and Kaizhu Huang. 2025. BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, 29395–29405
work page 2025
-
[48]
Yunyi Zhou, Ziyi Feng, Chunling Chen, and Fenghua Yu. 2024. Bilinear Distance Feature Network for Semantic Segmentation in PowerLine Corridor Point Clouds. Sensors24, 15 (2024), 5021
work page 2024
-
[49]
Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, et al. 2023. PonderV2: Pave the Way for 3D Foundation Model with a Universal Pre-training Paradigm.arXiv preprint arXiv:2310.08586(2023). TowerDataset: A Heterogeneous Benchmark for Transmission Corridor Segmentation with a Global-Loca...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.