arxiv: 2604.16459 · v1 · submitted 2026-04-08 · 📡 eess.AS · cs.AI· cs.CV· cs.LG· cs.SD· eess.SP

Recognition: 2 theorem links

· Lean Theorem

Deep Hierarchical Knowledge Loss for Fault Intensity Diagnosis

Yu Sha , Shuiping Gou , Bo Liu , Haofan Lu , Ningtao Liu , Jiahui Fu , Horst Stoecker , Domagoj Vnucec

show 3 more authors

Nadine Wetzstein Andreas Widl Kai Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:21 UTC · model grok-4.3

classification 📡 eess.AS cs.AIcs.CVcs.LGcs.SDeess.SP

keywords fault intensity diagnosishierarchical lossdeep learningfault diagnosissubtle fault recognitionindustrial datasetstree constraints

0 comments

The pith

A deep hierarchical knowledge loss framework models fault class dependencies to improve diagnosis of subtle faults.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a deep hierarchical knowledge loss to incorporate dependencies among target classes that prior fault intensity diagnosis methods overlooked. It builds a hierarchical tree loss using positive and negative constraints to map same-attribute classes together and adds a group tree triplet loss with dynamic margins derived from tree distances. Adaptive weighting schemes based on tree height and a focal variant increase flexibility. Tests across four industrial datasets show the combined losses raise accuracy on hard-to-distinguish faults compared with recent methods.

Core claim

The joint hierarchical tree loss and group tree triplet loss produce hierarchical consistent representations and predictions, which measurably raise recognition rates for subtle faults by enforcing tree-based positive and negative constraints plus boundary structural knowledge.

What carries the argument

The deep hierarchical knowledge loss that combines a tree-based positive/negative hierarchical constraint loss with a group tree triplet loss whose margin adapts to tree distance.

Load-bearing premise

Fault classes have stable hierarchical dependencies that a fixed tree structure can capture without creating new errors on unseen data.

What would settle it

A new industrial dataset where the assumed class tree does not match actual fault relationships and the method shows no accuracy gain or loses ground to non-hierarchical baselines.

Figures

Figures reproduced from arXiv: 2604.16459 by Andreas Widl, Bo Liu, Domagoj Vnucec, Haofan Lu, Horst Stoecker, Jiahui Fu, Kai Zhou, Nadine Wetzstein, Ningtao Liu, Shuiping Gou, Yu Sha.

**Figure 2.** Figure 2: Evolution schematic of deep hierarchical knowledge loss (DHK), comprising hierarchical tree loss [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visualisation of the learned deep feature distribu [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Results of different evaluation metrics from various ablation experiments on Cavitation-Short. (a)-(b) and (d) are all [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Fault intensity diagnosis (FID) plays a pivotal role in intelligent manufacturing while neglecting dependencies among target classes hinders its practical deployment. This paper introduces a novel and general framework with deep hierarchical knowledge loss (DHK) to achieve hierarchical consistent representation and prediction. We develop a novel hierarchical tree loss to enable a holistic mapping of same-attribute classes, leveraging tree-based positive and negative hierarchical knowledge constraints. We further design a focal hierarchical tree loss to enhance its extensibility and devise two adaptive weighting schemes based on tree height. In addition, we propose a group tree triplet loss with hierarchical dynamic margin by incorporating hierarchical group concepts and tree distance to model boundary structural knowledge across classes. The joint two losses significantly improve the recognition of subtle faults. Extensive experiments are performed on four real-world datasets from various industrial domains (three cavitation datasets from SAMSON AG and one publicly available dataset) for FID, all showing superior results and outperforming recent state-of-the-art FID methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a DHK loss framework with tree constraints and dynamic triplet margins for fault intensity diagnosis, but the fixed hierarchy lacks validation and the gains are hard to attribute without stronger ablations.

read the letter

The main thing to know is that this work builds a Deep Hierarchical Knowledge Loss setup to enforce consistency across fault classes that share attributes or severity levels. It combines a hierarchical tree loss using positive and negative constraints, a focal variant with height-based weighting, and a group tree triplet loss that sets margins by tree distance. The claim is that the joint losses help spot subtle faults better than prior methods on industrial data.

Referee Report

3 major / 3 minor

Summary. The paper introduces a Deep Hierarchical Knowledge Loss (DHK) framework for fault intensity diagnosis (FID) that incorporates hierarchical class dependencies via a hierarchical tree loss (with positive/negative constraints), a focal hierarchical tree loss (with tree-height adaptive weighting), and a group tree triplet loss (with tree-distance dynamic margins). The central claim is that jointly optimizing these losses yields hierarchical-consistent representations that significantly improve subtle fault recognition, with extensive experiments on four real-world datasets (three cavitation datasets and one public) demonstrating outperformance over recent SOTA FID methods.

Significance. If the hierarchical dependencies are stable and the fixed tree accurately captures them without introducing spurious correlations, the DHK framework offers a principled way to embed structural knowledge into deep learning losses for FID, potentially improving generalization in industrial settings where fault intensities form natural hierarchies. The multi-dataset evaluation is a positive aspect, and the combination of focal weighting and dynamic margins addresses class imbalance and boundary issues in a targeted manner.

major comments (3)

[§3.2] §3.2 (Hierarchical Tree Construction): The fixed tree structure underlying all positive/negative constraints, focal weighting, and dynamic margins is presented without justification, sensitivity analysis to alternative hierarchies (flat or differently branched), or validation against domain-derived dependencies; this is load-bearing because incorrect groupings would enforce spurious correlations rather than genuine knowledge, directly undermining the claim that the joint losses 'significantly improve' recognition beyond dataset-specific artifacts.
[§4.2–4.3] §4.2–4.3 (Experimental Validation): While tables report superior accuracy/F1 on the four datasets, the manuscript lacks ablation studies isolating the contribution of each loss component (e.g., removing the group triplet loss), statistical significance tests (paired t-tests or Wilcoxon with p-values across runs), and standard deviations from multiple random seeds; without these, it is impossible to confirm that reported gains are robust rather than due to hyperparameter tuning or post-hoc tree selection.
[Eq. (8)–(10)] Eq. (8)–(10) (Adaptive Weighting and Dynamic Margins): The tree-height adaptive scheme and tree-distance margins assume a balanced or correctly scaled hierarchy; if leaf depths vary substantially across the manually defined tree, the weighting can over- or under-emphasize certain constraints, yet no analysis of height distribution or margin sensitivity is provided, risking unstable training on new fault distributions.

minor comments (3)

[Figure 2] Figure 2 and §3.1: The diagram of the DHK framework would benefit from explicit annotation of which loss terms operate on which embeddings (e.g., labeling the triplet sampling strategy).
[§2] §2 (Related Work): The comparison to prior hierarchical loss methods (e.g., in image classification) is brief; adding 2–3 sentences on how DHK differs from existing tree-based or metric-learning approaches would clarify novelty.
[§3] Notation in §3: The symbols for positive/negative pairs (e.g., P_h, N_h) and the exact definition of tree distance d_T should be introduced once in a table or early equation to avoid repeated inline definitions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments have identified areas where additional justification and validation will strengthen the manuscript. We address each major comment below and will incorporate the suggested revisions.

read point-by-point responses

Referee: [§3.2] §3.2 (Hierarchical Tree Construction): The fixed tree structure underlying all positive/negative constraints, focal weighting, and dynamic margins is presented without justification, sensitivity analysis to alternative hierarchies (flat or differently branched), or validation against domain-derived dependencies; this is load-bearing because incorrect groupings would enforce spurious correlations rather than genuine knowledge, directly undermining the claim that the joint losses 'significantly improve' recognition beyond dataset-specific artifacts.

Authors: We appreciate this observation. The tree structure is derived from domain knowledge of fault intensity hierarchies in cavitation processes, where classes share physical attributes such as severity levels and operational conditions, as informed by the industrial datasets from SAMSON AG. In the revision, we will expand §3.2 to provide explicit justification for the tree construction, including its grounding in real-world fault dependencies. We will also add a sensitivity analysis comparing the proposed hierarchy against flat and alternative branched structures to confirm that performance gains are not artifacts of the specific tree. revision: yes
Referee: [§4.2–4.3] §4.2–4.3 (Experimental Validation): While tables report superior accuracy/F1 on the four datasets, the manuscript lacks ablation studies isolating the contribution of each loss component (e.g., removing the group triplet loss), statistical significance tests (paired t-tests or Wilcoxon with p-values across runs), and standard deviations from multiple random seeds; without these, it is impossible to confirm that reported gains are robust rather than due to hyperparameter tuning or post-hoc tree selection.

Authors: We agree that these elements are necessary for demonstrating robustness. In the revised manuscript, we will include ablation studies that systematically isolate each DHK component (hierarchical tree loss, focal hierarchical tree loss, and group tree triplet loss). We will also report standard deviations from multiple random seeds (at least five runs per experiment) and add statistical significance tests, such as paired t-tests or Wilcoxon signed-rank tests with p-values, to validate that the observed improvements are statistically meaningful. revision: yes
Referee: [Eq. (8)–(10)] Eq. (8)–(10) (Adaptive Weighting and Dynamic Margins): The tree-height adaptive scheme and tree-distance margins assume a balanced or correctly scaled hierarchy; if leaf depths vary substantially across the manually defined tree, the weighting can over- or under-emphasize certain constraints, yet no analysis of height distribution or margin sensitivity is provided, risking unstable training on new fault distributions.

Authors: We acknowledge the potential sensitivity to tree depth variations. Our current trees exhibit relatively uniform depths (typically 3–4 levels) due to the natural structure of fault intensity classes. In the revision, we will add an explicit analysis of the height distribution across the trees used in our experiments. We will further include sensitivity studies on the adaptive weighting coefficients and dynamic margin parameters to demonstrate training stability and performance consistency under different configurations. revision: yes

Circularity Check

0 steps flagged

No circularity: new loss functions are additive constructions, not reductions to inputs

full rationale

The paper defines a novel DHK framework consisting of a hierarchical tree loss (positive/negative constraints on same-attribute classes), a focal hierarchical tree loss (with tree-height adaptive weighting), and a group tree triplet loss (with tree-distance dynamic margins). These are presented as original constructions in the method, without any equations or claims that reduce by definition to fitted parameters, prior predictions, or self-cited uniqueness theorems. The joint losses are applied to a fixed tree (presumably domain-defined) to produce representations, then validated empirically on four datasets; no step equates the output metric to the input tree or loss definitions themselves. This is a standard methods contribution with independent empirical content, so no load-bearing circularity exists.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review prevents exhaustive extraction; the central design rests on the unstated premise that fault classes admit a useful tree hierarchy and that the new loss terms will improve representation learning without additional regularization.

axioms (1)

domain assumption Fault intensity classes possess stable hierarchical relationships that can be encoded in a tree structure for positive and negative constraints.
Invoked by the design of the hierarchical tree loss and group triplet loss.

invented entities (1)

Deep Hierarchical Knowledge Loss (DHK) framework no independent evidence
purpose: To achieve hierarchical consistent representation and prediction for fault intensity diagnosis.
New composite loss introduced in the paper.

pith-pipeline@v0.9.0 · 5507 in / 1240 out tokens · 43360 ms · 2026-05-10T18:21:07.628911+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 2 internal anchors

[1]

Zhicheng An, Zhexu Gu, Li Yu, Ke Tu, Zhengwei Wu, Binbin Hu, Zhiqiang Zhang, Lihong Gu, and Jinjie Gu. 2024. DDCDR: A Disentangle-based Distillation Framework for Cross-Domain Recommendation. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4764–4773

work page 2024
[2]

Yuanchen Bei, Weizhi Chen, Hao Chen, Sheng Zhou, Carl Yang, Jiapei Fan, Longtao Huang, and Jiajun Bu. 2025. Correlation-aware graph convolutional networks for multi-label node classification. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 37–48

work page 2025
[3]

Long Cui, Xincheng Tian, Qingzhe Wei, and Yan Liu. 2024. A self-attention based contrastive learning method for bearing fault diagnosis.Expert Systems with Applications238 (2024), 121645

work page 2024
[4]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[5]

Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen, Min Wu, Chee-Keong Kwoh, Xiaoli Li, and Cuntai Guan. 2023. Self-supervised contrastive representa- tion learning for semi-supervised time-series classification.IEEE Transactions on Pattern Analysis and Machine Intelligence(2023)

work page 2023
[6]

Yuechun Gu, Jiajie He, and Keke Chen. 2025. Adaptive domain inference attack with concept hierarchy. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 413–424

work page 2025
[7]

Chang Guo, Zhibin Zhao, Jiaxin Ren, Shibin Wang, Yilong Liu, and Xuefeng Chen

work page
[8]

Causal explaining guided domain generalization for rotating machinery intelligent fault diagnosis.Expert Systems with Applications243 (2024), 122806

work page 2024
[9]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778

work page 2016
[10]

Neil He, Menglin Yang, and Rex Ying. 2025. Lorentzian residual neural networks. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 436–447

work page 2025
[11]

Yuan He, Moy Yuan, Jiaoyan Chen, and Ian Horrocks. 2024. Language models as hierarchy encoders.Advances in Neural Information Processing Systems37 (2024), 14690–14711

work page 2024
[12]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingx- ing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for mobilenetv3. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision. 1314–1324

work page 2019
[13]

Qiao Hu, Yi Zhou, Shixing Wang, and Futao Wang. 2020. Machine learning and fractal theory models for landslide susceptibility mapping: Case study from the Jinsha River Basin.Geomorphology351 (2020), 106975

work page 2020
[14]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger

work page
[15]

InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Densely connected convolutional networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700–4708

work page
[16]

Jaeseok Jang and Hyuk-Yoon Kwon. 2024. Are Multiple Instance Learning Algorithms Learnable for Instances?Advances in Neural Information Processing Systems37 (2024), 10575–10612

work page 2024
[17]

Mohd Javaid, Abid Haleem, Ravi Pratap Singh, and Rajiv Suman. 2022. Artificial intelligence applications for industry 4.0: A literature-based study.Journal of Industrial Integration and Management7, 01 (2022), 83–111

work page 2022
[18]

Harshavardhan Kamarthi, Aditya B Sasanur, Xinjie Tong, Xingyu Zhou, James Peters, Joe Czyzyk, and B Aditya Prakash. 2024. Large scale hierarchical industrial demand time-series forecasting incorporating sparsity. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5230–5239

work page 2024
[19]

Siddhant Kharbanda, Devaansh Gupta, Erik Schultheis, Atmadeep Banerjee, Cho- Jui Hsieh, and Rohit Babbar. 2024. Gandalf: Learning label-label correlations in extreme multi-label classification via label features. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1360–1371

work page 2024
[20]

Christian Lessmeier, James Kuria Kimotho, Detmar Zimmer, and Walter Sextro

work page
[21]

InPHM Society European Conference, Vol

Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. InPHM Society European Conference, Vol. 3

work page
[22]

Chuanjiang Li, Shaobo Li, Huan Wang, Fengshou Gu, and Andrew D Ball. 2023. Attention-based deep meta-transfer learning for few-shot fine-grained fault diagnosis.Knowledge-Based Systems264 (2023), 110345

work page 2023
[23]

Gaoyang Li, Haiyi Sun, Jiachao He, Xuhui Ding, Wenkun Zhu, Caiyan Qin, Xuelan Zhang, Xinwu Zhou, Bin Yang, and Yuting Guo. 2024. Deep learning, numerical, and experimental methods to reveal hydrodynamics performance and cavitation development in centrifugal pump.Expert Systems with Applications 237 (2024), 121604

work page 2024
[24]

Jiahao Li, Huandong Wang, and Xinlei Chen. 2024. Physics-informed neural ode for post-disaster mobility recovery. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1587–1598

work page 2024
[25]

Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hong- sheng Li, and Yu Qiao. 2023. Uniformer: Unifying convolution and self-attention for visual recognition.IEEE Transactions on Pattern Analysis and Machine Intelli- gence45, 10 (2023), 12581–12600

work page 2023
[26]

Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, and Yi Yang. 2022. Deep Hierarchical Semantic Segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1246–1257

work page 2022
[27]

Ruikun Li, Huandong Wang, Jinghua Piao, Qingmin Liao, and Yong Li. 2024. Predicting long-term dynamics of complex networks via identifying skeleton in hyperbolic space. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1655–1666

work page 2024
[28]

Yibin Li, Yan Song, Lei Jia, Shengyao Gao, Qiqiang Li, and Meikang Qiu. 2020. Intelligent fault diagnosis by fusing domain adversarial training and maximum mean discrepancy via ensemble learning.IEEE Transactions on Industrial Infor- matics17, 4 (2020), 2833–2841

work page 2020
[29]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022

work page 2021
[30]

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie. 2022. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11976–11986

work page 2022
[31]

Weining Lu, Bin Liang, Yu Cheng, Deshan Meng, Jun Yang, and Tao Zhang. 2016. Deep model based domain adaptation for fault diagnosis.IEEE Transactions on Industrial Electronics64, 3 (2016), 2296–2305

work page 2016
[32]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. InProceedings of the European Conference on Computer Vision (ECCV). 116–131

work page 2018
[33]

Oded Maron and Tomás Lozano-Pérez. 1997. A framework for multiple-instance learning.Advances in Neural Information Processing Systems10 (1997)

work page 1997
[34]

Juhong Min, Yucheng Zhao, Chong Luo, and Minsu Cho. 2022. Peripheral vision transformer.Advances in Neural Information Processing Systems35 (2022), 32097– 32111

work page 2022
[35]

Arta Mohammad-Alikhani, Babak Nahid-Mobarakeh, and Min-Fu Hsieh. 2023. One-Dimensional LSTM-Regulated Deep Residual Network for Data-Driven Fault Detection in Electric Machines.IEEE Transactions on Industrial Electronics (2023)

work page 2023
[36]

Jun Pan, Yanyang Zi, Jinglong Chen, Zitong Zhou, and Biao Wang. 2017. Lift- ingNet: A novel deep learning network with layerwise feature learning from noisy mechanical data for fault classification.IEEE Transactions on Industrial Electronics65, 6 (2017), 4973–4982

work page 2017
[37]

Tongyang Pan, Jinglong Chen, Zitong Zhou, Changlei Wang, and Shuilong He

work page
[38]

A novel deep learning network via multiscale inner product with locally connected feature extraction for intelligent fault detection.IEEE Transactions on Industrial Informatics15, 9 (2019), 5119–5128

work page 2019
[39]

Milton S Plesset and Andrea Prosperetti. 1977. Bubble dynamics and cavitation. Annual Review of Fluid Mechanics9, 1 (1977), 145–185. Yu Sha et al

work page 1977
[40]

Shaowei Rao, Guoping Zou, Shiyou Yang, and Sami Barmada. 2024. A feature se- lection and ensemble learning based methodology for transformer fault diagnosis. Applied Soft Computing150 (2024), 111072

work page 2024
[41]

T-YLPG Ross and GKHP Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2980–2988

work page 2017
[42]

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang- Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520

work page 2018
[43]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815–823

work page 2015
[44]

Yu Sha, Johannes Faber, Shuiping Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, et al. 2022. A multi-task learning for cavitation detection and cavitation intensity recognition of valve acoustic signals.Engineering Applications of Artificial Intelligence113 (2022), 104904

work page 2022
[45]

Yu Sha, Shuiping Gou, Bo Liu, Johannes Faber, Ningtao Liu, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, et al

work page
[46]

InProceedings of the 30th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining

Hierarchical Knowledge Guided Fault Intensity Diagnosis of Complex Industrial Systems. InProceedings of the 30th ACM SIGKDD Conference on Knowl- edge Discovery and Data Mining. 5657–5668

work page
[47]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional net- works for large-scale image recognition.arXiv preprint arXiv:1409.1556(2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[48]

Baoye Song, Yiyan Liu, Jingzhong Fang, Weibo Liu, Maiying Zhong, and Xiaohui Liu. 2024. An optimized CNN-BiLSTM network for bearing fault diagnosis under multiple working conditions with limited training samples.Neurocomputing574 (2024), 127284

work page 2024
[49]

Matthias Springstein, Stefanie Schneider, Javad Rahnama, Julian Stalter, Maxim- ilian Kristen, Eric Müller-Budack, and Ralph Ewerth. 2024. Visual Narratives: Large-scale Hierarchical Classification of Art-historical Images. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 7220–7230

work page 2024
[50]

Xinjie Sun, Kai Zhang, Shuanghong Shen, Fei Wang, Yuxiang Guo, and Qi Liu

work page
[51]

Target hierarchy-guided knowledge tracing: Fine-grained knowledge state modeling.Expert Systems with Applications251 (2024), 123898

work page 2024
[52]

Youchen Sun, Zhu Sun, Yingpeng Du, Jie Zhang, and Yew Soon Ong. 2024. Self-Supervised Denoising through Independent Cascade Graph Augmentation for Robust Social Recommendation. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2806–2817

work page 2024
[53]

Xu Wang, Jiangxia Cao, Zhiyi Fu, Kun Gai, and Guorui Zhou. 2025. Home: Hier- archy of multi-gate experts for multi-task learning at kuaishou. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 2638–2647

work page 2025
[54]

Yiming Xiao, Haidong Shao, Jie Wang, Shen Yan, and Bin Liu. 2024. Bayesian variational transformer: A generalizable model for rotating machinery fault diagnosis.Mechanical Systems and Signal Processing207 (2024), 110936

work page 2024
[55]

Fuchao Yang, Jianhong Cheng, Hui Liu, Yongqiang Dong, Yuheng Jia, and Junhui Hou. 2025. Mixed blessing: Class-wise embedding guided instance-dependent partial label learning. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1763–1772

work page 2025
[56]

Menglin Yang, Harshit Verma, Delvin Ce Zhang, Jiahong Liu, Irwin King, and Rex Ying. 2024. Hypformer: Exploring efficient transformer fully in hyperbolic space. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3770–3781

work page 2024
[57]

Raanan Yehezkel Rohekar, Yaniv Gurwicz, Shami Nisimov, and Gal Novik. 2019. Modeling uncertainty by learning a hierarchy of deep neural connections.Ad- vances in Neural Information Processing Systems32 (2019)

work page 2019
[58]

Shihang Yu, Min Wang, Shanchen Pang, Limei Song, Xue Zhai, and Yawu Zhao

work page
[59]

TDMSAE: A transferable decoupling multi-scale autoencoder for me- chanical fault diagnosis.Mechanical Systems and Signal Processing185 (2023), 109789

work page 2023
[60]

Xiao Yu, Youjie Wang, Zhongting Liang, Haidong Shao, Kun Yu, and Wanli Yu

work page
[61]

An Adaptive Domain Adaptation Method for Rolling Bearings’ Fault Diag- nosis Fusing Deep Convolution and Self-Attention Networks.IEEE Transactions on Instrumentation and Measurement72 (2023), 1–14

work page 2023
[62]

Tonghe Zhang, Yongxing Song, Qiang Liu, Yi Ge, Linhua Zhang, and Jingting Liu. 2024. Cavitation state recognition method of centrifugal pump based on multi-dimensional feature fusion and convolutional gate recurrent unit.Physics of Fluids36, 10 (2024)

work page 2024
[63]

Yunyi Zhang, Ming Zhong, Siru Ouyang, Yizhu Jiao, Sizhe Zhou, Linyi Ding, and Jiawei Han. 2024. Automated mining of structured knowledge from text in the era of large language models. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 6644–6654

work page 2024
[64]

Yifei Zhang, Hao Zhu, Menglin Yang, Jiahong Liu, Rex Ying, Irwin King, and Piotr Koniusz. 2025. Understanding and mitigating hyperbolic dimensional collapse in graph contrastive learning. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1. 1984–1995

work page 2025
[65]

Zheng Zhang, Allen Zhang, Ruth Nelson, Giorgio Ascoli, and Liang Zhao. 2024. Representation Learning of Geometric Trees. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4374–4385

work page 2024
[66]

Minghang Zhao, Shisheng Zhong, Xuyun Fu, Baoping Tang, Shaojiang Dong, and Michael Pecht. 2020. Deep residual networks with adaptively parametric rectifier linear units for fault diagnosis.IEEE Transactions on Industrial Electronics 68, 3 (2020), 2587–2597

work page 2020
[67]

Yongwei Zhao, Zidong Du, Qi Guo, Shaoli Liu, Ling Li, Zhiwei Xu, Tianshi Chen, and Yunji Chen. 2019. Cambricon-F: machine learning computers with fractal von Neumann architecture. InProceedings of the 46th International Symposium on Computer Architecture. 788–801. Deep Hierarchical Knowledge Loss for Fault Intensity Diagnosis Contents of Appendix A Prelim...

work page 2019
[68]

A D I B E F C G H J K L M anchor positive negative K F G I LMD J K F G I LMD J K F G I LMD J K F G I LMD J e.g

Therefore, we have𝑚 𝜎 ∈ (0,1]. A D I B E F C G H J K L M anchor positive negative K F G I LMD J K F G I LMD J K F G I LMD J K F G I LMD J e.g. Figure A2: Schematic diagram of the maximum and mini- mum boundaries for anchor sample nodes, positive sample nodes and negative sample nodes. The left part shows a given hierarchical tree and the right part provid...

work page 2070
[69]

traffic scene

From Table 11, It can clearly be seen that there is almost no difference in the inference time between DHK and CCE. E Discussion In this section, we reflect on the key assumptions underlying our method (DHK), discuss its limitations, extensibility and consider the broader impact of our work. In addition, we also outline potential directions for future imp...

work page