pith. sign in

arxiv: 2605.26984 · v1 · pith:HWUIMKSMnew · submitted 2026-05-26 · 💻 cs.LG

TED: Related Party Transaction guided Tax Evasion Detection on Heterogeneous Graph

Pith reviewed 2026-06-29 19:23 UTC · model grok-4.3

classification 💻 cs.LG
keywords tax evasion detectionheterogeneous graphgraph neural networkrelated party transactionhierarchical attentionmachine learningrisk management
0
0 comments X

The pith

A heterogeneous graph model guided by related party transactions detects tax evasion better than statistical features alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models tax scenarios as heterogeneous graphs to capture interactive signals between companies instead of relying only on individual statistical features. It proposes the TED graph neural network that filters low-level noise through related party transaction groups and applies hierarchical attention to extract deeper structure and semantics. The method is deployed in a real tax bureau risk system and evaluated on two human-labeled datasets, where it outperforms prior state-of-the-art approaches. A sympathetic reader would care because improved detection could reduce government revenue losses and support fair economic competition.

Core claim

By representing the tax evasion detection problem on a heterogeneous graph and designing the TED model to filter via related party transaction groups while using hierarchical attention for semantic information, the approach extracts comprehensive interactive signals and significantly outperforms existing methods on two real-world labeled tax datasets.

What carries the argument

The TED graph neural network, which filters noise with related party transaction groups and applies hierarchical attention to capture deeper structure and semantics in the heterogeneous tax graph.

If this is right

  • Interactive company relations captured in the graph improve detection accuracy over isolated statistical features.
  • The model integrates directly into existing tax bureau risk management systems for operational use.
  • Filtering with related party groups effectively reduces noise in complex transaction data.
  • Hierarchical attention extracts semantic patterns hidden in transaction structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same filtering approach on relational groups could extend to fraud detection in banking or insurance networks.
  • Performance would likely drop if transaction data is too sparse to form meaningful related party groups.
  • Adding temporal layers to the graph could track evolving evasion patterns over time.

Load-bearing premise

That related party transaction groups contain interactive signals that improve evasion detection beyond what statistical company features already provide.

What would settle it

If the TED model shows no significant improvement over the strongest statistical or graph baselines when re-evaluated on the same two human-labeled real tax datasets, the central performance claim would be falsified.

read the original abstract

Tax evasion causes severe losses of government revenues and disturbs the economic order of fair competition. To help alleviate this problem, the latest tax evasion detection solutions utilize expert knowledge to extract features and then train classifiers to determine whether a company is suspected of tax evasion. However, existing solutions mainly focus on the statistical features of the company, but fail to exploit the rich interactive information in tax scenarios, which affect the detection performance. In this paper, we first model the tax scenario as a heterogeneous graph and study the tax evasion detection problem under the heterogeneous graph model. To improve the performance of tax evasion detection, a novel graph neural network model is proposed to extract the comprehensive information of heterogeneous graphs. Specifically, we use heterogeneous and complex related party transaction groups to filter low-level noise information. Moreover, a hierarchical attention mechanism is designed to capture the deeper structure and semantic information hidden in the related party transaction group. We apply our method to the real risk management system of the tax bureau, and evaluate it on two human-labeled real-world tax datasets. The results demonstrate that our method significantly outperforms the state-of-the-art in the tax evasion detection task.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes TED, a heterogeneous graph neural network for tax evasion detection. It models tax scenarios as heterogeneous graphs, filters low-level noise using related party transaction groups, and applies a hierarchical attention mechanism to capture deeper structural and semantic information. The method is deployed in a real tax bureau risk management system and evaluated on two human-labeled real-world tax datasets, with the abstract asserting significant outperformance over state-of-the-art methods.

Significance. If validated with concrete metrics, the work could advance practical tax evasion detection by incorporating interactive signals from related-party graphs beyond purely statistical company features, with direct applicability to government systems. The real-world deployment and human-labeled datasets represent potential strengths for impact assessment.

major comments (2)
  1. [Abstract] Abstract: The central claim that the method 'significantly outperforms the state-of-the-art in the tax evasion detection task' on two real datasets is unsupported by any quantitative results (precision, recall, F1, AUC), baseline descriptions, statistical tests, or error analysis, which is load-bearing for the empirical contribution.
  2. [Abstract] Abstract: No ablation studies or comparisons isolate the contribution of the heterogeneous graph modeling, related party transaction filtering, or hierarchical attention versus standard statistical features alone, leaving the weakest assumption untested.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting issues in the abstract. We will revise the abstract to include quantitative results from the experiments and better articulate the contributions of the model components.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the method 'significantly outperforms the state-of-the-art in the tax evasion detection task' on two real datasets is unsupported by any quantitative results (precision, recall, F1, AUC), baseline descriptions, statistical tests, or error analysis, which is load-bearing for the empirical contribution.

    Authors: We agree the abstract should be strengthened with concrete metrics. The full manuscript reports experimental results on two human-labeled datasets, including comparisons against state-of-the-art baselines with precision, recall, F1, and AUC scores. We will revise the abstract to include key quantitative results, baseline names, and a brief note on the evaluation setup. revision: yes

  2. Referee: [Abstract] Abstract: No ablation studies or comparisons isolate the contribution of the heterogeneous graph modeling, related party transaction filtering, or hierarchical attention versus standard statistical features alone, leaving the weakest assumption untested.

    Authors: We agree the abstract does not isolate component contributions. The manuscript evaluates the full model against statistical-feature baselines and includes related-party transaction filtering as a core design choice, but dedicated ablations for each element (heterogeneous modeling, filtering, hierarchical attention) are not explicitly summarized in the abstract. We will revise the abstract to reference the comparative experiments and, if space permits, note the incremental value of the graph-based components. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical GNN model with external evaluation

full rationale

The paper proposes a heterogeneous-graph GNN for tax-evasion detection that filters via related-party groups and applies hierarchical attention. No equations, derivations, or parameter-fitting steps are described that reduce any claimed performance metric to the inputs by construction. The central claim is outperformance on two human-labeled real-world datasets after deployment in a live tax system; this is an externally falsifiable empirical assertion rather than a self-referential prediction or self-citation load-bearing uniqueness theorem. No self-definitional, fitted-input-called-prediction, or ansatz-smuggled patterns appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that relational patterns in tax data are best captured by heterogeneous graphs and that related-party grouping removes noise without discarding signal; no free parameters or invented entities are specified in the abstract.

axioms (1)
  • domain assumption Heterogeneous graph neural networks can extract useful signals from company transaction and relation data for classification tasks
    Invoked as the basis for modeling tax scenarios as graphs and applying GNNs

pith-pipeline@v0.9.1-grok · 5738 in / 1158 out tokens · 27670 ms · 2026-06-29T19:23:39.815995+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 6 canonical work pages · 5 internal anchors

  1. [1]

    Administratie si Manage- ment Public (32), 32–41 (2019)

    Androniceanu, A., Gherghina, R., Ciob˘ ana¸ su, M.: The interdependence between fiscal public policies and tax evasion. Administratie si Manage- ment Public (32), 32–41 (2019)

  2. [2]

    Journal of Macroeco- nomics53, 107–126 (2017)

    L´ opez, J.J.: A quantitative theory of tax evasion. Journal of Macroeco- nomics53, 107–126 (2017)

  3. [3]

    Journal of International Development30(2), 206–232 (2018)

    Cobham, A., Jansk` y, P.: Global distribution of revenue loss from corporate tax avoidance: re-estimation and country results. Journal of International Development30(2), 206–232 (2018)

  4. [4]

    International Monetary Fund, ??? (2015)

    Crivelli, E., De Mooij, R.A., Keen, M.M.: Base Erosion, Profit Shifting and Developing Countries. International Monetary Fund, ??? (2015)

  5. [5]

    Taxation: critical perspectives on the world economy3, 323–338 (1972)

    Allingham, M.G., Sandmo, A.: Income tax evasion: A theoretical analysis. Taxation: critical perspectives on the world economy3, 323–338 (1972)

  6. [6]

    Review of Economics and Statistics, forthcoming (2017)

    Liu, L., Schmidt-Eisenlohr, T., Guo, D., et al.: International transfer pric- ing and tax avoidance: Evidence from linked trade-tax statistics in the uk. Review of Economics and Statistics, forthcoming (2017)

  7. [7]

    Contemporary Accounting Research34(1), 455–493 (2017)

    Klassen, K.J., Lisowsky, P., Mescall, D.: Transfer pricing: Strategies, prac- tices, and tax minimization. Contemporary Accounting Research34(1), 455–493 (2017)

  8. [8]

    IEEE Access8, 16073–16086 (2020)

    Didimo, W., Grilli, L., Liotta, G., Menconi, L., Montecchiani, F., Pagli- uca, D.: Combining network visualization and data mining for tax risk assessment. IEEE Access8, 16073–16086 (2020)

  9. [9]

    International Journal of Accounting Information Systems25, 1–17 (2017)

    Rahimikia, E., Mohammadi, S., Rahmani, T., Ghazanfari, M.: Detecting corporate tax evasion using a hybrid intelligent system: A case study of iran. International Journal of Accounting Information Systems25, 1–17 (2017)

  10. [10]

    In: International Conference on Intelligent Decision Technologies, pp

    Assylbekov, Z., Melnykov, I., Bekishev, R., Baltabayeva, A., Bissen- galiyeva, D., Mamlin, E.: Detecting value-added tax evasion by business entities of kazakhstan. In: International Conference on Intelligent Decision Technologies, pp. 37–49 (2016). Springer

  11. [11]

    Procedia-Social and Behavioral Sciences213, 383–389 (2015)

    Stankevicius, E., Leonas, L.: Hybrid approach model for prevention of tax evasion and fraud. Procedia-Social and Behavioral Sciences213, 383–389 (2015)

  12. [12]

    In: KDD, pp

    de Roux, D., Perez, B., Moreno, A., Villamil, M.d.P., Figueroa, C.: Tax fraud detection for under-reporting declarations using an unsupervised Springer Nature 2021 LATEX template TED: RPT guided Tax Evasion Detection on Heterogeneous Graph23 machine learning approach. In: KDD, pp. 215–222 (2018)

  13. [13]

    arXiv preprint arXiv:2103.01033 (2021)

    Savi´ c, M., Atanasijevi´ c, J., Jakoveti´ c, D., Kreji´ c, N.: Tax evasion risk management using a hybrid unsupervised outlier detection method. arXiv preprint arXiv:2103.01033 (2021)

  14. [14]

    Engineering34, 43–59 (2024)

    Zheng, Q., Xu, Y., Liu, H., Shi, B., Wang, J., Dong, B.: A survey of tax risk detection using data mining techniques. Engineering34, 43–59 (2024)

  15. [15]

    In: CiSE, pp

    Liu, X., Pan, D., Chen, S.: Application of hierarchical clustering in tax inspection case-selecting. In: CiSE, pp. 1–4 (2010). IEEE

  16. [16]

    Future Internet11(4), 86 (2019)

    P´ erez L´ opez, C., Delgado Rodr´ ıguez, M.J., et al.: Tax fraud detection through neural networks: an application using a sample of personal income taxpayers. Future Internet11(4), 86 (2019)

  17. [17]

    Knowledge-Based Systems89, 459–470 (2015)

    Lin, C.-C., Chiu, A.-A., Huang, S.Y., Yen, D.C.: Detecting the financial statement fraud: The analysis of the differences between data mining tech- niques and experts’ judgments. Knowledge-Based Systems89, 459–470 (2015)

  18. [18]

    Expert Systems with Applications40(5), 1427–1436 (2013)

    Gonz´ alez, P.C., Vel´ asquez, J.D.: Characterization and detection of tax- payers with false invoices using data mining techniques. Expert Systems with Applications40(5), 1427–1436 (2013)

  19. [19]

    In: Big Data, pp

    Wu, Y., Zheng, Q., Gao, Y., Dong, B., Wei, R., Zhang, F., He, H.: Tedm- pu: A tax evasion detection method based on positive and unlabeled learning. In: Big Data, pp. 1681–1686 (2019). IEEE

  20. [20]

    TKDE28(10), 2651– 2664 (2016)

    Tian, F., Lan, T., Chao, K.-M., Godwin, N., Zheng, Q., Shah, N., Zhang, F.: Mining suspicious tax evasion groups in big data. TKDE28(10), 2651– 2664 (2016)

  21. [21]

    Decision Support Systems110, 71–83 (2018)

    Didimo, W., Giamminonni, L., Liotta, G., Montecchiani, F., Pagliuca, D.: A visual analytics system to support tax evasion discovery. Decision Support Systems110, 71–83 (2018)

  22. [22]

    In: ICONIP, pp

    Mi, L., Dong, B., Shi, B., Zheng, Q.: A tax evasion detection method based on positive and unlabeled learning with network embedding features. In: ICONIP, pp. 140–151 (2020). Springer

  23. [23]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Xu, Y., Shi, B., Peng, Z., Liu, H., Dong, B., Chen, C.: Out-of-distribution generalization on graphs via progressive inference. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, pp. 12963–12971 (2025) Springer Nature 2021 LATEX template 24TED: RPT guided Tax Evasion Detection on Heterogeneous Graph

  24. [24]

    In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

    Xu, Y., Peng, Z., Shi, B., Hua, X., Dong, B., Wang, S., Chen, C.: Revisiting graph contrastive learning on anomaly detection: A struc- tural imbalance perspective. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, pp. 12972–12980 (2025)

  25. [25]

    Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

    Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural net- works on graphs with fast localized spectral filtering. arXiv preprint arXiv:1606.09375 (2016)

  26. [26]

    Semi-Supervised Classification with Graph Convolutional Networks

    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convo- lutional networks. arXiv preprint arXiv:1609.02907 (2016)

  27. [27]

    Inductive Representation Learning on Large Graphs

    Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216 (2017)

  28. [28]

    In: 2023 IEEE 39th International Conference on Data Engineering (ICDE), pp

    Xu, Y., Shi, B., Ma, T., Dong, B., Zhou, H., Zheng, Q.: Cldg: Contrastive learning on dynamic graphs. In: 2023 IEEE 39th International Conference on Data Engineering (ICDE), pp. 696–707 (2023). IEEE

  29. [29]

    Neural Networks176, 106384 (2024)

    Xu, Y., Peng, Z., Shi, B., Hua, X., Dong, B.: Learning dynamic graph representations through timespan view contrasts. Neural Networks176, 106384 (2024)

  30. [30]

    In: KDD, pp

    Dong, Y., Chawla, N.V., et al.: metapath2vec: Scalable representation learning for heterogeneous networks. In: KDD, pp. 135–144 (2017)

  31. [31]

    In: WWW, pp

    Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., et al.: Heterogeneous graph attention network. In: WWW, pp. 2022–2032 (2019)

  32. [32]

    In: WWW, pp

    Fu, X., Zhang, J., Meng, Z., King, I.: Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In: WWW, pp. 2331– 2341 (2020)

  33. [33]

    In: WWW, pp

    Hu, Z., Dong, Y., Wang, K., Sun, Y.: Heterogeneous graph transformer. In: WWW, pp. 2704–2710 (2020)

  34. [34]

    Expert Systems with Applications213, 118903 (2023)

    Shi, B., Dong, B., Xu, Y., Wang, J., Wang, Y., Zheng, Q.: An edge feature aware heterogeneous graph neural network model to support tax evasion detection. Expert Systems with Applications213, 118903 (2023)

  35. [35]

    VLDB 4(11), 992–1003 (2011)

    Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB 4(11), 992–1003 (2011)

  36. [36]

    In: CIKM, pp

    Fu, T.-y., Lee, W.-C., Lei, Z.: Hin2vec: Explore meta-paths in heteroge- neous information networks for representation learning. In: CIKM, pp. 1797–1806 (2017) Springer Nature 2021 LATEX template TED: RPT guided Tax Evasion Detection on Heterogeneous Graph25

  37. [37]

    In: KDD, pp

    Tang, J., Qu, M., Mei, Q.: Pte: Predictive text embedding through large- scale heterogeneous text networks. In: KDD, pp. 1165–1174 (2015)

  38. [38]

    In: NIPS, pp

    Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J.,et al.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 1–9 (2013)

  39. [39]

    In: AAAI (2018)

    Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S.: Convolutional 2d knowledge graph embeddings. In: AAAI (2018)

  40. [40]

    Embedding Entities and Relations for Learning and Inference in Knowledge Bases

    Yang, B., Yih, W.-t., He, X., Gao, J., Deng, L.: Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575 (2014)

  41. [41]

    In: International Conference on Machine Learning, pp

    Trouillon, T., Welbl, J., Riedel, S., Gaussier, ´E., Bouchard, G.: Complex embeddings for simple link prediction. In: International Conference on Machine Learning, pp. 2071–2080 (2016). PMLR

  42. [42]

    NeurIPS 30, 3146–3154 (2017)

    Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.- Y.: Lightgbm: A highly efficient gradient boosting decision tree. NeurIPS 30, 3146–3154 (2017)

  43. [43]

    In: European Semantic Web Conference, pp

    Schlichtkrull, M., Kipf, T.N., Bloem, P., Van Den Berg, R., et al.: Mod- eling relational data with graph convolutional networks. In: European Semantic Web Conference, pp. 593–607 (2018). Springer

  44. [44]

    Adam: A Method for Stochastic Optimization

    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  45. [45]

    TKDE (2020)

    Yang, C., Xiao, Y., Zhang, Y., Sun, Y., Han, J.: Heterogeneous network representation learning: A unified framework with survey and benchmark. TKDE (2020)

  46. [46]

    Journal of machine learning research9(11) (2008)

    Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research9(11) (2008)

  47. [47]

    TVCG (2020)

    Lin, Y., Wong, K., Wang, Y., Zhang, R., Dong, B., Qu, H., Zheng, Q.: Taxthemis: Interactive mining and exploration of suspicious tax evasion groups. TVCG (2020)

  48. [48]

    Decision Support Systems141, 113464 (2021)

    Gonz´ alez-Martel, C., Hern´ andez, J.M., Manrique-de-Lara-Pe˜ nate, C.: Identifying business misreporting in vat using network analysis. Decision Support Systems141, 113464 (2021)