pith. machine review for the scientific record. sign in

arxiv: 2605.02041 · v1 · submitted 2026-05-03 · 💻 cs.LG

TIJERE: A Novel Threat Intelligence Joint Extraction Model Based on Analyst Expert Knowledge

Pith reviewed 2026-05-08 19:33 UTC · model grok-4.3

classification 💻 cs.LG
keywords extractiontijerecybersecurityentityintelligencejointthreatdataset
0
0 comments X

The pith

TIJERE uses multisequence labeling representation and SecureBERT+ to reach F1 scores above 0.93 for NER and 0.98 for relation extraction on a new jointly labeled cybersecurity dataset DNRTI-JE.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on turning unstructured threat intelligence reports into structured data such as knowledge graphs. Current methods often mix up features, struggle with unclear language, and have trouble with overlapping relations. TIJERE treats the task as creating separate label sequences for every possible pair of entities instead of one shared sequence. It adds expert analyst features to better capture position, context, and meaning. The model also relies on SecureBERT+, a language model that has been further trained on cybersecurity documents to reduce ambiguity in domain-specific text. To support testing, the authors created DNRTI-JE, described as the first public dataset with joint entity and relation labels for this field. On this dataset the model reports very high accuracy and beats prior approaches.

Core claim

Empirical evaluations on the curated DNRTI-JE dataset demonstrate that TIJERE achieves state-of-the-art performance, with F1-scores exceeding 0.93 for NER and 0.98 for RE, outperforming existing methods.

Load-bearing premise

That the multisequence labeling representation combined with expert domain features and SecureBERT+ sufficiently resolves feature confusion, language ambiguity, and overlapping relations without overfitting to the new dataset or introducing selection bias in evaluation.

Figures

Figures reproduced from arXiv: 2605.02041 by Inoussa Mouiche, Sherif Saad.

Figure 1
Figure 1. Figure 1: A sample schema showing entity relationships within the dataset view at source ↗
Figure 2
Figure 2. Figure 2: Sample data format in the DNRTI-JE dataset for joint extraction view at source ↗
Figure 3
Figure 3. Figure 3: Graphical example of an annotated sentence contain￾ing overlapping relations. Mouiche and Saad: Preprint submitted to Elsevier Page 14 of 13 view at source ↗
Figure 4
Figure 4. Figure 4: Multisequence labeling representation view at source ↗
Figure 5
Figure 5. Figure 5: TIJERE Model’s Architecture. Mouiche and Saad: Preprint submitted to Elsevier Page 15 of 13 view at source ↗
Figure 6
Figure 6. Figure 6: Training and validation losses and accuracies of SecureBERT+ -BiGRU-CRF, demonstrating its convergence. Mouiche and Saad: Preprint submitted to Elsevier Page 16 of 13 view at source ↗
Figure 7
Figure 7. Figure 7: Model execution time. Mouiche and Saad: Preprint submitted to Elsevier Page 17 of 13 view at source ↗
read the original abstract

The extraction of entities and relationships from threat intelligence reports into structured formats, such as cybersecurity knowledge graphs, is essential for automated threat analysis, detection, and mitigation. However, existing joint extraction methods struggle with feature confusion, language ambiguity, noise propagation, and overlapping relations, resulting in low accuracy and poor model performance. This paper presents TIJERE, an innovative joint entity and relation extraction framework that formulates joint extraction as a multisequence labeling representation (MSLR) problem. Specifically, separate sequences are generated for each entity pair. Unlike prior tagging schemes, MSLR integrates expert domain features to enrich positional, contextual, and semantic representations of entities, thereby enhancing feature distinction and classification accuracy. Additionally, TIJERE reduces language ambiguity and enhances domain-specific generalization by leveraging SecureBERT+, a contextual language model fine-tuned on cybersecurity text. This improves both named entity recognition (NER) and relation extraction (RE). This paper also introduces DNRTI-JE, the first publicly available jointly labeled dataset for cybersecurity entity and RE, filling a crucial gap in cyber threat intelligence automation. Empirical evaluations on the curated DNRTI-JE dataset demonstrate that TIJERE achieves state-of-the-art performance, with F1-scores exceeding 0.93 for NER and 0.98 for RE, outperforming existing methods. Together, TIJERE and the standardized benchmarking DNRTI-JE dataset enable high-performance cybersecurity intelligence extraction, with transferable applications in healthcare, finance, and bioinformatics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes TIJERE, a joint NER and RE framework for cybersecurity threat intelligence reports. It reformulates joint extraction as a multisequence labeling representation (MSLR) problem that generates a separate sequence per entity pair, augments these with analyst expert domain features for positional/contextual/semantic enrichment, and employs SecureBERT+ (a cybersecurity-fine-tuned language model) to reduce ambiguity and improve generalization. The authors also introduce DNRTI-JE, described as the first public jointly labeled dataset for this task. Empirical results on the curated DNRTI-JE dataset report F1 scores exceeding 0.93 for NER and 0.98 for RE, claimed to be state-of-the-art over existing methods.

Significance. The release of the DNRTI-JE dataset is a clear positive contribution, as it supplies a standardized, publicly available benchmark for joint entity-relation extraction in the cybersecurity domain where such resources have been scarce. If the reported performance is shown to be robust (via proper splits, ablations, and baselines), the MSLR-plus-expert-features design offers a plausible way to mitigate overlapping relations and feature confusion in technical text, with potential transfer to other specialized domains. The combination of domain-adapted pretraining and explicit expert knowledge injection is a reasonable engineering approach for this application.

major comments (2)
  1. [Abstract and §5] Abstract and §5 (Experiments): The central SOTA claim rests on F1 > 0.93 (NER) and > 0.98 (RE) on the newly introduced DNRTI-JE dataset, yet no information is supplied on train/test split methodology (report-level vs. instance-level), baseline implementations, statistical significance tests, error bars, or ablation isolating the contribution of MSLR versus expert features versus SecureBERT+. This is load-bearing because the skeptic correctly notes that high RE F1 could arise from selection or leakage artifacts rather than genuine disambiguation of overlapping relations.
  2. [§3] §3 (Method), MSLR definition: The claim that generating one sequence per entity pair plus expert features 'resolves feature confusion and overlapping relations' lacks a concrete walk-through or counter-example showing how the tagging scheme avoids the very noise propagation and ambiguity problems diagnosed in prior work; without this, it is unclear whether the performance gain is architectural or an artifact of how entity pairs were chosen for sequence construction.
minor comments (2)
  1. [Abstract] The abstract refers to 'curated DNRTI-JE' without a pointer to the exact curation protocol, annotation guidelines, or inter-annotator agreement statistics; these details belong in §4.1 or an appendix for reproducibility.
  2. [§3] Notation for the MSLR sequences and the integration of expert features could be formalized with a small equation or pseudocode block to make the distinction from prior tagging schemes (e.g., standard BIO or span-based) immediately clear.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that MSLR plus expert features plus SecureBERT+ fine-tuning will improve distinction and generalization; these are adaptations of standard NLP techniques whose effectiveness is asserted rather than derived from first principles.

free parameters (1)
  • Model hyperparameters and fine-tuning settings
    Typical in neural network training but not enumerated in the abstract; their values affect the reported F1 scores.

pith-pipeline@v0.9.0 · 5568 in / 1270 out tokens · 23613 ms · 2026-05-08T19:33:55.044902+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 19 canonical work pages

  1. [1]

    Securebert: A domain-specific language model for cybersecu rity

    Aghaei, E., Niu, X., Shadid, W., Al-Shaer, E., 2023. Securebert: A domain-specific language model for cybersecu rity. 2https://github.com/imouiche/TIJERE Mouiche and Saad: Preprint submitted to Elsevier Page 12 of 13 Data-driven Approach for Joint Entity and Relation Extract ions Security and Privacy in Communication Networks 462. doi: 10.1109/TrustCom5067...

  2. [2]

    Cyberentrel: Joint extrac- tion of cyber entities and relations using deep learning

    Ahmed, K., Khurshid, S.K., Hina, S., 2024. Cyberentrel: Joint extrac- tion of cyber entities and relations using deep learning. Co mputers & Security 136. doi: https://doi.org/10.1016/j.cose.2023.103579

  3. [3]

    Joint entity recognition and relation extraction as a multi-head selection problem

    Bekoulis, G., Deleu, J., Demeester, T., Develder, C., 20 18. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications 114, 34–45

  4. [4]

    Secureb ert_plus

    Ehsan, A., Niu, X., Waseem, S., Ehab, A.S., 2022. Secureb ert_plus. URL: https://huggingface.co/ehsanaghaei/SecureBERT_Plus

  5. [5]

    Information extr action of cybersecurity concepts: an lstm approach

    Gasmi, H., Laval, J., Bouras, A., 2019. Information extr action of cybersecurity concepts: an lstm approach. Applied Science s 9, 3945. doi:https://doi.org/10.3390/app9193945

  6. [6]

    Guo, Y., Liu, Z., Huang, C., Liu, J., Jing, W., Wang, Z., Wa ng, Y.,

  7. [7]

    Information and Communications Security: 23rd I nterna- tional Conference, ICICS 2021, Chongqing, China, November 19-21 , 447–463doi: https://doi.org/10.1007/978-3-030-86890-1_25

    Cyberrel: Joint entity and relation extraction for cy bersecurity concepts. Information and Communications Security: 23rd I nterna- tional Conference, ICICS 2021, Chongqing, China, November 19-21 , 447–463doi: https://doi.org/10.1007/978-3-030-86890-1_25

  8. [8]

    Vulcan: Automatic extrac tion and anal- ysis of cyber threat intelligence from unstructured text

    Jo, H., Lee, Y., Shin, S., 2022. Vulcan: Automatic extrac tion and anal- ysis of cyber threat intelligence from unstructured text. C omputers & Security 120, 13. doi: 10.1016/j.cose.2022.102763

  9. [9]

    A novel threat intelligence information extraction system c ombin- ing multiple models

    Li, Y., Guo, Y., Fang, C., Liu, Y., Chen, Q., 2022. A novel threat intelligence information extraction system c ombin- ing multiple models. Security and Communication Networks doi: https://doi.org/10.1155/2022/8477260

  10. [10]

    Bilayer-induced asymmetric quantum Hall effect in epitaxial graphene

    Liu, Y., Han, X., Zuo, W., Lv, H., Guo, J., 2024. Cti-je: A j oint extraction framework of entities and relations in unstruct ured cyber threat intelligence. 27th International Conference on Com puter Supported Cooperative Work in Design (CSCWD), Tianjin, Chi na , 2728–2733doi: 10.1109/CSCWD61410.2024.10580210

  11. [11]

    Joint extraction of entities and relationships from cyber threat intelligence based on task-specific fourier network

    Lv, H., Han, X., Cui1, H., Wang, P., Zuo, W., Zhou, Y., 202 4. Joint extraction of entities and relationships from cyber threat intelligence based on task-specific fourier network. 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan , 1– 8doi: 10.1109/IJCNN60899.2024.10650942

  12. [12]

    Stixnet: A novel and modular solution for extracting all stix objects i n cti reports

    Marchiori, F., Conti, M., Verde, N.V ., 2023. Stixnet: A novel and modular solution for extracting all stix objects i n cti reports. ARES 23: Proceedings of the 18th International Conference on Availability, Reliability and Security 2, 1– 11. doi: https://doi.org/10.1145/3600160.3600182

  13. [13]

    MITRE, 2023. Att&ck. URL: https://attack.mitre.org/versions/v13/

  14. [14]

    End-to-end relation extrac tion using lstms on sequences and tree structures

    Miwa, M., Bansal, M., 2016. End-to-end relation extrac tion using lstms on sequences and tree structures. Proceedings of the 5 4th Annual Meeting of the Association for Computational Lingui stics 1, 1105–1116. doi: https://aclanthology.org/P16-1105

  15. [15]

    Context-awar e entity- relation extraction pipeline for threat intelligence know ledge graphs

    Mouiche, I., Merbouh, H., Saad, S., 2025. Context-awar e entity- relation extraction pipeline for threat intelligence know ledge graphs. TechRxiv doi:10.36227/techrxiv.173627493.39916970/v1

  16. [16]

    Entity and relation extrac tions for threat intelligence knowledge graphs

    Mouiche, I., Saad, S., 2025. Entity and relation extrac tions for threat intelligence knowledge graphs. Computers & Securit y 148. doi:https://doi.org/10.1016/j.cose.2024.104120

  17. [17]

    Creating cybersecurity knowledge graphs from malware after action reports

    Piplai, A., Mittal, S., Joshi, A., Finin, T., Holt, J., Z ak, R., 2020. Creating cybersecurity knowledge graphs from malware after action reports. in IEEE Access , 211691– 211703doi: 10.1109/ACCESS.2020.3039234

  18. [18]

    Open-cykg: An open cyber t hreat intelligence knowledge graph

    Sarhan, I., Spruit, M., 2021. Open-cykg: An open cyber t hreat intelligence knowledge graph. Knowledge-Based Systems 23 3. doi:https://doi.org/10.1016/j.knosys.2021.107524

  19. [19]

    Uco: A unified cybersecurity ontology

    Syed, Z., Padia, A., Finin, T., Mathews, L., Joshi, A., 2 016. Uco: A unified cybersecurity ontology. AAAI Workshop on Artificia l Intelligence for Cyber Security , 1–6

  20. [20]

    Securebert: A domain-specific language model for cybersecu rity

    Wang, X., Liu, X., Ao, S., Li, N., Jiang, Z., Xu, Z., Xiong , Z., Xiong, M., Zhang, X., 2020. Dnrti: A large-scale dataset for named entity recognition in threat intelligence. 2020 IEEE 19th International Conference on Trust, Security, and Privacy i n Com- puting and Communications (TrustCom), Guangzhou, China , 1 842– 1848doi: https://doi.org/10.1109/Tru...

  21. [21]

    Joint relational tripl e extraction with enhanced representation and binary tagging framework in cy bersecu- rity

    Wang, X., Liu, Z., Liu, J., 2024. Joint relational tripl e extraction with enhanced representation and binary tagging framework in cy bersecu- rity. Computers & Security 144

  22. [22]

    Joint extraction mo del of entity relations based on bert-crf

    Wang, Y., Zhang, X., Deng, A., 2022. Joint extraction mo del of entity relations based on bert-crf. 2022 4th International Confer ence on Machine Learning, Big Data and Business Intelligence (MLBD BI), Shanghai, China , 7–12doi: 10.1109/MLBDBI58171.2022.00008

  23. [23]

    An empirical study of pipe line vs

    Yan, Z., Jia, Z., Tu, K., 2022. An empirical study of pipe line vs. joint approaches to entity and relation extraction. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association fo r Com- putational Linguistics and the 12th International Joint Co nference on Natural Language Processing , 437–443

  24. [24]

    Bilayer-induced asymmetric quantum Hall effect in epitaxial graphene

    You, Y., Jiang, Z., Zhang, K., Feng, H., Jiang, J., Yang, P., 2024. Tignet: Joint entity and relation triplets extraction for a pt campaign threat intelligence. 27th International Conference on Com puter Supported Cooperative Work in Design (CSCWD), Tianjin, Chi na , 1687–1694doi: 10.1109/CSCWD61410.2024.10580395

  25. [25]

    A relation-specific attention network for joint entity and re lation ex- traction

    Yuan, Y., Zhou, X., Pan, S., Zhu, Q., Song, Z., Guo, L., 20 20. A relation-specific attention network for joint entity and re lation ex- traction. In: International Joint Conference on Artificial Intelligence

  26. [26]

    Association for the Advancement of Artificial Intelli gence (AAAI) , 4054–4060

  27. [27]

    A knowledge graph for network security

    Zhang, H., Xu, Y., Chen, J., Zhou, W., Cheng, L., 2024. A knowledge graph for network security. In Wang, W., Liu, X., N a, Z., Zhang, B. (eds) Communications, Signal Processing, and Sys- tems. CSPS 2023. Lecture Notes in Electrical Engineering 10 32. doi:10.1007/978-981-99-7505-1\_59

  28. [28]

    Cyber t hreat intelligence modeling based on heterogeneous graph convol utional network

    Zhao, J., Yan, Q., Liu1, X., Li, B., Zuo, G., 2020. Cyber t hreat intelligence modeling based on heterogeneous graph convol utional network. In Proceedings of the 23rd international symposiu m on research in attacks, intrusions and defenses (RAID 2020) 1, 241–256

  29. [29]

    A frustratingly easy approac h for entity and relation extraction

    Zhong, Z., Chen, D., 2021. A frustratingly easy approac h for entity and relation extraction. In Proceedings of the 2021 Confere nce of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies , 50–61

  30. [30]

    Itirel: Joint en tity and relation extraction for internet of things threat intelligence

    Zhu, F., Cheng, Z., Li, P., Xu, H., 2024. Itirel: Joint en tity and relation extraction for internet of things threat intelligence. in I EEE Internet of Things Journal , 20867–20878doi: 10.1109/JIOT.2024.3373799

  31. [31]

    An end-to-end en tity and relation joint extraction model for cyber threat intellige nce

    Zuo, J., Gao, Y., Li, X., Yuan, J., 2022. An end-to-end en tity and relation joint extraction model for cyber threat intellige nce. 2022 the 7th International Conference on Big Data Analytics (ICB DA) doi:10.1109/ICBDA55095.2022.9760342. Mouiche and Saad: Preprint submitted to Elsevier Page 13 of 13 Data-driven Approach for Joint Entity and Relation Extra...