TIJERE: A Novel Threat Intelligence Joint Extraction Model Based on Analyst Expert Knowledge
Pith reviewed 2026-05-08 19:33 UTC · model grok-4.3
The pith
TIJERE uses multisequence labeling representation and SecureBERT+ to reach F1 scores above 0.93 for NER and 0.98 for relation extraction on a new jointly labeled cybersecurity dataset DNRTI-JE.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Empirical evaluations on the curated DNRTI-JE dataset demonstrate that TIJERE achieves state-of-the-art performance, with F1-scores exceeding 0.93 for NER and 0.98 for RE, outperforming existing methods.
Load-bearing premise
That the multisequence labeling representation combined with expert domain features and SecureBERT+ sufficiently resolves feature confusion, language ambiguity, and overlapping relations without overfitting to the new dataset or introducing selection bias in evaluation.
Figures
read the original abstract
The extraction of entities and relationships from threat intelligence reports into structured formats, such as cybersecurity knowledge graphs, is essential for automated threat analysis, detection, and mitigation. However, existing joint extraction methods struggle with feature confusion, language ambiguity, noise propagation, and overlapping relations, resulting in low accuracy and poor model performance. This paper presents TIJERE, an innovative joint entity and relation extraction framework that formulates joint extraction as a multisequence labeling representation (MSLR) problem. Specifically, separate sequences are generated for each entity pair. Unlike prior tagging schemes, MSLR integrates expert domain features to enrich positional, contextual, and semantic representations of entities, thereby enhancing feature distinction and classification accuracy. Additionally, TIJERE reduces language ambiguity and enhances domain-specific generalization by leveraging SecureBERT+, a contextual language model fine-tuned on cybersecurity text. This improves both named entity recognition (NER) and relation extraction (RE). This paper also introduces DNRTI-JE, the first publicly available jointly labeled dataset for cybersecurity entity and RE, filling a crucial gap in cyber threat intelligence automation. Empirical evaluations on the curated DNRTI-JE dataset demonstrate that TIJERE achieves state-of-the-art performance, with F1-scores exceeding 0.93 for NER and 0.98 for RE, outperforming existing methods. Together, TIJERE and the standardized benchmarking DNRTI-JE dataset enable high-performance cybersecurity intelligence extraction, with transferable applications in healthcare, finance, and bioinformatics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes TIJERE, a joint NER and RE framework for cybersecurity threat intelligence reports. It reformulates joint extraction as a multisequence labeling representation (MSLR) problem that generates a separate sequence per entity pair, augments these with analyst expert domain features for positional/contextual/semantic enrichment, and employs SecureBERT+ (a cybersecurity-fine-tuned language model) to reduce ambiguity and improve generalization. The authors also introduce DNRTI-JE, described as the first public jointly labeled dataset for this task. Empirical results on the curated DNRTI-JE dataset report F1 scores exceeding 0.93 for NER and 0.98 for RE, claimed to be state-of-the-art over existing methods.
Significance. The release of the DNRTI-JE dataset is a clear positive contribution, as it supplies a standardized, publicly available benchmark for joint entity-relation extraction in the cybersecurity domain where such resources have been scarce. If the reported performance is shown to be robust (via proper splits, ablations, and baselines), the MSLR-plus-expert-features design offers a plausible way to mitigate overlapping relations and feature confusion in technical text, with potential transfer to other specialized domains. The combination of domain-adapted pretraining and explicit expert knowledge injection is a reasonable engineering approach for this application.
major comments (2)
- [Abstract and §5] Abstract and §5 (Experiments): The central SOTA claim rests on F1 > 0.93 (NER) and > 0.98 (RE) on the newly introduced DNRTI-JE dataset, yet no information is supplied on train/test split methodology (report-level vs. instance-level), baseline implementations, statistical significance tests, error bars, or ablation isolating the contribution of MSLR versus expert features versus SecureBERT+. This is load-bearing because the skeptic correctly notes that high RE F1 could arise from selection or leakage artifacts rather than genuine disambiguation of overlapping relations.
- [§3] §3 (Method), MSLR definition: The claim that generating one sequence per entity pair plus expert features 'resolves feature confusion and overlapping relations' lacks a concrete walk-through or counter-example showing how the tagging scheme avoids the very noise propagation and ambiguity problems diagnosed in prior work; without this, it is unclear whether the performance gain is architectural or an artifact of how entity pairs were chosen for sequence construction.
minor comments (2)
- [Abstract] The abstract refers to 'curated DNRTI-JE' without a pointer to the exact curation protocol, annotation guidelines, or inter-annotator agreement statistics; these details belong in §4.1 or an appendix for reproducibility.
- [§3] Notation for the MSLR sequences and the integration of expert features could be formalized with a small equation or pseudocode block to make the distinction from prior tagging schemes (e.g., standard BIO or span-based) immediately clear.
Axiom & Free-Parameter Ledger
free parameters (1)
- Model hyperparameters and fine-tuning settings
Reference graph
Works this paper leans on
-
[1]
Securebert: A domain-specific language model for cybersecu rity
Aghaei, E., Niu, X., Shadid, W., Al-Shaer, E., 2023. Securebert: A domain-specific language model for cybersecu rity. 2https://github.com/imouiche/TIJERE Mouiche and Saad: Preprint submitted to Elsevier Page 12 of 13 Data-driven Approach for Joint Entity and Relation Extract ions Security and Privacy in Communication Networks 462. doi: 10.1109/TrustCom5067...
-
[2]
Cyberentrel: Joint extrac- tion of cyber entities and relations using deep learning
Ahmed, K., Khurshid, S.K., Hina, S., 2024. Cyberentrel: Joint extrac- tion of cyber entities and relations using deep learning. Co mputers & Security 136. doi: https://doi.org/10.1016/j.cose.2023.103579
-
[3]
Joint entity recognition and relation extraction as a multi-head selection problem
Bekoulis, G., Deleu, J., Demeester, T., Develder, C., 20 18. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications 114, 34–45
-
[4]
Secureb ert_plus
Ehsan, A., Niu, X., Waseem, S., Ehab, A.S., 2022. Secureb ert_plus. URL: https://huggingface.co/ehsanaghaei/SecureBERT_Plus
2022
-
[5]
Information extr action of cybersecurity concepts: an lstm approach
Gasmi, H., Laval, J., Bouras, A., 2019. Information extr action of cybersecurity concepts: an lstm approach. Applied Science s 9, 3945. doi:https://doi.org/10.3390/app9193945
-
[6]
Guo, Y., Liu, Z., Huang, C., Liu, J., Jing, W., Wang, Z., Wa ng, Y.,
-
[7]
Cyberrel: Joint entity and relation extraction for cy bersecurity concepts. Information and Communications Security: 23rd I nterna- tional Conference, ICICS 2021, Chongqing, China, November 19-21 , 447–463doi: https://doi.org/10.1007/978-3-030-86890-1_25
-
[8]
Vulcan: Automatic extrac tion and anal- ysis of cyber threat intelligence from unstructured text
Jo, H., Lee, Y., Shin, S., 2022. Vulcan: Automatic extrac tion and anal- ysis of cyber threat intelligence from unstructured text. C omputers & Security 120, 13. doi: 10.1016/j.cose.2022.102763
-
[9]
A novel threat intelligence information extraction system c ombin- ing multiple models
Li, Y., Guo, Y., Fang, C., Liu, Y., Chen, Q., 2022. A novel threat intelligence information extraction system c ombin- ing multiple models. Security and Communication Networks doi: https://doi.org/10.1155/2022/8477260
-
[10]
Bilayer-induced asymmetric quantum Hall effect in epitaxial graphene
Liu, Y., Han, X., Zuo, W., Lv, H., Guo, J., 2024. Cti-je: A j oint extraction framework of entities and relations in unstruct ured cyber threat intelligence. 27th International Conference on Com puter Supported Cooperative Work in Design (CSCWD), Tianjin, Chi na , 2728–2733doi: 10.1109/CSCWD61410.2024.10580210
work page Pith review doi:10.1109/cscwd61410.2024.10580210 2024
-
[11]
Lv, H., Han, X., Cui1, H., Wang, P., Zuo, W., Zhou, Y., 202 4. Joint extraction of entities and relationships from cyber threat intelligence based on task-specific fourier network. 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan , 1– 8doi: 10.1109/IJCNN60899.2024.10650942
-
[12]
Stixnet: A novel and modular solution for extracting all stix objects i n cti reports
Marchiori, F., Conti, M., Verde, N.V ., 2023. Stixnet: A novel and modular solution for extracting all stix objects i n cti reports. ARES 23: Proceedings of the 18th International Conference on Availability, Reliability and Security 2, 1– 11. doi: https://doi.org/10.1145/3600160.3600182
-
[13]
MITRE, 2023. Att&ck. URL: https://attack.mitre.org/versions/v13/
2023
-
[14]
End-to-end relation extrac tion using lstms on sequences and tree structures
Miwa, M., Bansal, M., 2016. End-to-end relation extrac tion using lstms on sequences and tree structures. Proceedings of the 5 4th Annual Meeting of the Association for Computational Lingui stics 1, 1105–1116. doi: https://aclanthology.org/P16-1105
2016
-
[15]
Context-awar e entity- relation extraction pipeline for threat intelligence know ledge graphs
Mouiche, I., Merbouh, H., Saad, S., 2025. Context-awar e entity- relation extraction pipeline for threat intelligence know ledge graphs. TechRxiv doi:10.36227/techrxiv.173627493.39916970/v1
-
[16]
Entity and relation extrac tions for threat intelligence knowledge graphs
Mouiche, I., Saad, S., 2025. Entity and relation extrac tions for threat intelligence knowledge graphs. Computers & Securit y 148. doi:https://doi.org/10.1016/j.cose.2024.104120
-
[17]
Creating cybersecurity knowledge graphs from malware after action reports
Piplai, A., Mittal, S., Joshi, A., Finin, T., Holt, J., Z ak, R., 2020. Creating cybersecurity knowledge graphs from malware after action reports. in IEEE Access , 211691– 211703doi: 10.1109/ACCESS.2020.3039234
-
[18]
Open-cykg: An open cyber t hreat intelligence knowledge graph
Sarhan, I., Spruit, M., 2021. Open-cykg: An open cyber t hreat intelligence knowledge graph. Knowledge-Based Systems 23 3. doi:https://doi.org/10.1016/j.knosys.2021.107524
-
[19]
Uco: A unified cybersecurity ontology
Syed, Z., Padia, A., Finin, T., Mathews, L., Joshi, A., 2 016. Uco: A unified cybersecurity ontology. AAAI Workshop on Artificia l Intelligence for Cyber Security , 1–6
-
[20]
Securebert: A domain-specific language model for cybersecu rity
Wang, X., Liu, X., Ao, S., Li, N., Jiang, Z., Xu, Z., Xiong , Z., Xiong, M., Zhang, X., 2020. Dnrti: A large-scale dataset for named entity recognition in threat intelligence. 2020 IEEE 19th International Conference on Trust, Security, and Privacy i n Com- puting and Communications (TrustCom), Guangzhou, China , 1 842– 1848doi: https://doi.org/10.1109/Tru...
-
[21]
Joint relational tripl e extraction with enhanced representation and binary tagging framework in cy bersecu- rity
Wang, X., Liu, Z., Liu, J., 2024. Joint relational tripl e extraction with enhanced representation and binary tagging framework in cy bersecu- rity. Computers & Security 144
2024
-
[22]
Joint extraction mo del of entity relations based on bert-crf
Wang, Y., Zhang, X., Deng, A., 2022. Joint extraction mo del of entity relations based on bert-crf. 2022 4th International Confer ence on Machine Learning, Big Data and Business Intelligence (MLBD BI), Shanghai, China , 7–12doi: 10.1109/MLBDBI58171.2022.00008
-
[23]
An empirical study of pipe line vs
Yan, Z., Jia, Z., Tu, K., 2022. An empirical study of pipe line vs. joint approaches to entity and relation extraction. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association fo r Com- putational Linguistics and the 12th International Joint Co nference on Natural Language Processing , 437–443
2022
-
[24]
Bilayer-induced asymmetric quantum Hall effect in epitaxial graphene
You, Y., Jiang, Z., Zhang, K., Feng, H., Jiang, J., Yang, P., 2024. Tignet: Joint entity and relation triplets extraction for a pt campaign threat intelligence. 27th International Conference on Com puter Supported Cooperative Work in Design (CSCWD), Tianjin, Chi na , 1687–1694doi: 10.1109/CSCWD61410.2024.10580395
work page Pith review doi:10.1109/cscwd61410.2024.10580395 2024
-
[25]
A relation-specific attention network for joint entity and re lation ex- traction
Yuan, Y., Zhou, X., Pan, S., Zhu, Q., Song, Z., Guo, L., 20 20. A relation-specific attention network for joint entity and re lation ex- traction. In: International Joint Conference on Artificial Intelligence
-
[26]
Association for the Advancement of Artificial Intelli gence (AAAI) , 4054–4060
-
[27]
A knowledge graph for network security
Zhang, H., Xu, Y., Chen, J., Zhou, W., Cheng, L., 2024. A knowledge graph for network security. In Wang, W., Liu, X., N a, Z., Zhang, B. (eds) Communications, Signal Processing, and Sys- tems. CSPS 2023. Lecture Notes in Electrical Engineering 10 32. doi:10.1007/978-981-99-7505-1\_59
-
[28]
Cyber t hreat intelligence modeling based on heterogeneous graph convol utional network
Zhao, J., Yan, Q., Liu1, X., Li, B., Zuo, G., 2020. Cyber t hreat intelligence modeling based on heterogeneous graph convol utional network. In Proceedings of the 23rd international symposiu m on research in attacks, intrusions and defenses (RAID 2020) 1, 241–256
2020
-
[29]
A frustratingly easy approac h for entity and relation extraction
Zhong, Z., Chen, D., 2021. A frustratingly easy approac h for entity and relation extraction. In Proceedings of the 2021 Confere nce of the North American Chapter of the Association for Computati onal Linguistics: Human Language Technologies , 50–61
2021
-
[30]
Itirel: Joint en tity and relation extraction for internet of things threat intelligence
Zhu, F., Cheng, Z., Li, P., Xu, H., 2024. Itirel: Joint en tity and relation extraction for internet of things threat intelligence. in I EEE Internet of Things Journal , 20867–20878doi: 10.1109/JIOT.2024.3373799
-
[31]
An end-to-end en tity and relation joint extraction model for cyber threat intellige nce
Zuo, J., Gao, Y., Li, X., Yuan, J., 2022. An end-to-end en tity and relation joint extraction model for cyber threat intellige nce. 2022 the 7th International Conference on Big Data Analytics (ICB DA) doi:10.1109/ICBDA55095.2022.9760342. Mouiche and Saad: Preprint submitted to Elsevier Page 13 of 13 Data-driven Approach for Joint Entity and Relation Extra...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.