GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction

Fei Shao; Liangyi Huang; Mengshi Zhang; Shang Ma; Xusheng Xiao; Yanfang Ye; Zichen Liu; Zihao Chen

arxiv: 2605.16714 · v1 · pith:QMJXUZGWnew · submitted 2026-05-15 · 💻 cs.AI · cs.CR

GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction

Liangyi Huang , Zichen Liu , Fei Shao , Shang Ma , Mengshi Zhang , Zihao Chen , Yanfang Ye , Xusheng Xiao This is my paper

Pith reviewed 2026-05-20 17:18 UTC · model grok-4.3

classification 💻 cs.AI cs.CR

keywords security knowledge graphscyber threat intelligenceknowledge graph constructiontask bank rewardsLLM graph extractionCTI analysisreinforcement learning for extraction

0 comments

The pith

GRID turns CTI text into security knowledge graphs by training on a task bank of multiple-choice questions and regex targets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GRID as a way to build computable security knowledge graphs from long cyber threat intelligence articles. It first creates traceable alignments between texts and graphs through extraction followed by knowledge-graph-conditioned revision. These alignments become supervision for a task bank that converts the full document-to-graph problem into four-option multi-select questions with triple-level regex matching targets. Models trained with the resulting task-bank rewards reach higher source-averaged recall and comparable F1 scores than end-to-end LLM judges while using fewer tokens. The approach also allows the rewards to be precomputed once and reused across multiple training runs.

Core claim

GRID first builds security-domain supervision from CTI articles by creating traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then turns document-to-graph learning into a scripted task bank combining four-option multi-select questions with triple-level regex matching targets, yielding more stable task-specific rewards than repeatedly scoring full graph outputs with an LLM judge. The Task-bank Reward model with the ontology-guided GRID extraction pipeline reaches 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1 on 249 CTI articles from five sources, while the End2End Reward model reaches 76.91% prec,

What carries the argument

The Task-bank Reward model trained on a scripted bank of four-option multi-select questions paired with triple-level regex matching targets derived from offline article-graph alignments.

If this is right

Task-bank rewards can be built once offline and reused across later post-training runs.
The Task-bank Reward model achieves the best source-averaged recall and near-top Avg F1 with lower token usage and deployment cost.
The End2End Reward model with LLM-as-judge rewards reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1.
Choice-only Reward and End2End SFT without RL produce weaker results than the task-bank approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Precomputed task banks could be shared across research groups to train security-specific extractors without repeating expensive alignment steps.
The same offline-to-online reward pattern might apply to knowledge graph construction in other narrow technical domains such as biomedical literature.
If the alignments remain stable, the method could support iterative refinement where new articles update an existing graph without full retraining.

Load-bearing premise

The graph extraction and knowledge-graph-conditioned text revision steps produce sufficiently accurate and unbiased article-graph alignments to serve as reliable supervision targets for the downstream task-bank training.

What would settle it

Running the Task-bank Reward model on a fresh collection of CTI articles outside the original 249 and finding that its source-averaged recall falls below the End2End model's recall would falsify the performance advantage.

Figures

Figures reproduced from arXiv: 2605.16714 by Fei Shao, Liangyi Huang, Mengshi Zhang, Shang Ma, Xusheng Xiao, Yanfang Ye, Zichen Liu, Zihao Chen.

**Figure 2.** Figure 2: Overview of GRID Algorithm 1: Automatic Annotation of Article-Graph Alignment Data Input: Raw CTI article a, traceable extraction prompt Ptrace, revision prompt Prev Output: Article-graph alignment (a ′ , G ′ ) consisting of revised article a ′ and text-grounded knowledge graph G ′ 1 G ← LLMExtract(a, Ptrace) 2 Parse G into entity list E and relation list R 3 foreach r ∈ R do 4 keep sentence-local subject/… view at source ↗

**Figure 3.** Figure 3: RQ3 ablation curves up to 225 steps. Left: RL training reward (first-order EMA, [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Prompt blocks for the GRID extractor C Judge Rules for Automatic Evaluation The automatic evaluator scores precision and recall under an explicit rule set rather than unconstrained semantic similarity [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

read the original abstract

Security knowledge graphs can provide computable external memory for security agents, but constructing them from long-form cyber threat intelligence (CTI) remains difficult: LLMs often lack grounded security-domain knowledge, and end-to-end document-to-graph training is hard to supervise with cheap, stable rewards. We present GRID (Graph Representation of Intelligence Data), an end-to-end framework for security text knowledge graph construction. GRID first builds security-domain supervision from CTI articles by creating traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then turns document-to-graph learning into a scripted task bank combining four-option multi-select questions with triple-level regex matching targets, yielding more stable task-specific rewards than repeatedly scoring full graph outputs with an LLM judge. Using this supervision pipeline, we train two Qwen3-4B-Instruct-2507-based 4B extractors: a primary Task-bank Reward model and a secondary End2End Reward model with LLM-as-judge precision/recall rewards. On 249 CTI articles from GRID, CASIE, CTINexus, MalKG, and SecureNLP, the Task-bank Reward model with the ontology-guided GRID extraction pipeline reaches 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1, achieving the best source-averaged recall and near-top Avg F1 with lower token usage and deployment cost. The End2End Reward model reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1. Further analyses show that task-bank rewards can be built once offline and reused across later post-training runs, outperforming online End2End LLM-as-judge reward and weaker alternatives such as Choice-only Reward and End2End SFT without RL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GRID's task-bank supervision for security KG extraction gives better recall than end-to-end LLM judging but rests on unvalidated pipeline alignments.

read the letter

The main thing to know is that this paper introduces a supervision pipeline for building security knowledge graphs from threat reports. It creates article-graph alignments through extraction and revision steps, then uses those to train via a bank of four-option questions and regex matching instead of scoring entire graphs with an LLM. This is new in how it combines the traceable alignments with a scripted, reusable task bank for rewards. The paper does well in showing concrete results across multiple sources like CASIE and MalKG. The task-bank model hits 84.62% average precision and 64.91% recall on 249 articles, beating their end-to-end reward model on recall while using fewer tokens. The soft spots are around the quality of the supervision itself. The alignments come from the authors' own graph extraction and KG-conditioned revision, with no mention of human validation or agreement metrics. That means the reported performance could largely reflect how well the model learns to copy the pipeline's outputs rather than extract accurate triples from the text. Without error analysis on those targets, it's hard to know how much the numbers overstate real-world extraction quality. This work is for people in cybersecurity NLP who want practical tools to turn unstructured reports into computable graphs. Readers focused on reward design for LLM post-training or domain-specific extraction would get the most out of the comparisons and the offline task bank idea. It deserves a serious referee because the method is clearly described and the experiments cover several datasets with direct comparisons. The central claim about more stable rewards holds up as a practical alternative, even if the absolute accuracy depends on the initial alignments. I recommend sending this to peer review. The idea is solid enough to benefit from detailed feedback on the validation steps and implementation details.

Referee Report

1 major / 2 minor

Summary. The manuscript presents GRID, an end-to-end framework for constructing security knowledge graphs from long-form CTI articles. It first generates supervision via graph extraction and knowledge-graph-conditioned text revision to produce traceable article-graph alignments, then converts document-to-graph learning into a task bank of four-option multi-select questions paired with triple-level regex matching. Two Qwen3-4B-based models are trained: a primary Task-bank Reward model and a secondary End2End Reward model using LLM-as-judge rewards. On 249 articles drawn from GRID, CASIE, CTINexus, MalKG, and SecureNLP, the Task-bank model reports 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1 while using fewer tokens; the End2End model reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1. Additional analyses claim that task-bank rewards can be precomputed offline and reused.

Significance. If the pipeline-generated alignments prove reliable, the work supplies a concrete, lower-cost alternative to repeated LLM-as-judge scoring for supervising security KG extraction. The offline reusability of the task-bank rewards and the reporting of source-averaged metrics across five external datasets are practical strengths that could aid reproducibility in the security-AI community.

major comments (1)

[Abstract] Abstract and supervision pipeline description: the central performance numbers (84.62% precision, 64.91% recall) are obtained by training on targets produced by the paper's own graph extraction plus KG-conditioned text revision steps. No human validation, inter-annotator agreement scores, or error analysis of these alignments is reported, so the metrics may largely reflect consistency with the pipeline rather than independent ground-truth quality.

minor comments (2)

The abstract states that the Task-bank model achieves 'lower token usage and deployment cost' but supplies no quantitative token counts, latency figures, or cost comparisons against the baselines.
Details on data splits, baseline implementations, statistical significance testing, and per-source error analysis are absent from the abstract and would strengthen verifiability of the cross-dataset claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the supervision pipeline and its implications for the reported metrics. We address the concern point by point below, acknowledging the limitation while clarifying the design rationale and proposed revisions.

read point-by-point responses

Referee: [Abstract] Abstract and supervision pipeline description: the central performance numbers (84.62% precision, 64.91% recall) are obtained by training on targets produced by the paper's own graph extraction plus KG-conditioned text revision steps. No human validation, inter-annotator agreement scores, or error analysis of these alignments is reported, so the metrics may largely reflect consistency with the pipeline rather than independent ground-truth quality.

Authors: We agree that the performance numbers measure consistency with targets generated by our graph extraction and KG-conditioned text revision pipeline, rather than against independently human-annotated ground truth. This automated approach was chosen to produce scalable, traceable article-graph alignments without the prohibitive cost of manual security KG annotation. The evaluation uses source-averaged metrics across 249 articles from five external datasets (GRID, CASIE, CTINexus, MalKG, SecureNLP) to demonstrate generalization beyond the training sources. We will revise the manuscript to add an explicit limitations subsection discussing the pipeline-generated supervision, include a preliminary error analysis of alignment quality on a held-out sample, and note the absence of inter-annotator agreement as a point for future human-expert validation. These changes will not alter the core claims regarding task-bank reward stability and offline reusability. revision: partial

Circularity Check

0 steps flagged

No significant circularity; performance on held-out external datasets

full rationale

The paper generates article-graph alignments via its graph extraction and KG-conditioned revision pipeline to create supervision targets for task-bank training. It then reports precision/recall/F1 for the trained Task-bank Reward model on 249 held-out CTI articles drawn from external sources (CASIE, CTINexus, MalKG, SecureNLP) plus its own GRID collection. No equations, definitions, or self-citations are shown that reduce the reported source-averaged scores to quantities fitted or generated within the same run. The evaluation therefore retains independent empirical content against external benchmarks rather than collapsing to an in-sample consistency check. This warrants a low circularity score; concerns about unvalidated alignment quality belong to correctness risk, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard assumptions about LLM fine-tuning and the availability of an ontology for guidance; no free parameters or new entities are explicitly introduced in the abstract.

axioms (1)

domain assumption LLMs can be improved for structured extraction tasks via reinforcement learning with task-specific rewards derived from question answering and pattern matching
Invoked when the paper states that the task-bank approach yields more stable rewards than LLM-as-judge scoring.

pith-pipeline@v0.9.0 · 5881 in / 1225 out tokens · 93099 ms · 2026-05-20T17:18:32.239499+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GRID first constructs security-domain supervision from security-related CTI articles in an unsupervised manner by constructing traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then reformulates document-to-graph learning into a scripted task bank that combines four-option multi-select questions with triple-level regex matching targets
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The ontology is not used as a post-hoc label list. Instead, it guides the LLM to recover entity relations that are not directly stated through explicit verbs or relative clauses, by providing typed entity categories, normalized relation families, aliases, and hierarchy constraints.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · 4 internal anchors

[1]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

GitHub repository

URL https://github.com/topoteretes/ cognee. GitHub repository. Conference on Language Modeling. COLM 2026: Call for papers. https://colmweb. org/cfp.html,

work page 2026
[3]

The MITRE Corporation

Accessed: 2026-03-29. The MITRE Corporation. Mitre att&ck,

work page 2026
[4]

CrowdStrike

https://attack.mitre.org/. CrowdStrike. Crowdstrike 2024 global threat report,

work page 2024
[5]

Ying Dong, Wenbo Guo, Yueqi Chen, Xinyu Xing, Yuqing Zhang, and Gang Wang

Published: 2025-12-10; accessed: 2026-03-29. Ying Dong, Wenbo Guo, Yueqi Chen, Xinyu Xing, Yuqing Zhang, and Gang Wang. Towards the detection of inconsistencies in public security vulnerability reports. In28th USENIX security symposium (USENIX Security 19), pp. 869–885,

work page 2025
[6]

Soft Adaptive Policy Optimization

10 Preprint. Under review. Chang Gao, Chujie Zheng, Xiong-Hui Chen, Kai Dang, Shixuan Liu, Bowen Yu, An Yang, Shuai Bai, Jingren Zhou, and Junyang Lin. Soft adaptive policy optimization.arXiv preprint arXiv:2511.20347,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Threatkg: A threat knowledge graph for automated open-source cyber threat intelligence gathering and management,

Peng Gao, Xiaoyuan Liu, Edward Choi, Sibo Ma, Xinyu Yang, Zhengjie Ji, Zilin Zhang, and Dawn Song. Threatkg: A threat knowledge graph for automated open-source cyber threat intelligence gathering and management.arXiv preprint arXiv:2212.10388,

work page arXiv
[8]

REBEL: Relation extraction by end-to-end lan- guage generation

Pere-Llu´ıs Huguet Cabot and Roberto Navigli. REBEL: Relation extraction by end-to-end lan- guage generation. InFindings of the Association for Computational Linguistics: EMNLP 2021, pp. 2370–2381, Punta Cana, Dominican Republic, November

work page 2021
[9]

URL https://aclanthology.org/2021.findings-emnlp

Association for Com- putational Linguistics. URL https://aclanthology.org/2021.findings-emnlp

work page 2021
[10]

Acing the ioc game: Toward automatic discovery and analysis of open-source cyber threat intel- ligence

Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah. Acing the ioc game: Toward automatic discovery and analysis of open-source cyber threat intel- ligence. InProceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 755–766,

work page 2016
[11]

URL https://aclanthology.org/P17-1143/

doi: 10.18653/v1/P17-1143. URL https://aclanthology.org/P17-1143/. Kai Liu, Yi Wang, Zhaoyun Ding, Aiping Li, and Weiming Zhang. Bvted: A specialized bilingual (chinese–english) dataset for vulnerability triple extraction tasks.Applied Sciences, 14(16):7310,

work page doi:10.18653/v1/p17-1143
[12]

https://www.gartner.com/doc/2487216/definition- threat-intelligence. Meta. Llama-3.1-8b-instruct. https://huggingface.co/meta-llama/Llama-3. 1-8B-Instruct, July 2024a. Released: 2024-07-23; accessed: 2026-03-29. Meta. Llama-3.2-3b. https://huggingface.co/meta-llama/Llama-3.2-3B , September 2024b. Released: 2024-09-25, accessed: 2026-03-29. Trend Micro. Ra...

work page arXiv 2024
[13]

Office of the National Cyber Director

https://nvd.nist.gov/. Office of the National Cyber Director. 2024 report on the cy- bersecurity posture of the united states,

work page 2024
[14]

URL https: //www.whitehouse.gov/wp-content/uploads/2024/05/ 2024-Report-on-the-Cybersecurity-Posture-of-the-United-States. pdf. OpenAI. Chatgpt: Applications, opportunities, and threats.arXiv preprint arXiv:2304.09103,

work page arXiv 2024
[15]

Semeval-2018 task 8: Semantic extraction from cybersecurity reports using natural language processing (SecureNLP)

Peter Phandi, Amila Silva, and Wei Lu. Semeval-2018 task 8: Semantic extraction from cybersecurity reports using natural language processing (SecureNLP). InPro- ceedings of the 12th International Workshop on Semantic Evaluation, pp. 697–706. Associ- ation for Computational Linguistics,

work page 2018
[16]

Qwen3 Technical Report

doi: 10.18653/v1/S18-1113. URL https: //aclanthology.org/S18-1113/. Qwen Team. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025a. doi: 10.48550/ arXiv.2505.09388. URLhttps://arxiv.org/abs/2505.09388. Qwen Team. Qwen3-4b-instruct-2507. https://huggingface.co/Qwen/ Qwen3-4B-Instruct-2507, 2025b. Accessed: 2026-03-29. Preston Rasmussen, Pavlo Pal...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/s18-1113 2026
[17]

Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, Owen Cornec, and Alfio Massimiliano Gliozzo

URL https://arxiv.org/ abs/2102.05571. Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, Owen Cornec, and Alfio Massimiliano Gliozzo. Knowgl: Knowledge generation and linking from text. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 16476–16478,

work page arXiv
[18]

HybridFlow: A Flexible and Efficient RLHF Framework

https://www.recordedfuture.com/. Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

arXiv preprint arXiv:2307.07697 , year=

Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel M Ni, Heung-Yeung Shum, and Jian Guo. Think-on-graph: Deep and respon- sible reasoning of large language model on knowledge graph, 2024.URL https://arxiv. org/abs/2307.07697, 2023a. Siqi Sun, Cheng Huang, Tiejun Wu, and Yi Shen. Sectkg: A knowledge graph for open-source ...

work page arXiv 2024
[20]

Unit42 by Palo Alto Networks

http://www.nytimes.com/2014/02/27/business/target-reports-on-fourth-quarter- earnings.html? r=1. Unit42 by Palo Alto Networks. How the eitest campaigns path to angler ek evolved over time,

work page 2014
[21]

Y. Wang, Y. Zhang, Y. Li, and X. Liu. A bibliometric review of large language models research from 2017 to 2023.arXiv preprint arXiv:2304.02020,

work page arXiv 2017
[22]

Automated attack knowledge graph construction with large language models

Zhihua Wang, Siyuan Fei, Youlin Hu, Dacheng Shan, Shitao Xiao, Lizhao You, and Peijun Chen. Automated attack knowledge graph construction with large language models. In Proceedings of the 2025 2nd International Conference on Computer and Multimedia Technology, pp. 700–706,

work page 2025
[23]

Primus: A pioneering collection of open-source datasets for cybersecurity llm training.arXiv preprint arXiv:2502.11191,

Yao-Ching Yu, Tsun-Han Chiang, Cheng-Wei Tsai, Chien-Ming Huang, and Wen-Kwang Tsao. Primus: A pioneering collection of open-source datasets for cybersecurity llm training.arXiv preprint arXiv:2502.11191,

work page arXiv
[24]

Under review

14 Preprint. Under review. Table 6: Entity types in the GRID ontology user-account identity threat-actor- or-intrusion-set campaign malware hacker-tool general-software security-product detailed-part-of- malware-or- hackertool detailed-part-of- general-software attack-pattern vulnerability file process windows- registry-key course-of-action url domain-nam...

work page 2026

[1] [1]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

GitHub repository

URL https://github.com/topoteretes/ cognee. GitHub repository. Conference on Language Modeling. COLM 2026: Call for papers. https://colmweb. org/cfp.html,

work page 2026

[3] [3]

The MITRE Corporation

Accessed: 2026-03-29. The MITRE Corporation. Mitre att&ck,

work page 2026

[4] [4]

CrowdStrike

https://attack.mitre.org/. CrowdStrike. Crowdstrike 2024 global threat report,

work page 2024

[5] [5]

Ying Dong, Wenbo Guo, Yueqi Chen, Xinyu Xing, Yuqing Zhang, and Gang Wang

Published: 2025-12-10; accessed: 2026-03-29. Ying Dong, Wenbo Guo, Yueqi Chen, Xinyu Xing, Yuqing Zhang, and Gang Wang. Towards the detection of inconsistencies in public security vulnerability reports. In28th USENIX security symposium (USENIX Security 19), pp. 869–885,

work page 2025

[6] [6]

Soft Adaptive Policy Optimization

10 Preprint. Under review. Chang Gao, Chujie Zheng, Xiong-Hui Chen, Kai Dang, Shixuan Liu, Bowen Yu, An Yang, Shuai Bai, Jingren Zhou, and Junyang Lin. Soft adaptive policy optimization.arXiv preprint arXiv:2511.20347,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Threatkg: A threat knowledge graph for automated open-source cyber threat intelligence gathering and management,

Peng Gao, Xiaoyuan Liu, Edward Choi, Sibo Ma, Xinyu Yang, Zhengjie Ji, Zilin Zhang, and Dawn Song. Threatkg: A threat knowledge graph for automated open-source cyber threat intelligence gathering and management.arXiv preprint arXiv:2212.10388,

work page arXiv

[8] [8]

REBEL: Relation extraction by end-to-end lan- guage generation

Pere-Llu´ıs Huguet Cabot and Roberto Navigli. REBEL: Relation extraction by end-to-end lan- guage generation. InFindings of the Association for Computational Linguistics: EMNLP 2021, pp. 2370–2381, Punta Cana, Dominican Republic, November

work page 2021

[9] [9]

URL https://aclanthology.org/2021.findings-emnlp

Association for Com- putational Linguistics. URL https://aclanthology.org/2021.findings-emnlp

work page 2021

[10] [10]

Acing the ioc game: Toward automatic discovery and analysis of open-source cyber threat intel- ligence

Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah. Acing the ioc game: Toward automatic discovery and analysis of open-source cyber threat intel- ligence. InProceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 755–766,

work page 2016

[11] [11]

URL https://aclanthology.org/P17-1143/

doi: 10.18653/v1/P17-1143. URL https://aclanthology.org/P17-1143/. Kai Liu, Yi Wang, Zhaoyun Ding, Aiping Li, and Weiming Zhang. Bvted: A specialized bilingual (chinese–english) dataset for vulnerability triple extraction tasks.Applied Sciences, 14(16):7310,

work page doi:10.18653/v1/p17-1143

[12] [12]

https://www.gartner.com/doc/2487216/definition- threat-intelligence. Meta. Llama-3.1-8b-instruct. https://huggingface.co/meta-llama/Llama-3. 1-8B-Instruct, July 2024a. Released: 2024-07-23; accessed: 2026-03-29. Meta. Llama-3.2-3b. https://huggingface.co/meta-llama/Llama-3.2-3B , September 2024b. Released: 2024-09-25, accessed: 2026-03-29. Trend Micro. Ra...

work page arXiv 2024

[13] [13]

Office of the National Cyber Director

https://nvd.nist.gov/. Office of the National Cyber Director. 2024 report on the cy- bersecurity posture of the united states,

work page 2024

[14] [14]

URL https: //www.whitehouse.gov/wp-content/uploads/2024/05/ 2024-Report-on-the-Cybersecurity-Posture-of-the-United-States. pdf. OpenAI. Chatgpt: Applications, opportunities, and threats.arXiv preprint arXiv:2304.09103,

work page arXiv 2024

[15] [15]

Semeval-2018 task 8: Semantic extraction from cybersecurity reports using natural language processing (SecureNLP)

Peter Phandi, Amila Silva, and Wei Lu. Semeval-2018 task 8: Semantic extraction from cybersecurity reports using natural language processing (SecureNLP). InPro- ceedings of the 12th International Workshop on Semantic Evaluation, pp. 697–706. Associ- ation for Computational Linguistics,

work page 2018

[16] [16]

Qwen3 Technical Report

doi: 10.18653/v1/S18-1113. URL https: //aclanthology.org/S18-1113/. Qwen Team. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025a. doi: 10.48550/ arXiv.2505.09388. URLhttps://arxiv.org/abs/2505.09388. Qwen Team. Qwen3-4b-instruct-2507. https://huggingface.co/Qwen/ Qwen3-4B-Instruct-2507, 2025b. Accessed: 2026-03-29. Preston Rasmussen, Pavlo Pal...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/s18-1113 2026

[17] [17]

Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, Owen Cornec, and Alfio Massimiliano Gliozzo

URL https://arxiv.org/ abs/2102.05571. Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, Owen Cornec, and Alfio Massimiliano Gliozzo. Knowgl: Knowledge generation and linking from text. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 16476–16478,

work page arXiv

[18] [18]

HybridFlow: A Flexible and Efficient RLHF Framework

https://www.recordedfuture.com/. Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256,

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

arXiv preprint arXiv:2307.07697 , year=

Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel M Ni, Heung-Yeung Shum, and Jian Guo. Think-on-graph: Deep and respon- sible reasoning of large language model on knowledge graph, 2024.URL https://arxiv. org/abs/2307.07697, 2023a. Siqi Sun, Cheng Huang, Tiejun Wu, and Yi Shen. Sectkg: A knowledge graph for open-source ...

work page arXiv 2024

[20] [20]

Unit42 by Palo Alto Networks

http://www.nytimes.com/2014/02/27/business/target-reports-on-fourth-quarter- earnings.html? r=1. Unit42 by Palo Alto Networks. How the eitest campaigns path to angler ek evolved over time,

work page 2014

[21] [21]

Y. Wang, Y. Zhang, Y. Li, and X. Liu. A bibliometric review of large language models research from 2017 to 2023.arXiv preprint arXiv:2304.02020,

work page arXiv 2017

[22] [22]

Automated attack knowledge graph construction with large language models

Zhihua Wang, Siyuan Fei, Youlin Hu, Dacheng Shan, Shitao Xiao, Lizhao You, and Peijun Chen. Automated attack knowledge graph construction with large language models. In Proceedings of the 2025 2nd International Conference on Computer and Multimedia Technology, pp. 700–706,

work page 2025

[23] [23]

Primus: A pioneering collection of open-source datasets for cybersecurity llm training.arXiv preprint arXiv:2502.11191,

Yao-Ching Yu, Tsun-Han Chiang, Cheng-Wei Tsai, Chien-Ming Huang, and Wen-Kwang Tsao. Primus: A pioneering collection of open-source datasets for cybersecurity llm training.arXiv preprint arXiv:2502.11191,

work page arXiv

[24] [24]

Under review

14 Preprint. Under review. Table 6: Entity types in the GRID ontology user-account identity threat-actor- or-intrusion-set campaign malware hacker-tool general-software security-product detailed-part-of- malware-or- hackertool detailed-part-of- general-software attack-pattern vulnerability file process windows- registry-key course-of-action url domain-nam...

work page 2026