GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction
Pith reviewed 2026-05-20 17:18 UTC · model grok-4.3
The pith
GRID turns CTI text into security knowledge graphs by training on a task bank of multiple-choice questions and regex targets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRID first builds security-domain supervision from CTI articles by creating traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then turns document-to-graph learning into a scripted task bank combining four-option multi-select questions with triple-level regex matching targets, yielding more stable task-specific rewards than repeatedly scoring full graph outputs with an LLM judge. The Task-bank Reward model with the ontology-guided GRID extraction pipeline reaches 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1 on 249 CTI articles from five sources, while the End2End Reward model reaches 76.91% prec,
What carries the argument
The Task-bank Reward model trained on a scripted bank of four-option multi-select questions paired with triple-level regex matching targets derived from offline article-graph alignments.
If this is right
- Task-bank rewards can be built once offline and reused across later post-training runs.
- The Task-bank Reward model achieves the best source-averaged recall and near-top Avg F1 with lower token usage and deployment cost.
- The End2End Reward model with LLM-as-judge rewards reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1.
- Choice-only Reward and End2End SFT without RL produce weaker results than the task-bank approach.
Where Pith is reading between the lines
- Precomputed task banks could be shared across research groups to train security-specific extractors without repeating expensive alignment steps.
- The same offline-to-online reward pattern might apply to knowledge graph construction in other narrow technical domains such as biomedical literature.
- If the alignments remain stable, the method could support iterative refinement where new articles update an existing graph without full retraining.
Load-bearing premise
The graph extraction and knowledge-graph-conditioned text revision steps produce sufficiently accurate and unbiased article-graph alignments to serve as reliable supervision targets for the downstream task-bank training.
What would settle it
Running the Task-bank Reward model on a fresh collection of CTI articles outside the original 249 and finding that its source-averaged recall falls below the End2End model's recall would falsify the performance advantage.
Figures
read the original abstract
Security knowledge graphs can provide computable external memory for security agents, but constructing them from long-form cyber threat intelligence (CTI) remains difficult: LLMs often lack grounded security-domain knowledge, and end-to-end document-to-graph training is hard to supervise with cheap, stable rewards. We present GRID (Graph Representation of Intelligence Data), an end-to-end framework for security text knowledge graph construction. GRID first builds security-domain supervision from CTI articles by creating traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then turns document-to-graph learning into a scripted task bank combining four-option multi-select questions with triple-level regex matching targets, yielding more stable task-specific rewards than repeatedly scoring full graph outputs with an LLM judge. Using this supervision pipeline, we train two Qwen3-4B-Instruct-2507-based 4B extractors: a primary Task-bank Reward model and a secondary End2End Reward model with LLM-as-judge precision/recall rewards. On 249 CTI articles from GRID, CASIE, CTINexus, MalKG, and SecureNLP, the Task-bank Reward model with the ontology-guided GRID extraction pipeline reaches 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1, achieving the best source-averaged recall and near-top Avg F1 with lower token usage and deployment cost. The End2End Reward model reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1. Further analyses show that task-bank rewards can be built once offline and reused across later post-training runs, outperforming online End2End LLM-as-judge reward and weaker alternatives such as Choice-only Reward and End2End SFT without RL.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents GRID, an end-to-end framework for constructing security knowledge graphs from long-form CTI articles. It first generates supervision via graph extraction and knowledge-graph-conditioned text revision to produce traceable article-graph alignments, then converts document-to-graph learning into a task bank of four-option multi-select questions paired with triple-level regex matching. Two Qwen3-4B-based models are trained: a primary Task-bank Reward model and a secondary End2End Reward model using LLM-as-judge rewards. On 249 articles drawn from GRID, CASIE, CTINexus, MalKG, and SecureNLP, the Task-bank model reports 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1 while using fewer tokens; the End2End model reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1. Additional analyses claim that task-bank rewards can be precomputed offline and reused.
Significance. If the pipeline-generated alignments prove reliable, the work supplies a concrete, lower-cost alternative to repeated LLM-as-judge scoring for supervising security KG extraction. The offline reusability of the task-bank rewards and the reporting of source-averaged metrics across five external datasets are practical strengths that could aid reproducibility in the security-AI community.
major comments (1)
- [Abstract] Abstract and supervision pipeline description: the central performance numbers (84.62% precision, 64.91% recall) are obtained by training on targets produced by the paper's own graph extraction plus KG-conditioned text revision steps. No human validation, inter-annotator agreement scores, or error analysis of these alignments is reported, so the metrics may largely reflect consistency with the pipeline rather than independent ground-truth quality.
minor comments (2)
- The abstract states that the Task-bank model achieves 'lower token usage and deployment cost' but supplies no quantitative token counts, latency figures, or cost comparisons against the baselines.
- Details on data splits, baseline implementations, statistical significance testing, and per-source error analysis are absent from the abstract and would strengthen verifiability of the cross-dataset claims.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the supervision pipeline and its implications for the reported metrics. We address the concern point by point below, acknowledging the limitation while clarifying the design rationale and proposed revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract and supervision pipeline description: the central performance numbers (84.62% precision, 64.91% recall) are obtained by training on targets produced by the paper's own graph extraction plus KG-conditioned text revision steps. No human validation, inter-annotator agreement scores, or error analysis of these alignments is reported, so the metrics may largely reflect consistency with the pipeline rather than independent ground-truth quality.
Authors: We agree that the performance numbers measure consistency with targets generated by our graph extraction and KG-conditioned text revision pipeline, rather than against independently human-annotated ground truth. This automated approach was chosen to produce scalable, traceable article-graph alignments without the prohibitive cost of manual security KG annotation. The evaluation uses source-averaged metrics across 249 articles from five external datasets (GRID, CASIE, CTINexus, MalKG, SecureNLP) to demonstrate generalization beyond the training sources. We will revise the manuscript to add an explicit limitations subsection discussing the pipeline-generated supervision, include a preliminary error analysis of alignment quality on a held-out sample, and note the absence of inter-annotator agreement as a point for future human-expert validation. These changes will not alter the core claims regarding task-bank reward stability and offline reusability. revision: partial
Circularity Check
No significant circularity; performance on held-out external datasets
full rationale
The paper generates article-graph alignments via its graph extraction and KG-conditioned revision pipeline to create supervision targets for task-bank training. It then reports precision/recall/F1 for the trained Task-bank Reward model on 249 held-out CTI articles drawn from external sources (CASIE, CTINexus, MalKG, SecureNLP) plus its own GRID collection. No equations, definitions, or self-citations are shown that reduce the reported source-averaged scores to quantities fitted or generated within the same run. The evaluation therefore retains independent empirical content against external benchmarks rather than collapsing to an in-sample consistency check. This warrants a low circularity score; concerns about unvalidated alignment quality belong to correctness risk, not circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can be improved for structured extraction tasks via reinforcement learning with task-specific rewards derived from question answering and pattern matching
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GRID first constructs security-domain supervision from security-related CTI articles in an unsupervised manner by constructing traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then reformulates document-to-graph learning into a scripted task bank that combines four-option multi-select questions with triple-level regex matching targets
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The ontology is not used as a post-hoc label list. Instead, it guides the LLM to recover entity relations that are not directly stated through explicit verbs or relative clauses, by providing typed entity categories, normalized relation families, aliases, and hierarchy constraints.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
URL https://github.com/topoteretes/ cognee. GitHub repository. Conference on Language Modeling. COLM 2026: Call for papers. https://colmweb. org/cfp.html,
work page 2026
- [3]
-
[4]
https://attack.mitre.org/. CrowdStrike. Crowdstrike 2024 global threat report,
work page 2024
-
[5]
Ying Dong, Wenbo Guo, Yueqi Chen, Xinyu Xing, Yuqing Zhang, and Gang Wang
Published: 2025-12-10; accessed: 2026-03-29. Ying Dong, Wenbo Guo, Yueqi Chen, Xinyu Xing, Yuqing Zhang, and Gang Wang. Towards the detection of inconsistencies in public security vulnerability reports. In28th USENIX security symposium (USENIX Security 19), pp. 869–885,
work page 2025
-
[6]
Soft Adaptive Policy Optimization
10 Preprint. Under review. Chang Gao, Chujie Zheng, Xiong-Hui Chen, Kai Dang, Shixuan Liu, Bowen Yu, An Yang, Shuai Bai, Jingren Zhou, and Junyang Lin. Soft adaptive policy optimization.arXiv preprint arXiv:2511.20347,
work page internal anchor Pith review Pith/arXiv arXiv
-
[7]
Peng Gao, Xiaoyuan Liu, Edward Choi, Sibo Ma, Xinyu Yang, Zhengjie Ji, Zilin Zhang, and Dawn Song. Threatkg: A threat knowledge graph for automated open-source cyber threat intelligence gathering and management.arXiv preprint arXiv:2212.10388,
-
[8]
REBEL: Relation extraction by end-to-end lan- guage generation
Pere-Llu´ıs Huguet Cabot and Roberto Navigli. REBEL: Relation extraction by end-to-end lan- guage generation. InFindings of the Association for Computational Linguistics: EMNLP 2021, pp. 2370–2381, Punta Cana, Dominican Republic, November
work page 2021
-
[9]
URL https://aclanthology.org/2021.findings-emnlp
Association for Com- putational Linguistics. URL https://aclanthology.org/2021.findings-emnlp
work page 2021
-
[10]
Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah. Acing the ioc game: Toward automatic discovery and analysis of open-source cyber threat intel- ligence. InProceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp. 755–766,
work page 2016
-
[11]
URL https://aclanthology.org/P17-1143/
doi: 10.18653/v1/P17-1143. URL https://aclanthology.org/P17-1143/. Kai Liu, Yi Wang, Zhaoyun Ding, Aiping Li, and Weiming Zhang. Bvted: A specialized bilingual (chinese–english) dataset for vulnerability triple extraction tasks.Applied Sciences, 14(16):7310,
-
[12]
https://www.gartner.com/doc/2487216/definition- threat-intelligence. Meta. Llama-3.1-8b-instruct. https://huggingface.co/meta-llama/Llama-3. 1-8B-Instruct, July 2024a. Released: 2024-07-23; accessed: 2026-03-29. Meta. Llama-3.2-3b. https://huggingface.co/meta-llama/Llama-3.2-3B , September 2024b. Released: 2024-09-25, accessed: 2026-03-29. Trend Micro. Ra...
-
[13]
Office of the National Cyber Director
https://nvd.nist.gov/. Office of the National Cyber Director. 2024 report on the cy- bersecurity posture of the united states,
work page 2024
- [14]
-
[15]
Peter Phandi, Amila Silva, and Wei Lu. Semeval-2018 task 8: Semantic extraction from cybersecurity reports using natural language processing (SecureNLP). InPro- ceedings of the 12th International Workshop on Semantic Evaluation, pp. 697–706. Associ- ation for Computational Linguistics,
work page 2018
-
[16]
doi: 10.18653/v1/S18-1113. URL https: //aclanthology.org/S18-1113/. Qwen Team. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025a. doi: 10.48550/ arXiv.2505.09388. URLhttps://arxiv.org/abs/2505.09388. Qwen Team. Qwen3-4b-instruct-2507. https://huggingface.co/Qwen/ Qwen3-4B-Instruct-2507, 2025b. Accessed: 2026-03-29. Preston Rasmussen, Pavlo Pal...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.18653/v1/s18-1113 2026
-
[17]
URL https://arxiv.org/ abs/2102.05571. Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Nandana Mihindukulasooriya, Owen Cornec, and Alfio Massimiliano Gliozzo. Knowgl: Knowledge generation and linking from text. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 16476–16478,
-
[18]
HybridFlow: A Flexible and Efficient RLHF Framework
https://www.recordedfuture.com/. Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256,
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
arXiv preprint arXiv:2307.07697 , year=
Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Yeyun Gong, Lionel M Ni, Heung-Yeung Shum, and Jian Guo. Think-on-graph: Deep and respon- sible reasoning of large language model on knowledge graph, 2024.URL https://arxiv. org/abs/2307.07697, 2023a. Siqi Sun, Cheng Huang, Tiejun Wu, and Yi Shen. Sectkg: A knowledge graph for open-source ...
-
[20]
http://www.nytimes.com/2014/02/27/business/target-reports-on-fourth-quarter- earnings.html? r=1. Unit42 by Palo Alto Networks. How the eitest campaigns path to angler ek evolved over time,
work page 2014
- [21]
-
[22]
Automated attack knowledge graph construction with large language models
Zhihua Wang, Siyuan Fei, Youlin Hu, Dacheng Shan, Shitao Xiao, Lizhao You, and Peijun Chen. Automated attack knowledge graph construction with large language models. In Proceedings of the 2025 2nd International Conference on Computer and Multimedia Technology, pp. 700–706,
work page 2025
-
[23]
Yao-Ching Yu, Tsun-Han Chiang, Cheng-Wei Tsai, Chien-Ming Huang, and Wen-Kwang Tsao. Primus: A pioneering collection of open-source datasets for cybersecurity llm training.arXiv preprint arXiv:2502.11191,
-
[24]
14 Preprint. Under review. Table 6: Entity types in the GRID ontology user-account identity threat-actor- or-intrusion-set campaign malware hacker-tool general-software security-product detailed-part-of- malware-or- hackertool detailed-part-of- general-software attack-pattern vulnerability file process windows- registry-key course-of-action url domain-nam...
work page 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.