Recognition: unknown
Automatic Labeling for Entity Extraction in Cyber Security
read the original abstract
Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus ($\sim$750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
What Are Adversaries Doing? Automating Tactics, Techniques, and Procedures Extraction: A Systematic Review
Systematic review of 80 papers shows TTP extraction shifting to transformer and LLM methods but limited by narrow datasets, single-label focus, and low reproducibility.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.