ThreatCore: A Benchmark for Explicit and Implicit Threat Detection

Carlo Bardazzi; Davide Bruni; Maurizio Tesconi

arxiv: 2605.10563 · v1 · submitted 2026-05-11 · 💻 cs.CL · cs.AI

ThreatCore: A Benchmark for Explicit and Implicit Threat Detection

Davide Bruni , Carlo Bardazzi , Maurizio Tesconi This is my paper

Pith reviewed 2026-05-12 05:23 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords threat detectionimplicit threatsexplicit threatsbenchmarksemantic role labelingnatural language processingtoxicity

0 comments

The pith

ThreatCore benchmark reveals implicit threats are substantially harder for models to detect than explicit threats.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ThreatCore as a benchmark dataset for fine-grained threat detection in natural language, distinguishing between explicit threats, implicit threats, and non-threats. It aggregates existing datasets and re-annotates them using a unified definition of threat, while adding synthetic examples to better cover implicit cases. Evaluations of various models including Perspective API and language models demonstrate that implicit threats pose greater challenges. Incorporating Semantic Role Labeling as an intermediate representation improves performance by clarifying the structure of harmful intent. This provides a more consistent foundation for studying threat detection compared to conflating it with toxicity or hate speech.

Core claim

ThreatCore is constructed by aggregating multiple publicly available resources and systematically re-annotating them under a unified operational definition of threat, revealing substantial inconsistencies across existing labels. Synthetic examples are added and validated to cover underrepresented implicit threats. Evaluations show that implicit threats remain substantially harder to detect than explicit ones, and that incorporating Semantic Role Labeling as an intermediate representation can improve performance by making the structure of harmful intent more explicit.

What carries the argument

The ThreatCore dataset with its unified annotation for explicit and implicit threats, using Semantic Role Labeling to make harmful intent structure explicit.

If this is right

Implicit threats are substantially harder to detect than explicit threats for current models.
Incorporating Semantic Role Labeling improves threat detection performance.
Existing datasets show substantial inconsistencies when re-annotated under a single threat definition.
Threat detection should be treated separately from broader categories like toxicity and hate speech.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Moderation tools could benefit from pipelines that first apply semantic role labeling before threat classification.
The benchmark could be used to develop specialized models for detecting veiled threats in online discourse.
Similar re-annotation approaches might resolve inconsistencies in other NLP safety tasks like detecting misinformation or manipulation.
Extending the dataset with more diverse synthetic examples could further test model robustness to indirect language.

Load-bearing premise

A single unified operational definition of threat can be applied consistently across diverse datasets and synthetic examples without introducing systematic bias.

What would settle it

If multiple independent teams re-annotate the same source datasets using the unified definition and obtain significantly different labels, or if models using Semantic Role Labeling show no improvement on implicit threat detection in replication experiments.

Figures

Figures reproduced from arXiv: 2605.10563 by Carlo Bardazzi, Davide Bruni, Maurizio Tesconi.

**Figure 3.** Figure 3: Process of using Semantic Role Labeling with LLMs. As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Threat detection in Natural Language Processing lacks consistent definitions and standardized benchmarks, and is often conflated with broader phenomena such as toxicity, hate speech, or offensive language. In this work, we introduce ThreatCore, a public available benchmark dataset for fine-grained threat detection that distinguishes between explicit threats, implicit threats, and non-threats. The dataset is constructed by aggregating multiple publicly available resources and systematically re-annotating them under a unified operational definition of threat, revealing substantial inconsistencies across existing labels. To improve the coverage of underrepresented cases, particularly implicit threats, we further augment the dataset with synthetic examples, which are manually validated using the same annotation protocol adopted for the re-annotation of the public datasets, ensuring consistency across all data sources. We evaluate Perspective API, zero-shot classifiers, and recent language models on ThreatCore, showing that implicit threats remain substantially harder to detect than explicit ones. Our results also indicate that incorporating Semantic Role Labeling as an intermediate representation can improve performance by making the structure of harmful intent more explicit. Overall, ThreatCore provides a more consistent benchmark for studying fine-grained threat detection and highlights the challenges that current models still face in identifying indirect expressions of harmful intent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ThreatCore gives a practical new benchmark for explicit versus implicit threats but the re-annotation step leaves the main performance gap claim on shaky ground without more validation numbers.

read the letter

The paper's core contribution is ThreatCore, built by pulling several public datasets together, re-labeling them all under one operational definition of threat, and adding synthetic implicit examples that get checked with the same rules. They run Perspective API, zero-shot classifiers, and recent language models on it and report that implicit threats stay substantially harder to catch, with semantic role labeling helping by surfacing the intent structure more clearly. They also note that original labels across sources were inconsistent, which is a real issue in this corner of NLP safety work. That unified re-annotation plus targeted synthetic fill-in is the actual new piece, and releasing the whole thing publicly is the practical upside. It gives people working on content moderation or model evaluation a single place to test fine-grained threat distinctions instead of mixing toxicity or hate-speech proxies. The evaluation setup is straightforward and covers a decent range of current tools. The soft spot is exactly the one the stress-test note flags. Implicit threats are definitionally fuzzy, so forcing a single definition across old datasets risks shifting the boundary in ways that exaggerate the explicit-implicit gap rather than measuring it cleanly. The abstract mentions the inconsistencies but gives no inter-annotator agreement figures, no breakdown of how many labels changed, and no error bars or concrete metrics on the performance difference. The synthetic examples are manually validated under the same protocol, yet again without numbers on rejection rates or edge-case handling. If those details are missing or weak in the full paper, the headline result stays hard to trust. This is for researchers who need a testbed that splits threat types rather than lumping everything under toxicity. A reader building or auditing safety classifiers would find the dataset useful once the annotation quality is shown to be solid. It deserves peer review because a cleaned-up benchmark in this area would help, but the referees should press hard on the re-annotation validation and the size of the reported gaps before accepting the claims at face value.

Referee Report

3 major / 2 minor

Summary. The paper introduces ThreatCore, a publicly available benchmark for fine-grained threat detection that distinguishes explicit threats, implicit threats, and non-threats. It is constructed by aggregating multiple existing public datasets, re-annotating them under a single unified operational definition of threat (which reveals inconsistencies in the original labels), and augmenting with synthetic examples that are manually validated using the same protocol. Evaluations of Perspective API, zero-shot classifiers, and recent language models on this benchmark show that implicit threats remain substantially harder to detect than explicit ones, while incorporating Semantic Role Labeling as an intermediate representation improves performance by making the structure of harmful intent more explicit.

Significance. If the dataset construction proves reliable, ThreatCore would address a clear gap in NLP by providing a standardized, fine-grained benchmark that avoids conflating threats with toxicity or hate speech. The empirical results on the relative difficulty of implicit threats and the utility of SRL as an intermediate step offer concrete directions for improving detection systems. Public release of the benchmark is a positive contribution that could enable reproducible follow-up work.

major comments (3)

[Section 3] Dataset construction (Section 3): The central claim that ThreatCore is a consistent gold standard rests on the re-annotation process under a unified definition. However, while the abstract notes original-label inconsistencies, no quantitative evidence such as inter-annotator agreement scores, label-transition statistics, or agreement between original and new annotations is reported. This is load-bearing because implicit threats are definitionally ambiguous; without these metrics it is impossible to verify that the unified protocol eliminates rather than relocates systematic biases.
[Section 4] Experiments and results (Section 4): The headline findings—that implicit threats are substantially harder to detect and that SRL improves performance—are presented without specific metrics, confidence intervals, statistical significance tests, or ablation details in the abstract and high-level description. This prevents assessment of whether the performance gap is robust or an artifact of the re-annotation choices.
[Section 3.2] Synthetic augmentation (Section 3.2): The paper states that synthetic implicit-threat examples are manually validated under the same protocol, but provides no details on the generation method, validation criteria, or checks for distributional shift relative to the re-annotated real data. This matters because any systematic difference in how implicit intent is expressed in synthetic text could artifactually widen the explicit/implicit performance gap.

minor comments (2)

[Abstract] The abstract would be strengthened by including at least one key quantitative result (e.g., F1 scores for explicit vs. implicit) to support the claims.
Ensure all source datasets are cited with full references and DOIs or URLs in the main text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the paper accordingly to improve transparency and rigor.

read point-by-point responses

Referee: [Section 3] Dataset construction (Section 3): The central claim that ThreatCore is a consistent gold standard rests on the re-annotation process under a unified definition. However, while the abstract notes original-label inconsistencies, no quantitative evidence such as inter-annotator agreement scores, label-transition statistics, or agreement between original and new annotations is reported. This is load-bearing because implicit threats are definitionally ambiguous; without these metrics it is impossible to verify that the unified protocol eliminates rather than relocates systematic biases.

Authors: We agree that quantitative evidence for the re-annotation reliability is essential. In the revised manuscript, we will add inter-annotator agreement scores (using Fleiss' kappa) for the re-annotation process, a table of label-transition statistics showing mappings from original to unified labels, and agreement rates between original and new annotations. These will be included in Section 3 to substantiate the consistency of the unified protocol. revision: yes
Referee: [Section 4] Experiments and results (Section 4): The headline findings—that implicit threats are substantially harder to detect and that SRL improves performance—are presented without specific metrics, confidence intervals, statistical significance tests, or ablation details in the abstract and high-level description. This prevents assessment of whether the performance gap is robust or an artifact of the re-annotation choices.

Authors: The full paper already reports specific metrics, confidence intervals, statistical tests, and SRL ablations in Section 4 and its tables. To address the concern, we will revise the abstract and the opening of Section 4 to include key quantitative highlights (e.g., F1 gaps and SRL gains) with references to the detailed statistics and tests in the body. revision: partial
Referee: [Section 3.2] Synthetic augmentation (Section 3.2): The paper states that synthetic implicit-threat examples are manually validated under the same protocol, but provides no details on the generation method, validation criteria, or checks for distributional shift relative to the re-annotated real data. This matters because any systematic difference in how implicit intent is expressed in synthetic text could artifactually widen the explicit/implicit performance gap.

Authors: We agree that additional details are needed for transparency. In the revised Section 3.2, we will describe the generation method (prompt templates and models used), the exact validation criteria applied by annotators, and distributional shift analyses (e.g., comparisons of length, lexical diversity, and embedding similarity between synthetic and real implicit examples). revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark construction without derivations or fitted predictions

full rationale

The paper is an empirical dataset paper that aggregates existing resources, applies a unified annotation protocol, adds validated synthetic examples, and evaluates off-the-shelf models. No equations, no parameter fitting to subsets followed by 'predictions', no self-citation chains supporting a central derivation, and no ansatz or uniqueness claims. The performance gap between explicit and implicit threats is measured directly on the constructed data rather than derived from prior results by the same authors. This is the standard honest outcome for benchmark papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the consistency of the annotation protocol and the representativeness of aggregated plus synthetic data.

axioms (1)

domain assumption A unified operational definition of threat exists that can be consistently applied across multiple existing datasets.
The dataset construction relies on systematic re-annotation under this single definition.

pith-pipeline@v0.9.0 · 5503 in / 1175 out tokens · 55568 ms · 2026-05-12T05:23:14.184004+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We adopt the following definition: Threat: A statement or phrase intended to announce harm, injury, or punishment to a target...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

incorporating Semantic Role Labeling as an intermediate representation can improve performance by making the structure of harmful intent more explicit

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

Semantic Web , volume =

Zhang, Ziqi and Luo, Lei , title =. Semantic Web , volume =

work page
[2]

and others , title =

Bazarova, Natalya N. and others , title =. Journal of Language and Social Psychology , volume =

work page
[3]

and others , title =

Mills, Colleen E. and others , title =. Studies in Conflict & Terrorism , volume =

work page
[4]

International Journal of Computational Intelligence Systems , volume=

Reading Between the Lines: Machine Learning Ensemble and Deep Learning for Implied Threat Detection in Textual Data , author=. International Journal of Computational Intelligence Systems , volume=

work page
[5]

Digital Threats: Research and Practice , volume =

Ravi, Kamalakkannan and Yuan, Jiann-Shiun , title =. Digital Threats: Research and Practice , volume =

work page
[6]

Behavioral Sciences of Terrorism and Political Aggression , volume=

Mechanisms of online radicalisation: how the internet affects the radicalisation of extreme-right lone actor terrorists , author=. Behavioral Sciences of Terrorism and Political Aggression , volume=. 2023 , publisher=

work page 2023
[7]

Proceedings of the 13th annual meeting of the forum for information retrieval evaluation , pages=

UrduThreat@ FIRE2021: Shared track on abusive threat identification in Urdu , author=. Proceedings of the 13th annual meeting of the forum for information retrieval evaluation , pages=

work page
[8]

Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

Latent hatred: A benchmark for understanding implicit hate speech , author=. Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

work page 2021
[9]

Language Resources and Evaluation , volume=

Introducing the gab hate corpus: defining and applying hate-based rhetoric to social media posts at scale , author=. Language Resources and Evaluation , volume=. 2022 , publisher=

work page 2022
[10]

Annual International Conference on Information Management and Big Data , pages=

ThreatGram101: Extreme Telegram Replies Data with Threat Levels , author=. Annual International Conference on Information Management and Big Data , pages=. 2024 , organization=

work page 2024
[11]

IEEE Access , volume=

BREE-HD: A transformer-based model to identify threats on Twitter , author=. IEEE Access , volume=. 2023 , publisher=

work page 2023
[12]

Predicting the type and target of offensive posts in social media , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages=

work page 2019
[13]

Complex & Intelligent Systems , volume=

ETHOS: a multi-label hate speech detection dataset , author=. Complex & Intelligent Systems , volume=. 2022 , publisher=

work page 2022
[14]

Learning from the worst: Dynamically generated datasets to improve online hate detection , author=. Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing (volume 1: long papers) , pages=

work page
[15]

IEEE Access , volume=

Target-oriented investigation of online abusive attacks: A dataset and analysis , author=. IEEE Access , volume=. 2023 , publisher=

work page 2023
[16]

Proceedings of the 2017 ACM on web science conference , year=

A large labeled corpus for online harassment research , author=. Proceedings of the 2017 ACM on web science conference , year=

work page 2017
[17]

International Conference on Applications of Natural Language to Information Systems , pages=

Automatic identification and classification of misogynistic language on twitter , author=. International Conference on Applications of Natural Language to Information Systems , pages=. 2018 , organization=

work page 2018
[18]

2019 International Conference on Content-Based Multimedia Indexing (CBMI) , year=

Threat: A large annotated corpus for detection of violent threats , author=. 2019 International Conference on Content-Based Multimedia Indexing (CBMI) , year=

work page 2019
[19]

Hammer, Hugo Lewi , title =. 2014. 2014 , publisher =

work page 2014
[20]

2023 International Conference on Machine Learning and Applications (ICMLA) , year =

Ravi, Kamalakkannan and others , title =. 2023 International Conference on Machine Learning and Applications (ICMLA) , year =

work page 2023
[21]

arXiv preprint arXiv:2505.13557 , year=

Amaqa: A metadata-based qa dataset for rag systems , author=. arXiv preprint arXiv:2505.13557 , year=

work page arXiv
[22]

Alvisi, S

Mapping the italian telegram ecosystem: Communities, toxicity, and hate speech , author=. arXiv preprint arXiv:2504.19594 , year=

work page arXiv
[23]

2024 , publisher =

Laurer, Moritz , title =. 2024 , publisher =

work page 2024
[24]

Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis , pages=

Threat detection in online discussions , author=. Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis , pages=

work page
[25]

Transactions of the Association for Computational Linguistics , volume=

Context-aware frame-semantic role labeling , author=. Transactions of the Association for Computational Linguistics , volume=. 2015 , publisher=

work page 2015

[1] [1]

Semantic Web , volume =

Zhang, Ziqi and Luo, Lei , title =. Semantic Web , volume =

work page

[2] [2]

and others , title =

Bazarova, Natalya N. and others , title =. Journal of Language and Social Psychology , volume =

work page

[3] [3]

and others , title =

Mills, Colleen E. and others , title =. Studies in Conflict & Terrorism , volume =

work page

[4] [4]

International Journal of Computational Intelligence Systems , volume=

Reading Between the Lines: Machine Learning Ensemble and Deep Learning for Implied Threat Detection in Textual Data , author=. International Journal of Computational Intelligence Systems , volume=

work page

[5] [5]

Digital Threats: Research and Practice , volume =

Ravi, Kamalakkannan and Yuan, Jiann-Shiun , title =. Digital Threats: Research and Practice , volume =

work page

[6] [6]

Behavioral Sciences of Terrorism and Political Aggression , volume=

Mechanisms of online radicalisation: how the internet affects the radicalisation of extreme-right lone actor terrorists , author=. Behavioral Sciences of Terrorism and Political Aggression , volume=. 2023 , publisher=

work page 2023

[7] [7]

Proceedings of the 13th annual meeting of the forum for information retrieval evaluation , pages=

UrduThreat@ FIRE2021: Shared track on abusive threat identification in Urdu , author=. Proceedings of the 13th annual meeting of the forum for information retrieval evaluation , pages=

work page

[8] [8]

Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

Latent hatred: A benchmark for understanding implicit hate speech , author=. Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

work page 2021

[9] [9]

Language Resources and Evaluation , volume=

Introducing the gab hate corpus: defining and applying hate-based rhetoric to social media posts at scale , author=. Language Resources and Evaluation , volume=. 2022 , publisher=

work page 2022

[10] [10]

Annual International Conference on Information Management and Big Data , pages=

ThreatGram101: Extreme Telegram Replies Data with Threat Levels , author=. Annual International Conference on Information Management and Big Data , pages=. 2024 , organization=

work page 2024

[11] [11]

IEEE Access , volume=

BREE-HD: A transformer-based model to identify threats on Twitter , author=. IEEE Access , volume=. 2023 , publisher=

work page 2023

[12] [12]

Predicting the type and target of offensive posts in social media , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages=

work page 2019

[13] [13]

Complex & Intelligent Systems , volume=

ETHOS: a multi-label hate speech detection dataset , author=. Complex & Intelligent Systems , volume=. 2022 , publisher=

work page 2022

[14] [14]

Learning from the worst: Dynamically generated datasets to improve online hate detection , author=. Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing (volume 1: long papers) , pages=

work page

[15] [15]

IEEE Access , volume=

Target-oriented investigation of online abusive attacks: A dataset and analysis , author=. IEEE Access , volume=. 2023 , publisher=

work page 2023

[16] [16]

Proceedings of the 2017 ACM on web science conference , year=

A large labeled corpus for online harassment research , author=. Proceedings of the 2017 ACM on web science conference , year=

work page 2017

[17] [17]

International Conference on Applications of Natural Language to Information Systems , pages=

Automatic identification and classification of misogynistic language on twitter , author=. International Conference on Applications of Natural Language to Information Systems , pages=. 2018 , organization=

work page 2018

[18] [18]

2019 International Conference on Content-Based Multimedia Indexing (CBMI) , year=

Threat: A large annotated corpus for detection of violent threats , author=. 2019 International Conference on Content-Based Multimedia Indexing (CBMI) , year=

work page 2019

[19] [19]

Hammer, Hugo Lewi , title =. 2014. 2014 , publisher =

work page 2014

[20] [20]

2023 International Conference on Machine Learning and Applications (ICMLA) , year =

Ravi, Kamalakkannan and others , title =. 2023 International Conference on Machine Learning and Applications (ICMLA) , year =

work page 2023

[21] [21]

arXiv preprint arXiv:2505.13557 , year=

Amaqa: A metadata-based qa dataset for rag systems , author=. arXiv preprint arXiv:2505.13557 , year=

work page arXiv

[22] [22]

Alvisi, S

Mapping the italian telegram ecosystem: Communities, toxicity, and hate speech , author=. arXiv preprint arXiv:2504.19594 , year=

work page arXiv

[23] [23]

2024 , publisher =

Laurer, Moritz , title =. 2024 , publisher =

work page 2024

[24] [24]

Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis , pages=

Threat detection in online discussions , author=. Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis , pages=

work page

[25] [25]

Transactions of the Association for Computational Linguistics , volume=

Context-aware frame-semantic role labeling , author=. Transactions of the Association for Computational Linguistics , volume=. 2015 , publisher=

work page 2015