RiskNet: A large-scale dataset of AI risk incidents from news with alignment and multi-dimensional annotations
Pith reviewed 2026-06-27 18:35 UTC · model grok-4.3
The pith
RiskNet turns large-scale multilingual news into aligned clusters of AI risk incidents with multi-dimensional labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a structured pipeline for AI risk news identification, event-level report screening, incident alignment, and multi-dimensional incident classification applied to large-scale multilingual news sources yields a resource that organizes dispersed reports into incident-centered records, including aligned incident clusters and annotated benchmark subsets, thereby enabling downstream research on AI safety, governance, risk analysis, and longitudinal analyses of harms.
What carries the argument
The structured pipeline for AI risk news identification, event-level report screening, incident alignment, and multi-dimensional incident classification that converts raw news into incident-centered records and benchmark datasets.
Load-bearing premise
The structured pipeline for AI risk news identification, event-level report screening, incident alignment, and multi-dimensional incident classification produces accurate, representative, and unbiased incident records from news sources.
What would settle it
An independent audit that finds a substantial fraction of the aligned incident clusters merge unrelated events or systematically omit documented AI risk cases from the source news.
read the original abstract
As artificial intelligence (AI) systems are increasingly deployed across socially consequential domains, reports of AI-related harms and failures have grown in frequency and diversity. Although existing governance frameworks articulate high-level principles for responsible AI, large-scale empirical resources for tracking and analyzing real-world AI risk incidents remain limited. Existing incident collections are often manually curated, relatively small in scale, and insufficient for continuous, data-driven monitoring and downstream computational analysis. To address this need, we present RiskNet, a large-scale dataset of AI risk incidents constructed from large-scale multilingual news sources. RiskNet applies a structured pipeline for AI risk news identification, event-level report screening, incident alignment, and multi-dimensional incident classification. The resulting resource organizes dispersed news reports into incident-centered records and provides benchmark datasets for event classification, incident alignment, and incident-level risk labeling. In its current release, RiskNet covers hundreds of millions of source records and yields a large-scale collection of AI risk-related reports, including aligned incident clusters and annotated benchmark subsets. The dataset is also accessible through an online platform for browsing and exploration. We describe the data sources, processing workflow, taxonomy design, and technical validation of the resource. RiskNet is intended to support downstream research on AI safety, governance, risk analysis, and benchmarking, as well as longitudinal and cross-source analyses of AI-related harms. By providing a structured and reusable empirical resource, RiskNet helps bridge the gap between high-level governance principles and the documented realities of AI risk incidents.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents RiskNet, a large-scale dataset of AI risk incidents constructed from multilingual news sources using a structured pipeline for AI risk news identification, event-level report screening, incident alignment, and multi-dimensional incident classification. The resource organizes dispersed news reports into incident-centered records and provides benchmark datasets for event classification, incident alignment, and incident-level risk labeling, along with an online platform. The paper describes the data sources, processing workflow, taxonomy design, and technical validation of the resource to support AI safety, governance, and risk analysis research.
Significance. If the pipeline is validated to produce accurate and representative records, RiskNet would be a significant contribution by supplying a large-scale empirical resource (from hundreds of millions of source records) that addresses the limitations of small, manually curated incident collections. This could enable new data-driven, longitudinal, and cross-source analyses in AI risk monitoring and benchmarking.
major comments (2)
- [Technical validation section] Technical validation section (as referenced in the abstract and processing workflow description): The manuscript states that it describes the technical validation but supplies no quantitative metrics such as precision/recall for news identification, inter-annotator agreement for annotations or alignment, or error analysis for any pipeline stage. This is load-bearing for the central claim that the dataset yields accurate incident records.
- [Processing workflow] Processing workflow and pipeline description: The claim that the structured pipeline produces representative and unbiased incident records from news sources lacks supporting evidence such as bias mitigation steps, comparison to existing curated datasets, or validation against ground-truth incidents; without this, the utility for downstream tasks cannot be assessed.
minor comments (1)
- [Abstract] Abstract: Mention of 'technical validation' would be strengthened by previewing at least one key quantitative result to allow immediate assessment of the pipeline's performance.
Simulated Author's Rebuttal
We thank the referee for highlighting the need for stronger empirical support in the technical validation and processing workflow sections. We address each major comment below and commit to revisions that strengthen the manuscript without overstating current content.
read point-by-point responses
-
Referee: [Technical validation section] The manuscript states that it describes the technical validation but supplies no quantitative metrics such as precision/recall for news identification, inter-annotator agreement for annotations or alignment, or error analysis for any pipeline stage. This is load-bearing for the central claim that the dataset yields accurate incident records.
Authors: We agree that the current manuscript provides only a qualitative description of the validation process without numerical metrics. In the revised version we will add precision/recall estimates for the news identification stage, inter-annotator agreement scores for annotation and alignment tasks, and an error analysis for each pipeline component to directly support the accuracy claims. revision: yes
-
Referee: [Processing workflow] The claim that the structured pipeline produces representative and unbiased incident records from news sources lacks supporting evidence such as bias mitigation steps, comparison to existing curated datasets, or validation against ground-truth incidents; without this, the utility for downstream tasks cannot be assessed.
Authors: We acknowledge the absence of explicit bias mitigation details and external comparisons in the current text. The revised manuscript will expand the workflow section to document source diversity measures, filtering criteria, and a comparison against existing curated AI incident collections. Direct validation against comprehensive ground-truth incident lists remains constrained by the lack of such external references, which we will note as a limitation while emphasizing the utility of the released benchmark subsets. revision: partial
- Direct validation against comprehensive ground-truth incident records, as no complete external ground-truth collection of all AI risk incidents exists for comparison.
Circularity Check
No significant circularity
full rationale
The paper describes construction of a news-derived dataset via an identification, screening, alignment, and classification pipeline. No equations, parameter fitting, predictions, or first-principles derivations are claimed. The contribution is empirical resource creation with technical validation described; no load-bearing step reduces to self-definition, fitted inputs renamed as predictions, or self-citation chains. This matches the default expectation for non-derivational dataset papers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multilingual news sources contain sufficient and representative coverage of AI risk incidents
Reference graph
Works this paper leans on
-
[1]
Technical report, Depart- ment for Science, Innovation and Technology (UK) (2026)
Bengio, Y., et al.: International ai safety report 2026. Technical report, Depart- ment for Science, Innovation and Technology (UK) (2026). A global synthesis of scientific evidence on GPAI risks. https://arxiv.org/abs/2602.21012
arXiv 2026
-
[2]
Managing extreme AI risks amid rapid progress
Bengio, Y., Hinton, G., Yao, A., Song, D., Abbeel, P., Darrell, T., Harari, Y.N., Zhang, Y.-Q., Xue, L., Shalev-Shwartz, S., Hadfield, G., Clune, J., Maharaj, T., Hutter, F., Baydin, A.G., McIlraith, S., Gao, Q., Acharya, A., Krueger, D., Dragan, A., Torr, P., Russell, S., Kahneman, D., Brauner, J., Mindermann, S.: Managing extreme AI risks amid rapid pro...
-
[3]
Winfield, A.F.T., Jirotka, M.: Ethical governance is essential to building trust in robotics and artificial intelligence systems. Philosophical Trans- actions of the Royal Society A: Mathematical, Physical and Engineering Sciences376(2133), 20180085 (2018) https://doi.org/10.1098/rsta.2018.0085 https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.2018.0085
-
[4]
Mittelstadt, B.D., Allo, P., Taddeo, M., Wachter, S., Floridi, L.: The ethics of algorithms: Mapping the debate. Big Data & Society3(2), 2053951716679679 (2016) https://doi.org/10.1177/2053951716679679 https://doi.org/10.1177/2053951716679679
-
[5]
TIME Magazine (2025)
Atherton, K.D.: Reports of ai-related incidents rose 50% year-over-year: The new kinds of ai harm emerging. TIME Magazine (2025)
2025
-
[6]
Staufer, L., Casper, S., et al.: The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems (2026)
2025
-
[7]
Tabassi, E.: Artificial Intelligence Risk Management Framework (AI RMF 1.0). Technical Report NIST AI 100-1, National Institute of Standards and Technol- ogy (U.S.), Gaithersburg, MD (January 2023). https://doi.org/10.6028/NIST.AI. 100-1 . http://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf Accessed 2025- 03-21
-
[8]
https://eur-lex.europa.eu/legal-content/ EN/TXT/?uri=CELEX:32024R1689
European Commission: Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). https://eur-lex.europa.eu/legal-content/ EN/TXT/?uri=CELEX:32024R1689. Official Journal of the European Union, L 1689/1, 13 June 2024 (2024) 24
2024
-
[9]
https://www.cac.gov.cn/ 2023-07/13/c 1690898327029107.htm
Cyberspace Administration of China (CAC): Interim Measures for the Man- agement of Generative Artificial Intelligence Services. https://www.cac.gov.cn/ 2023-07/13/c 1690898327029107.htm. Effective from August 15, 2023 (2023)
2023
-
[10]
In: 2023 ACM Confer- ence on Fairness Accountability and Transparency, pp
Golpayegani, D., Pandit, H.J., Lewis, D.: To Be High-Risk, or Not To Be—Semantic Specifications and Implications of the AI Act’s High-Risk AI Applications and Harmonised Standards. In: 2023 ACM Confer- ence on Fairness Accountability and Transparency, pp. 905–915. ACM, Chicago IL USA (2023). https://doi.org/10.1145/3593013.3594050 . https://dl.acm.org/doi...
-
[11]
Yu, Y., Liu, Y., Zhang, J., Huang, Y., Wang, Y.: Understanding Generative AI Risks for Youth: A Taxonomy Based on Empirical Data. arXiv. arXiv:2502.16383 [cs] (2025). https://doi.org/10.48550/arXiv.2502.16383 . http://arxiv.org/abs/ 2502.16383 Accessed 2025-07-25
-
[12]
In: Proceedings of the 2025 Con- ference on Empirical Methods in Natural Language Processing (EMNLP), pp
Yong, Z.X., Ermis, B.,et al.: The state of multilingual llm safety research: From measuring the language gap to mitigating it. In: Proceedings of the 2025 Con- ference on Empirical Methods in Natural Language Processing (EMNLP), pp. 15845–15860 (2025)
2025
-
[13]
Computer57(11), 71–81 (2024) https://doi.org/10.1109/mc.2024.3432492
De Miguel Vel´ azquez, J., ˇS´ cepanovi´ c, S., Gvirtz, A., Quercia, D.: Decoding Real-World Artificial Intelligence Incidents. Computer57(11), 71–81 (2024) https://doi.org/10.1109/mc.2024.3432492 . Publisher: Institute of Electrical and Electronics Engineers (IEEE). Accessed 2025-07-25
-
[14]
In: arXiv Preprint arXiv:2407.17436 (2024)
Zeng, Y.,et al.: Air-bench 2024: A safety benchmark based on risk categories from regulations and policies. In: arXiv Preprint arXiv:2407.17436 (2024)
arXiv 2024
-
[15]
Slattery, P., Saeri, A.K., Grundy, E.A.C., Graham, J., Noetel, M., Uuk, R., Dao, J., Pour, S., Casper, S., Thompson, N.: The AI Risk Repository: A Comprehensive Meta-Review, Database, and Taxonomy of Risks From Artificial Intelligence. AGI - Artificial General Intelligence - Robotics - Safety & Alignment1(1) (2024) https://doi.org/10.70777/agi.v1i1.10881 ...
-
[16]
MIT AI Risk Initiative (2025)
Mylius, S., Slattery, P., et al.: The ai risk repository: December 2025 update on mapping the ai governance landscape. MIT AI Risk Initiative (2025)
2025
-
[17]
McGregor, S.: Preventing Repeated Real World AI Failures by Cataloging Inci- dents: The AI Incident Database. arXiv. arXiv:2011.08512 [cs] (2020). https:// doi.org/10.48550/arXiv.2011.08512 . http://arxiv.org/abs/2011.08512 Accessed 2025-07-25
-
[18]
https://www.aiaaic.org/
Pownall, C., Associates, C..: AIAAIC: Artificial Intelligence, Algorithms, and Automation Incident & Controversy Repository. https://www.aiaaic.org/. Accessed: 2025-08-01 (2023) 25
2025
-
[19]
Knight, S.: Learning about AI ethics from cases: A scoping review of AI incident repositories and cases (2025)
2025
-
[20]
Turri, V., Dzombak, R.: Why We Need to Know More: Exploring the State of AI Incident Documentation Practices. In: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, And Society, pp. 576–583. ACM, Montr´ eal QC Canada (2023). https://doi.org/10.1145/3600211.3604700 . https://dl.acm.org/doi/10.1145/3600211.3604700Accessed 2025-07-24
-
[21]
Law, Technology and Humans5(1), 133–152 (2023) https://doi.org/ 10.5204/lthj.2682
Lupo, G.: Risky Artificial Intelligence: The Role of Incidents in the Path to AI Regulation. Law, Technology and Humans5(1), 133–152 (2023) https://doi.org/ 10.5204/lthj.2682 . Publisher: Queensland University of Technology. Accessed 2025-07-25
-
[22]
AI and Ethics6(2) (2026) https://doi.org/10.1007/s43681-025-00700-x
Gipiˇ skis, A., et al.: Uncovering ai’s hidden risks: an empirical analysis of health- related ai incidents and their ethical implications. AI and Ethics6(2) (2026) https://doi.org/10.1007/s43681-025-00700-x
-
[23]
Paeth, K., Atherton, D., Pittaras, N., Frase, H., McGregor, S.: Lessons for Edi- tors of AI Incidents from the AI Incident Database. arXiv. arXiv:2409.16425 [cs] (2024). https://doi.org/10.48550/arXiv.2409.16425 . http://arxiv.org/abs/2409. 16425 Accessed 2025-07-25
-
[24]
Rao, P., ˇS´ cepanovi´ c, S., et al.: RiskRAG: A Data-Driven Solution for Improved AI Model Risk Reporting (2025)
2025
-
[25]
Technical Report 34, OECD (February 2025)
OECD: Towards a common reporting framework for ai incidents. Technical Report 34, OECD (February 2025). Accessed: 2026-03-19. https://www.oecd. org/en/publications/towards-a-common-reporting-framework-for-ai-incidents f326d4ac-en.html
2025
-
[26]
https: //arxiv.org/abs/2412.14855
Bieringer, L., Paeth, K., St¨ angler, J., Wespi, A., Alahi, A., Grosse, K.: Position: A Taxonomy for Reporting and Describing AI Security Incidents (2025). https: //arxiv.org/abs/2412.14855
Pith/arXiv arXiv 2025
-
[27]
https: //arxiv.org/abs/2507.23669
Russo, D., Orlando, G.M., La Gatta, V., Moscato, V.: Automating AI Failure Tracking: Semantic Association of Reports in AI Incident Database (2025). https: //arxiv.org/abs/2507.23669
arXiv 2025
-
[28]
https://arxiv.org/abs/2505.04291
Richards, I., Benn, C., Zilka, M.: From Incidents to Insights: Patterns of Responsibility following AI Harms (2025). https://arxiv.org/abs/2505.04291
arXiv 2025
-
[29]
Popchanovska, E., et al.: When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies (2026)
2026
-
[30]
In: 17th ACM/SPEC International 26 Conference on Performance Engineering (ICPE 2026) (2026)
Chu, X., Ilager, S.,et al.: Leveraging llms for structured information extrac- tion and analysis from cloud incident reports. In: 17th ACM/SPEC International 26 Conference on Performance Engineering (ICPE 2026) (2026). https://doi.org/10. 1145/3777911.3801103
arXiv 2026
-
[31]
https://data
Common Crawl: Common Crawl News Dataset Index. https://data. commoncrawl.org/crawl-data/CC-NEWS/index.html. Accessed: 2025-08-01 (2016)
2025
-
[32]
https://arxiv.org/abs/ 2407.13773
He, C., Li, W., Jin, Z., Xu, C., Wang, B., Lin, D.: OpenDataLab: Empowering General Artificial Intelligence with Open Datasets (2024). https://arxiv.org/abs/ 2407.13773
arXiv 2024
-
[33]
https://github
SnailDev: Hot Hub repositories for Chinese online trending topics. https://github. com/snaildev. Including Weibo Hot Hub, Douyin Hot Hub, Zhihu Hot Hub, Toutiao Hot Hub, and V2EX Hot Hub. Accessed: 2025-08-01 (2021) 27
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.