Evaluating Assurance Cases as Text-Attributed Graphs for Structure and Provenance Analysis

Dusica Marijan; Fariz Ikhwantri

arxiv: 2604.20577 · v2 · submitted 2026-04-22 · 💻 cs.SE · cs.LG

Evaluating Assurance Cases as Text-Attributed Graphs for Structure and Provenance Analysis

Fariz Ikhwantri , Dusica Marijan This is my paper

Pith reviewed 2026-05-09 23:48 UTC · model grok-4.3

classification 💻 cs.SE cs.LG

keywords assurance casesgraph neural networkslink predictionprovenance detectionLLM-generatedsafety argumentstext-attributed graphs

0 comments

The pith

Assurance cases modeled as text-attributed graphs let graph neural networks predict argument links and detect LLM authorship.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper converts assurance cases into graphs where nodes hold text for claims and evidence while edges show relationships. Graph neural networks then learn to predict missing links between these elements and to classify entire cases as human-written or generated by large language models. This approach works because the graph structure captures how arguments connect in hierarchies. Experiments report solid results on both tasks and note that LLM cases tend to follow different linking patterns than human ones. The work supplies a public dataset to support further study of structure and origin in safety documents.

Core claim

Assurance cases are turned into text-attributed graphs so that graph neural networks can perform link prediction at ROC-AUC 0.760 on real cases while generalizing across domains and semi-supervised regimes. The same models classify human-authored versus LLM-generated cases at F1 0.94 and expose distinct hierarchical linking patterns in the LLM outputs. Existing GNN explanation methods align only moderately with the true argument structure.

What carries the argument

Text-attributed graphs of assurance cases, with nodes carrying text descriptions of argument elements and edges encoding relationships, fed to graph neural networks for link prediction and provenance classification.

If this is right

Link prediction models can help complete or audit the logical connections inside assurance cases.
Provenance classification can flag LLM-generated cases for extra human review in regulated safety work.
The observed difference in hierarchical patterns indicates current LLMs produce more uniform argument structures.
Moderate faithfulness of GNN explanations shows a remaining gap between model reasoning and actual argument flow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph treatment could be tried on other regulated documents such as legal or medical arguments to spot AI assistance.
Prompt designers might use the linking-pattern difference to make future LLM outputs closer to human variety.
If adopted at scale the method could support standards that require visible human oversight of AI-drafted safety cases.

Load-bearing premise

Turning assurance cases into graphs with text on nodes keeps the essential semantic links and origin signals intact.

What would settle it

A new collection of LLM-generated assurance cases deliberately prompted to copy human hierarchical linking statistics, tested to see whether classification F1 drops well below 0.94.

Figures

Figures reproduced from arXiv: 2604.20577 by Dusica Marijan, Fariz Ikhwantri.

**Figure 1.** Figure 1: Comparison of human-authored and LLMgenerated assurance cases. The LLM-generated case (right) shows a different hierarchical linking pattern and node distribution compared to the human-authored ground truth (left). Different colours represent heterogeneous node types (e.g., Goal, Strategy, Evidence) in the GSN notation. These assurance cases typically comprise a hierarchical argument structure, consisti… view at source ↗

**Figure 2.** Figure 2: Overview of graph evaluation framework on Assur [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Node Importance Distribution by Type. Importance scores are obtained by GNNExplainer attributions from UniGraph [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: GNNExplainer output for UniGraph model showing [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

An assurance case is a structured argument document that justifies claims about a system's requirements or properties, which are supported by evidence. In regulated domains, these are crucial for meeting compliance and safety requirements to industry standards. We propose a graph diagnostic framework for analysing the structure and provenance of assurance cases. We focus on two main tasks: (1) link prediction, to learn and identify connections between argument elements, and (2) graph classification, to differentiate between assurance cases created by a state-of-the-art large language model and those created by humans, aiming to detect bias. We compiled a publicly available dataset of assurance cases, represented as graphs with nodes and edges, supporting both link prediction and provenance analysis. Experiments show that graph neural networks (GNNs) achieve strong link prediction performance (ROC-AUC 0.760) on real assurance cases and generalise well across domains and semi-supervised settings. For provenance detection, GNNs effectively distinguish human-authored from LLM-generated cases (F1 0.94). We observed that LLM-generated assurance cases have different hierarchical linking patterns compared to human-authored cases. Furthermore, existing GNN explanation methods show only moderate faithfulness, revealing a gap between predicted reasoning and the true argument structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper releases a new public graph dataset of assurance cases and shows GNNs can do link prediction plus human-vs-LLM classification, but dataset construction details are missing so the provenance results are hard to trust.

read the letter

The main takeaway is a new publicly available dataset of assurance cases converted to text-attributed graphs, with GNN experiments on link prediction (ROC-AUC 0.76) and provenance classification that separates human from LLM-generated cases at F1 0.94. They also note LLM cases show different hierarchical linking patterns and check that existing GNN explanations are only moderately faithful to the argument structure. That dataset release and the two concrete tasks are the actual new pieces here. The numbers look plausible for a first pass at these problems in the assurance-case domain, and the semi-supervised generalization claim is worth having on record. Making the graphs public lets others test the same setup without starting from scratch. The soft spot is exactly what the stress-test flagged: we get almost no information on how the LLM cases were generated or how the text was turned into nodes, edges, and attributes. If the prompts or post-processing produced shallower or more regular trees than real human cases, the classifier could be learning synthesis artifacts instead of true provenance signals. Graph construction choices matter a lot here because any leakage of surface text features would inflate the F1. There are also no baselines reported for the classification task and no error analysis or statistical tests on the link-prediction results. The moderate explanation faithfulness already hints that the model is not fully capturing the intended argument structure. This paper is for people working on safety-critical systems, compliance documentation, or graph ML applied to structured technical text. A reader who needs a starting dataset for assurance-case graphs or who wants to extend provenance detection will find usable material. It is coherent enough and grounded enough in a real application area to deserve a serious referee, even though the methods section will need substantial expansion for reproducibility.

Referee Report

3 major / 2 minor

Summary. The paper proposes representing assurance cases as text-attributed graphs to support structure and provenance analysis via graph neural networks. It introduces a public dataset and evaluates two tasks: link prediction on real assurance cases (ROC-AUC 0.76, with generalization across domains and semi-supervised settings) and binary graph classification to distinguish human-authored from LLM-generated cases (F1 0.94). The authors report that LLM-generated cases exhibit different hierarchical linking patterns and that existing GNN explanation methods show only moderate faithfulness to the underlying argument structure.

Significance. If the central empirical claims hold after addressing methodological gaps, the work would be significant for regulated software engineering domains where assurance cases are mandatory. It provides an automated, graph-based approach to detect potential provenance issues and structural differences, along with a publicly available dataset that could enable further research. The reported generalization and the observation of linking pattern differences are strengths, though their reliability hinges on unbiased dataset construction.

major comments (3)

[Dataset compilation] Dataset construction (Section on dataset compilation): The process for creating the LLM-generated assurance cases is described at too high a level. Specifics on the LLM model, prompt templates, temperature, sampling strategy, and any post-processing or filtering steps are absent. This is load-bearing for the provenance classification result (F1 0.94), because the GNN could be learning synthesis artifacts (e.g., shallower or more uniform trees) rather than genuine human vs. LLM structural differences.
[Representation as text-attributed graphs] Graph construction details (Section on representation as text-attributed graphs): The conversion of assurance cases into nodes, edges, and text attributes is not specified in sufficient detail (e.g., how implicit links are encoded, what text is attached to nodes/edges, embedding method, or handling of hierarchical vs. cross-reference edges). Without this, it is impossible to assess whether the reported link-prediction ROC-AUC of 0.76 and classification performance truly reflect preserved semantic and provenance signals.
[Experiments] Experimental evaluation (Experiments section): The manuscript reports concrete performance numbers but omits baselines, statistical significance tests, error analysis, dataset size/split details, and ablation studies. These omissions make it difficult to verify the claims of strong performance and cross-domain generalization for both tasks.

minor comments (2)

[Abstract] The abstract states that GNNs 'generalise well across domains' but provides no quantitative breakdown by domain or explicit list of domains represented in the dataset.
Notation for graph components (nodes, edges, attributes) and the precise definition of 'hierarchical linking patterns' could be introduced earlier and used consistently to improve readability for readers outside the GNN community.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We believe the suggested clarifications will improve the paper's reproducibility and clarity. We address each major comment below.

read point-by-point responses

Referee: [Dataset compilation] Dataset construction (Section on dataset compilation): The process for creating the LLM-generated assurance cases is described at too high a level. Specifics on the LLM model, prompt templates, temperature, sampling strategy, and any post-processing or filtering steps are absent. This is load-bearing for the provenance classification result (F1 0.94), because the GNN could be learning synthesis artifacts (e.g., shallower or more uniform trees) rather than genuine human vs. LLM structural differences.

Authors: We concur that greater specificity is required here to substantiate the provenance classification results and mitigate concerns over potential artifacts. Accordingly, the revised manuscript will provide comprehensive details on the LLM generation process, including the model (GPT-4), complete prompt templates (to be included in an appendix), temperature (set to 0.7), sampling method (top-p sampling with p=0.9), and post-processing (automatic filtering for valid JSON structure followed by manual review for argument coherence). We will also include comparative statistics on graph properties such as average depth and branching factor to demonstrate that observed differences reflect substantive variations rather than superficial synthesis traits. revision: yes
Referee: [Representation as text-attributed graphs] Graph construction details (Section on representation as text-attributed graphs): The conversion of assurance cases into nodes, edges, and text attributes is not specified in sufficient detail (e.g., how implicit links are encoded, what text is attached to nodes/edges, embedding method, or handling of hierarchical vs. cross-reference edges). Without this, it is impossible to assess whether the reported link-prediction ROC-AUC of 0.76 and classification performance truly reflect preserved semantic and provenance signals.

Authors: We appreciate this observation and will enhance the representation section with precise specifications. Nodes correspond to individual argument elements (e.g., claims, strategies, evidence), each attributed with its original textual content. Edges encode 'supportedBy' relations for the hierarchical argument structure and 'inContextOf' for cross-references. Implicit links are derived directly from the assurance case's documented relationships. Text attributes are embedded using the all-MiniLM-L6-v2 sentence transformer model. The revised text will include a step-by-step description and pseudocode for the graph construction pipeline, distinguishing hierarchical from cross-reference edges. revision: yes
Referee: [Experiments] Experimental evaluation (Experiments section): The manuscript reports concrete performance numbers but omits baselines, statistical significance tests, error analysis, dataset size/split details, and ablation studies. These omissions make it difficult to verify the claims of strong performance and cross-domain generalization for both tasks.

Authors: We agree that these methodological details are important for validating our empirical claims. In the updated Experiments section, we will incorporate: baseline methods such as random guessing, feature-based classifiers without graph structure, and traditional ML models; statistical significance testing using bootstrap resampling or t-tests with reported p-values; qualitative error analysis highlighting representative failure cases for both tasks; explicit dataset statistics including the number of graphs (120 human-authored and 120 LLM-generated for classification, with domain-specific counts), and the train/validation/test splits (70%/15%/15%); and ablation experiments varying the use of text embeddings and edge types. New tables and figures will present these results to support the reported performance and generalization. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML evaluation on compiled dataset

full rationale

The paper describes compiling a dataset of assurance cases represented as text-attributed graphs, then applying GNNs to perform link prediction (ROC-AUC 0.760) and binary classification between human and LLM-generated cases (F1 0.94). All reported results are standard experimental metrics obtained by training models on external data splits and evaluating on held-out examples. No equations, self-definitional relations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided abstract or description. The derivation chain consists of data construction followed by independent model training and evaluation, with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions of graph representation learning and the premise that assurance-case text can be faithfully converted to attributed graphs without loss of argument semantics. No new entities are postulated.

axioms (2)

domain assumption Assurance cases can be losslessly represented as text-attributed graphs where nodes are argument elements and edges capture logical connections.
Invoked in the proposal of the graph diagnostic framework and dataset construction.
domain assumption GNNs trained on these graphs can generalize across domains and in semi-supervised settings.
Stated as an experimental outcome but treated as a modeling assumption for the framework.

pith-pipeline@v0.9.0 · 5517 in / 1488 out tokens · 37652 ms · 2026-05-09T23:48:06.568203+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 3 internal anchors

[1]

Adelard. 2024. Claims-Arguments-Evidence (CAE). https://www.adelard.com/ asce/cae. Accessed July 2025

work page 2024
[2]

Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, and Marinka Zitnik. 2023. Evaluating explainability for graph neural networks.Scientific Data10, 1 (2023), 144

work page 2023
[3]

Ankit Agrawal, Seyedehzahra Khoshmanesh, Michael Vierhauser, Mona Rahimi, Jane Cleland-Huang, and Robyn Lutz. 2019. Leveraging artifact trees to evolve and reuse safety cases. InProceedings of the 41st International Conference on Software Engineering(Montreal, Quebec, Canada)(ICSE ’19). IEEE Press, New York, NY, USA, 1222–1233. doi:10.1109/ICSE.2019.00124...

work page doi:10.1109/icse.2019.00124 2019
[4]

Alexander Ahlbrecht, Jasper Sprockhoff, and Umut Durak. 2024. A system- theoretic assurance framework for safety-driven systems engineering: A System- Theoretic Assurance Framework...Softw. Syst. Model.24, 1 (Sept. 2024), 253–270. doi:10.1007/s10270-024-01209-6

work page doi:10.1007/s10270-024-01209-6 2024
[5]

Kenza Amara, Rex Ying, Zitao Zhang, Zhihao Han, Yinan Shan, Ulrik Bran- des, Sebastian Schemm, and Ce Zhang. 2024. GraphFramEx: Towards Sys- tematic Evaluation of Explainability Methods for Graph Neural Networks. arXiv:2206.09677 [cs.LG] https://arxiv.org/abs/2206.09677

work page arXiv 2024
[6]

Ewen Denney, Ganesh Pai, and Ibrahim Habli. 2015. Dynamic safety cases for through-life safety assurance. InProceedings of the 37th International Conference on Software Engineering - Volume 2(Florence, Italy)(ICSE ’15). IEEE Press, New York, NY, USA, 587–590

work page 2015
[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy...

work page 2019
[8]

Romina Etezadi, Sallam Abualhaija, Chetan Arora, and Lionel Briand. 2025. Classification or Prompting: A Case Study on Legal Requirements Traceabil- ity. arXiv:2502.04916 [cs.SE] https://arxiv.org/abs/2502.04916

work page arXiv 2025
[9]

Matthias Fey, Jinu Sunil, Akihiro Nitta, Rishi Puri, Manan Shah, Blaž Stojanovič, Ramona Bendias, Alexandria Barghi, Vid Kocijan, Zecheng Zhang, et al . 2025. Pyg 2.0: Scalable learning on real world graphs, In Temporal Graph Learning Workshop @ KDD.arXiv e-prints, arXiv–2507

work page 2025
[10]

Schoenholz, Patrick F

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural message passing for Quantum chemistry. InProceedings of the 34th International Conference on Machine Learning - Volume 70(Sydney, NSW, Australia)(ICML’17). JMLR.org, 1263–1272

work page 2017
[11]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[12]

Provider of Services for Urban Air Mobility (PSU) Prototype Simulation (X5) Final Report,

Mallory S. Graydon and Sarah M. Lehman. 2025.Examining Proposed Uses of LLMs to Produce or Assess Assurance Arguments. Technical Memorandum (TM) 20250001849. Langley Research Center, NASA. https://ntrs.nasa.gov/api/ citations/20250001849/downloads/NASA-TM-20250001849.pdf Public release; work of the U.S. Government

work page arXiv 2025
[13]

Hamilton, Rex Ying, and Jure Leskovec

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035

work page 2017
[14]

Yufei He, Yuan Sui, Xiaoxin He, and Bryan Hooi. 2025. UniGraph: Learning a Uni- fied Cross-Domain Foundation Model for Text-Attributed Graphs. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (Toronto ON, Canada)(KDD ’25). Association for Computing Machinery, New York, NY, USA, 448–459. doi:10.1145/3690624.3709277

work page doi:10.1145/3690624.3709277 2025
[15]

Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. 2025. GRAG: Graph Retrieval-Augmented Generation. InFindings of the Association for Computational Linguistics: NAACL 2025, Luis Chiruzzo, Alan Ritter, and Lu Wang (Eds.). Association for Computational Linguistics, Albuquerque, New Mexico, 4145–4157. https://aclanthology.org/2025.findi...

work page 2025
[16]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[17]

Fariz Ikhwantri and Dusica Marijan. 2025. Explainable Compliance Detec- tion with Multi-Hop Natural Language Inference on Assurance Case Structure. arXiv:2506.08713 [cs.CL]

work page arXiv 2025
[18]

Bowen Jin, Gang Liu, Chi Han, Meng Jiang, Heng Ji, and Jiawei Han. 2024. Large Language Models on Graphs: A Comprehensive Survey.IEEE Transactions on Knowledge and Data Engineering36, 12 (2024), 8622–8642. doi:10.1109/TKDE. 2024.3469578

work page doi:10.1109/tkde 2024
[19]

Tim Kelly and Rob Weaver. 2004. The goal structuring notation–a safety argument notation. InProceedings of the dependable systems and networks 2004 workshop on assurance cases, Vol. 6. Citeseer Princeton, NJ

work page 2004
[20]

1999.Arguing safety: a systematic approach to managing safety cases

Timothy Patrick Kelly et al . 1999.Arguing safety: a systematic approach to managing safety cases. Ph. D. Dissertation. University of York York, UK

work page 1999
[21]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYgl

work page 2017
[22]

Zemin Liu, Xingtong Yu, Yuan Fang, and Xinming Zhang. 2023. GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks. In Proceedings of the ACM Web Conference 2023(Austin, TX, USA)(WWW ’23). Association for Computing Machinery, New York, NY, USA, 417–428. doi:10. 1145/3543507.3583386

work page arXiv 2023
[23]

Yihan Ma, Zhikun Zhang, Ning Yu, Xinlei He, Michael Backes, Yun Shen, and Yang Zhang. 2023. Generated graph detection. InInternational Conference on Machine Learning. PMLR, 23412–23428

work page 2023
[24]

Mazen Mohamad, Jan-Philipp Steghöfer, and Riccardo Scandariato. 2021. Security assurance cases—state of the art of an emerging approach.Empirical software engineering26, 4 (2021), 70

work page 2021
[25]

Anitha Murugesan, Isaac Wong, Joaquín Arias, Robert Stroud, Srivatsan Varadara- jan, Elmer Salazar, Gopal Gupta, Robin Bloomfield, and John Rushby. 2024. Au- tomating semantic analysis of system assurance cases using goal-directed ASP. Theory and Practice of Logic Programming24, 4 (2024), 805–824

work page 2024
[26]

Joakim Nivre. 2010. Dependency parsing.Language and Linguistics Compass4, 3 (2010), 138–152

work page 2010
[27]

Belle, Song Wang, Segla Kpodjedo, Timothy C

Oluwafemi Odu, Alvine B. Belle, Song Wang, Segla Kpodjedo, Timothy C. Leth- bridge, and Hadi Hemmati. 2025. Automatic instantiation of assurance cases from patterns using large language models.Journal of Systems and Software222 (2025), 112353. doi:10.1016/j.jss.2025.112353

work page doi:10.1016/j.jss.2025.112353 2025
[28]

Ross, Mark Winstead, and Michael McEvilley

Ronald S. Ross, Mark Winstead, and Michael McEvilley. 2022. Engineering Trustworthy Secure Systems. doi:10.6028/NIST.SP.800-160v1r1

work page doi:10.6028/nist.sp.800-160v1r1 2022
[29]

2015.Un- derstanding and evaluating assurance cases

John Rushby, Xidong Xu, Murali Rangarajan, and Thomas L Weaver. 2015.Un- derstanding and evaluating assurance cases. Technical Report. Langley Research Center, National Aeronautics and Space Administration

work page 2015
[30]

Mithila Sivakumar, Alvine Boaye Belle, Jinjun Shan, and Kimya Khakzad Sha- handashti. 2023. GPT-4 and Safety Case Generation: An Exploratory Analysis. arXiv:2312.05696 [cs.SE] https://arxiv.org/abs/2312.05696

work page arXiv 2023
[31]

Mithila Sivakumar, Alvine B Belle, Jinjun Shan, and Kimya Khakzad Shahandashti

work page
[32]

Prompting GPT–4 to support automatic safety case generation.Expert Systems with Applications255 (2024), 124653

work page 2024
[33]

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview. net/forum?id=rJXMpikCZ

work page 2018
[34]

Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. KGAT: Knowledge Graph Attention Network for Recommendation. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(Anchorage, AK, USA)(KDD ’19). Association for Computing Machinery, New York, NY, USA, 950–958. doi:10.1145/3292500.3330989

work page doi:10.1145/3292500.3330989 2019
[35]

Francis Rhys Ward and Ibrahim Habli. 2020. An assurance case pattern for the interpretability of machine learning in safety-critical systems. InInternational Conference on Computer Safety, Reliability, and Security. Springer, Springer, 395– 407

work page 2020
[36]

Weinstock, Howard F

Charles B. Weinstock, Howard F. Lipson, and John B. Goodenough. 2007.Arguing Security: Creating Security Assurance Cases. Technical Report. Software Engi- neering Institute, Carnegie Mellon University. https://www.sei.cmu.edu/library/ arguing-security-creating-security-assurance-cases/ Accessed: 2025-10-09

work page 2007
[37]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, et al. 2020. Transformers: State-of-the-Art Natural Language Processing. InPro- ceedings of the 2020 Conference on Empirical Methods in Natural Language Process- ing: System Demonstrations, Qun Liu and David Schlangen (Eds.). Association for Computational Linguistics, Online, 38–...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020
[38]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. 2020. A comprehensive survey on graph neural networks.IEEE transactions on neural networks and learning systems32, 1 (2020), 4–24

work page 2020
[39]

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Cheng- peng Li, Chengyuan Li, Dayiheng Liu, et al. 2024. Qwen2 Technical Report.arXiv preprint arXiv:2407.10671(2024)

work page internal anchor Pith review arXiv 2024
[40]

Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Graph convolutional networks for text classification. InProceedings of the Thirty-Third AAAI Conference on Arti- ficial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence(Honolulu, Hawaii, USA)...

work page doi:10.1609/aaai.v33i01.33017370 2019
[41]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, New York, NY, USA, 974–983

work page 2018
[42]

Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec

work page
[43]

Gnnexplainer: Generating explanations for graph neural networks.Ad- vances in neural information processing systems32 (2019)

work page 2019
[44]

Marinka Zitnik, Monica Agrawal, and Jure Leskovec. 2018. Modeling polyphar- macy side effects with graph convolutional networks.Bioinformatics34, 13 (06 2018), i457–i466. doi:10.1093/bioinformatics/bty294

work page doi:10.1093/bioinformatics/bty294 2018

[1] [1]

Adelard. 2024. Claims-Arguments-Evidence (CAE). https://www.adelard.com/ asce/cae. Accessed July 2025

work page 2024

[2] [2]

Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, and Marinka Zitnik. 2023. Evaluating explainability for graph neural networks.Scientific Data10, 1 (2023), 144

work page 2023

[3] [3]

Ankit Agrawal, Seyedehzahra Khoshmanesh, Michael Vierhauser, Mona Rahimi, Jane Cleland-Huang, and Robyn Lutz. 2019. Leveraging artifact trees to evolve and reuse safety cases. InProceedings of the 41st International Conference on Software Engineering(Montreal, Quebec, Canada)(ICSE ’19). IEEE Press, New York, NY, USA, 1222–1233. doi:10.1109/ICSE.2019.00124...

work page doi:10.1109/icse.2019.00124 2019

[4] [4]

Alexander Ahlbrecht, Jasper Sprockhoff, and Umut Durak. 2024. A system- theoretic assurance framework for safety-driven systems engineering: A System- Theoretic Assurance Framework...Softw. Syst. Model.24, 1 (Sept. 2024), 253–270. doi:10.1007/s10270-024-01209-6

work page doi:10.1007/s10270-024-01209-6 2024

[5] [5]

Kenza Amara, Rex Ying, Zitao Zhang, Zhihao Han, Yinan Shan, Ulrik Bran- des, Sebastian Schemm, and Ce Zhang. 2024. GraphFramEx: Towards Sys- tematic Evaluation of Explainability Methods for Graph Neural Networks. arXiv:2206.09677 [cs.LG] https://arxiv.org/abs/2206.09677

work page arXiv 2024

[6] [6]

Ewen Denney, Ganesh Pai, and Ibrahim Habli. 2015. Dynamic safety cases for through-life safety assurance. InProceedings of the 37th International Conference on Software Engineering - Volume 2(Florence, Italy)(ICSE ’15). IEEE Press, New York, NY, USA, 587–590

work page 2015

[7] [7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy...

work page 2019

[8] [8]

Romina Etezadi, Sallam Abualhaija, Chetan Arora, and Lionel Briand. 2025. Classification or Prompting: A Case Study on Legal Requirements Traceabil- ity. arXiv:2502.04916 [cs.SE] https://arxiv.org/abs/2502.04916

work page arXiv 2025

[9] [9]

Matthias Fey, Jinu Sunil, Akihiro Nitta, Rishi Puri, Manan Shah, Blaž Stojanovič, Ramona Bendias, Alexandria Barghi, Vid Kocijan, Zecheng Zhang, et al . 2025. Pyg 2.0: Scalable learning on real world graphs, In Temporal Graph Learning Workshop @ KDD.arXiv e-prints, arXiv–2507

work page 2025

[10] [10]

Schoenholz, Patrick F

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural message passing for Quantum chemistry. InProceedings of the 34th International Conference on Machine Learning - Volume 70(Sydney, NSW, Australia)(ICML’17). JMLR.org, 1263–1272

work page 2017

[11] [11]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[12] [12]

Provider of Services for Urban Air Mobility (PSU) Prototype Simulation (X5) Final Report,

Mallory S. Graydon and Sarah M. Lehman. 2025.Examining Proposed Uses of LLMs to Produce or Assess Assurance Arguments. Technical Memorandum (TM) 20250001849. Langley Research Center, NASA. https://ntrs.nasa.gov/api/ citations/20250001849/downloads/NASA-TM-20250001849.pdf Public release; work of the U.S. Government

work page arXiv 2025

[13] [13]

Hamilton, Rex Ying, and Jure Leskovec

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. InProceedings of the 31st International Conference on Neural Information Processing Systems(Long Beach, California, USA)(NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035

work page 2017

[14] [14]

Yufei He, Yuan Sui, Xiaoxin He, and Bryan Hooi. 2025. UniGraph: Learning a Uni- fied Cross-Domain Foundation Model for Text-Attributed Graphs. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1 (Toronto ON, Canada)(KDD ’25). Association for Computing Machinery, New York, NY, USA, 448–459. doi:10.1145/3690624.3709277

work page doi:10.1145/3690624.3709277 2025

[15] [15]

Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. 2025. GRAG: Graph Retrieval-Augmented Generation. InFindings of the Association for Computational Linguistics: NAACL 2025, Luis Chiruzzo, Alan Ritter, and Lu Wang (Eds.). Association for Computational Linguistics, Albuquerque, New Mexico, 4145–4157. https://aclanthology.org/2025.findi...

work page 2025

[16] [16]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[17] [17]

Fariz Ikhwantri and Dusica Marijan. 2025. Explainable Compliance Detec- tion with Multi-Hop Natural Language Inference on Assurance Case Structure. arXiv:2506.08713 [cs.CL]

work page arXiv 2025

[18] [18]

Bowen Jin, Gang Liu, Chi Han, Meng Jiang, Heng Ji, and Jiawei Han. 2024. Large Language Models on Graphs: A Comprehensive Survey.IEEE Transactions on Knowledge and Data Engineering36, 12 (2024), 8622–8642. doi:10.1109/TKDE. 2024.3469578

work page doi:10.1109/tkde 2024

[19] [19]

Tim Kelly and Rob Weaver. 2004. The goal structuring notation–a safety argument notation. InProceedings of the dependable systems and networks 2004 workshop on assurance cases, Vol. 6. Citeseer Princeton, NJ

work page 2004

[20] [20]

1999.Arguing safety: a systematic approach to managing safety cases

Timothy Patrick Kelly et al . 1999.Arguing safety: a systematic approach to managing safety cases. Ph. D. Dissertation. University of York York, UK

work page 1999

[21] [21]

Kipf and Max Welling

Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=SJU4ayYgl

work page 2017

[22] [22]

Zemin Liu, Xingtong Yu, Yuan Fang, and Xinming Zhang. 2023. GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural Networks. In Proceedings of the ACM Web Conference 2023(Austin, TX, USA)(WWW ’23). Association for Computing Machinery, New York, NY, USA, 417–428. doi:10. 1145/3543507.3583386

work page arXiv 2023

[23] [23]

Yihan Ma, Zhikun Zhang, Ning Yu, Xinlei He, Michael Backes, Yun Shen, and Yang Zhang. 2023. Generated graph detection. InInternational Conference on Machine Learning. PMLR, 23412–23428

work page 2023

[24] [24]

Mazen Mohamad, Jan-Philipp Steghöfer, and Riccardo Scandariato. 2021. Security assurance cases—state of the art of an emerging approach.Empirical software engineering26, 4 (2021), 70

work page 2021

[25] [25]

Anitha Murugesan, Isaac Wong, Joaquín Arias, Robert Stroud, Srivatsan Varadara- jan, Elmer Salazar, Gopal Gupta, Robin Bloomfield, and John Rushby. 2024. Au- tomating semantic analysis of system assurance cases using goal-directed ASP. Theory and Practice of Logic Programming24, 4 (2024), 805–824

work page 2024

[26] [26]

Joakim Nivre. 2010. Dependency parsing.Language and Linguistics Compass4, 3 (2010), 138–152

work page 2010

[27] [27]

Belle, Song Wang, Segla Kpodjedo, Timothy C

Oluwafemi Odu, Alvine B. Belle, Song Wang, Segla Kpodjedo, Timothy C. Leth- bridge, and Hadi Hemmati. 2025. Automatic instantiation of assurance cases from patterns using large language models.Journal of Systems and Software222 (2025), 112353. doi:10.1016/j.jss.2025.112353

work page doi:10.1016/j.jss.2025.112353 2025

[28] [28]

Ross, Mark Winstead, and Michael McEvilley

Ronald S. Ross, Mark Winstead, and Michael McEvilley. 2022. Engineering Trustworthy Secure Systems. doi:10.6028/NIST.SP.800-160v1r1

work page doi:10.6028/nist.sp.800-160v1r1 2022

[29] [29]

2015.Un- derstanding and evaluating assurance cases

John Rushby, Xidong Xu, Murali Rangarajan, and Thomas L Weaver. 2015.Un- derstanding and evaluating assurance cases. Technical Report. Langley Research Center, National Aeronautics and Space Administration

work page 2015

[30] [30]

Mithila Sivakumar, Alvine Boaye Belle, Jinjun Shan, and Kimya Khakzad Sha- handashti. 2023. GPT-4 and Safety Case Generation: An Exploratory Analysis. arXiv:2312.05696 [cs.SE] https://arxiv.org/abs/2312.05696

work page arXiv 2023

[31] [31]

Mithila Sivakumar, Alvine B Belle, Jinjun Shan, and Kimya Khakzad Shahandashti

work page

[32] [32]

Prompting GPT–4 to support automatic safety case generation.Expert Systems with Applications255 (2024), 124653

work page 2024

[33] [33]

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview. net/forum?id=rJXMpikCZ

work page 2018

[34] [34]

Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. KGAT: Knowledge Graph Attention Network for Recommendation. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(Anchorage, AK, USA)(KDD ’19). Association for Computing Machinery, New York, NY, USA, 950–958. doi:10.1145/3292500.3330989

work page doi:10.1145/3292500.3330989 2019

[35] [35]

Francis Rhys Ward and Ibrahim Habli. 2020. An assurance case pattern for the interpretability of machine learning in safety-critical systems. InInternational Conference on Computer Safety, Reliability, and Security. Springer, Springer, 395– 407

work page 2020

[36] [36]

Weinstock, Howard F

Charles B. Weinstock, Howard F. Lipson, and John B. Goodenough. 2007.Arguing Security: Creating Security Assurance Cases. Technical Report. Software Engi- neering Institute, Carnegie Mellon University. https://www.sei.cmu.edu/library/ arguing-security-creating-security-assurance-cases/ Accessed: 2025-10-09

work page 2007

[37] [37]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, et al. 2020. Transformers: State-of-the-Art Natural Language Processing. InPro- ceedings of the 2020 Conference on Empirical Methods in Natural Language Process- ing: System Demonstrations, Qun Liu and David Schlangen (Eds.). Association for Computational Linguistics, Online, 38–...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020

[38] [38]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. 2020. A comprehensive survey on graph neural networks.IEEE transactions on neural networks and learning systems32, 1 (2020), 4–24

work page 2020

[39] [39]

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Cheng- peng Li, Chengyuan Li, Dayiheng Liu, et al. 2024. Qwen2 Technical Report.arXiv preprint arXiv:2407.10671(2024)

work page internal anchor Pith review arXiv 2024

[40] [40]

Liang Yao, Chengsheng Mao, and Yuan Luo. 2019. Graph convolutional networks for text classification. InProceedings of the Thirty-Third AAAI Conference on Arti- ficial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence(Honolulu, Hawaii, USA)...

work page doi:10.1609/aaai.v33i01.33017370 2019

[41] [41]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. Association for Computing Machinery, New York, NY, USA, 974–983

work page 2018

[42] [42]

Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec

work page

[43] [43]

Gnnexplainer: Generating explanations for graph neural networks.Ad- vances in neural information processing systems32 (2019)

work page 2019

[44] [44]

Marinka Zitnik, Monica Agrawal, and Jure Leskovec. 2018. Modeling polyphar- macy side effects with graph convolutional networks.Bioinformatics34, 13 (06 2018), i457–i466. doi:10.1093/bioinformatics/bty294

work page doi:10.1093/bioinformatics/bty294 2018