ProvenAI measures transparency in multi-hop QA via answer correctness, citation fidelity, and ablation-based document influence on HotpotQA, reporting 53.53% accuracy and 71.55% fidelity while identifying a citation-influence gap.
Automating Cloud Security and Forensics Through a Secure-by-Design Generative AI Framework
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
As cloud environments become increasingly complex, cybersecurity and forensic investigations must evolve to meet emerging threats. Large Language Models (LLMs) have shown promise in automating log analysis and reasoning tasks, yet they remain vulnerable to prompt injection attacks and lack forensic rigor. To address these dual challenges, we propose a unified, secure-by-design GenAI framework that integrates PromptShield and the Cloud Investigation Automation Framework (CIAF). PromptShield proactively defends LLMs against adversarial prompts using ontology-driven validation that standardizes user inputs and mitigates manipulation. CIAF streamlines cloud forensic investigations through structured, ontology-based reasoning across all six phases of the forensic process. We evaluate our system on real-world datasets from AWS and Microsoft Azure, demonstrating substantial improvements in both LLM security and forensic accuracy. Experimental results show PromptShield boosts classification performance under attack conditions, achieving precision, recall, and F1 scores above 93%, while CIAF enhances ransomware detection accuracy in cloud logs using Likert-transformed performance features. Our integrated framework advances the automation, interpretability, and trustworthiness of cloud forensics and LLM-based systems, offering a scalable foundation for real-time, AI-driven incident response across diverse cloud infrastructures.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ProvenAI: Provenance-Native Traces of Evidence in Generated Answers
ProvenAI measures transparency in multi-hop QA via answer correctness, citation fidelity, and ablation-based document influence on HotpotQA, reporting 53.53% accuracy and 71.55% fidelity while identifying a citation-influence gap.