UntrustVul: An Automated Approach for Identifying Untrustworthy Alerts in Vulnerability Detection Models
Pith reviewed 2026-05-23 00:05 UTC · model grok-4.3
The pith
UntrustVul identifies untrustworthy vulnerability predictions by flagging lines that neither match historical vulnerability patterns nor influence any lines that do through dependency graphs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UntrustVul detects untrustworthy alerts by defining a line as vulnerability-irrelevant when it does not resemble patterns from historical vulnerable lines and all its successors in the data and control dependency graph are likewise vulnerability-irrelevant. This rule lets the method label a model prediction as untrustworthy without needing ground-truth labels on the current example. Evaluation on 115K predictions from four models across BigVul, MegaVul, SARD, and PrimeVul datasets yields AUC scores of 70%-88% and F1-scores of 82%-94%, exceeding prior methods by 6%-59% in AUC and 13%-92% in F1-score.
What carries the argument
The recursive definition of a vulnerability-irrelevant line: mismatch with historical patterns plus irrelevance of all successors in the data and control dependency graph.
If this is right
- Developers can skip manual inspection of lines marked untrustworthy and focus effort on the remaining alerts.
- Incorrect patching decisions triggered by irrelevant highlighted lines become less likely.
- The same filtering step can be applied to any vulnerability detection model that outputs line-level alerts.
- Performance holds across multiple public datasets and detector architectures without retraining the original models.
Where Pith is reading between the lines
- The same pattern-plus-graph test could be tried on other ML tasks that produce line-level explanations, such as defect prediction.
- If dependency graphs miss indirect influences, some truly relevant lines may still be flagged as irrelevant.
Load-bearing premise
Patterns drawn from historical vulnerable lines together with data and control dependency graphs can separate vulnerability-irrelevant lines from relevant ones without access to ground-truth labels on the current prediction.
What would settle it
A manually labeled set of model predictions where each highlighted line has been checked by experts for actual relevance to the vulnerability; if UntrustVul's untrustworthy flags show low agreement with these labels, the central claim fails.
Figures
read the original abstract
Machine learning (ML) has shown promise in vulnerability detection, but ML detectors may rely on irrelevant code features, causing them to highlight non-vulnerable lines as suspicious. Such misleading predictions increase developers' manual effort and may lead to incorrect patching strategies, motivating the need to identify untrustworthy predictions automatically. We present UntrustVul, an approach for detecting untrustworthy vulnerability predictions by identifying suspicious lines that are inherently unrelated to vulnerabilities. UntrustVul leverages patterns from historical vulnerable lines and flags predictions as untrustworthy when the highlighted lines neither match known vulnerability patterns nor influence lines that do. A line is considered vulnerability-irrelevant if it does not resemble historical vulnerabilities and all its successors in the data and control dependency graph are also vulnerability-irrelevant. The approach is designed conservatively to minimise misclassifying trustworthy predictions as untrustworthy. We evaluate UntrustVul on 115K predictions from four models across the BigVul, MegaVul, SARD, and PrimeVul datasets. Results show that UntrustVul achieves AUC scores of 70%-88% and F1-scores of 82%-94%, outperforming existing approaches by 6%-59% in AUC and 13%-92% in F1-score.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces UntrustVul, an automated method to flag untrustworthy vulnerability predictions from ML detectors. It identifies suspicious lines as untrustworthy if they neither resemble historical vulnerable lines nor influence (via PDG successors) lines that do, using a conservative recursive definition based on data/control dependency graphs. Evaluation on 115K predictions from four models across BigVul, MegaVul, SARD, and PrimeVul reports AUC of 70-88% and F1 of 82-94%, outperforming baselines by 6-59% AUC and 13-92% F1.
Significance. If the operationalization of resemblance and historical/test separation can be made reproducible without leakage, the approach could meaningfully reduce developer effort on false-positive alerts in vulnerability detection. The conservative design to avoid misclassifying trustworthy predictions is a methodological strength, and the scale of the evaluation (four datasets, four models) is appropriate for the claim.
major comments (3)
- [Abstract] Abstract and method description: the central performance claims (AUC 70-88%, outperformance margins) rest on an unspecified definition of 'resemble historical vulnerabilities' and on the exact procedure for extracting historical patterns; without an equation, feature set, or similarity metric, it is impossible to verify that the 115K test predictions are strictly separated from the historical set.
- [Evaluation] Evaluation section: no details are given on baseline implementations, exact PDG construction algorithm, or how historical data is partitioned from the evaluated predictions on BigVul/MegaVul/SARD/PrimeVul; this directly undermines the reported 6-59% AUC gains.
- [Approach] The recursive definition of vulnerability-irrelevant lines (a line is irrelevant if it does not resemble history and all successors are irrelevant) is load-bearing for the conservative claim, yet no pseudocode, termination condition, or handling of cycles in the PDG is provided.
minor comments (2)
- [Approach] Notation for PDG successors and 'resemble' should be formalized with a small example in the method section.
- [Results] Table or figure reporting per-dataset and per-model breakdowns would clarify whether the aggregate 70-88% AUC holds uniformly.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve reproducibility and clarity.
read point-by-point responses
-
Referee: [Abstract] Abstract and method description: the central performance claims (AUC 70-88%, outperformance margins) rest on an unspecified definition of 'resemble historical vulnerabilities' and on the exact procedure for extracting historical patterns; without an equation, feature set, or similarity metric, it is impossible to verify that the 115K test predictions are strictly separated from the historical set.
Authors: We agree that the description of resemblance and historical pattern extraction requires more precision to allow verification of no leakage. In the revision we will add the exact feature set, similarity metric, and partitioning procedure used to separate historical data from the 115K test predictions. revision: yes
-
Referee: [Evaluation] Evaluation section: no details are given on baseline implementations, exact PDG construction algorithm, or how historical data is partitioned from the evaluated predictions on BigVul/MegaVul/SARD/PrimeVul; this directly undermines the reported 6-59% AUC gains.
Authors: We acknowledge the missing implementation details. The revised manuscript will specify baseline implementations, the PDG construction algorithm and tool, and the precise historical/test partitioning method applied to each of the four datasets. revision: yes
-
Referee: [Approach] The recursive definition of vulnerability-irrelevant lines (a line is irrelevant if it does not resemble history and all successors are irrelevant) is load-bearing for the conservative claim, yet no pseudocode, termination condition, or handling of cycles in the PDG is provided.
Authors: We will add pseudocode for the recursive procedure, explicitly state the termination condition, and describe cycle handling (via a visited-node set) to prevent infinite recursion. revision: yes
Circularity Check
No circularity; empirical evaluation on external benchmarks
full rationale
The paper presents UntrustVul as a rule-based heuristic that flags lines as vulnerability-irrelevant when they neither resemble historical vulnerable lines nor propagate influence via PDG successors. No equations, fitted parameters, or self-referential definitions appear in the provided text; performance (AUC/F1 on 115K predictions across BigVul/MegaVul/SARD/PrimeVul) is reported as direct empirical measurement against external datasets. No self-citation chains, ansatz smuggling, or renaming of known results are present. The derivation is therefore self-contained against the stated benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Historical vulnerable lines provide representative patterns for identifying irrelevant lines in new code.
- domain assumption Data and control dependency graphs accurately reflect influence relationships relevant to vulnerabilities.
Reference graph
Works this paper leans on
-
[1]
Software Assurance Reference Dataset
2017. Software Assurance Reference Dataset. https://samate.nist.gov/SARD
work page 2017
-
[2]
2023. Joern. https://joern.io
work page 2023
-
[3]
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. In Proceedings of the 21th ACM SIGKDD In- ternational Conference on Knowledge Discovery and Data Mining (Sydney, NSW, Australia) (KDD ’15). Association for Computing Machine...
-
[4]
Deep learning based vulnerability detection: Are we there yet?
S. Chakraborty, R. Krishna, Y. Ding, and B. Ray. 2022. Deep Learning Based Vulnerability Detection: Are We There Yet? IEEE Transactions on Software Engi- neering 48, 09 (sep 2022), 3280–3296. doi:10.1109/TSE.2021.3087402
-
[5]
Baijun Cheng, Shengming Zhao, Kailong Wang, Meizhen Wang, Guangdong Bai, Ruitao Feng, Yao Guo, Lei Ma, and Haoyu Wang. 2024. Beyond Fidelity: Explaining Vulnerability Localization of Learning-Based Detectors. ACM Trans. Softw. Eng. Methodol. 33, 5, Article 127 (June 2024), 33 pages. doi:10.1145/3641543
-
[6]
Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, and Yulei Sui. 2021. DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network. ACM Trans. Softw. Eng. Methodol. 30, 3, Article 38 (April 2021), 33 pages. doi:10. 1145/3436877
work page 2021
-
[7]
Zhaoyang Chu, Yao Wan, Qian Li, Yang Wu, Hongyu Zhang, Yulei Sui, Guan- dong Xu, and Hai Jin. 2024. Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation. In Proceedings of the 33rd ACM SIGSOFT Inter- national Symposium on Software Testing and Analysis (Vienna, Austria) (ISSTA 2024). Association for Computing Machinery, New York,...
-
[8]
Roland Croft, M. Ali Babar, and M. Mehdi Kholoosi. 2023. Data Quality for Software Vulnerability Datasets. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) . 121–133. doi:10.1109/ICSE48619.2023.00022
-
[9]
Mengnan Du, Ruixiang Tang, Weijie Fu, and Xia Hu. 2022. Towards Debiasing DNN Models from Spurious Feature Influence.Proceedings of the AAAI Conference on Artificial Intelligence 36, 9 (Jun. 2022), 9521–9528. doi:10.1609/aaai.v36i9.21185
-
[10]
Jiahao Fan, Yi Li, Shaohua Wang, and Tien N. Nguyen. 2020. A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries. In Proceedings of the 17th International Conference on Mining Software Repositories (Seoul, Republic of Korea) (MSR ’20). Association for Computing Machinery, New York, NY, USA, 508–512. doi:10.1145/3379597.3387501
-
[11]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020 . Association for Computa- tional Linguistics, Online, 1536–1547. doi:10.18...
-
[12]
Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. 9, 3 (July 1987), 319–349. doi:10.1145/24039.24041
-
[13]
Michael Fu and Chakkrit Tantithamthavorn. 2022. LineVul: a transformer- based line-level vulnerability prediction. In Proceedings of the 19th Interna- tional Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 608–620. doi:10.1145/3524842.3528452
-
[14]
Tom Ganz, Martin Härterich, Alexander Warnecke, and Konrad Rieck. 2021. Explaining Graph Neural Networks for Vulnerability Discovery. In Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security (Virtual Event, Republic of Korea) (AISec ’21). Association for Computing Machinery, New York, NY, USA, 145–156. doi:10.1145/3474369.3486866
-
[15]
Shuzheng Gao, Cuiyun Gao, Chaozheng Wang, Jun Sun, David Lo, and Yue Yu. 2023. Two Sides of the Same Coin: Exploiting the Impact of Identifiers in Neural Code Comprehension. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 1933–1945. doi:10.1109/ICSE48619.2023.00164
-
[16]
Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller
Bhavya Ghai, Q. Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller
-
[17]
Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers. Proc. ACM Hum.-Comput. Interact. 4, CSCW3, Article 235 (jan 2021), 28 pages. doi:10.1145/3432934
-
[18]
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In Pro- ceedings of the 60th Annual Meeting of the Association for Computational Linguis- tics (Volume 1: Long Papers) . Association for Computational Linguistics, Dublin, Ireland, 7212–7225. doi:10.18653/v1/2022.a...
-
[19]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[20]
LineVD: Statement- level vulnerability detection using graph neural networks,
David Hin, Andrey Kan, Huaming Chen, and M. Ali Babar. 2022. LineVD: statement-level vulnerability detection using graph neural networks. In Pro- ceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 596–607. doi:10.1145/3524842.3527949
-
[21]
Yutao Hu, Suyuan Wang, Wenke Li, Junru Peng, Yueming Wu, Deqing Zou, and Hai Jin. 2023. Interpreters for GNN-Based Vulnerability Detection: Are We There Yet?. InProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (Seattle, WA, USA) (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 1407–1419. doi...
-
[22]
Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, and Yi Chang. 2023. GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks. IEEE Transactions on Knowledge and Data Engineering 35, 7 (2023), 6968–6972. doi:10.1109/TKDE.2022.3187455
-
[23]
Shih-Cheng Huang, Akshay S. Chaudhari, Curtis P. Langlotz, Nigam Shah, Serena Yeung, and Matthew P. Lungren. 2022. Developing medical imaging AI for emerging infectious diseases. Nature Communications 13, 1 (18 Nov 2022), 7060. doi:10.1038/s41467-022-34234-4
-
[24]
Erik Imgrund, Tom Ganz, Martin Härterich, Lukas Pirch, Niklas Risse, and Konrad Rieck. 2023. Broken Promises: Measuring Confounding Effects in Learning- based Vulnerability Discovery. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security(Copenhagen, Denmark)(AISec ’23). Association for Computing Machinery, New York, NY, USA, 149–...
-
[25]
L. Arnold Johnson, Kelley L. Dempsey, Ronald S. Ross, Sarbari Gupta, and Dennis Bailey. 2011. Guide for Security-Focused Configuration Management of Information Systems. Technical Report. Gaithersburg, MD, USA
work page 2011
-
[26]
Davinder Kaur, Suleyman Uslu, Kaley J. Rittichier, and Arjan Durresi. 2022. Trust- worthy Artificial Intelligence: A Review. ACM Comput. Surv. 55, 2, Article 39 (jan 2022), 38 pages. doi:10.1145/3491209
-
[27]
Lena Kästner, Markus Langer, Veronika Lazar, Astrid Schomäcker, Timo Speith, and Sarah Sterz. 2021. On the Relation of Trust and Explainability: Why to Engineer for Trustworthiness. InIEEE 29th International Requirements Engineering Conference Workshops. 169–175. doi:10.1109/REW53955.2021.00031
-
[28]
Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. 2019. Unmasking Clever Hans predic- tors and assessing what machines really learn. Nature Communications 10, 1 (11 Mar 2019), 1096. doi:10.1038/s41467-019-08987-4
-
[29]
Florin Leon, Sabina-Adriana Floria, and Costin Bădică. 2017. Evaluating the effect of voting methods on ensemble-based classification. In 2017 IEEE International Conference on INnovations in Intelligent SysTems and Applications (INISTA) . 1–6. doi:10.1109/INISTA.2017.8001122
-
[30]
Bo Li, Peng Qi, Bo Liu, Shuai Di, Jingen Liu, Jiquan Pei, Jinfeng Yi, and Bowen Zhou. 2023. Trustworthy AI: From Principles to Practices. ACM Comput. Surv. 55, 9, Article 177 (jan 2023), 46 pages. doi:10.1145/3555803
-
[31]
Jiwei Li, Will Monroe, and Dan Jurafsky. 2016. Understanding Neural Networks through Representation Erasure. CoRR abs/1612.08220 (2016). http://dblp.uni- trier.de/db/journals/corr/corr1612.html#LiMJ16a
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[32]
Yi Li, Shaohua Wang, and Tien N. Nguyen. 2021. Vulnerability detection with fine-grained interpretations. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Comput- ing Machinery, New York, NY, USA, 292–303. doi:...
-
[33]
Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2022. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 4 (2022), 2244–2258. doi:10.1109/TDSC.2021.3051525
-
[34]
Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proceedings 2018 Network and Distributed System Security Symposium (NDSS 2018) . Internet Society. doi:10.14722/ndss.2018.23158 11
-
[35]
Qinghua Lu, Liming Zhu, Xiwei Xu, Jon Whittle, and Zhenchang Xing. 2022. Towards a roadmap on software engineering for responsible AI. InProceedings of the 1st International Conference on AI Engineering: Software Engineering for AI (Pittsburgh, Pennsylvania) (CAIN ’22). Association for Computing Machinery, New York, NY, USA, 101–112. doi:10.1145/3522664.3528607
-
[36]
Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, and Xiang Zhang. 2020. Parameterized Explainer for Graph Neural Network. In Advances in Neural Information Processing Systems , Vol. 33. Curran Associates, Inc., 19620–19631. https://proceedings.neurips.cc/paper_files/paper/2020/file/ e37b08dd3015330dcbb5d6663667b8b8-Paper.pdf
work page 2020
-
[37]
Silverio Martínez-Fernández, Justus Bogner, Xavier Franch, Marc Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, and Stefan Wagner. 2022. Soft- ware Engineering for AI-Based Systems: A Survey. ACM Trans. Softw. Eng. Methodol. 31, 2, Article 37e (April 2022), 59 pages. doi:10.1145/3487043
-
[38]
Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 427–436. doi:10.1109/CVPR.2015.7298640
-
[39]
Tien N. Nguyen and Raymond Choo. 2022. Human-in-the-loop XAI-enabled vulnerability detection, investigation, and mitigation. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (Mel- bourne, Australia) (ASE ’21). IEEE Press, 1210–1212. doi:10.1109/ASE51524.2021. 9678840
-
[40]
Chao Ni, Liyu Shen, Xiaohu Yang, Yan Zhu, and Shaohua Wang. 2024. MegaVul: A C/C++ Vulnerability Dataset with Comprehensive Code Representations. In 2024 IEEE/ACM 21st International Conference on Mining Software Repositories (MSR). 738–742
work page 2024
-
[41]
Chao Ni, Xin Yin, Kaiwen Yang, Dehai Zhao, Zhenchang Xing, and Xin Xia. 2023. Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (San Francisco, CA, USA) (ESE...
-
[42]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318. doi:10. 3115/1073083.1073135
-
[43]
Pope, Soheil Kolouri, Mohammad Rostami, Charles E
Phillip E. Pope, Soheil Kolouri, Mohammad Rostami, Charles E. Martin, and Heiko Hoffmann. 2019. Explainability Methods for Graph Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2019
-
[44]
Md Mahbubur Rahman, Ira Ceka, Chengzhi Mao, Saikat Chakraborty, Baishakhi Ray, and Wei Le. 2024. Towards Causal Deep Learning for Vulnerability Detec- tion. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 153, 11 pages. doi:1...
-
[45]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, USA, 1135–1144. doi:10.1145/2939672.2939778
-
[46]
Vincenzo Riccio, Gunel Jahangirova, Andrea Stocco, Nargiz Humbatova, Michael Weiss, and Paolo Tonella. 2020. Testing machine learning based systems: a systematic mapping. Empirical Software Engineering 25, 6 (01 Nov 2020), 5193–
work page 2020
-
[47]
doi:10.1007/s10664-020-09881-0
- [48]
-
[49]
Abhik Roychoudhury, Corina Pasareanu, Michael Pradel, and Baishakhi Ray
-
[50]
Agen- tic ai software engineer: Programming with trust,
AI Software Engineer: Programming with Trust. arXiv:2502.13767 [cs.SE] https://arxiv.org/abs/2502.13767
-
[51]
Patrick Schramowski, Wolfgang Stammer, Stefano Teso, Anna Brugger, Franziska Herbert, Xiaoting Shao, Hans-Georg Luigs, Anne-Katrin Mahlein, and Kristian Kersting. 2020. Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nature Machine Intelligence 2, 8 (01 Aug 2020), 476–486. doi:10.1038/s42256-020-0212-3
-
[52]
Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE Inter- national Conference on Computer Vision (ICCV)
work page 2017
-
[53]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv:1312.6034 [cs.CV] https://arxiv.org/abs/1312.6034
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[54]
Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, and Wei Le. 2023. An Empirical Study of Deep Learning Models for Vulnerability Detection. In Pro- ceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE ’23). IEEE Press, 2237–2248. doi:10.1109/ICSE48619. 2023.00188
-
[55]
Benjamin Steenhoek, Kalpathy Sivaraman, Renata Saldivar Gonzalez, Yevhen Mohylevskyy, Roshanak Zilouchian Moghaddam, and Wei Le. 2025. Closing the Gap: A User Study on the Real-world Usefulness of AI-powered Vulnerability Detection & Repair in the IDE. arXiv:2412.14306 [cs.SE] https://arxiv.org/abs/ 2412.14306
-
[56]
Russell Stewart, Mykhaylo Andriluka, and Andrew Y. Ng. 2016. End-To-End People Detection in Crowded Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
work page 2016
-
[57]
Szymon Stradowski and Lech Madeyski. 2024. Interpretability/Explainability Ap- plied to Machine Learning Software Defect Prediction: An Industrial Perspective. IEEE Software (2024), 1–8. doi:10.1109/MS.2024.3505544
-
[58]
Scott Thiebes, Sebastian Lins, and Ali Sunyaev. 2021. Trustworthy artificial intelligence. Electronic Markets 31, 2 (01 Jun 2021), 447–464. doi:10.1007/s12525- 020-00441-4
- [59]
-
[60]
Minh Vu and My T. Thai. 2020. PGM-Explainer: Probabilistic Graph- ical Model Explanations for Graph Neural Networks. In Advances in Neural Information Processing Systems , Vol. 33. Curran Associates, Inc., 12225–12235. https://proceedings.neurips.cc/paper_files/paper/2020/file/ 8fb134f258b1f7865a6ab2d935a897c9-Paper.pdf
work page 2020
-
[61]
Tan Wang, Chang Zhou, Qianru Sun, and Hanwang Zhang. 2021. Causal Atten- tion for Unbiased Visual Recognition. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) . 3091–3100
work page 2021
-
[62]
John Wieting, Taylor Berg-Kirkpatrick, Kevin Gimpel, and Graham Neubig. 2019. Beyond BLEU: Training Neural Machine Translation with Semantic Similarity. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4344–4355. doi:10.18653/v1/P19-1427
- [63]
-
[64]
Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec
-
[65]
In Advances in Neural Information Processing Systems , Vol
GNNExplainer: Generating Explanations for Graph Neural Net- works. In Advances in Neural Information Processing Systems , Vol. 32. Cur- ran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/ d80b7040b773199015de6d3b4293c8ff-Paper.pdf
work page 2019
- [66]
-
[67]
Yue Zhang, David Defazio, and Arti Ramesh. 2021. RelEx: A Model-Agnostic Relational Model Explainer. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (Virtual Event, USA) (AIES ’21). Association for Computing Machinery, New York, NY, USA, 1042–1049. doi:10.1145/3461702.3462562
-
[68]
Yunhui Zheng, Saurabh Pujar, Burn Lewis, Luca Buratti, Edward Epstein, Bo Yang, Jim Laredo, Alessandro Morari, and Zhong Su. 2021. D2A: A Dataset Built for AI- Based Vulnerability Detection Methods Using Differential Analysis. InProceedings of the ACM/IEEE 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’...
work page 2021
-
[69]
Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. De- vign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. In Advances in Neural Information Pro- cessing Systems, Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/ paper_files/paper/2019/file/49265d2447bc3b...
work page 2019
-
[70]
Deqing Zou, Yutao Hu, Wenke Li, Yueming Wu, Haojun Zhao, and Hai Jin
-
[71]
IEEE Transactions on Dependable and Secure Computing (2022), 1–12
mVulPreter: A Multi-Granularity Vulnerability Detection System With Interpretations. IEEE Transactions on Dependable and Secure Computing (2022), 1–12. doi:10.1109/TDSC.2022.3199769 12
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.