Automatic feature learning for vulnerability prediction

Aditya Ghose; Hoa Khanh Dam; John Grundy; Shien Wee Ng; Trang Pham; Truyen Tran

arxiv: 1708.02368 · v1 · pith:RYWEKD4Onew · submitted 2017-08-08 · 💻 cs.SE

Automatic feature learning for vulnerability prediction

Hoa Khanh Dam , Truyen Tran , Trang Pham , Shien Wee Ng , John Grundy , Aditya Ghose This is my paper

classification 💻 cs.SE

keywords codepredictionfeatureslearningmodelssemanticsyntacticvariety

0 comments

read the original abstract

Code flaws or vulnerabilities are prevalent in software systems and can potentially cause a variety of problems including deadlock, information loss, or system failure. A variety of approaches have been developed to try and detect the most likely locations of such code vulnerabilities in large code bases. Most of them rely on manually designing features (e.g. complexity metrics or frequencies of code tokens) that represent the characteristics of the code. However, all suffer from challenges in sufficiently capturing both semantic and syntactic representation of source code, an important capability for building accurate prediction models. In this paper, we describe a new approach, built upon the powerful deep learning Long Short Term Memory model, to automatically learn both semantic and syntactic features in code. Our evaluation on 18 Android applications demonstrates that the prediction power obtained from our learned features is equal or even superior to what is achieved by state of the art vulnerability prediction models: 3%--58% improvement for within-project prediction and 85% for cross-project prediction.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Automatic Repair and Type Binding of Undeclared Variables using Neural Networks
cs.SE 2019-07 unverdicted novelty 4.0

Neural network trained on AST structural details repairs undeclared variable errors and infers types, reporting 81% success on location/identification and 80% on types for 1059 programs in the prutor dataset.