Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization

Abraham Heifets; Izhar Wallach

arxiv: 1706.06619 · v2 · pith:7L7DXX5Znew · submitted 2017-06-20 · 🧬 q-bio.QM · cs.LG· stat.ML

Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization

Izhar Wallach , Abraham Heifets This is my paper

classification 🧬 q-bio.QM cs.LGstat.ML

keywords ligand-basedbenchmarksclassificationmeasuremethodsoverfittingperformancerather

0 comments

read the original abstract

Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems that accounts for the similarity amongst inactive molecules as well as active. We investigated seven widely-used benchmarks for virtual screening and classification, and show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously-applied unbiasing techniques. Therefore, it may be that the previously-reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy.

This paper has not been read by Pith yet.

Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization

discussion (0)