pith. sign in

arxiv: 2606.03422 · v1 · pith:WON5QLMLnew · submitted 2026-06-02 · 💻 cs.CE

HonestAffinity: Leak-Aware Evaluation of Protein and Pocket Priors for Binding Affinity Prediction

classification 💻 cs.CE
keywords bestcasf-2016no-leakpearsonpriorssplitsachievesaffinity
0
0 comments X
read the original abstract

Sequence-based deep learning offers a scalable alternative to structure-based scoring for protein-ligand binding affinity prediction. However, progress is hard to interpret when architectural priors are evaluated on canonical PDBbind-style splits that leak similarity classes across folds. We present HonestAffinity, a compact 1D-input predictor to isolate two priors under a leak-aware protocol: frozen ESM-2 (650M) protein embeddings and a learned binary pocket-position marker. We evaluate a multi-scale convolutional/Transformer template in three variants: HonestAffinity-Pocket, HonestAffinity-NoPocket, and HonestAffinity-Pocket-NoESM. All three train on 11,513 LP-PDBBind complexes in ~3 GPU-hours. We benchmark against five baselines on the LP-PDBBind 3-tier no-leak hold-out, CASF-2016, and a CASF-2016 non-train subset. Our central finding is a split-conditioned reversal rather than a uniformly best prior: HonestAffinity-Pocket achieves the best mean Pearson R on validation and CASF-2016 splits, whereas HonestAffinity-Pocket-NoESM achieves the best mean Pearson R on every strict LP no-leak tier (test_cl1-cl3). Both the pocket marker and ESM-2 input improve performance on familiar splits but reduce Pearson R on strict no-leak tiers. We argue models should report paired canonical and leak-proof ablations, and that deployment-regime-matched variants better describe these reversals than a single default. Code and scripts are linked in the footnote; checkpoints will be released upon acceptance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.