{"paper":{"title":"Advancing Ligand-based Virtual Screening and Molecular Generation with Pretrained Molecular Embedding Distance","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Pretrained embedding distances serve as an effective training-free measure of molecular similarity for virtual screening and generation.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Shiyun Wa, Simone Sciabola, Ye Wang, Yifei Wang","submitted_at":"2026-04-27T13:43:20Z","abstract_excerpt":"Molecular similarity plays a central role in ligand-based drug discovery, such as virtual screening, analog searching, and goal-directed molecular generation. However, traditional similarity measures, ranging from fingerprint-based Tanimoto coefficients to 3D shape overlays, are often computationally expensive at scale or rely on hand-crafted molecular descriptors. Meanwhile, many deep learning approaches to similarity-aware design still depend on similarity-specific supervision or costly data curation, limiting their generality across targets. In this work, we propose pretrained embedding dis"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"pretrained embedding distance (PED) ... exhibits distinct correlations with traditional similarity metrics, and performs effectively in both ranking molecules for virtual screening and guiding molecular generation via reward design.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That embeddings from general pretrained molecular models already capture the structural information needed for effective similarity measurement across targets, without requiring task-specific supervision or data curation.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Pretrained molecular embedding distances provide an effective similarity metric for ligand-based virtual screening and molecular generation without task-specific training.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Pretrained embedding distances serve as an effective training-free measure of molecular similarity for virtual screening and generation.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ac7258a12070b22ffdbcf4b5de50cbeec7a0a425d0ec7fe65daba0cd3f585195"},"source":{"id":"2604.24474","kind":"arxiv","version":2},"verdict":{"id":"cec7f059-88f4-4cd8-8e13-1034409184de","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-08T04:00:37.208696Z","strongest_claim":"pretrained embedding distance (PED) ... exhibits distinct correlations with traditional similarity metrics, and performs effectively in both ranking molecules for virtual screening and guiding molecular generation via reward design.","one_line_summary":"Pretrained molecular embedding distances provide an effective similarity metric for ligand-based virtual screening and molecular generation without task-specific training.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That embeddings from general pretrained molecular models already capture the structural information needed for effective similarity measurement across targets, without requiring task-specific supervision or data curation.","pith_extraction_headline":"Pretrained embedding distances serve as an effective training-free measure of molecular similarity for virtual screening and generation."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.24474/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-21T06:39:47.405703Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T22:03:38.125400Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"1fb529372e33ec1760357765bdc04df50ce403fbfc3e38846e3485244e43f43e"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}