Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Chao Guo; Cong Wang; Fei Lin; Fei-Yue Wang; Gen Luo; Ji Dai; Tengchao Zhang; Xiaotong Yu; Xue Yang; Yining Jiang

arxiv: 2506.10912 · v4 · pith:SKMASNE3new · submitted 2025-06-12 · 💻 cs.AI · cs.CL

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Fei Lin , Ziyang Gong , Cong Wang , Tengchao Zhang , Yonglin Tian , Yining Jiang , Ji Dai , Chao Guo

show 4 more authors

Xiaotong Yu Xue Yang Gen Luo Fei-Yue Wang

This is my paper

classification 💻 cs.AI cs.CL

keywords toxicitymolecularmllmsevaluationrepairtaskcapabilitiesdesign

0 comments

read the original abstract

Toxicity remains a leading cause of early-stage drug development failure. Despite advances in molecular design and property prediction, the task of molecular toxicity repair, generating structurally valid molecular alternatives with reduced toxicity, has not yet been systematically defined or benchmarked. To fill this gap, we introduce ToxiMol, the first benchmark task for general-purpose Multimodal Large Language Models (MLLMs) focused on molecular toxicity repair. We construct a standardized dataset covering 11 primary tasks and 660 representative toxic molecules spanning diverse mechanisms and granularities. We design a prompt annotation pipeline with mechanism-aware and task-adaptive capabilities, informed by expert toxicological knowledge. In parallel, we propose an automated evaluation framework, ToxiEval, which integrates toxicity endpoint prediction, synthetic accessibility, drug-likeness, and structural similarity into a high-throughput evaluation chain for repair success. We systematically assess 43 mainstream general-purpose MLLMs and conduct multiple ablation studies to analyze key issues, including evaluation metrics, candidate diversity, and failure attribution. Experimental results show that although current MLLMs still face significant challenges on this task, they begin to demonstrate promising capabilities in toxicity understanding, semantic constraint adherence, and structure-aware editing.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MolDeTox: Evaluating Language Model's Stepwise Fragment Editing for Molecular Detoxification
cs.AI 2026-05 unverdicted novelty 6.0

MolDeTox is a new benchmark that shows fragment-level stepwise editing by LLMs and VLMs improves structural validity and detoxification quality over prior toxicity-focused evaluations.
ToxiEval-ZKP: A Structure-Private Verification Framework for Molecular Toxicity Repair Tasks
cs.CR 2025-08 unverdicted novelty 5.0

ToxiEval-ZKP applies zero-knowledge proofs to enable private verification that generative AI molecules meet multidimensional toxicity repair criteria.