BioXArena benchmarks LLM agents on generating end-to-end ML pipelines for 76 multi-modal biomedical tasks, with MLEvolve plus Gemini-3.1-Pro scoring highest at 0.666.
Coley, Cao Xiao, Jimeng Sun, and Marinka Zitnik
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9roles
dataset 2polarities
use dataset 2representative citing papers
Chem-GMNet uses sphere-native embeddings, DualSKA attention, and SH-FFN layers to match or beat ChemBERTa-2 on MoleculeNet tasks with fewer parameters and sometimes no pretraining.
Large language models exhibit distinct creative patterns in molecule generation, including higher constraint satisfaction when more constraints are added, and this is the first work to reframe molecule generation abilities as creativity.
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
Suiren-1.0 is a family of three molecular foundation models (Base, Dimer, ConfAvg) pre-trained on 70M+ DFT samples and distilled to achieve claimed state-of-the-art performance on quantum property prediction tasks from 2D inputs.
GraphPINE is a GNN architecture that initializes node importance from prior knowledge graphs and propagates updates via an importance propagation layer for interpretable drug response prediction on over 5,000 genes and 952 drugs.
Orthonormal Data Collaboration (ODC) enforces orthonormal secret and target bases so that alignment reduces to the Orthogonal Procrustes problem, yielding O(acl^2) complexity, orthogonal concordance, and downstream performance invariant to the choice of target basis.
A benchmark across 156 comparisons finds classical ML models win 116 times while larger pretrained and LLM models win far fewer, showing predictive performance depends on model-task fit rather than scale.
Reinforcement learning with a quantum-inspired simulated annealing policy neural network is applied to synthesizable molecular optimization and reports competitive results against genetic algorithm baselines on the PMO benchmark with a 10K query budget.
citing papers explorer
-
BioXArena: Benchmarking LLM Agents on Multi-Modal Biomedical Machine Learning Tasks
BioXArena benchmarks LLM agents on generating end-to-end ML pipelines for 76 multi-modal biomedical tasks, with MLEvolve plus Gemini-3.1-Pro scoring highest at 0.666.
-
Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction
Chem-GMNet uses sphere-native embeddings, DualSKA attention, and SH-FFN layers to match or beat ChemBERTa-2 on MoleculeNet tasks with fewer parameters and sometimes no pretraining.
-
How Creative Are Large Language Models in Generating Molecules?
Large language models exhibit distinct creative patterns in molecule generation, including higher constraint satisfaction when more constraints are added, and this is the first work to reframe molecule generation abilities as creativity.
-
Tabular foundation models for in-context prediction of molecular properties
Tabular foundation models achieve high accuracy in molecular property prediction through in-context learning, with up to 100% win rates on MoleculeACE tasks when paired with CheMeleon embeddings.
-
Suiren-1.0 Technical Report: A Family of Molecular Foundation Models
Suiren-1.0 is a family of three molecular foundation models (Base, Dimer, ConfAvg) pre-trained on 70M+ DFT samples and distilled to achieve claimed state-of-the-art performance on quantum property prediction tasks from 2D inputs.
-
GraphPINE: Graph Importance Propagation for Interpretable Drug Response Prediction
GraphPINE is a GNN architecture that initializes node importance from prior knowledge graphs and propagates updates via an importance propagation layer for interpretable drug response prediction on over 5,000 genes and 952 drugs.
-
Data Collaboration Analysis with Orthonormal Basis Selection and Alignment
Orthonormal Data Collaboration (ODC) enforces orthonormal secret and target bases so that alignment reduces to the Orthogonal Procrustes problem, yielding O(acl^2) complexity, orthogonal concordance, and downstream performance invariant to the choice of target basis.
-
Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction
A benchmark across 156 comparisons finds classical ML models win 116 times while larger pretrained and LLM models win far fewer, showing predictive performance depends on model-task fit rather than scale.
-
Quantum-inspired Reinforcement Learning for Synthesizable Drug Design
Reinforcement learning with a quantum-inspired simulated annealing policy neural network is applied to synthesizable molecular optimization and reports competitive results against genetic algorithm baselines on the PMO benchmark with a 10K query budget.