NMRAgent is an evidential LLM agent for NMR-based molecular structure elucidation that improves accuracy on novel scaffolds and demonstrates utility on real natural products.
hub Mixed citations
ChemBERTa: large -scale self -supervised pretraining fo r molecular property prediction
Mixed citation behavior. Most common role is background (40%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
scCycleMol adds a learnable circular cell-cycle head with closed-loop supervision from predicted treated expression, yielding higher r-squared on SciPlex3 gene predictions and improved phase accuracy versus ChemCPA baselines.
LOGICA adds context to pretrained biological LMs via logit-space contrastive alignment with gated adapters, improving AUC on held-out drug-resistance mutation ranking from ~0.55 to ~0.65 while preserving token likelihoods.
MolGram integrates a conditional n-gram memory module into molecular language models to address locality gaps in SMILES tokenization, improving performance on generation, forward prediction, and retrosynthesis while outperforming 3x larger baselines.
Chem-GMNet uses sphere-native embeddings, DualSKA attention, and SH-FFN layers to match or beat ChemBERTa-2 on MoleculeNet tasks with fewer parameters and sometimes no pretraining.
FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.
Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.
Pre-training improves CLMs' encoding of molecular substructures especially in upper layers while fine-tuning selectively modifies task-relevant substructures more than others.
Decentralized AI agent teams self-organize around hypotheses, critique proposals, and share knowledge to outperform single-agent baselines on biomedical ML, language-model optimization, and protein fitness tasks.
Drug-blind cancer sensitivity prediction is limited by evaluation metric and training distribution rather than drug representation complexity.
PolyLM fine-tunes a 9B-parameter LLM on 185k papers to predict polymer properties from text alone, achieving median R² of 0.74 on 68k held-out samples.
Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.
Benchmark across 78 endpoint-split entries finds classical ML winning 47.4% of best performances over pretrained models, GNNs, and LLMs, with performance depending on model-task-split fit rather than scale.
NOSE aligns molecular, receptor, and linguistic modalities in a shared embedding space via tri-modal orthogonal contrastive learning and weak positive samples, achieving SOTA performance and zero-shot generalization on olfactory tasks.
FlexMS is a new flexible benchmarking framework that lets researchers dynamically combine deep learning architectures and evaluate their mass spectrum prediction performance on public metabolomics datasets using multiple metrics and retrieval tasks.
MIST models up to 10x larger than prior work, fine-tuned on over 400 structure-property tasks, match or exceed SOTA on benchmarks and demonstrate zero-shot olfactory perception mapping consistent with hyperbolic geometry.
SmellNet supplies 828k gas-sensor time series across 50 substances plus 43 mixtures; ScentFormer reaches 63.3% top-1 accuracy on classification and 50.2% top-1@0.1 on mixture prediction.
ChemCrow augments LLMs with 18 expert chemistry tools to autonomously plan and execute syntheses and guide molecular discoveries in organic synthesis, drug discovery, and materials design.
MolE-RAG is a training-free RAG framework that augments LLMs with literature, molecular context, and structural analogs to improve performance on nine molecular property prediction tasks.
A tabular foundation model pipeline with ETF preprocessing transfers across 7 modalities on 95 datasets, matching lightweight tuned baselines on frozen features at much higher speed while providing calibration for deployment.
SPADE selects ligands more efficiently than deep learning or Bayesian optimization, needing fewer tests on average to identify high-quality drug candidates for novel proteins.
Active learning for chemical reaction extraction frequently produces non-monotonic learning curves and fails to deliver stable gains over random sampling because of strong pretraining, structured CRF decoding, and label sparsity.
Lit2Vec delivers a documented, reproducible pipeline that extracts and annotates a large licensed chemistry paper corpus from S2ORC with paragraph embeddings and subfield labels.
GLACIER combines graph, SMILES, and descriptor encoders with Finsler fusion and contrastive distillation to produce an efficient multimodal model for molecular property prediction.
citing papers explorer
No citing papers match the current filters.