hub Mixed citations

ChemBERTa: large -scale self -supervised pretraining fo r molecular property prediction

Seyone Chithrananda, Gabriel Grand, Bharath Ramsundar · 2010 · arXiv 2010.09885

Mixed citation behavior. Most common role is background (40%).

26 Pith papers citing it

Background 40% of classified citations

read on arXiv browse 26 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 2 baseline 1

citation-polarity summary

background 2 use method 2 baseline 1

representative citing papers

Towards Generalizable and Evidential Nuclear Magnetic Resonance-Based Molecular Structure Elucidation via Large Language Model Agent

cs.LG · 2026-06-29 · unverdicted · novelty 7.0

NMRAgent is an evidential LLM agent for NMR-based molecular structure elucidation that improves accuracy on novel scaffolds and demonstrates utility on real natural products.

Modeling Cell-Cycle-Aware Single-Cell Drug Perturbation Responses

q-bio.QM · 2026-06-29 · unverdicted · novelty 7.0

scCycleMol adds a learnable circular cell-cycle head with closed-loop supervision from predicted treated expression, yielding higher r-squared on SciPlex3 gene predictions and improved phase accuracy versus ChemCPA baselines.

Contextualizing Biological Language Models across Modalities via Logit-Space Contrastive Alignment

cs.LG · 2026-06-17 · unverdicted · novelty 7.0 · 2 refs

LOGICA adds context to pretrained biological LMs via logit-space contrastive alignment with gated adapters, improving AUC on held-out drug-resistance mutation ranking from ~0.55 to ~0.65 while preserving token likelihoods.

Augmenting Molecular Language Models with Local $n$-gram Memory

cs.CL · 2026-06-10 · unverdicted · novelty 7.0

MolGram integrates a conditional n-gram memory module into molecular language models to address locality gaps in SMILES tokenization, improving performance on generation, forward prediction, and retrosynthesis while outperforming 3x larger baselines.

Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Chem-GMNet uses sphere-native embeddings, DualSKA attention, and SH-FFN layers to match or beat ChemBERTa-2 on MoleculeNet tasks with fewer parameters and sometimes no pretraining.

FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.

From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.

Probing Chemical Language Models: Effects of Pre-training and Fine-tuning

cs.LG · 2026-07-02 · unverdicted · novelty 6.0

Pre-training improves CLMs' encoding of molecular substructures especially in upper layers while fine-tuning selectively modifies task-relevant substructures more than others.

AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

Decentralized AI agent teams self-organize around hypotheses, critique proposals, and share knowledge to outperform single-agent baselines on biomedical ML, language-model optimization, and protein fitness tasks.

Training distribution determines the ceiling of drug-blind cancer sensitivity prediction

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

Drug-blind cancer sensitivity prediction is limited by evaluation metric and training distribution rather than drug representation complexity.

Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

PolyLM fine-tunes a 9B-parameter LLM on 185k papers to predict polymer properties from text alone, achieving median R² of 0.74 on 68k held-out samples.

Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.

Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction

cs.LG · 2026-04-29 · unverdicted · novelty 6.0 · 2 refs

Benchmark across 78 endpoint-split entries finds classical ML winning 47.4% of best performances over pretrained models, GNNs, and LLMs, with performance depending on model-task-split fit rather than scale.

NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning

cs.CL · 2026-04-12 · unverdicted · novelty 6.0

NOSE aligns molecular, receptor, and linguistic modalities in a shared embedding space via tri-modal orthogonal contrastive learning and weak positive samples, achieving SOTA performance and zero-shot generalization on olfactory tasks.

FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics

cs.AI · 2026-02-26 · unverdicted · novelty 6.0

FlexMS is a new flexible benchmarking framework that lets researchers dynamically combine deep learning architectures and evaluate their mass spectrum prediction performance on public metabolomics datasets using multiple metrics and retrieval tasks.

Foundation Models for Discovery and Exploration in Chemical Space

physics.chem-ph · 2025-10-20 · unverdicted · novelty 6.0

MIST models up to 10x larger than prior work, fine-tuned on over 400 structure-property tasks, match or exceed SOTA on benchmarks and demonstrate zero-shot olfactory perception mapping consistent with hyperbolic geometry.

SmellNet: A Large-scale Dataset for Real-world Smell Recognition

cs.AI · 2025-05-30 · unverdicted · novelty 6.0

SmellNet supplies 828k gas-sensor time series across 50 substances plus 43 mixtures; ScentFormer reaches 63.3% top-1 accuracy on classification and 50.2% top-1@0.1 on mixture prediction.

ChemCrow: Augmenting large-language models with chemistry tools

physics.chem-ph · 2023-04-11 · conditional · novelty 6.0

ChemCrow augments LLMs with 18 expert chemistry tools to autonomously plan and execute syntheses and guide molecular discoveries in organic synthesis, drug discovery, and materials design.

MolE-RAG: Molecular Structure-Enhanced Retrieval-Augmented Generation for Chemistry

cs.LG · 2026-06-04 · unverdicted · novelty 5.0

MolE-RAG is a training-free RAG framework that augments LLMs with literature, molecular context, and structural analogs to improve performance on nine molecular property prediction tasks.

When Tabular Foundation Models Transfer Across Modalities: A Systematic Evaluation Across 95 Datasets, 7 Modalities, and Two Regimes

cs.LG · 2026-06-01 · unverdicted · novelty 5.0

A tabular foundation model pipeline with ETF preprocessing transfers across 7 modalities on 95 datasets, matching lightweight tuned baselines on frozen features at much higher speed while providing calibration for deployment.

SPADE: Faster Drug Discovery by Learning from Sparse Data

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

SPADE selects ligands more efficiently than deep learning or Bayesian optimization, needing fewer tests on average to identify high-quality drug candidates for novel proteins.

When Active Learning Falls Short: An Empirical Study on Chemical Reaction Extraction

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

Active learning for chemical reaction extraction frequently produces non-monotonic learning curves and fails to deliver stable gains over random sampling because of strong pretraining, structured CRF decoding, and label sparsity.

Lit2Vec: A Reproducible Workflow for Building a Legally Screened Chemistry Corpus from S2ORC for Downstream Retrieval and Text Mining

cs.DB · 2026-04-14 · unverdicted · novelty 5.0

Lit2Vec delivers a documented, reproducible pipeline that extracts and annotates a large licensed chemistry paper corpus from S2ORC with paragraph embeddings and subfield labels.

GLACIER: A Multimodal Student-Teacher Foundation Model for Molecular Property Prediction

cs.LG · 2026-06-09 · unverdicted · novelty 4.0

GLACIER combines graph, SMILES, and descriptor encoders with Finsler fusion and contrastive distillation to produce an efficient multimodal model for molecular property prediction.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

ChemBERTa: large -scale self -supervised pretraining fo r molecular property prediction

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer