hub Mixed citations

Chemberta: Large-scale self- supervised pretraining for molecular property prediction

(30) Chithrananda, S · 2010 · arXiv 2010.09885

Mixed citation behavior. Most common role is background (40%).

17 Pith papers citing it

Background 40% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 method 2 baseline 1

citation-polarity summary

background 2 use method 2 baseline 1

representative citing papers

Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Chem-GMNet uses sphere-native embeddings, DualSKA attention, and SH-FFN layers to match or beat ChemBERTa-2 on MoleculeNet tasks with fewer parameters and sometimes no pretraining.

FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.

From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.

Training distribution determines the ceiling of drug-blind cancer sensitivity prediction

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

Drug-blind cancer sensitivity prediction is limited by evaluation metric and training distribution rather than drug representation complexity.

Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

PolyLM fine-tunes a 9B-parameter LLM on 185k papers to predict polymer properties from text alone, achieving median R² of 0.74 on 68k held-out samples.

Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces

cs.LG · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.

NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning

cs.CL · 2026-04-12 · unverdicted · novelty 6.0

NOSE aligns molecular, receptor, and linguistic modalities in a shared embedding space via tri-modal orthogonal contrastive learning and weak positive samples, achieving SOTA performance and zero-shot generalization on olfactory tasks.

FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics

cs.AI · 2026-02-26 · unverdicted · novelty 6.0

FlexMS is a new flexible benchmarking framework that lets researchers dynamically combine deep learning architectures and evaluate their mass spectrum prediction performance on public metabolomics datasets using multiple metrics and retrieval tasks.

Foundation Models for Discovery and Exploration in Chemical Space

physics.chem-ph · 2025-10-20 · unverdicted · novelty 6.0

MIST models up to 10x larger than prior work, fine-tuned on over 400 structure-property tasks, match or exceed SOTA on benchmarks and demonstrate zero-shot olfactory perception mapping consistent with hyperbolic geometry.

SmellNet: A Large-scale Dataset for Real-world Smell Recognition

cs.AI · 2025-05-30 · unverdicted · novelty 6.0

SmellNet supplies 828k gas-sensor time series across 50 substances plus 43 mixtures; ScentFormer reaches 63.3% top-1 accuracy on classification and 50.2% top-1@0.1 on mixture prediction.

ChemCrow: Augmenting large-language models with chemistry tools

physics.chem-ph · 2023-04-11 · conditional · novelty 6.0

ChemCrow augments LLMs with 18 expert chemistry tools to autonomously plan and execute syntheses and guide molecular discoveries in organic synthesis, drug discovery, and materials design.

SPADE: Faster Drug Discovery by Learning from Sparse Data

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

SPADE selects ligands more efficiently than deep learning or Bayesian optimization, needing fewer tests on average to identify high-quality drug candidates for novel proteins.

Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction

cs.LG · 2026-04-29 · unverdicted · novelty 5.0 · 2 refs

A benchmark across 156 comparisons finds classical ML models win 116 times while larger pretrained and LLM models win far fewer, showing predictive performance depends on model-task fit rather than scale.

When Active Learning Falls Short: An Empirical Study on Chemical Reaction Extraction

cs.LG · 2026-04-21 · unverdicted · novelty 5.0

Active learning for chemical reaction extraction frequently produces non-monotonic learning curves and fails to deliver stable gains over random sampling because of strong pretraining, structured CRF decoding, and label sparsity.

Lit2Vec: A Reproducible Workflow for Building a Legally Screened Chemistry Corpus from S2ORC for Downstream Retrieval and Text Mining

cs.DB · 2026-04-14 · unverdicted · novelty 5.0

Lit2Vec delivers a documented, reproducible pipeline that extracts and annotates a large licensed chemistry paper corpus from S2ORC with paragraph embeddings and subfield labels.

Regression with Large Language Models for Materials and Molecular Property Prediction

cond-mat.mtrl-sci · 2024-09-09 · unverdicted · novelty 4.0

Fine-tuned LLaMA 3 achieves regression performance on QM9 molecular properties and 28 materials properties from composition strings that rivals random forests but is 5-10x worse than specialized models using atomic coordinates.

Machine Learning Based Prediction of Proton Conductivity in Metal-Organic Frameworks

cond-mat.mtrl-sci · 2024-06-18 · unverdicted · novelty 4.0

A newly built database of proton-conductive MOFs supports descriptor and transformer machine learning models that predict conductivity with MAE 0.91.

citing papers explorer

Showing 17 of 17 citing papers.

Chem-GMNet: A Sphere-Native Geometric Transformer for Molecular Property Prediction cs.LG · 2026-05-13 · unverdicted · none · ref 6
Chem-GMNet uses sphere-native embeddings, DualSKA attention, and SH-FFN layers to match or beat ChemBERTa-2 on MoleculeNet tasks with fewer parameters and sometimes no pretraining.
FORGE: Fragment-Oriented Ranking and Generation for Context-Aware Molecular Optimization cs.LG · 2026-05-11 · unverdicted · none · ref 22
FORGE reformulates molecular optimization as context-aware fragment ranking and replacement using mined low-to-high edit pairs, outperforming larger language models and graph methods on standard benchmarks.
From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models cs.LG · 2026-05-11 · unverdicted · none · ref 11
Chirality emerges in SMILES translation models through an abrupt encoder-centered reorganization of representations after a long plateau, identified via checkpoint analysis and ablation.
Training distribution determines the ceiling of drug-blind cancer sensitivity prediction cs.LG · 2026-05-20 · unverdicted · none · ref 8
Drug-blind cancer sensitivity prediction is limited by evaluation metric and training distribution rather than drug representation complexity.
Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose? cs.LG · 2026-05-07 · unverdicted · none · ref 5
PolyLM fine-tunes a 9B-parameter LLM on 185k papers to predict polymer properties from text alone, achieving median R² of 0.74 on 68k held-out samples.
Molecules Meet Language: Confound-Aware Representation Learning and Chemical Property Steering in Transformer-VAE Latent Spaces cs.LG · 2026-05-07 · unverdicted · none · ref 12 · 2 links
Chemically meaningful steering for properties like cLogP and TPSA emerges in entangled Transformer-VAE latent spaces only after controlling for SELFIES representation confounds through residualization and decoded traversals.
NOSE: Neural Olfactory-Semantic Embedding with Tri-Modal Orthogonal Contrastive Learning cs.CL · 2026-04-12 · unverdicted · none · ref 11
NOSE aligns molecular, receptor, and linguistic modalities in a shared embedding space via tri-modal orthogonal contrastive learning and weak positive samples, achieving SOTA performance and zero-shot generalization on olfactory tasks.
FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics cs.AI · 2026-02-26 · unverdicted · none · ref 9
FlexMS is a new flexible benchmarking framework that lets researchers dynamically combine deep learning architectures and evaluate their mass spectrum prediction performance on public metabolomics datasets using multiple metrics and retrieval tasks.
Foundation Models for Discovery and Exploration in Chemical Space physics.chem-ph · 2025-10-20 · unverdicted · none · ref 107
MIST models up to 10x larger than prior work, fine-tuned on over 400 structure-property tasks, match or exceed SOTA on benchmarks and demonstrate zero-shot olfactory perception mapping consistent with hyperbolic geometry.
SmellNet: A Large-scale Dataset for Real-world Smell Recognition cs.AI · 2025-05-30 · unverdicted · none · ref 9
SmellNet supplies 828k gas-sensor time series across 50 substances plus 43 mixtures; ScentFormer reaches 63.3% top-1 accuracy on classification and 50.2% top-1@0.1 on mixture prediction.
ChemCrow: Augmenting large-language models with chemistry tools physics.chem-ph · 2023-04-11 · conditional · none · ref 30
ChemCrow augments LLMs with 18 expert chemistry tools to autonomously plan and execute syntheses and guide molecular discoveries in organic synthesis, drug discovery, and materials design.
SPADE: Faster Drug Discovery by Learning from Sparse Data cs.LG · 2026-05-06 · unverdicted · none · ref 3
SPADE selects ligands more efficiently than deep learning or Bayesian optimization, needing fewer tests on average to identify high-quality drug candidates for novel proteins.
Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction cs.LG · 2026-04-29 · unverdicted · none · ref 7 · 2 links
A benchmark across 156 comparisons finds classical ML models win 116 times while larger pretrained and LLM models win far fewer, showing predictive performance depends on model-task fit rather than scale.
When Active Learning Falls Short: An Empirical Study on Chemical Reaction Extraction cs.LG · 2026-04-21 · unverdicted · none · ref 38
Active learning for chemical reaction extraction frequently produces non-monotonic learning curves and fails to deliver stable gains over random sampling because of strong pretraining, structured CRF decoding, and label sparsity.
Lit2Vec: A Reproducible Workflow for Building a Legally Screened Chemistry Corpus from S2ORC for Downstream Retrieval and Text Mining cs.DB · 2026-04-14 · unverdicted · none · ref 12
Lit2Vec delivers a documented, reproducible pipeline that extracts and annotates a large licensed chemistry paper corpus from S2ORC with paragraph embeddings and subfield labels.
Regression with Large Language Models for Materials and Molecular Property Prediction cond-mat.mtrl-sci · 2024-09-09 · unverdicted · none · ref 1
Fine-tuned LLaMA 3 achieves regression performance on QM9 molecular properties and 28 materials properties from composition strings that rivals random forests but is 5-10x worse than specialized models using atomic coordinates.
Machine Learning Based Prediction of Proton Conductivity in Metal-Organic Frameworks cond-mat.mtrl-sci · 2024-06-18 · unverdicted · none · ref 4
A newly built database of proton-conductive MOFs supports descriptor and transformer machine learning models that predict conductivity with MAE 0.91.

Chemberta: Large-scale self- supervised pretraining for molecular property prediction

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer