MoleCode unlocks structural intelligence in large language models
Pith reviewed 2026-05-19 21:34 UTC · model grok-4.3
The pith
MoleCode makes molecular topology directly readable, editable and auditable by LLMs instead of hidden in SMILES strings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MoleCode is an LLM-native, training-free, graph-explicit molecular language in which all molecular components are represented as typed entities with persistent identifiers and explicit relations. This makes molecular topology directly readable, editable and auditable within the language context, allowing an LLM to operate on structure rather than recover it from syntax. Across reasoning, editing, generation and analysis tasks the shift improves frontier models most when structural access limits performance, replaces long reconstruction traces with shorter chemically directed reasoning, and supports localized property-aligned edits that preserve similarity to starting compounds.
What carries the argument
Subgraph-Node-Edge grammar that supplies typed entities with persistent identifiers and explicit relations so topology becomes first-class content inside LLM prompts.
If this is right
- Molecular optimization produces localized edits that preserve structural similarity while aligning with target properties.
- Inference effort moves from lengthy implicit reconstruction to shorter, chemically directed reasoning over explicit atoms and bonds.
- The grammar extends without change to polymers, Markush structures, mechanism-style transformations and documents that interleave text with chemical images.
- Performance improvements concentrate on unfamiliar molecules, topology-sensitive operations, larger structures and repetitive polymers.
Where Pith is reading between the lines
- Explicit structural interfaces may prove useful for any relational scientific object whose topology is currently decoded from linear text.
- The approach could be tested on protein or materials graphs to check whether the same reduction in reconstruction overhead appears.
- Auditability of edits inside the prompt may lower error rates in multi-step chemical planning by making each structural change traceable.
Load-bearing premise
Frontier LLMs can immediately exploit the explicit Subgraph-Node-Edge grammar in ordinary prompts for better reasoning without training or fine-tuning, and observed gains come specifically from structural access rather than prompt length or other factors.
What would settle it
A controlled comparison of MoleCode versus length-matched SMILES prompts on identical tasks that finds no performance difference once prompt length and wording are equalized.
read the original abstract
Molecules are graphs, but large language models~(LLMs) are usually asked to reason about them through linear strings. The most popular molecular representation, SMILES, compresses atoms, bonds, branches and rings into a compact sequence in which topology is implicit, forcing LLMs to reconstruct molecular structure before performing the requested chemical operation. Here we introduce MoleCode, an LLM-native, training-free, graph-explicit molecular language in which all molecular components are represented as typed entities with persistent identifiers and explicit relations. MoleCode makes molecular topology directly readable, editable and auditable within the language context, allowing an LLM to operate on structure rather than recover it from syntax. Across molecular reasoning, editing, generation and analysis tasks, this representational shift improves frontier LLMs most strongly when structural access is limiting: unfamiliar molecules, topology-sensitive operations, larger structures and repetitive polymers. It also changes how inference is allocated, replacing long reasoning traces devoted to implicit structural reconstruction with shorter, more chemically directed reasoning over explicit atoms and bonds. In molecular optimization, this enables localized, property-aligned edits that preserve structural similarity to the starting compounds. The same Subgraph--Node--Edge grammar extends beyond small molecules to polymers, Markush structures, mechanism-style transformations and interleaved scientific documents, including research articles and patent disclosures in which chemical information is distributed across text and images. These results suggest that the interface between scientific objects and LLMs should not treat structure as something to be decoded from text. When the object of reasoning is relational, the structure itself should be part of the language.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MoleCode, a training-free, graph-explicit molecular representation using a Subgraph-Node-Edge grammar that assigns persistent identifiers and explicit relations to atoms, bonds, subgraphs and other components. It claims this allows frontier LLMs to directly read, edit and reason over molecular topology instead of implicitly reconstructing structure from linear strings such as SMILES, yielding performance gains on reasoning, editing, generation and analysis tasks (especially for unfamiliar molecules, topology-sensitive operations and larger or polymeric structures) while also extending the grammar to Markush structures, mechanisms and interleaved scientific documents.
Significance. If the empirical claims are substantiated with controlled experiments, the work would offer a practical, immediately usable interface for injecting explicit relational structure into LLM prompts for chemistry and biology. The training-free character and the extension to polymers and document-scale chemical information are notable strengths that could influence how structural objects are represented in scientific LLM applications.
major comments (2)
- [Abstract] Abstract and Results: the manuscript asserts quantitative improvements on molecular tasks but supplies no metrics, baselines, error bars, or task definitions, preventing any assessment of effect size or reproducibility.
- [Results] The central claim that observed gains arise specifically from the explicit Subgraph-Node-Edge grammar (rather than longer or more detailed prompts) is load-bearing yet untested; no ablation that holds total token count and surface-level chemical detail fixed while varying only the node/edge identifiers and relations is described.
minor comments (2)
- Provide a concise formal grammar or BNF for the Subgraph-Node-Edge syntax together with a side-by-side comparison to SMILES for a small molecule containing a ring and a branch.
- Clarify how persistent identifiers are maintained across multi-turn editing sessions and how the representation scales to very large polymers without token explosion.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the opportunity to clarify the empirical basis of our claims. We address each major comment below and outline targeted revisions to improve quantitative reporting and experimental controls.
read point-by-point responses
-
Referee: [Abstract] Abstract and Results: the manuscript asserts quantitative improvements on molecular tasks but supplies no metrics, baselines, error bars, or task definitions, preventing any assessment of effect size or reproducibility.
Authors: We agree that the abstract and results presentation would benefit from greater quantitative detail. The manuscript reports comparative performance on reasoning, editing, generation and analysis tasks against SMILES baselines, with gains most pronounced for topology-sensitive and larger structures. To address reproducibility concerns, the revised manuscript will add explicit task definitions (e.g., success rate for localized edits, accuracy on ring-counting and connectivity queries), report error bars from multiple independent runs, and include a summary table of effect sizes and baselines in both the abstract and results sections. revision: yes
-
Referee: [Results] The central claim that observed gains arise specifically from the explicit Subgraph-Node-Edge grammar (rather than longer or more detailed prompts) is load-bearing yet untested; no ablation that holds total token count and surface-level chemical detail fixed while varying only the node/edge identifiers and relations is described.
Authors: This is a fair and important point. While our experiments contrast MoleCode against standard SMILES prompting, they do not include a controlled ablation that equalizes token count and surface-level chemical description while isolating the effect of persistent node/edge identifiers and explicit relations. We will add such an ablation in the revision, using length-matched prompt variants that expand SMILES with equivalent textual detail, to quantify the incremental benefit attributable to the structured grammar. revision: yes
Circularity Check
No circularity: paper introduces new representational grammar with empirical claims only
full rationale
The manuscript proposes MoleCode as a new Subgraph-Node-Edge grammar for molecular structures to make topology explicit in LLM prompts. No mathematical derivations, fitted parameters, predictions, or equations are present in the provided text. The central claims rest on empirical performance improvements across tasks rather than any reduction of outputs to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked to justify the grammar itself. The contribution is a proposed interface change whose benefits are asserted through task results, not derived from prior fitted quantities or self-referential definitions. This is a standard non-circular empirical proposal of a representational format.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Frontier LLMs can directly utilize explicit graph representations with typed entities and relations in their context for reasoning without additional training.
invented entities (1)
-
MoleCode
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MoleCode is built mainly from three primitives: Subgraph, Node, and Edge. Subgraphs define structural scopes, nodes encode typed entities with persistent identifiers, and edges encode explicit relations between nodes.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MoleCode makes molecular topology directly readable, editable and auditable within the language context, allowing an LLM to operate on structure rather than recover it from syntax.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A survey on large language models in biology and chemistry
Islambek Ashyrmamatov, Su Ji Gwak, Su-Young Jin, Ikhyeong Jun, Umit V Ucak, Jay-Yoon Lee, and Juyong Lee. A survey on large language models in biology and chemistry. Experimental & Molecular Medicine, pages 1–11, 2025
work page 2025
-
[2]
Large language models as molecular design engines
Debjyoti Bhattacharya, Harrison J Cassady, Michael A Hickner, and Wesley F Reinhart. Large language models as molecular design engines. Journal of Chemical Information and Modeling, 64(18):7086–7096, 2024
work page 2024
-
[3]
Llamo: Large language model-based molecular graph assistant
Jinyoung Park, Minseong Bae, Dohwan Ko, and Hyunwoo J Kim. Llamo: Large language model-based molecular graph assistant. Advances in Neural Information Processing Systems, 37:131972–132000, 2024
work page 2024
-
[4]
Adrian Mirza, Nawaf Alampara, Sreekanth Kunchapu, Martiño Ríos-García, Benedict Emoekabu, Aswanth Krishnan, Tanya Gupta, Mara Schilling-Wilhelmi, Macjonathan Okereke, Anagha Aneesh, et al. A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists. Nature Chemistry, 17(7):1027– 1034, 2025
work page 2025
-
[5]
Translation between Molecules and Natural Language
Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji. Translation between molecules and natural language. arXiv preprint arXiv:2204.11817, 2022
-
[6]
Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. BioT5: Enriching cross-modal integration in biology with chemical knowledge and natural language associations. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023
work page 2023
-
[7]
Navigating chemical-linguistic sharing space with heterogeneous molecular encoding
Liuzhenghao Lv, Hao Li, Yu Wang, Zhiyuan Yan, Zijun Chen, Zongying Lin, Li Yuan, and Yonghong Tian. Navigating chemical-linguistic sharing space with heterogeneous molecular encoding. arXiv preprint arXiv:2412.20888, 2024
-
[8]
Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter
Zhiyuan Liu, Sihang Li, Yanchen Luo, Hao Fei, Yixin Cao, Kenji Kawaguchi, Xiang Wang, and Tat-Seng Chua. Molca: Molecular graph-language modeling with cross-modal projector and uni-modal adapter. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15623–15638, 2023
work page 2023
-
[9]
Mol- instructions: A large-scale biomolecular instruction dataset for large language models
Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. Mol- instructions: A large-scale biomolecular instruction dataset for large language models. In International Conference on Learning Representations, volume 2024, pages 48221–48251, 2024
work page 2024
-
[10]
He Cao, Zijing Liu, Xingyu Lu, Yuan Yao, and Yu Li. Instructmol: Multi-modal integration for building a versatile and reliable molecular assistant in drug discovery. InProceedings of the 31st International Conference on Computational Linguistics, pages 354–379, 2025. 17
work page 2025
-
[11]
Enhancing activity prediction models in drug discovery with the ability to understand human language
Philipp Seidl, Andreu Vall, Sepp Hochreiter, and Günter Klambauer. Enhancing activity prediction models in drug discovery with the ability to understand human language. In International Conference on Machine Learning, pages 30458–30490. PMLR, 2023
work page 2023
-
[12]
Towards 3d molecule-text interpretation in language models
Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, and Qi Tian. Towards 3d molecule-text interpretation in language models. In International Conference on Learning Representations, volume 2024, pages 17352–17371, 2024
work page 2024
-
[13]
Botao Yu, Frazier N Baker, Ziqi Chen, Xia Ning, and Huan Sun. Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset. In ACL 2024 Workshop Language+ Molecules, 2024
work page 2024
-
[14]
Immunostruct enables multimodal deep learning for immunogenicity prediction
Kevin Bijan Givechian, João Felipe Rocha, Chen Liu, Edward Yang, Sidharth Tyagi, Kerrie Greene, Rex Ying, Etienne Caron, Akiko Iwasaki, and Smita Krishnaswamy. Immunostruct enables multimodal deep learning for immunogenicity prediction. Nature Machine Intelligence, 8:70–83, 2026
work page 2026
-
[15]
Graph neural networks for materials science and chemistry
Patrick Reiser, Marlen Neubert, André Eberhard, Luca Torresi, Chen Zhou, Chen Shao, Houssam Metni, Clint van Hoesel, Henrik Schopmans, Timo Sommer, et al. Graph neural networks for materials science and chemistry. Communications Materials, 3(1):93, 2022
work page 2022
-
[16]
Invalid smiles are beneficial rather than detrimental to chemical language models
Michael A Skinnider. Invalid smiles are beneficial rather than detrimental to chemical language models. Nature Machine Intelligence, 6(4):437–448, 2024
work page 2024
-
[17]
SMILES, a chemical language and information system
David Weininger. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 28(1):31–36, 1988
work page 1988
-
[18]
Improving chemical understanding of llms via smiles parsing
Yunhui Jang, Jaehyung Kim, and Sungsoo Ahn. Improving chemical understanding of llms via smiles parsing. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 15694–15709, 2025
work page 2025
-
[19]
What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks
Taicheng Guo, Kehan Guo, Bowen Nan, Zhenwen Liang, Zhichun Guo, Nitesh Chawla, Olaf Wiest, and Xiangliang Zhang. What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks. Advances in Neural Information Processing Systems, 36, 2023
work page 2023
-
[20]
Li Hao, Liuzhenghao Lv, Zijing Liu, Zhiyuan Yan, Yu Wang, Yonghong Tian, Yu Li, Li Yuan, et al. How to detect and defeat molecular mirage: A metric-driven benchmark for hallucination in llm-based molecular comprehension. In NeurIPS 2025 AI for Science Workshop, 2025
work page 2025
-
[21]
Li Hao, He Cao, Bin Feng, Daniel Shao, Robert Tang, Zhiyuan Yan, Yonghong Tian, Li Yuan, and Yu Li. Beyond chemical qa: Evaluating llm’s chemical reasoning with modular chemical operations.Advances in Neural Information Processing Systems, 38, 2026
work page 2026
-
[22]
Xuan Liu, Siru Ouyang, Xianrui Zhong, Jiawei Han, and Huimin Zhao. Fgbench: A dataset and benchmark for molecular property reasoning at functional group-level in large language models. Advances in Neural Information Processing Systems, 38, 2026
work page 2026
-
[23]
Yuyang Wu, Jinhui Ye, Shuhao Zhang, Lu Dai, Yonatan Bisk, and Olexandr Isayev. Molerr2fix: Benchmarking llm trustworthiness in chemistry via modular error detection, localization, explanation, and correction. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 19365–19382, 2025
work page 2025
-
[24]
Molxpt: Wrapping molecules with text for generative pre-training
Zequn Liu, Wei Zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, and Tie-Yan Liu. Molxpt: Wrapping molecules with text for generative pre-training. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (V olume2: Short Papers), pages 1606–1616, 2023
work page 2023
-
[25]
Root-aligned smiles: a tight representation for chemical reaction prediction
Zipeng Zhong, Jie Song, Zunlei Feng, Tiantao Liu, Lingxiang Jia, Shaolun Yao, Min Wu, Tingjun Hou, and Mingli Song. Root-aligned smiles: a tight representation for chemical reaction prediction. Chemical Science, 13(31):9023–9034, 2022
work page 2022
-
[26]
Representations in distributed cognitive tasks
Jiaje Zhang and Donald A Norman. Representations in distributed cognitive tasks. Cognitive science, 18(1):87–122, 1994
work page 1994
-
[27]
The nature of external representations in problem solving
Jiajie Zhang. The nature of external representations in problem solving. Cognitive science, 21(2):179–217, 1997
work page 1997
-
[28]
David Eugene Smith and Louis Charles Karpinski. The hindu-arabic numerals. Ginn, 1911
work page 1911
-
[29]
Chemistry: the molecular science
John Olmsted and Gregory M Williams. Chemistry: the molecular science. Jones & Bartlett Learning, 1997
work page 1997
-
[30]
The complexity of reasoning about and with chemical representations
Vicente Talanquer. The complexity of reasoning about and with chemical representations. Jacs Au, 2(12):2658–2669, 2022. 18
work page 2022
-
[31]
Self-referencing embedded strings (selfies): A 100% robust molecular string representation
Mario Krenn, Florian Häse, AkshatKumar Nigam, Pascal Friederich, and Alan Aspuru-Guzik. Self-referencing embedded strings (selfies): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4):045024, 2020
work page 2020
-
[32]
t-smiles: a fragment-based molecular representation framework for de novo ligand design
Juan-Ni Wu, Tong Wang, Yue Chen, Li-Juan Tang, Hai-Long Wu, and Ru-Qin Yu. t-smiles: a fragment-based molecular representation framework for de novo ligand design. Nature Communications, 15(1):4993, 2024
work page 2024
-
[33]
Neural message passing for quantum chemistry
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In International Conference on Machine Learning, pages 1263–1272. PMLR, 2017
work page 2017
-
[34]
Semi-supervised classification with graph convolutional networks
Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017
work page 2017
-
[35]
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions
Kristof T Schütt, Pieter-Jan Kindermans, Huziel E Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Müller. SchNet: A continuous-filter convolutional neural network for modeling quantum interactions. In Advances in Neural Information Processing Systems, volume 30, 2018
work page 2018
-
[36]
Uni- mol: A universal 3d molecular representation learning framework
Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. Uni- mol: A universal 3d molecular representation learning framework. In The Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[37]
Graphcl: Contrastive self-supervised learning of graph representations
Hakim Hafidi, Mounir Ghogho, Philippe Ciblat, and Ananthram Swami. Graphcl: Contrastive self-supervised learning of graph representations. arXiv preprint arXiv:2007.08025, 2020
-
[38]
Multi-modal molecule structure–text model for text-based retrieval and editing
Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, and Animashree Anandkumar. Multi-modal molecule structure–text model for text-based retrieval and editing. Nature Machine Intelligence, 5(12):1447–1457, 2023
work page 2023
-
[39]
Feiyang Cai, Jiahui Bai, Tao Tang, Guijuan He, Joshua Luo, Tianyu Zhu, Srikanth Pilla, Gang Li, Ling Liu, and Feng Luo. Mollangbench: A comprehensive benchmark for language-prompted molecular structure recognition, editing, and generation. arXiv preprint arXiv:2505.15054, 2025
-
[40]
Pre-training molecular graph representation with 3d geometry
Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. Pre-training molecular graph representation with 3d geometry. In International Conference on Learning Representations, 2022
work page 2022
-
[41]
Multimodal molecular pretraining via modality blending
Qiying Yu, Yudi Zhang, Yuyan Ni, Shikun Feng, Yanyan Lan, Hao Zhou, and Jingjing Liu. Multimodal molecular pretraining via modality blending. In The Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[42]
Unicorn: A unified contrastive learning approach for multi-view molecular representation learning
Shikun Feng, Yuyan Ni, Minghao Li, Yanwen Huang, Zhi-Ming Ma, Wei-Ying Ma, and Yanyan Lan. Unicorn: A unified contrastive learning approach for multi-view molecular representation learning. arXiv preprint arXiv:2405.10343, 2024
-
[43]
A molecular multimodal foundation model associating molecule graphs with natural language
Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, and Ji-Rong Wen. A molecular multimodal foundation model associating molecule graphs with natural language. arXiv preprint arXiv:2209.05481, 2022
-
[44]
Mol-llama: Towards general understanding of molecules in large molecular language model
Dongki Kim, Wonbin Lee, and Sung Ju Hwang. Mol-llama: Towards general understanding of molecules in large molecular language model. Advances in Neural Information Processing Systems, 38:26921–26960, 2026
work page 2026
-
[45]
DeepSeek-R1 incentivizes reasoning in llms through reinforcement learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. DeepSeek-R1 incentivizes reasoning in llms through reinforcement learning. Nature, 645(8081):633–638, 2025
work page 2025
-
[46]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [47]
-
[48]
Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. Pubchem 2023 update. Nucleic acids research, 51(D1):D1373–D1380, 2023
work page 2023
-
[49]
Grammar variational autoencoder
Matt J Kusner, Brooks Paige, and José Miguel Hernández-Lobato. Grammar variational autoencoder. InInternational conference on machine learning, pages 1945–1954. PMLR, 2017
work page 1945
-
[50]
Junction tree variational autoencoder for molecular graph generation
Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pages 2323–2332. PMLR, 2018
work page 2018
-
[51]
Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. Openai gpt-5 system card. arXiv preprint arXiv:2601.03267, 2025. 19
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022
work page 2022
-
[53]
polybert: a chemical language model to enable fully machine-driven ultrafast polymer informatics
Christopher Kuenneth and Rampi Ramprasad. polybert: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nature communications, 14(1):4099, 2023
work page 2023
-
[54]
Programmable divergent electrochemical ring-opening multifunctionalization of strained rings
Yajuan Li, Yatao Lang, Shu-Fan He, Daixi Li, Ke-Xin Liu, Wenying Ai, Yong Jiang, Chen Zhu, and Tao Shen. Programmable divergent electrochemical ring-opening multifunctionalization of strained rings. Nature Chemistry, pages 1–13, 2026
work page 2026
- [55]
-
[56]
Molparser: End-to-end visual recognition of molecule structures in the wild
Xi Fang, Jiankun Wang, Xiaochen Cai, Shangqian Chen, Shuwen Yang, Haoyi Tao, Nan Wang, Lin Yao, Linfeng Zhang, and Guolin Ke. Molparser: End-to-end visual recognition of molecule structures in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 24528–24538, 2025
work page 2025
-
[57]
RDKit: Open-source cheminformatics.http://www.rdkit.org, 2006
Greg Landrum et al. RDKit: Open-source cheminformatics.http://www.rdkit.org, 2006. Version 2023.09
work page 2006
-
[58]
Speak- to-structure: Evaluating llms in open-domain natural language-driven molecule generation
Jiatong Li, Junxian Li, Weida Wang, Yunqing Liu, Changmeng Zheng, Dongzhan Zhou, Xiao-yong Wei, and Qing Li. Speak- to-structure: Evaluating llms in open-domain natural language-driven molecule generation. arXiv preprint arXiv:2412.14642, 2024
work page internal anchor Pith review arXiv 2024
-
[59]
Assessing the chemical intelligence of large language models
Nicholas T Runcie, Charlotte M Deane, and Fergus Imrie. Assessing the chemical intelligence of large language models. Journal of Chemical Information and Modeling, 66(1):216–227, 2025
work page 2025
-
[60]
A large-scale reaction dataset of mechanistic pathways of organic reactions
Shuan Chen, Ramil Babazade, Taewan Kim, Sunkyu Han, and Yousung Jung. A large-scale reaction dataset of mechanistic pathways of organic reactions. Scientific Data, 11(1):863, 2024. 20
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.