Pith Number

pith:AJOP2LMO

pith:2026:AJOP2LMO45JELL324FTHE7XWUD

not attested not anchored not stored refs resolved

MoleCode unlocks structural intelligence in large language models

Boxuan Zhao, Chen Liu, Fanyang Mo, Hao Li, Jixiang Zhao, Kaiqing Lin, Liuzhenghao Lv, Li Yuan, Shanzhuo Zhang, Yimi Wang, Zhiyuan Yan

MoleCode makes molecular topology directly readable, editable and auditable by LLMs instead of hidden in SMILES strings.

arxiv:2605.16480 v1 · 2026-05-15 · q-bio.BM · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{AJOP2LMO45JELL324FTHE7XWUD}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MoleCode makes molecular topology directly readable, editable and auditable within the language context, allowing an LLM to operate on structure rather than recover it from syntax.

C2weakest assumption

That frontier LLMs can immediately leverage the explicit Subgraph-Node-Edge grammar in prompts for improved reasoning without any training or fine-tuning, and that observed gains stem specifically from structural access rather than prompt length or other variables.

C3one line summary

MoleCode is a training-free, LLM-native representation that makes molecular graphs with explicit atoms, bonds, and topology directly readable and editable in language models, improving structural tasks over implicit string encodings.

References

60 extracted · 60 resolved · 2 Pith anchors

[1] A survey on large language models in biology and chemistry 2025

[2] Large language models as molecular design engines 2024

[3] Llamo: Large language model-based molecular graph assistant 2024

[4] A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists 2025

[5] arXiv preprint arXiv:2204.11817 , year= 2022

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-20T00:02:24.235032Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

025cfd2d8ee75245af7ae166727ef6a0ff5287b2feb8be4d271e69c004e7c266

Aliases

arxiv: 2605.16480 · arxiv_version: 2605.16480v1 · doi: 10.48550/arxiv.2605.16480 · pith_short_12: AJOP2LMO45JE · pith_short_16: AJOP2LMO45JELL32 · pith_short_8: AJOP2LMO

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/AJOP2LMO45JELL324FTHE7XWUD \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 025cfd2d8ee75245af7ae166727ef6a0ff5287b2feb8be4d271e69c004e7c266

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "78e6f7213125c1641475524bffea4e4df0e068cb519dbc53f2b99d300786c270",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/publicdomain/zero/1.0/",
    "primary_cat": "q-bio.BM",
    "submitted_at": "2026-05-15T17:44:27Z",
    "title_canon_sha256": "94ca4ad630776d953f42e36b87e77b7d7f40f80707668682182271bc1a939dae"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16480",
    "kind": "arxiv",
    "version": 1
  }
}