pith:UP7WBPUX
SMolLM: Small Language Models Learn Small Molecular Grammar
A 53K-parameter transformer generates valid SMILES by resolving constraints in fixed order: brackets first, rings second, valence last.
arxiv:2605.06322 v2 · 2026-05-07 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UP7WBPUX2CQAMH4P73KNXMGHMF}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
the same block resolves SMILES constraints across passes in a fixed order: brackets first, rings second, and valence last, as shown by error classification, linear probing, and sparse autoencoders. A systematic ablation across attention heads and passes further localizes the first bracket-matching step to a single attention head.
That linear probing, sparse autoencoders, and error classification reveal the actual causal computation rather than surface correlations, and that high validity on the benchmark reflects genuine grammar learning instead of dataset-specific pattern matching.
A 53K-parameter model generates 95% valid SMILES on ZINC-250K, outperforming larger models, by resolving chemical constraints in fixed order: brackets first, rings second, valence last.
Receipt and verification
| First computed | 2026-05-29T01:04:37.579711Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
a3ff60be97d0a0061f8ffed4dbb0c7615c503b048727f55fdc38c17610cdc4f0
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UP7WBPUX2CQAMH4P73KNXMGHMF \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a3ff60be97d0a0061f8ffed4dbb0c7615c503b048727f55fdc38c17610cdc4f0
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "f906cd36d6791d7e91c9eae6e3e7c4a7664a8909544df1b45fecd4d6a4c818a9",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-07T14:21:26Z",
"title_canon_sha256": "032d9d01a815da6e9f78a3e10eee32882c81e27900146a05d524f60efde6fba6"
},
"schema_version": "1.0",
"source": {
"id": "2605.06322",
"kind": "arxiv",
"version": 2
}
}