pith. sign in
Pith Number

pith:UP7WBPUX

pith:2026:UP7WBPUX2CQAMH4P73KNXMGHMF
not attested not anchored not stored refs pending

SMolLM: Small Language Models Learn Small Molecular Grammar

Akhil Jindal, Harang Ju

A 53K-parameter transformer generates valid SMILES by resolving constraints in fixed order: brackets first, rings second, valence last.

arxiv:2605.06322 v2 · 2026-05-07 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{UP7WBPUX2CQAMH4P73KNXMGHMF}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

the same block resolves SMILES constraints across passes in a fixed order: brackets first, rings second, and valence last, as shown by error classification, linear probing, and sparse autoencoders. A systematic ablation across attention heads and passes further localizes the first bracket-matching step to a single attention head.

C2weakest assumption

That linear probing, sparse autoencoders, and error classification reveal the actual causal computation rather than surface correlations, and that high validity on the benchmark reflects genuine grammar learning instead of dataset-specific pattern matching.

C3one line summary

A 53K-parameter model generates 95% valid SMILES on ZINC-250K, outperforming larger models, by resolving chemical constraints in fixed order: brackets first, rings second, valence last.

Receipt and verification
First computed 2026-05-29T01:04:37.579711Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

a3ff60be97d0a0061f8ffed4dbb0c7615c503b048727f55fdc38c17610cdc4f0

Aliases

arxiv: 2605.06322 · arxiv_version: 2605.06322v2 · doi: 10.48550/arxiv.2605.06322 · pith_short_12: UP7WBPUX2CQA · pith_short_16: UP7WBPUX2CQAMH4P · pith_short_8: UP7WBPUX
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/UP7WBPUX2CQAMH4P73KNXMGHMF \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: a3ff60be97d0a0061f8ffed4dbb0c7615c503b048727f55fdc38c17610cdc4f0
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "f906cd36d6791d7e91c9eae6e3e7c4a7664a8909544df1b45fecd4d6a4c818a9",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-07T14:21:26Z",
    "title_canon_sha256": "032d9d01a815da6e9f78a3e10eee32882c81e27900146a05d524f60efde6fba6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.06322",
    "kind": "arxiv",
    "version": 2
  }
}