pith. sign in

arxiv: 2606.23443 · v1 · pith:2ZTXNN37new · submitted 2026-06-22 · 💻 cs.LG · cs.AI· physics.chem-ph

What Does a Chemical Language Model Know About Molecules?

Pith reviewed 2026-06-26 09:22 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.chem-ph
keywords chemical language modelssparse autoencodersmolecular representationsSMILES stringsmodel interpretabilityposition trackingsubstructure encoding
0
0 comments X

The pith

Sparse autoencoders show chemical language models parse SMILES positions in early layers and encode substructures and pharmacological features later.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper uses sparse autoencoders on the encoder-only model MolFormer to examine how it constructs representations of molecules from SMILES strings layer by layer. It establishes that early layers depend on position-tracking latents to handle molecular grammar, while later layers develop features for atoms within substructures and properties relevant to pharmacology. It further demonstrates that non-canonical SMILES strings trigger larger representation shifts than invalid ones, primarily through disruption of the position latents that then propagates forward. This provides a mechanistic view that goes beyond the common assumption of purely syntactic learning in such models.

Core claim

Early layers in MolFormer rely on position-tracking latents to parse molecular grammar in SMILES, while later layers encode atom-in-substructure and pharmacologically relevant features; non-canonical SMILES produce more disruptive representation shifts than invalid SMILES because position-latent disruption propagates across layers.

What carries the argument

Sparse autoencoders applied across layers of MolFormer to recover and interpret position-tracking and semantic latents from molecular string representations.

If this is right

  • Representation building in chemical language models proceeds from syntactic position tracking to semantic substructure encoding.
  • Non-canonical SMILES strings disrupt internal processing more than invalid ones due to position-latent effects.
  • Pharmacologically relevant features emerge in deeper layers rather than being present from the start.
  • Interactive visualization of SAE activations can reveal how specific molecular strings are processed internally.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This layer-wise progression suggests that interventions on early position latents could improve robustness to SMILES variations.
  • The findings may generalize to other string-based molecular encoders if similar position-tracking mechanisms are recovered.
  • Pharmacological feature emergence in later layers could guide targeted fine-tuning for drug-related tasks.

Load-bearing premise

The latents extracted by the sparse autoencoders reflect actual features used by the model rather than being artifacts of the autoencoder training process.

What would settle it

A controlled intervention that disables the position-tracking latents in early layers and measures whether SMILES parsing accuracy drops specifically for canonical versus non-canonical strings.

Figures

Figures reproduced from arXiv: 2606.23443 by Christian Kenneth, Etowah Adams, Gerard JP van Westen, Liam Bai.

Figure 1
Figure 1. Figure 1: Interpretable Molecular Features. (a) SAE latents classified by dominant token type (D ≥ 0.6) across layers. (b) Number of unique interpretable latents per layer matching their top-1 atom-related concept. (c) Examples of atom-related concepts encoded in latents across layers. ring indices that number rings, branches marking branching points, bond notations representing interatomic connectiv￾ity (e.g., = fo… view at source ↗
Figure 2
Figure 2. Figure 2: Syntax Parsing Features. Position Latents: L1/f/452 shows increasing activations on token C toward the final token; L6/f/963 shows decreasing activations on branch-closing token ) toward the end; and L9/f/1494 shows decreasing activations on the (=O) motif, possibly tracking double-bonded oxygens. Anything-But-X Latents: L1/f/2597 (pzero = 0.99) does not activate on aromatic carbons (pctx = 1.0), ring indi… view at source ↗
Figure 4
Figure 4. Figure 4: Position Latents Decrease Toward Later Layers. Dis￾tribution of absolute Fisher-averaged Spearman coefficients across layers. Peaks near |ρ¯| ≈ 1 at early layers indicate a significant presence of position latents, while later layers show a flattened distribution skewed toward |ρ¯| ≈ 0, suggesting that fewer position latents are present and contribute less positional information to the layer’s internal rep… view at source ↗
Figure 3
Figure 3. Figure 3: Position Latents Shift Learned Representations. (a) Median cosine and Jaccard similarities between max-pooled SAE activations of canonical and non-canonical SMILES pairs. (b) Number of SAE latents with |SMD| ≥ 0.8 classified as non-positional and positional across layers for the augmented SMILES. (c) Examples of features with significant SMD in non-canonical and valence error cases. physicochemical descrip… view at source ↗
Figure 5
Figure 5. Figure 5: Linear Probe Identifies Pharmacologically Relevant SAE Latents. We show the top 3 latents by standardized linear probe coefficient, z(βf ), averaged across triplicates, for (a) human intestinal absorption (HIA), (b) CYP2C9 inhibition, and (c) Ames mutagenicity classification; and (d) top 3 positive and top 2 negative latents for the lipophilicity regression task. (the latter seen in Figure 6c), both well-k… view at source ↗
Figure 6
Figure 6. Figure 6: More Examples of Syntax Parsing Features. Position Latents: L1/f/1327 shows decreasing activations on token c in the first few positions; L3/f/1803 shows decreasing activations on multiple atom tokens; and L12/f/559 shows decreasing activations across multiple tokens. Anything-But-X Latents: L3/f/172 (pzero = 0.97) does not activate on nitrogens (pctx = 0.99), oxygens (pctx = 0.99), branch openings (pctx =… view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of Similarity Measures for Non-canonical and Invalid SMILES. Generally, we find that the median Jaccard and cosine similarities for invalid SMILES are higher compared to non-canonical SMILES, suggesting that latent representations are more robust to invalid SMILES than to non-canonical SMILES. This is likely because invalid SMILES retain the canonical token order, whereas non-canonical SMILES … view at source ↗
Figure 8
Figure 8. Figure 8: Non-canonical SMILES: Significant Latents Increase Across Layers. Grey contour areas show the distribution of latent |SMD| and Spearman correlation to token position ρ¯; colored points mark significant latents (|SMD| ≥ 0.8) from max-pooled activations, with blue indicating higher |SMD| for canonical and red for non-canonical SMILES. Canonical latents consistently cluster near ρ¯ ≈ −1.0, indicating position… view at source ↗
Figure 9
Figure 9. Figure 9: Invalid SMILES: Valence Errors Have the Most Significant Latents in Early Layers. Grey contour areas show the distribution of latent |SMD| and Spearman correlation to token position ρ¯; colored points mark significant latents (|SMD| ≥ 0.8) from max-pooled activations, with blue indicating higher |SMD| for valid and red for invalid SMILES. Unlike augmented SMILES case, high-SMD latents across all error type… view at source ↗
Figure 10
Figure 10. Figure 10: Early-Layer Position Latents Affect Final-Layer Molecular Representations. The y-axis represents the number of SAE latents with |SMD| ≥ 0.8, and the grey markers indicate the ablated position latents, for which the count is expected to be zero. Activation patching reverses this trend, with layer 3 surging significantly for both positional and non-positional latents upon ablating layer 1. Ablating layers 1… view at source ↗
Figure 11
Figure 11. Figure 11: Pharmacological Encoding Is Shared Across Multiple SAE Latents. Shown here is the change in linear probe performance upon ablating latents ranked by z(βf ) – top-k, bottom-k, and top-k by absolute value – on the test set of each data split. Ablating the top-3, bottom-3, or even top-3-absolute latents does not significantly degrade performance, whereas ablating 512 or more latents does, indicating that tas… view at source ↗
Figure 12
Figure 12. Figure 12: SAE Latents Align with Morgan Fingerprint Features. Two molecules from the AMES dataset show that concepts encoded in the SAE latents can correspond to ECFP2 bits. A single SAE latent may be correspond to a fraction of, exactly one, or many ECFP2 bits. Moreover, SAE latents offer not only binary representations but also activation magnitudes, as shown by L9/f/2807, which activates strongly at the [N+] ato… view at source ↗
read the original abstract

Chemical language models (cLMs) are widely assumed to learn surface-level syntactic patterns rather than learning meaningful molecular semantics. Here, we apply sparse autoencoders (SAEs) to MolFormer, an encoder-only cLM, to mechanistically examine how molecular representations are built across layers. We discover that early layers rely on position-tracking latents to parse molecular grammar, while later layers encode atom-in-substructure and pharmacologically relevant features. Additionally, we show that non-canonical SMILES produce more disruptive representation shifts than invalid SMILES, driven by position-latent disruption propagating across layers. To support further exploration, we develop InterMol, an interactive visualizer for SAE activations on molecular strings and structures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper applies sparse autoencoders (SAEs) to MolFormer, an encoder-only chemical language model, to examine layer-wise construction of molecular representations from SMILES strings. It claims that early layers rely on position-tracking latents to parse molecular grammar, later layers encode atom-in-substructure and pharmacologically relevant features, and that non-canonical SMILES induce larger representation shifts than invalid SMILES via propagation of position-latent disruptions. An interactive visualizer called InterMol is introduced to support exploration of SAE activations.

Significance. If the recovered latents are shown to be causally relevant to MolFormer's computations, the work would provide a mechanistic account of how cLMs move from syntactic to semantic processing of molecules, directly addressing a common assumption in the field. The release of InterMol constitutes a concrete contribution for reproducibility and further interpretability studies.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (results on layer-wise latents): the central claims that early layers use position-tracking latents while later layers encode pharmacologically relevant features rest on the assumption that SAE directions correspond to genuine model features; no quantitative metrics, labeling criteria, or validation controls (e.g., ablation or causal patching) are supplied to support this mapping.
  2. [§5] §5 (non-canonical vs. invalid SMILES experiments): the claim that non-canonical SMILES produce more disruptive shifts via position-latent propagation requires evidence that the observed activation changes are driven by the identified position latents rather than other factors; the manuscript provides no feature-ablation or intervention results to establish this causal link.
minor comments (2)
  1. [§3] Figure captions and §3 (SAE training details): sparsity and dictionary-size hyperparameters are not stated explicitly, making it difficult to assess reproducibility of the reported latents.
  2. [§6] The InterMol tool description would benefit from a brief statement of its input/output format and any limitations on the molecules it can visualize.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments on our manuscript. We address each major comment below, clarifying our methodological approach while acknowledging areas where additional discussion or qualification is warranted.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (results on layer-wise latents): the central claims that early layers use position-tracking latents while later layers encode pharmacologically relevant features rest on the assumption that SAE directions correspond to genuine model features; no quantitative metrics, labeling criteria, or validation controls (e.g., ablation or causal patching) are supplied to support this mapping.

    Authors: The interpretations in §4 are derived from consistent activation patterns of SAE latents on targeted molecular inputs, visualized and categorized via the InterMol tool. Labeling was performed by inspecting high-activating examples for syntactic (position) versus semantic (substructure/pharmacophore) properties. We agree that explicit labeling criteria and a clearer statement of the monosemanticity assumption would strengthen the presentation. We will revise §4 and the abstract to include these criteria and add a limitations paragraph noting the absence of quantitative metrics or causal interventions such as ablation or patching. revision: partial

  2. Referee: [§5] §5 (non-canonical vs. invalid SMILES experiments): the claim that non-canonical SMILES produce more disruptive shifts via position-latent propagation requires evidence that the observed activation changes are driven by the identified position latents rather than other factors; the manuscript provides no feature-ablation or intervention results to establish this causal link.

    Authors: Section 5 reports larger activation shifts for non-canonical SMILES that align with early-layer position-latent disruptions propagating forward, based on layer-wise comparison of SAE activations. We concur that this constitutes correlational evidence rather than a direct causal demonstration via ablation or intervention. We will revise the text in §5 to frame the position-latent propagation as a supported hypothesis from the observed patterns, while explicitly noting the lack of feature-ablation results as a limitation. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical SAE analysis with no derivations or self-referential predictions

full rationale

The paper is an empirical interpretability study applying sparse autoencoders to MolFormer activations on SMILES strings. Claims about layer-wise latents (position-tracking in early layers, atom-in-substructure and pharmacological features in later layers) and representation shifts under non-canonical vs. invalid SMILES are presented as observations from activation patterns and visualizations, not as quantities derived from equations or fitted parameters that reduce to the inputs by construction. No self-citations load-bearing the central claims, no uniqueness theorems, no ansatzes smuggled in, and no renaming of known results as new derivations. The work is self-contained against external benchmarks in the sense that its findings rest on direct measurement rather than any internal definition that equates result to input.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The analysis relies on standard SAE techniques whose hyperparameters are not detailed here.

pith-pipeline@v0.9.1-grok · 5651 in / 1106 out tokens · 27867 ms · 2026-06-26T09:22:27.187970+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

21 extracted references · 3 canonical work pages

  1. [1]

    URL https://www.biorxiv.org/content/ early/2025/06/18/2025.02.06.636901

    doi: 10.1101/2025.02.06.636901. URL https://www.biorxiv.org/content/ early/2025/06/18/2025.02.06.636901. Ahmad, W., Simon, E., Chithrananda, S., Grand, G., and Ramsundar, B. Chemberta-2: Towards chemical foun- dation models,

  2. [2]

    Bajorath, J

    URL https://arxiv.org/ abs/2209.01712. Bajorath, J. Chemical and biological language models in molecular design: opportunities, risks and scientific rea- soning.Future Sci. OA, 10(1):FSO957, May

  3. [3]

    Brinkmann, H., Argante, A., ter Steege, H., and Grisoni, F

    URL https://arxiv.org/abs/1703.07076. Brinkmann, H., Argante, A., ter Steege, H., and Grisoni, F. Going beyond smiles enumeration for data augmentation in generative drug discovery.Digital Discovery, 4:2752– 2764,

  4. [4]

    URL http: //dx.doi.org/10.1039/D5DD00028A

    doi: 10.1039/D5DD00028A. URL http: //dx.doi.org/10.1039/D5DD00028A. Brixi, G., Durrant, M. G., Ku, J., Poli, M., Brockman, G., Chang, D., Gonzalez, G. A., King, S. H., Li, D. B., Mer- chant, A. T., Naghipourfar, M., Nguyen, E., Ricci-Tam, C., Romero, D. W., Sun, G., Taghibakshi, A., V orontsov, A., Yang, B., Deng, M., Gorton, L., Nguyen, N., Wang, N. K., ...

  5. [5]

    Genome modeling and design across all domains of life with Evo 2

    doi: 10.1101/2025.02.18.638918. URL https://www.biorxiv.org/content/ early/2025/02/21/2025.02.18.638918. Chithrananda, S., Grand, G., and Ramsundar, B. Chem- berta: Large-scale self-supervised pretraining for molecu- lar property prediction,

  6. [6]

    org/abs/2010.09885

    URL https://arxiv. org/abs/2010.09885. Cohen, J., Hasson, A. G., and Tanovic, S. Unveiling la- tent knowledge in chemistry language models through sparse autoencoders,

  7. [7]

    org/abs/2512.08077

    URL https://arxiv. org/abs/2512.08077. Dudek, A., Dejnaka, E., Sulecka-Zadka, J., Perz, M., Krawczyk-Łebek, A., Kostrzewa-Susłow, E., Pruchnik, 8 What Does a Chemical Language Model Know About Molecules? H., and Pawlak, A. Bromo- and chloro-substituted flavones induce apoptosis and modulate cell death path- ways in canine lymphoma and leukemia cells - a c...

  8. [8]

    Fender, I., Gut, J

    URLhttps://arxiv.org/abs/2510.08638. Fender, I., Gut, J. A., and Lemmin, T. Beyond performance: how design choices shape chemical language models.J. Cheminform., 17(1):173, November

  9. [9]

    Grisoni, F

    URL https: //arxiv.org/abs/2406.04093. Grisoni, F. Chemical language models for de novo drug design: Challenges and opportunities.Curr . Opin. Struct. Biol., 79(102527):102527, April

  10. [10]

    Huang, K., Fu, T., Gao, W., Zhao, Y ., Roohani, Y ., Leskovec, J., Coley, C

    URL https://arxiv.org/abs/ 2509.14252. Huang, K., Fu, T., Gao, W., Zhao, Y ., Roohani, Y ., Leskovec, J., Coley, C. W., Xiao, C., Sun, J., and Zitnik, M. Ther- apeutics data commons: Machine learning datasets and tasks for drug discovery and development,

  11. [11]

    Huang, K., Fu, T., Gao, W., Zhao, Y ., Roohani, Y ., Leskovec, J., Coley, C

    URL https://arxiv.org/abs/2102.09548. Huang, K., Fu, T., Gao, W., Zhao, Y ., Roohani, Y ., Leskovec, J., Coley, C. W., Xiao, C., Sun, J., and Zitnik, M. Artifi- cial intelligence foundation for therapeutic science.Nat. Chem. Biol., 18(10):1033–1036, October

  12. [12]

    Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B

    URL https://arxiv.org/abs/2505.07139. Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B. A., Thiessen, P. A., Yu, B., Zaslavsky, L., Zhang, J., and Bolton, E. E. PubChem 2019 update: improved access to chemical data.Nucleic Acids Res., 47(D1):D1102–D1109, January

  13. [13]

    F., Probst, D., Ujihara, K., Pahl, A., Godin, G., and Lehtivarjo, J

    Landrum, G., Tosco, P., Kelley, B., Rodriguez, R., Cos- grove, D., Vianello, R., sriniker, Gedeck, P., Jones, G., Kawashima, E., NadineSchneider, Nealschneider, D., Dalke, A., tadhurst-cdd, Swain, M., Cole, B., Turk, S., Savelev, A., Maeder, N., Vaucher, A., W´ojcikowski, M., Faara, H., Take, I., Walker, R., Scalfani, V . F., Probst, D., Ujihara, K., Pahl...

  14. [14]

    org/abs/2411.12886

    URL https://arxiv. org/abs/2411.12886. Schoenmaker, L., B´equignon, O. J. M., Jespers, W., and van Westen, G. J. P. UnCorrupt SMILES: a novel approach to de novo design.J. Cheminform., 15(1):22, February

  15. [15]

    org/abs/2506.15679

    URL https://arxiv. org/abs/2506.15679. Templeton, A., Conerly, T., Marcus, J., Lindsey, J., Bricken, T., Chen, B., Pearce, A., Citro, C., Ameisen, E., Jones, A., Cunningham, H., Turner, N. L., McDougall, C., MacDiarmid, M., Freeman, C. D., Sumers, T. R., Rees, E., Batson, J., Jermyn, A., Carter, S., Olah, C., and Henighan, T. Scaling monosemanticity: Ex- ...

  16. [16]

    Tsui, D., Talreja, K., Saeedi, D., and Aghazadeh, A

    URL https: //transformer-circuits.pub/2024/ scaling-monosemanticity/index.html. Tsui, D., Talreja, K., Saeedi, D., and Aghazadeh, A. Protein circuit tracing via cross-layer transcoders,

  17. [17]

    Varadi, K., Marosi, M., and Antal, P

    URL https://arxiv.org/abs/2602.12026. Varadi, K., Marosi, M., and Antal, P. Circuits, features, and heuristics in molecular transformers,

  18. [18]

    Veith, H., Southall, N., Huang, R., James, T., Fayne, D., Artemenko, N., Shen, M., Inglese, J., Austin, C

    URL https://arxiv.org/abs/2512.09757. Veith, H., Southall, N., Huang, R., James, T., Fayne, D., Artemenko, N., Shen, M., Inglese, J., Austin, C. P., Lloyd, D. G., and Auld, D. S. Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries.Nat. Biotechnol., 27(11):1050–1055, November

  19. [19]

    Code Availability The code to reproduce the experiments is available athttps://github.com/ckennetha/intermol

    10 What Does a Chemical Language Model Know About Molecules? Appendix A. Code Availability The code to reproduce the experiments is available athttps://github.com/ckennetha/intermol. B. Related Work Mechanistic Interpretability of Chemical Language Models.Understanding how cLMs work internally may help in designing better cLMs that enable the generation o...

  20. [20]

    Normalization.For ease of interpretation, we normalized SAE latents using ∼250,000 molecular SMILES randomly sampled from the SAE training dataset

    and PubChem (Kim et al., 2019), the same data pool used to train MolFormer-XL. Normalization.For ease of interpretation, we normalized SAE latents using ∼250,000 molecular SMILES randomly sampled from the SAE training dataset. Following Simon & Zou (2025), for each latent in each SAE, we used its maximum activation to rescale the encoder and decoder weigh...

  21. [21]

    Table 1.Invalid SMILES Generation.Error variations are randomly introduced for each error type with examples of resulting invalid SMILES

    This yields around 300,000 valid/invalid pairs. Table 1.Invalid SMILES Generation.Error variations are randomly introduced for each error type with examples of resulting invalid SMILES. Red tokens on the left indicate deletions, while those on the right indicate substitutions or insertions. Error type Variation Invalid SMILES Rings Remove a ring indexc1cc...