Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics
Pith reviewed 2026-06-27 10:53 UTC · model grok-4.3
The pith
A foundation model for continuum dynamics recruits SAE features in piecewise consistent but physically unaligned patterns across shear flow setups.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across multiple shear-flow setups the model shows evidence of piecewise consistency in which subsets of SAE features recur in similar roles, but this structure is intermittent and does not map cleanly onto standard physical decompositions; parts of the observed discrepancies between numerical simulation and emulator outputs can be connected to changes in specific SAE feature usage.
What carries the argument
Sparse autoencoder features triaged by enstrophy in one selected layer of the Walrus foundation model for continuum dynamics.
If this is right
- Subsets of SAE features recur in similar roles across different shear-flow parameter values.
- The observed consistency remains only piecewise and does not align with standard physical decompositions.
- Some systematic output discrepancies between simulation and emulation are traceable to changes in particular SAE feature usage.
- Single-layer SAE analysis leaves open how to separate stable internal structure from analysis artifacts.
Where Pith is reading between the lines
- Using additional or combined physical metrics for triage might expose whether low-enstrophy features carry overlooked mechanistic information.
- The intermittent consistency could indicate that the model develops effective yet non-physical internal representations for continuum tasks.
- Extending the same probing approach to other layers or to different foundation models would test whether these interpretability issues are widespread.
Load-bearing premise
Enstrophy supplies a sufficient and unbiased filter for selecting important SAE features from over 20,000 without missing low-enstrophy but mechanistically relevant ones or creating selection artifacts that alter the reported consistency and discrepancy patterns.
What would settle it
Repeating the triage and comparison with a different physical quantity such as integrated kinetic energy instead of enstrophy, and obtaining feature sets that map cleanly onto physical decompositions with stable roles across all setups, would falsify the claim of intermittent and non-mapping structure.
Figures
read the original abstract
Generative AI emulators are increasingly used in scientific domains where we already have strong theory, benchmarks, and physical intuition. This raises a central evaluation and interpretability question: when a foundation-style model can reproduce known continuum dynamics, what internal mechanism supports that behavior, is the internal behaviour consistent with known physics, and how does it relate to where the emulator succeeds or fails? We investigate a cross-domain foundation model for continuum dynamics, Walrus by Polymathic, using mechanistic interpretability guided by physical principles. We apply a sparse autoencoder (SAE) to probe a selected layer, and address the practical challenge of triaging a large feature set (over 20,000) using enstrophy as a physically grounded metric. As a deliberately simple testbed, we focus on shear flow and compare feature recruitment across multiple shear-flow setups, i.e. parameter values in the numerical simulation. Across setups we find evidence of piecewise consistency, with subsets of features recurring in similar roles, but this structure is intermittent and does not map cleanly onto standard physical decompositions. In parallel, direct comparisons between numerical simulation and the emulator reveal systematic output-level discrepancies, including regimes where energy/structures become too diffuse or too localized. We connect parts of these discrepancies to changes in specific SAE feature usage. Our work highlights open questions for scientific foundation models: how to robustly prioritize mechanistically meaningful features, how to separate stable structure from analysis artifacts (including single-layer and SAE limitations), and how to use established benchmarks to decide when "different" internal representations are genuinely informative rather than merely effective.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper applies sparse autoencoders to a selected layer of the Walrus foundation model for continuum dynamics, using enstrophy to triage >20,000 features in shear-flow simulations across parameter values. It reports piecewise consistency in recurring feature roles that is intermittent and does not map cleanly to standard physical decompositions, while linking some output discrepancies (diffuse vs. localized structures) between simulation and emulator to changes in specific SAE feature usage.
Significance. If the central empirical observations hold after addressing triage validation, the work usefully surfaces open methodological questions for interpretability in scientific foundation models: robust prioritization of mechanistically meaningful features, separation of stable structure from single-layer/SAE artifacts, and criteria for when internal differences are informative. The deliberate choice of a simple shear-flow testbed and external physical metric (enstrophy) is a strength for grounding the analysis.
major comments (2)
- [Methods] The enstrophy triage procedure (described in the methods) is load-bearing for the claims of piecewise consistency and discrepancy-feature linkages, yet the manuscript supplies no quantitative check (e.g., overlap with a non-enstrophy metric, recall of known vorticity features, or sensitivity analysis) that the threshold preserves the relevant mechanistic subspace rather than systematically excluding low-enstrophy but causally important features such as subtle boundary or gradient encodings.
- [Results] The abstract and results sections state findings of 'piecewise consistency' and causal connections to output discrepancies without accompanying quantitative support (overlap fractions, statistical tests, ablation on feature subsets, or error bars), which prevents assessment of effect sizes and reproducibility of the intermittency observation.
minor comments (1)
- Notation for SAE feature indices and the precise layer chosen should be defined explicitly on first use to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which identify important opportunities to strengthen the methodological grounding and quantitative presentation of our case study. We respond to each major comment below.
read point-by-point responses
-
Referee: [Methods] The enstrophy triage procedure (described in the methods) is load-bearing for the claims of piecewise consistency and discrepancy-feature linkages, yet the manuscript supplies no quantitative check (e.g., overlap with a non-enstrophy metric, recall of known vorticity features, or sensitivity analysis) that the threshold preserves the relevant mechanistic subspace rather than systematically excluding low-enstrophy but causally important features such as subtle boundary or gradient encodings.
Authors: We agree that the triage procedure would benefit from additional validation. Enstrophy was selected because it is a physically natural metric for the vorticity-dominated shear-flow testbed. In the revision we will add a sensitivity analysis across threshold values and report feature overlap with a secondary metric (kinetic energy) to check whether low-enstrophy but potentially relevant encodings are excluded. This will make explicit the extent to which the selected subspace is preserved. revision: yes
-
Referee: [Results] The abstract and results sections state findings of 'piecewise consistency' and causal connections to output discrepancies without accompanying quantitative support (overlap fractions, statistical tests, ablation on feature subsets, or error bars), which prevents assessment of effect sizes and reproducibility of the intermittency observation.
Authors: The reported patterns are qualitative observations drawn from the deliberately limited shear-flow testbed. We will add overlap fractions for recurring features across parameter values and include a limited ablation on the most frequently recruited feature subsets to quantify their contribution to the observed output discrepancies. Because the intermittency itself is the central empirical finding, formal statistical tests are not straightforward, but we will clarify the exploratory character of the results and note reproducibility across the tested configurations. revision: yes
Circularity Check
No circularity: empirical case study with external physical triage metric
full rationale
The paper is an observational interpretability case study applying SAE probes to a pre-trained foundation model and triaging >20k features via the external physical quantity enstrophy. No derivation chain, fitted-parameter predictions, self-definitional steps, or load-bearing self-citations exist. Claims of piecewise consistency and discrepancy linkage rest on direct comparisons to numerical simulations and feature activation patterns, not on any reduction to the paper's own inputs or prior author work by construction. The enstrophy triage is an analysis choice whose adequacy is debatable on methodological grounds but does not create circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Journal of Computational Physics , author =
A transformer-based convolutional method to model inverse cascade in forced two-dimensional turbulence , volume =. Journal of Computational Physics , author =. 2025 , pages =. doi:10.1016/j.jcp.2024.113475 , abstract =
-
[2]
McCabe, Michael and Mukhopadhyay, Payel and Marwah, Tanya and Blancard, Bruno Regaldo-Saint and Rozet, Francois and Diaconu, Cristiana and Meyer, Lucas and Wong, Kaze W. K. and Sotoudeh, Hadi and Bietti, Alberto and Espejo, Irina and Fear, Rio and Golkar, Siavash and Hehir, Tom and Hirashima, Keiya and Krawezik, Geraud and Lanusse, Francois and Morel, Rud...
-
[3]
Ohana, Ruben and McCabe, Michael and Meyer, Lucas and Morel, Rudy and Agocs, Fruzsina J. and Beneitez, Miguel and Berger, Marsha and Burkhart, Blakesley and Burns, Keaton and Dalziel, Stuart B. and Fielding, Drummond B. and Fortunato, Daniel and Goldberg, Jared A. and Hirashima, Keiya and Jiang, Yan-Fei and Kerswell, Rich R. and Maddu, Suryanarayana and M...
-
[4]
MacMillan, Theodore and Ouellette, Nicholas T. , month = dec, year =. Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features , shorttitle =. doi:10.48550/arXiv.2512.24440 , abstract =
-
[5]
Herde, Maximilian and Raonić, Bogdan and Rohner, Tobias and Käppeli, Roger and Molinaro, Roberto and Bézenac, Emmanuel de and Mishra, Siddhartha , month = nov, year =. Poseidon:. doi:10.48550/arXiv.2405.19101 , abstract =
-
[6]
Park, Kiho and Choe, Yo Joong and Veitch, Victor , month = jul, year =. The. doi:10.48550/arXiv.2311.03658 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2311.03658
-
[7]
Zoom in: An introduction to circuits
Zoom. Distill , author =. 2020 , pages =. doi:10.23915/distill.00024.001 , number =
-
[8]
transformer-circuits , author =
Towards. transformer-circuits , author =
-
[9]
Fear, Rio Alexa and Mukhopadhyay, Payel and McCabe, Michael and Bietti, Alberto and Cranmer, Miles , month = nov, year =. Physics. doi:10.48550/arXiv.2511.20798 , abstract =
-
[10]
Physical Review Research , keywords =
Dedalus:. Physical Review Research , author =. 2020 , note =. doi:10.1103/PhysRevResearch.2.023068 , abstract =
-
[11]
Adam: A Method for Stochastic Optimization
Kingma, Diederik P. and Ba, Jimmy , month = jan, year =. Adam:. doi:10.48550/arXiv.1412.6980 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980
-
[12]
Probabilistic machine learning: an introduction , publisher =
Murphy, Kevin P , year =. Probabilistic machine learning: an introduction , publisher =
-
[13]
Cunningham, Hoagy and Ewart, Aidan and Riggs, Logan and Huben, Robert and Sharkey, Lee , month = oct, year =. Sparse. doi:10.48550/arXiv.2309.08600 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2309.08600
-
[14]
Proceedings of the National Academy of Sciences , author =
Sparse autoencoders uncover biologically interpretable features in protein language model representations , volume =. Proceedings of the National Academy of Sciences , author =. 2025 , pages =. doi:10.1073/pnas.2506316122 , abstract =
-
[15]
Guan, Haoxiang and He, Jiyan and Zhang, Jie , month = jul, year =. Sparse. doi:10.48550/arXiv.2507.07486 , abstract =
-
[16]
Shu, Dong and Wu, Xuansheng and Zhao, Haiyan and Rai, Daking and Yao, Ziyu and Liu, Ninghao and Du, Mengnan , year =. A. doi:10.48550/ARXIV.2503.05613 , abstract =
-
[17]
Nature Methods , author =. 2025 , pages =. doi:10.1038/s41592-025-02836-7 , language =
-
[18]
Abdulaal, Ahmed and Fry, Hugo and Montaña-Brown, Nina and Ijishakin, Ayodeji and Gao, Jack and Hyland, Stephanie and Alexander, Daniel C. and Castro, Daniel C. , year =. An. doi:10.48550/ARXIV.2410.03334 , abstract =
-
[19]
Lam, Remi and Sanchez-Gonzalez, Alvaro and Willson, Matthew and Wirnsberger, Peter and Fortunato, Meire and Alet, Ferran and Ravuri, Suman and Ewalds, Timo and Eaton-Rosen, Zach and Hu, Weihua and Merose, Alexander and Hoyer, Stephan and Holland, George and Vinyals, Oriol and Stott, Jacklynn and Pritzel, Alexander and Mohamed, Shakir and Battaglia, Peter ...
-
[20]
the-well-rbc-sf , url =
Morel, Rudy , month = nov, year =. the-well-rbc-sf , url =
-
[21]
Scaling and evaluating sparse autoencoders
Gao, Leo and Tour, Tom Dupré la and Tillman, Henk and Goh, Gabriel and Troll, Rajan and Radford, Alec and Sutskever, Ilya and Leike, Jan and Wu, Jeffrey , month = jun, year =. Scaling and evaluating sparse autoencoders , url =. doi:10.48550/arXiv.2406.04093 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2406.04093
-
[22]
arXiv.org , author =
Controllable. arXiv.org , author =
-
[23]
arXiv.org , author =
Axial. arXiv.org , author =
-
[24]
arXiv.org , author =
Multiple. arXiv.org , author =
-
[25]
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
Rudin, Cynthia , year =. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
-
[26]
Elhage, Nelson and Hume, Tristan and Olsson, Catherine and Schiefer, Nicholas and Henighan, Tom and Kravec, Shauna and Hatfield-Dodds, Zac and Lasenby, Robert and Drain, Dawn and Chen, Carol and Grosse, Roger and McCandlish, Sam and Kaplan, Jared and Amodei, Dario and Wattenberg, Martin and Olah, Christopher , month = sep, year =. Toy. doi:10.48550/arXiv....
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2209.10652
-
[27]
Machine learning and the physical sciences , volume =. Reviews of Modern Physics , author =. 2019 , pages =. doi:10.1103/RevModPhys.91.045002 , language =
-
[28]
Nikankin, Yaniv and Reusch, Anja and Mueller, Aaron and Belinkov, Yonatan , month = may, year =. Arithmetic. doi:10.48550/arXiv.2410.21272 , abstract =
-
[29]
Wetzel, Sebastian Johann and Ha, Seungwoong and Iten, Raban and Klopotek, Miriam and Liu, Ziming , year =. Interpretable. doi:10.48550/ARXIV.2503.23616 , abstract =
-
[30]
Nature Communications , author =
Physics-informed learning of governing equations from scarce data , volume =. Nature Communications , author =. 2021 , note =. doi:10.1038/s41467-021-26434-1 , abstract =
-
[31]
Sanderse, Benjamin and Stinis, Panos and Maulik, Romit and Ahmed, Shady E. , month = sep, year =. Scientific machine learning for closure models in multiscale problems: a review , shorttitle =. doi:10.48550/arXiv.2403.02913 , abstract =
-
[32]
Templeton, Adly and Conerly, Tom , month = may, year =. Scaling
-
[33]
Annual Review of Condensed Matter Physics , author =
Machine. Annual Review of Condensed Matter Physics , author =. 2025 , note =. doi:10.1146/annurev-conmatphys-043024-114758 , abstract =
-
[34]
Artificial Intelligence for the Earth Systems , author =
A hierarchical ensemble manifold methodology for new knowledge on spatial data:. Artificial Intelligence for the Earth Systems , author =
-
[35]
Yik, William and Sonnewald, Maike and Clare, Mariana C. A. and Lguensat, Redouane , month = dec, year =. Southern. doi:10.48550/arXiv.2310.13916 , abstract =
-
[36]
Journal of Advances in Modeling Earth Systems , author =
Revealing the. Journal of Advances in Modeling Earth Systems , author =. 2021 , note =. doi:10.1029/2021MS002496 , abstract =
-
[37]
Journal of Advances in Modeling Earth Systems , author =
Explainable. Journal of Advances in Modeling Earth Systems , author =. 2022 , note =. doi:10.1029/2022MS003162 , abstract =
-
[38]
arXiv.org , author =
-
[39]
in review , author =
Machine. in review , author =
-
[40]
Environmental Research Letters , author =
Bridging observations, theory and numerical simulation of the ocean using machine learning , volume =. Environmental Research Letters , author =. 2021 , note =. doi:10.1088/1748-9326/ac0eb0 , abstract =
-
[41]
Engineering Applications of Artificial Intelligence , author =
Automated identification of dominant physical processes , volume =. Engineering Applications of Artificial Intelligence , author =. 2022 , keywords =. doi:10.1016/j.engappai.2022.105496 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.