Order Is Not Control

Gareth Seneque; Jeffrey Molendijk; Lap-Hang Ho; Nafise Erfanian Saeedi; Tim Elson

arxiv: 2606.12923 · v1 · pith:ROVG6VBFnew · submitted 2026-06-11 · 💻 cs.LG · cs.AI· cs.CL

Order Is Not Control

Gareth Seneque , Lap-Hang Ho , Nafise Erfanian Saeedi , Jeffrey Molendijk , Tim Elson This is my paper

Pith reviewed 2026-06-27 07:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL

keywords receiver-gated response lawcontrol versus orderAI alignmentneural perturbationdriven-dissipative systemsstochastic operatorsbiological controlLLM steering

0 comments

The pith

Control requires a receiver-gated response law rather than order alone, as shown by consistent patterns across biological and LLM systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that order-inducing objects in AI alignment and neural studies do not amount to control. Instead, control depends on a receiver-gated response law, defined as a denominator-indexed operator that takes material state, action or drive, bath, and receiver state as inputs to produce response displacement, sinks, effort, and basin projection. This law appears in mouse ALM, C. elegans, zebrafish, LLM output panels, adapters, and stochastic operators, where interventions prove local and context-dependent. A reader would care because it reframes steering and perturbation work around measurable response operators instead of assuming order produces controllable outcomes. The result is a driven-dissipative account in which drives act through prepared media to yield admitted movement or overdrive.

Core claim

Order is not control. Control requires a receiver-gated response law: a denominator-indexed operator mapping material state, action/drive, bath, and receiver state to response displacement, sinks, effort, and basin projection. The law is identified across biological panels (mouse ALM, C. elegans, zebrafish) and LLM panels, where response vectors are predictable at 72.8-73.7 percent component-sign accuracy (rising to 84.3-84.8 percent on nonzero components) and held-out observers predict system-effect and target families at 93.6 percent and 91.7 percent accuracy. Interventions are admitted, saturated, sign-changing, leaky, or overdriven depending on medium, bath, receiver state, action port,

What carries the argument

The receiver-gated response law, a denominator-indexed operator that maps material state, action/drive, bath, and receiver state to response displacement, sinks, effort, and basin projection.

If this is right

An intervention can be admitted, saturated, sign-changing, leaky, or overdriven depending on medium, bath, receiver state, action port, and comparator.
Control is assigned when finite effort moves a target or outcome-readout class under the same denominator while damage, null/evasive, invalid format, overdrive, and unnecessary effort stay bounded.
Constitution-conditioned adapters reshape susceptibility as prepared media.
Stochastic-operator panels separate measured opportunity from deployable action policies.
The evidence supports local admitted control and measurable stochastic response operators at the mesoscopic level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The local character of the response law implies that alignment techniques may need to prepare specific receiver states rather than rely on global order.
If the same denominator-indexed structure holds across domains, perturbation experiments could be designed to test transfer of control metrics from biological to artificial systems without assuming coordinate identity.
The separation of opportunity from deployable policies in stochastic panels suggests a route to quantify when an LLM intervention remains within bounded effort.
Future tests could check whether adapter conditioning consistently alters the sign and saturation behavior of generated responses under fixed drives.

Load-bearing premise

The biological panels and LLM panels demonstrate instances of the same receiver-gated response law.

What would settle it

If held-out observers in the LLM panels fail to predict system-effect and target/oracle families at the reported 93.6 percent and 91.7 percent accuracy, or if biological interventions do not exhibit the local admitted, saturated, or overdriven patterns under varied receiver states.

Figures

Figures reproduced from arXiv: 2606.12923 by Gareth Seneque, Jeffrey Molendijk, Lap-Hang Ho, Nafise Erfanian Saeedi, Tim Elson.

**Figure 1.** Figure 1: Receiver-gated response law and evidence key. Candidate drives become control only when a measured receiver admits them into target-basin movement with sink and effort channels bounded. The figure defines the response-chain object rather than a single example: the same denominator-conditioned structure is measured at biological, generated-output, adapter, and stochastic-operator ports. Local semantic-repai… view at source ↗

**Figure 2.** Figure 2: Cross-substrate response-law bridge. The bridge is role-level: biological, LLM-output, and adaptermedia systems instantiate the same denominator-indexed response-law roles, not shared raw coordinates or universal control. The biological column shows physical perturbation-substrate response operators: material condition, drive, bath or protocol, receiver, response displacement, sink routing, held-out respo… view at source ↗

**Figure 3.** Figure 3: LLM generated-output response dynamics under matched denominators. Visible semantic fields and finite action probes are locally admitted and can overdrive when dose or budget increases. The generated-output and finite-dose panels fix prompt family, model/material state, decode bath, visible-field action, text-only completion rubric, and comparator. Scheduler validation supports local field selection, while… view at source ↗

**Figure 4.** Figure 4: Adapters as prepared response media. Constitution-conditioned adapters change how the same visible fields are admitted into generated-output basins. The denominator fixes base model family, frozen adapter state, prompt/bath, visible-field action, text-only completion rubric, and matched base-to-adapter pairing. The key scale is the 384-cell response tensor and 288 matched pairs summarised in Section 5 and … view at source ↗

**Figure 5.** Figure 5: Evidence ladder for predictive response laws and the controller limits. Panel A shows responsevector sign prediction across the four main material conditions. Panel B shows non-endpoint held-out observer prediction for system-effect and target/oracle binary families. Panel C separates clean local-control pockets from sink/overdrive, missing-projection, and sign-changing classes. Panel D compares the teste… view at source ↗

read the original abstract

AI alignment, interpretability, steering, and neural perturbation studies identify order-inducing objects. We argue that order is not control. Control requires a receiver-gated response law: a denominator-indexed operator mapping material state, action/drive, bath, and receiver state to response displacement, sinks, effort, and basin projection. We identify it across biological, LLM, adapter, and stochastic-operator panels. The laws are local: an intervention can be admitted, saturated, sign-changing, leaky, or overdriven depending on medium, bath, receiver state, action port, and comparator. Control is assigned when finite effort moves a target or outcome-readout class under the same denominator while damage, null/evasive, invalid format, overdrive, and unnecessary effort stay bounded. Mouse ALM, C. elegans, and zebrafish panels provide physical response-operator evidence while excluding coordinate identity and controller conclusions. LLM panels show generated-output response laws: across four material conditions, response vectors are predictable at 72.8-73.7% component-sign accuracy, rising to 84.3-84.8% on nonzero components; held-out observers predict system-effect and target/oracle families at 93.6% and 91.7% accuracy. Constitution-conditioned adapters reshape susceptibility as prepared media, and stochastic-operator panels separate measured opportunity from deployable action policies. This gives a driven-dissipative response-system account at the mesoscopic control level: drives act through prepared media, baths, and receivers, producing admitted movement, impedance, sinks, or overdrive. The evidence supports local admitted control and measurable stochastic response operators, while leaving deployable pre-generation control, hidden/logit causal sufficiency, biological-to-LLM coordinate identity, and literal thermodynamic quantities outside scope.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines control via a new receiver-gated operator and reports LLM prediction numbers, but the cross-domain identity claim lacks shown structural equivalence.

read the letter

The main point to take away is that this work introduces a specific definition of control as a denominator-indexed, receiver-gated response law and applies it to both biological perturbation data and LLM output panels. The distinction from mere order is the central framing.

What the paper does is spell out how an intervention can be admitted, saturated, leaky, or overdriven depending on medium, bath, and receiver state. The LLM section gives concrete component-sign accuracies of 72.8-73.7% overall and 84.3-84.8% on nonzero components, plus held-out prediction numbers around 93% and 91%. The biological panels from mouse ALM, C. elegans, and zebrafish are used to supply physical response-operator examples while ruling out coordinate identity.

The soft spot is that the cross-domain claim requires the same functional operator to be instantiated in both sets of panels, yet the abstract supplies only statistical predictability under four material conditions rather than an explicit common expression or reduction showing matching indexing, saturation rules, or leakage behavior. Without that, the inference that order-inducing objects are not control does not fully follow from the reported numbers. The definition also carries some circularity risk since control is defined in terms of the operator being identified.

This is for readers working on alignment, interpretability, or mesoscopic response models who are open to a driven-dissipative framing. The thinking is clear on its own terms even if the evidence for operator identity is limited. It deserves peer review so the methods and any full operator derivations can be checked.

Referee Report

2 major / 2 minor

Summary. The manuscript argues that order is not control. Control is defined as requiring a receiver-gated response law: a denominator-indexed operator mapping material state, action/drive, bath, and receiver state to response displacement, sinks, effort, and basin projection. The paper claims to identify this law in biological panels (mouse ALM, C. elegans, zebrafish) providing physical response-operator evidence and in LLM panels, reporting 72.8-73.7% component-sign accuracy (84.3-84.8% on nonzero components) across four material conditions, plus held-out prediction accuracies of 93.6% for system-effect and 91.7% for target/oracle families. It concludes that the laws are local (admitted, saturated, sign-changing, leaky, or overdriven depending on conditions), control is assigned when finite effort moves a target under the same denominator with bounded damage/null/overdrive, and this yields a driven-dissipative mesoscopic account while excluding coordinate identity, controller conclusions, and pre-generation control.

Significance. If the central claim holds, the work supplies a quantifiable, domain-spanning distinction between order-inducing objects and control at the mesoscopic level, with direct relevance to AI alignment, interpretability, and neural perturbation studies. The LLM panel accuracies constitute concrete, falsifiable measurements of response predictability under material conditions, and the exclusion of certain inferences (e.g., coordinate identity) is explicitly scoped. The approach of treating adapters as prepared media and separating opportunity from policy is a constructive modeling choice.

major comments (2)

[Abstract] Abstract: The inference that the identical denominator-indexed operator operates across biological and LLM panels is not supported by an explicit common operator expression, reduction, or equivalence proof showing that the functional form, indexing, saturation/leakage rules, and basin-projection mapping match between the physical response operators and the LLM-generated output laws. The reported component-sign and held-out accuracies establish only statistical predictability of outputs under four material conditions; they do not demonstrate structural identity of the operator.
[Abstract] Abstract: The definition of control as necessarily requiring the receiver-gated response law is introduced axiomatically. No independent criterion or comparison to standard control-theoretic notions (e.g., feedback, observability, or dissipativity) is supplied to establish that this specific mapping is required for control rather than being a sufficient but non-necessary characterization; this renders the claim that order-inducing objects are not control dependent on acceptance of the new definition.

minor comments (2)

The abstract refers to 'four material conditions' without enumerating them; an explicit list or table would improve reproducibility and allow readers to assess whether the conditions are commensurate across biological and LLM panels.
The manuscript would benefit from a consolidated table reporting accuracies, component counts, and held-out metrics side-by-side for all panels (biological, LLM, adapter, stochastic-operator) to facilitate direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We respond point by point to the major comments and indicate where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: The inference that the identical denominator-indexed operator operates across biological and LLM panels is not supported by an explicit common operator expression, reduction, or equivalence proof showing that the functional form, indexing, saturation/leakage rules, and basin-projection mapping match between the physical response operators and the LLM-generated output laws. The reported component-sign and held-out accuracies establish only statistical predictability of outputs under four material conditions; they do not demonstrate structural identity of the operator.

Authors: We agree that the manuscript does not supply an explicit common operator expression or formal equivalence proof establishing structural identity between the biological and LLM instantiations. The identification rests on the shared functional form of the denominator-indexed mapping together with the empirical observation that this form produces statistically predictable response components under four matched material conditions. The reported accuracies therefore demonstrate consistent applicability of the operator structure rather than a reduction or identity proof. We will revise the abstract and the relevant results sections to replace the phrasing of an 'identical' operator with language that the same functional form is identified and shown to be predictive across domains. revision: partial
Referee: [Abstract] Abstract: The definition of control as necessarily requiring the receiver-gated response law is introduced axiomatically. No independent criterion or comparison to standard control-theoretic notions (e.g., feedback, observability, or dissipativity) is supplied to establish that this specific mapping is required for control rather than being a sufficient but non-necessary characterization; this renders the claim that order-inducing objects are not control dependent on acceptance of the new definition.

Authors: The definition is introduced as the minimal mapping required to support a driven-dissipative, receiver-gated account at the mesoscopic level that distinguishes control from order. It is not asserted to be the unique possible definition of control in all contexts. We will add a short paragraph in the introduction that situates the proposed mapping relative to classical notions of feedback, observability, and dissipativity, making explicit that the definition is offered as a sufficient characterization for the scope of the present work rather than a necessary condition in the classical sense. revision: yes

Circularity Check

1 steps flagged

Control defined via receiver-gated operator whose presence is then reported as identified in panels

specific steps

self definitional [Abstract]
"Control requires a receiver-gated response law: a denominator-indexed operator mapping material state, action/drive, bath, and receiver state to response displacement, sinks, effort, and basin projection. We identify it across biological, LLM, adapter, and stochastic-operator panels."

The paper defines control as the presence of this exact operator, then claims to identify the operator in the panels on the basis of component-sign accuracies (72.8-73.7 % overall) and held-out prediction accuracies. Because the operator is introduced as definitional, the reported identification is consistent with the definition by construction and does not constitute an independent test of the claimed functional form across domains.

full rationale

The paper's central derivation begins by stipulating that control requires a specific denominator-indexed operator, then states that this operator is identified in the biological and LLM panels via predictability metrics. The metrics establish statistical association under four conditions but do not independently verify the operator's functional form, indexing, or saturation rules; the identification therefore reduces to consistency with the introduced definition rather than an external criterion. No self-citations or prior uniqueness theorems are invoked in the provided text, so the circularity is confined to the self-definitional step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on an ad-hoc definition of control introduced by the paper and the assumption that the reported panels instantiate that definition. No numerical free parameters are stated. The framework is largely self-contained within the new terminology.

axioms (1)

ad hoc to paper Control is defined as requiring a receiver-gated response law with the specified mapping from states to responses.
This definition is introduced in the abstract as the requirement that distinguishes control from order.

invented entities (2)

denominator-indexed operator no independent evidence
purpose: To provide the mathematical structure for the receiver-gated response law that defines control.
Newly postulated operator without independent evidence outside the paper.
receiver-gated response law no independent evidence
purpose: To serve as the criterion separating control from mere order.
Core invented concept of the paper.

pith-pipeline@v0.9.1-grok · 5860 in / 1551 out tokens · 42576 ms · 2026-06-27T07:13:04.465624+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 32 canonical work pages

[1]

Abc align: Large language model alignment for safety & accuracy, 2024

Gareth Seneque, Lap-Hang Ho, Ariel Kuperman, Nafise Erfanian Saeedi, and Jeffrey Molendijk. Abc align: Large language model alignment for safety & accuracy, 2024. URL https://arxiv.or g/abs/2408.00307

arXiv 2024
[2]

Enigma: The geometry of reasoning and alignment in large-language models,

Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, Jeffrey Molendijk, Ariel Kupermann, and Tim Elson. Enigma: The geometry of reasoning and alignment in large-language models,
[3]

URL https://arxiv.org/abs/2510.11278

arXiv
[4]

Atlas: Constitution-conditioned latent geometry and redistribution across language models and neural perturbation data, 2026

Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, Jeffrey Molendijk, and Tim Elson. Atlas: Constitution-conditioned latent geometry and redistribution across language models and neural perturbation data, 2026. URL https://arxiv.org/abs/2604.17663

Pith/arXiv arXiv 2026
[5]

Constitutional ai: Harmlessness from ai feedback, 2022

Yuntao Bai et al. Constitutional ai: Harmlessness from ai feedback, 2022. URL https: //arxiv.org/abs/2212.08073

Pith/arXiv arXiv 2022
[6]

Collective constitutional ai: Aligning a language model with public input

Saffron Huang et al. Collective constitutional ai: Aligning a language model with public input. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency,

2024
[7]

Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli

doi: 10.1145/3630106.3658979. URL https://arxiv.org/abs/2406.07814

work page doi:10.1145/3630106.3658979
[8]

Self-supervised alignment with mutual information: Learning to follow principles without preference labels, 2024

Jan-Philipp Franken et al. Self-supervised alignment with mutual information: Learning to follow principles without preference labels, 2024. URL https://arxiv.org/abs/2404.14313

arXiv 2024
[9]

Let’s verify step by step, 2023

Hunter Lightman et al. Let’s verify step by step, 2023. URL https://arxiv.org/abs/2305.20050

Pith/arXiv arXiv 2023
[10]

Math-Shepherd: Verify and Reinforce

Peiyi Wang, Lei Li, Zhihong Shao, Runxin Xu, Damai Dai, Yifei Li, Deli Chen, Yu Wu, and Zhifang Sui. Math-shepherd: Verify and reinforce llms step-by-step without human annotations. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 9426–9439, 2024. doi: 10.18653/v1/2024.acl-long.510. URL https: //aclanthology...

work page doi:10.18653/v1/2024.acl-long.510 2024
[11]

Alignment faking in large language models, 2024

Ryan Greenblatt et al. Alignment faking in large language models, 2024. URL https://arxiv.or g/abs/2412.14093

Pith/arXiv arXiv 2024
[12]

Why do some language models fake alignment while others don’t?, 2025

Abhay Sheshadri et al. Why do some language models fake alignment while others don’t?, 2025. URL https://arxiv.org/abs/2506.18032

arXiv 2025
[13]

A mathematical framework for transformer circuits, 2021

Nelson Elhage et al. A mathematical framework for transformer circuits, 2021. URL https: //transformer-circuits.pub/2021/framework/index.html

2021
[14]

Circuit tracing: Revealing computational graphs in language models, 2025

Anthropic. Circuit tracing: Revealing computational graphs in language models, 2025. URL https://transformer-circuits.pub/2025/attribution-graphs/methods.html

2025
[15]

Representation engineering: A top-down approach to ai transparency, 2023

Andy Zou et al. Representation engineering: A top-down approach to ai transparency, 2023. URL https://arxiv.org/abs/2310.01405

Pith/arXiv arXiv 2023
[16]

Activation addition: Steering language models without optimiza- tion, 2023

Alexander Matt Turner et al. Activation addition: Steering language models without optimiza- tion, 2023. URL https://arxiv.org/abs/2308.10248

Pith/arXiv arXiv 2023
[17]

Steering

Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Turner. Steering Llama 2 via contrastive activation addition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 15504–15522, 2024. doi: 10.18653/v1/2024.acl-long.828. URL https://aclanthology.org/2024.acl-long.828/. 22

work page doi:10.18653/v1/2024.acl-long.828 2024
[18]

Refusal in language models is mediated by a single direction, 2024

Andy Arditi et al. Refusal in language models is mediated by a single direction, 2024. URL https://arxiv.org/abs/2406.11717

Pith/arXiv arXiv 2024
[19]

Lee et al

Bruce W. Lee et al. Programming refusal with conditional activation steering, 2024. URL https://arxiv.org/abs/2409.05907

arXiv 2024
[20]

Steering off course: Reliability challenges in steering language models,

Patricia Da Silva et al. Steering off course: Reliability challenges in steering language models,
[21]

URL https://arxiv.org/abs/2504.04635

arXiv
[22]

Position: The platonic representation hypothesis

Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola. Position: The platonic representation hypothesis. InProceedings of the 41st International Conference on Machine Learning, 2024. URL https://proceedings.mlr.press/v235/huh24a.html

2024
[23]

Rishi Jha, Collin Zhang, Vitaly Shmatikov, and John X. Morris. Harnessing the universal geometry of embeddings, 2025. URL https://arxiv.org/abs/2505.12540

arXiv 2025
[24]

Revisiting the platonic representation hypothesis: An aristotelian view, 2026

Fabian Groeger, Shuo Wen, and Maria Brbic. Revisiting the platonic representation hypothesis: An aristotelian view, 2026. URL https://arxiv.org/abs/2602.14486

Pith/arXiv arXiv 2026
[25]

Sensory experience steers representational drift in mouse visual cortex.Nature Communications, 15, 2024

Joel Bauer et al. Sensory experience steers representational drift in mouse visual cortex.Nature Communications, 15, 2024. doi: 10.1038/s41467-024-53326-x. URL https://doi.org/10.1038/s4 1467-024-53326-x

work page doi:10.1038/s41467-024-53326-x 2024
[26]

Climer, Heydar Davoudi, Jun Young Oh, and Daniel A

Jason R. Climer, Heydar Davoudi, Jun Young Oh, and Daniel A. Dombeck. Hippocampal representations drift in stable multisensory environments.Nature, 645:457–465, 2025. doi: 10.1038/s41586-025-09245-y. URL https://doi.org/10.1038/s41586-025-09245-y

work page doi:10.1038/s41586-025-09245-y 2025
[27]

A brain-wide map of neural activity during complex behaviour , url =

The International Brain Laboratory. A brain-wide map of neural activity during complex behaviour.Nature, 645:177–191, 2025. doi: 10.1038/s41586-025-09235-0. URL https: //doi.org/10.1038/s41586-025-09235-0

work page doi:10.1038/s41586-025-09235-0 2025
[28]

Brain-wide representations of prior information in mouse decision-making

Charles Findling et al. Brain-wide representations of prior information in mouse decision-making. Nature, 2025. doi: 10.1038/s41586-025-09226-1. URL https://doi.org/10.1038/s41586-025- 09226-1

work page doi:10.1038/s41586-025-09226-1 2025
[29]

doi: 10.1561/2200000073

Gabriel Peyre and Marco Cuturi. Computational optimal transport: With applications to data science.Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019. doi: 10.1561/2200000073. URL https://doi.org/10.1561/2200000073

work page doi:10.1561/2200000073 2019
[30]

Unbalanced optimal transport: Dynamic and kantorovich formulations.Journal of Functional Analysis, 274 (11):3090–3123, 2018

Lenaic Chizat, Gabriel Peyre, Bernhard Schmitzer, and Francois-Xavier Vialard. Unbalanced optimal transport: Dynamic and kantorovich formulations.Journal of Functional Analysis, 274 (11):3090–3123, 2018. doi: 10.1016/j.jfa.2018.03.008. URL https://doi.org/10.1016/j.jfa.2018.0 3.008

work page doi:10.1016/j.jfa.2018.03.008 2018
[31]

Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport, 2025

Zhenyi Zhang, Tiejun Li, and Peijie Zhou. Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport, 2025. URL https://arxiv.org/abs/2410.00844

arXiv 2025
[32]

Phase-field Approaches to Structural Topology Optimization

Shun-ichi Amari.Information Geometry and Its Applications. Springer, 2016. doi: 10.1007/978- 4-431-55978-8. URL https://doi.org/10.1007/978-4-431-55978-8

work page doi:10.1007/978- 2016
[33]

R. E. Kalman. On the general theory of control systems. InProceedings of the First International Congress on Automatic Control, 1960. doi: 10.1016/S1474-6670(17)70094-8. URL https: //doi.org/10.1016/S1474-6670(17)70094-8. 23

work page doi:10.1016/s1474-6670(17)70094-8 1960
[34]

Jan C. Willems. Dissipative dynamical systems part i: General theory.Archive for Rational Mechanics and Analysis, 45:321–351, 1972. doi: 10.1007/BF01665402. URL https://doi.org/10 .1007/BF01665402

work page doi:10.1007/bf01665402 1972
[35]

Time, structure, and fluctuations.Science, 201(4358):777–785, 1978

Ilya Prigogine. Time, structure, and fluctuations.Science, 201(4358):777–785, 1978. doi: 10.1126/science.201.4358.777. URL https://doi.org/10.1126/science.201.4358.777

work page doi:10.1126/science.201.4358.777 1978
[36]

Data-driven optimal control of unknown nonlinear dynamical systems using the koopman operator

Zhexuan Zeng, Ruikun Zhou, Yiming Meng, and Jun Liu. Data-driven optimal control of unknown nonlinear dynamical systems using the koopman operator. InProceedings of the 7th Annual Learning for Dynamics & Control Conference, pages 1127–1139, 2025. URL https://proceedings.mlr.press/v283/zeng25a.html

2025
[37]

Transition-path theory and path-finding algorithms for the study of rare events.Annual Review of Physical Chemistry, 61:391–420, 2010

Weinan E and Eric Vanden-Eijnden. Transition-path theory and path-finding algorithms for the study of rare events.Annual Review of Physical Chemistry, 61:391–420, 2010. doi: 10.1146/annurev.physchem.040808.090412. URL https://doi.org/10.1146/annurev.physchem.0 40808.090412

work page doi:10.1146/annurev.physchem.040808.090412 2010
[38]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems, 2023. URL https://arxiv.org/abs/2302.04761

Pith/arXiv arXiv 2023
[39]

Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents. InInternational Conference on Learning Representations,
[40]

URL https://arxiv.org/abs/2410.02644

Pith/arXiv arXiv
[41]

Wiley-Interscience, New York, 1977

Gregoire Nicolis and Ilya Prigogine.Self-Organization in Nonequilibrium Systems: From Dissipative Structures to Order Through Fluctuations. Wiley-Interscience, New York, 1977. ISBN 0471024015

1977
[42]

M. C. Cross and P. C. Hohenberg. Pattern formation outside of equilibrium.Reviews of Modern Physics, 65(3):851–1112, 1993. doi: 10.1103/RevModPhys.65.851. URL https: //doi.org/10.1103/RevModPhys.65.851

work page doi:10.1103/revmodphys.65.851 1993
[43]

Reports on Progress in Physics75(12), 126001 (2012) https://doi.org/ 10.1088/0034-4885/75/12/126001

Udo Seifert. Stochastic thermodynamics, fluctuation theorems and molecular machines.Reports on Progress in Physics, 75(12):126001, 2012. doi: 10.1088/0034-4885/75/12/126001. URL https://doi.org/10.1088/0034-4885/75/12/126001

work page doi:10.1088/0034-4885/75/12/126001 2012
[44]

Statistical-mechanical theory of irreversible processes

Ryogo Kubo. Statistical-mechanical theory of irreversible processes. i. general theory and simple applications to magnetic and conduction problems.Journal of the Physical Society of Japan, 12(6):570–586, 1957. doi: 10.1143/JPSJ.12.570. URL https://doi.org/10.1143/JPSJ.12.570

work page doi:10.1143/jpsj.12.570 1957
[45]

Barrett, Ron Brightwell, K

Hermann Haken.Synergetics: An Introduction. Springer, 3 edition, 1983. doi: 10.1007/978-3- 642-88338-5. URL https://doi.org/10.1007/978-3-642-88338-5

work page doi:10.1007/978-3- 1983
[46]

R. E. Kalman. A new approach to linear filtering and prediction problems.Journal of Basic Engineering, 82(1):35–45, 1960. doi: 10.1115/1.3662552. URL https://doi.org/10.1115/1.3662 552

work page doi:10.1115/1.3662552 1960
[47]

Åström and Richard M

Karl J. Åström and Richard M. Murray.Feedback Systems: An Introduction for Scientists and Engineers. Princeton University Press, 2 edition, 2021. ISBN 9780691193984. URL https://press.princeton.edu/books/hardcover/9780691193984/feedback-systems. 24

arXiv 2021
[48]

Mikail Khona and Ila R. Fiete. Attractor and integrator networks in the brain.Nature Reviews Neuroscience, 23(12):744–766, 2022. doi: 10.1038/s41583-022-00642-0. URL https://doi.org/10.1038/s41583-022-00642-0

work page doi:10.1038/s41583-022-00642-0 2022
[49]

Perich, Divya Narain, and Juan A

Matthew G. Perich, Divya Narain, and Juan A. Gallego. A neural manifold view of the brain.Nature Neuroscience, 28:1582–1597, 2025. doi: 10.1038/s41593-025-02031-z. URL https://doi.org/10.1038/s41593-025-02031-z

work page doi:10.1038/s41593-025-02031-z 2025
[50]

Churchland, John P

Mark M. Churchland, John P. Cunningham, Matthew T. Kaufman, Justin D. Foster, Paul Nuyujukian, Stephen I. Ryu, and Krishna V. Shenoy. Neural population dynamics during reaching.Nature, 487(7405):51–56, 2012. doi: 10.1038/nature11129. URL https://doi.org/10.1 038/nature11129

work page doi:10.1038/nature11129 2012
[51]

Towards best practices of activation patching in language models: Metrics and methods

Fred Zhang and Neel Nanda. Towards best practices of activation patching in language models: Metrics and methods. InInternational Conference on Learning Representations, 2024. URL https://arxiv.org/abs/2309.16042

Pith/arXiv arXiv 2024
[52]

Reciprocal relations in irreversible processes

Lars Onsager. Reciprocal relations in irreversible processes. i.Physical Review, 37(4):405–426,
[53]

URL https://doi.org/10.1103/PhysRev.37.405

doi: 10.1103/PhysRev.37.405. URL https://doi.org/10.1103/PhysRev.37.405

work page doi:10.1103/physrev.37.405
[54]

Yuffa and John A

Alex J. Yuffa and John A. Scales. Linear response laws and causality in electrodynamics. European Journal of Physics, 33(6):1635–1650, 2012. doi: 10.1088/0143-0807/33/6/1635. URL https://doi.org/10.1088/0143-0807/33/6/1635

work page doi:10.1088/0143-0807/33/6/1635 2012
[55]

A review of linear response theory for general differentiable dynamical systems

David Ruelle. A review of linear response theory for general differentiable dynamical systems. Nonlinearity, 22(4):855–870, 2009. doi: 10.1088/0951-7715/22/4/009. URL https://doi.org/10 .1088/0951-7715/22/4/009

work page doi:10.1088/0951-7715/22/4/009 2009
[56]

Coleman and Walter Noll

Bernard D. Coleman and Walter Noll. Foundations of linear viscoelasticity.Reviews of Modern Physics, 33(2):239–249, 1961. doi: 10.1103/RevModPhys.33.239. URL https: //doi.org/10.1103/RevModPhys.33.239

work page doi:10.1103/revmodphys.33.239 1961
[57]

Gemma 3 technical report, 2025

Gemma Team. Gemma 3 technical report, 2025. URL https://arxiv.org/abs/2503.19786

Pith/arXiv arXiv 2025
[58]

Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras, 2025

Marah Abdin et al. Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras, 2025. URL https://arxiv.org/abs/2503.01743

Pith/arXiv arXiv 2025
[59]

The llama 3 herd of models, 2024

Aaron Grattafiori et al. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.2 1783

2024
[60]

Qwen3 technical report, 2025

An Yang et al. Qwen3 technical report, 2025. URL https://arxiv.org/abs/2505.09388

Pith/arXiv arXiv 2025
[61]

Artificial intelligence risk management framework (AI RMF 1.0)

National Institute of Standards and Technology. Artificial intelligence risk management framework (AI RMF 1.0). NIST AI 100-1, National Institute of Standards and Technology,
[62]

URL https://doi.org/10.6028/NIST.AI.100-1

work page doi:10.6028/nist.ai.100-1
[63]

Robust neuronal dynamics in premotor cortex during motor planning.Nature, 532:459–464, 2016

Nuo Li, Kayvon Daie, Karel Svoboda, and Shaul Druckmann. Robust neuronal dynamics in premotor cortex during motor planning.Nature, 532:459–464, 2016. doi: 10.1038/nature17643. URL https://doi.org/10.1038/nature17643

work page doi:10.1038/nature17643 2016
[64]

Discrete attractor dynamics underlies persistent activity in the frontal cortex, 2019

Karel Svoboda and Hidehiko Inagaki. Discrete attractor dynamics underlies persistent activity in the frontal cortex, 2019. URL https://janelia.figshare.com/articles/dataset/Discrete_attr actor_dynamics_underlies_persistent_activity_in_the_frontal_cortex/7489253. Janelia Research Campus dataset. 25

arXiv 2019
[65]

Dataset (matlab format) from chen kang et al

Nuo Li and Guang Chen. Dataset (matlab format) from chen kang et al. (2021) modularity and robustness of frontal cortex networks, 2022. URL https://doi.org/10.5281/zenodo.6713616. Zenodo dataset

work page doi:10.5281/zenodo.6713616 2021
[66]

Dataset (matlab format) from yang et al

Nuo Li and Weiguo Yang. Dataset (matlab format) from yang et al. (2022) thalamus-driven functional populations in frontal cortex support decision-making, 2022. URL https://doi.org/ 10.5281/zenodo.6846161. Zenodo dataset

work page doi:10.5281/zenodo.6846161 2022
[67]

pyramidal cell types drive functionally distinct cortical activity patterns during decision-making

Anne Churchland, Xiaonan Sun, and Simon Musall. Data supporting “pyramidal cell types drive functionally distinct cortical activity patterns during decision-making”, 2023. URL https://plus.figshare.com/articles/dataset/Data_supporting_Pyramidal_cell_types_driv e_functionally_distinct_cortical_activity_patterns_during_decision-making_/21538458. Figshare+ dataset

2023
[68]

Standardized and reproducible measurement of decision- making in mice.eLife, 10:e63711, 2021

The International Brain Laboratory. Standardized and reproducible measurement of decision- making in mice.eLife, 10:e63711, 2021. doi: 10.7554/elife.63711. URL https://elifesciences.or g/articles/63711

work page doi:10.7554/elife.63711 2021
[69]

Reproducibility of in vivo electrophysiological mea- surements in mice.eLife, 13:RP100840, 2025

The International Brain Laboratory. Reproducibility of in vivo electrophysiological mea- surements in mice.eLife, 13:RP100840, 2025. doi: 10.7554/elife.100840.3. URL https://elifesciences.org/articles/100840

work page doi:10.7554/elife.100840.3 2025
[70]

Francesco Randi, Anuj Sharma, Sophie Dvali, and Andrew M. Leifer. Neural signal propagation atlas of caenorhabditis elegans, 2024. URL https://dandiarchive.org/dandiset/001075/0.24093 0.1859. DANDI archive dataset

2024
[71]

Thermoregulatory responses forebrain, 2023

Martin Haesemeyer and Kaarthik Balakrishnan. Thermoregulatory responses forebrain, 2023. URL https://dandiarchive.org/dandiset/000235/0.230316.1600. DANDI archive dataset

arXiv 2023
[72]

Thermoregulatory responses midbrain, 2023

Martin Haesemeyer and Kaarthik Balakrishnan. Thermoregulatory responses midbrain, 2023. URL https://dandiarchive.org/dandiset/000236/0.230316.2031. DANDI archive dataset

arXiv 2023
[73]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models, 2021. URL https://arxiv.org/abs/2106.09685

Pith/arXiv arXiv 2021
[74]

Measuring AI R&D automation, 2026

Alan Chan, Ranay Padarath, Joe Kwon, Hilary Greaves, and Markus Anderljung. Measuring AI R&D automation, 2026. URL https://arxiv.org/abs/2603.03992

arXiv 2026
[75]

Inspectable AI for science: A research object approach to generative AI governance,

Ruta Binkyte, Sharif Abuaddba, Chamikara Mahawaga, Ming Ding, Natasha Fernandes, and Mario Fritz. Inspectable AI for science: A research object approach to generative AI governance,
[76]

URL https://arxiv.org/abs/2604.11261

Pith/arXiv arXiv
[77]

Zhehao Zhang, Weijie Xu, Fanyou Wu, and Chandan K. Reddy. Falsereject: A resource for improving contextual safety and mitigating over-refusals in LLMs via structured reasoning,
[78]

URL https://arxiv.org/abs/2505.08054

arXiv
[79]

DeceptionBench: A comprehensive benchmark for AI deception behaviors in real-world scenarios,

Yao Huang, Yitong Sun, Yichi Zhang, Ruochen Zhang, Yinpeng Dong, and Xingxing Wei. DeceptionBench: A comprehensive benchmark for AI deception behaviors in real-world scenarios,
[80]

URL https://arxiv.org/abs/2510.15501

arXiv

Showing first 80 references.

[1] [1]

Abc align: Large language model alignment for safety & accuracy, 2024

Gareth Seneque, Lap-Hang Ho, Ariel Kuperman, Nafise Erfanian Saeedi, and Jeffrey Molendijk. Abc align: Large language model alignment for safety & accuracy, 2024. URL https://arxiv.or g/abs/2408.00307

arXiv 2024

[2] [2]

Enigma: The geometry of reasoning and alignment in large-language models,

Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, Jeffrey Molendijk, Ariel Kupermann, and Tim Elson. Enigma: The geometry of reasoning and alignment in large-language models,

[3] [3]

URL https://arxiv.org/abs/2510.11278

arXiv

[4] [4]

Atlas: Constitution-conditioned latent geometry and redistribution across language models and neural perturbation data, 2026

Gareth Seneque, Lap-Hang Ho, Nafise Erfanian Saeedi, Jeffrey Molendijk, and Tim Elson. Atlas: Constitution-conditioned latent geometry and redistribution across language models and neural perturbation data, 2026. URL https://arxiv.org/abs/2604.17663

Pith/arXiv arXiv 2026

[5] [5]

Constitutional ai: Harmlessness from ai feedback, 2022

Yuntao Bai et al. Constitutional ai: Harmlessness from ai feedback, 2022. URL https: //arxiv.org/abs/2212.08073

Pith/arXiv arXiv 2022

[6] [6]

Collective constitutional ai: Aligning a language model with public input

Saffron Huang et al. Collective constitutional ai: Aligning a language model with public input. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency,

2024

[7] [7]

Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli

doi: 10.1145/3630106.3658979. URL https://arxiv.org/abs/2406.07814

work page doi:10.1145/3630106.3658979

[8] [8]

Self-supervised alignment with mutual information: Learning to follow principles without preference labels, 2024

Jan-Philipp Franken et al. Self-supervised alignment with mutual information: Learning to follow principles without preference labels, 2024. URL https://arxiv.org/abs/2404.14313

arXiv 2024

[9] [9]

Let’s verify step by step, 2023

Hunter Lightman et al. Let’s verify step by step, 2023. URL https://arxiv.org/abs/2305.20050

Pith/arXiv arXiv 2023

[10] [10]

Math-Shepherd: Verify and Reinforce

Peiyi Wang, Lei Li, Zhihong Shao, Runxin Xu, Damai Dai, Yifei Li, Deli Chen, Yu Wu, and Zhifang Sui. Math-shepherd: Verify and reinforce llms step-by-step without human annotations. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 9426–9439, 2024. doi: 10.18653/v1/2024.acl-long.510. URL https: //aclanthology...

work page doi:10.18653/v1/2024.acl-long.510 2024

[11] [11]

Alignment faking in large language models, 2024

Ryan Greenblatt et al. Alignment faking in large language models, 2024. URL https://arxiv.or g/abs/2412.14093

Pith/arXiv arXiv 2024

[12] [12]

Why do some language models fake alignment while others don’t?, 2025

Abhay Sheshadri et al. Why do some language models fake alignment while others don’t?, 2025. URL https://arxiv.org/abs/2506.18032

arXiv 2025

[13] [13]

A mathematical framework for transformer circuits, 2021

Nelson Elhage et al. A mathematical framework for transformer circuits, 2021. URL https: //transformer-circuits.pub/2021/framework/index.html

2021

[14] [14]

Circuit tracing: Revealing computational graphs in language models, 2025

Anthropic. Circuit tracing: Revealing computational graphs in language models, 2025. URL https://transformer-circuits.pub/2025/attribution-graphs/methods.html

2025

[15] [15]

Representation engineering: A top-down approach to ai transparency, 2023

Andy Zou et al. Representation engineering: A top-down approach to ai transparency, 2023. URL https://arxiv.org/abs/2310.01405

Pith/arXiv arXiv 2023

[16] [16]

Activation addition: Steering language models without optimiza- tion, 2023

Alexander Matt Turner et al. Activation addition: Steering language models without optimiza- tion, 2023. URL https://arxiv.org/abs/2308.10248

Pith/arXiv arXiv 2023

[17] [17]

Steering

Nina Rimsky, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, and Alexander Turner. Steering Llama 2 via contrastive activation addition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, pages 15504–15522, 2024. doi: 10.18653/v1/2024.acl-long.828. URL https://aclanthology.org/2024.acl-long.828/. 22

work page doi:10.18653/v1/2024.acl-long.828 2024

[18] [18]

Refusal in language models is mediated by a single direction, 2024

Andy Arditi et al. Refusal in language models is mediated by a single direction, 2024. URL https://arxiv.org/abs/2406.11717

Pith/arXiv arXiv 2024

[19] [19]

Lee et al

Bruce W. Lee et al. Programming refusal with conditional activation steering, 2024. URL https://arxiv.org/abs/2409.05907

arXiv 2024

[20] [20]

Steering off course: Reliability challenges in steering language models,

Patricia Da Silva et al. Steering off course: Reliability challenges in steering language models,

[21] [21]

URL https://arxiv.org/abs/2504.04635

arXiv

[22] [22]

Position: The platonic representation hypothesis

Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola. Position: The platonic representation hypothesis. InProceedings of the 41st International Conference on Machine Learning, 2024. URL https://proceedings.mlr.press/v235/huh24a.html

2024

[23] [23]

Rishi Jha, Collin Zhang, Vitaly Shmatikov, and John X. Morris. Harnessing the universal geometry of embeddings, 2025. URL https://arxiv.org/abs/2505.12540

arXiv 2025

[24] [24]

Revisiting the platonic representation hypothesis: An aristotelian view, 2026

Fabian Groeger, Shuo Wen, and Maria Brbic. Revisiting the platonic representation hypothesis: An aristotelian view, 2026. URL https://arxiv.org/abs/2602.14486

Pith/arXiv arXiv 2026

[25] [25]

Sensory experience steers representational drift in mouse visual cortex.Nature Communications, 15, 2024

Joel Bauer et al. Sensory experience steers representational drift in mouse visual cortex.Nature Communications, 15, 2024. doi: 10.1038/s41467-024-53326-x. URL https://doi.org/10.1038/s4 1467-024-53326-x

work page doi:10.1038/s41467-024-53326-x 2024

[26] [26]

Climer, Heydar Davoudi, Jun Young Oh, and Daniel A

Jason R. Climer, Heydar Davoudi, Jun Young Oh, and Daniel A. Dombeck. Hippocampal representations drift in stable multisensory environments.Nature, 645:457–465, 2025. doi: 10.1038/s41586-025-09245-y. URL https://doi.org/10.1038/s41586-025-09245-y

work page doi:10.1038/s41586-025-09245-y 2025

[27] [27]

A brain-wide map of neural activity during complex behaviour , url =

The International Brain Laboratory. A brain-wide map of neural activity during complex behaviour.Nature, 645:177–191, 2025. doi: 10.1038/s41586-025-09235-0. URL https: //doi.org/10.1038/s41586-025-09235-0

work page doi:10.1038/s41586-025-09235-0 2025

[28] [28]

Brain-wide representations of prior information in mouse decision-making

Charles Findling et al. Brain-wide representations of prior information in mouse decision-making. Nature, 2025. doi: 10.1038/s41586-025-09226-1. URL https://doi.org/10.1038/s41586-025- 09226-1

work page doi:10.1038/s41586-025-09226-1 2025

[29] [29]

doi: 10.1561/2200000073

Gabriel Peyre and Marco Cuturi. Computational optimal transport: With applications to data science.Foundations and Trends in Machine Learning, 11(5-6):355–607, 2019. doi: 10.1561/2200000073. URL https://doi.org/10.1561/2200000073

work page doi:10.1561/2200000073 2019

[30] [30]

Unbalanced optimal transport: Dynamic and kantorovich formulations.Journal of Functional Analysis, 274 (11):3090–3123, 2018

Lenaic Chizat, Gabriel Peyre, Bernhard Schmitzer, and Francois-Xavier Vialard. Unbalanced optimal transport: Dynamic and kantorovich formulations.Journal of Functional Analysis, 274 (11):3090–3123, 2018. doi: 10.1016/j.jfa.2018.03.008. URL https://doi.org/10.1016/j.jfa.2018.0 3.008

work page doi:10.1016/j.jfa.2018.03.008 2018

[31] [31]

Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport, 2025

Zhenyi Zhang, Tiejun Li, and Peijie Zhou. Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport, 2025. URL https://arxiv.org/abs/2410.00844

arXiv 2025

[32] [32]

Phase-field Approaches to Structural Topology Optimization

Shun-ichi Amari.Information Geometry and Its Applications. Springer, 2016. doi: 10.1007/978- 4-431-55978-8. URL https://doi.org/10.1007/978-4-431-55978-8

work page doi:10.1007/978- 2016

[33] [33]

R. E. Kalman. On the general theory of control systems. InProceedings of the First International Congress on Automatic Control, 1960. doi: 10.1016/S1474-6670(17)70094-8. URL https: //doi.org/10.1016/S1474-6670(17)70094-8. 23

work page doi:10.1016/s1474-6670(17)70094-8 1960

[34] [34]

Jan C. Willems. Dissipative dynamical systems part i: General theory.Archive for Rational Mechanics and Analysis, 45:321–351, 1972. doi: 10.1007/BF01665402. URL https://doi.org/10 .1007/BF01665402

work page doi:10.1007/bf01665402 1972

[35] [35]

Time, structure, and fluctuations.Science, 201(4358):777–785, 1978

Ilya Prigogine. Time, structure, and fluctuations.Science, 201(4358):777–785, 1978. doi: 10.1126/science.201.4358.777. URL https://doi.org/10.1126/science.201.4358.777

work page doi:10.1126/science.201.4358.777 1978

[36] [36]

Data-driven optimal control of unknown nonlinear dynamical systems using the koopman operator

Zhexuan Zeng, Ruikun Zhou, Yiming Meng, and Jun Liu. Data-driven optimal control of unknown nonlinear dynamical systems using the koopman operator. InProceedings of the 7th Annual Learning for Dynamics & Control Conference, pages 1127–1139, 2025. URL https://proceedings.mlr.press/v283/zeng25a.html

2025

[37] [37]

Transition-path theory and path-finding algorithms for the study of rare events.Annual Review of Physical Chemistry, 61:391–420, 2010

Weinan E and Eric Vanden-Eijnden. Transition-path theory and path-finding algorithms for the study of rare events.Annual Review of Physical Chemistry, 61:391–420, 2010. doi: 10.1146/annurev.physchem.040808.090412. URL https://doi.org/10.1146/annurev.physchem.0 40808.090412

work page doi:10.1146/annurev.physchem.040808.090412 2010

[38] [38]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems, 2023. URL https://arxiv.org/abs/2302.04761

Pith/arXiv arXiv 2023

[39] [39]

Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents

Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents. InInternational Conference on Learning Representations,

[40] [40]

URL https://arxiv.org/abs/2410.02644

Pith/arXiv arXiv

[41] [41]

Wiley-Interscience, New York, 1977

Gregoire Nicolis and Ilya Prigogine.Self-Organization in Nonequilibrium Systems: From Dissipative Structures to Order Through Fluctuations. Wiley-Interscience, New York, 1977. ISBN 0471024015

1977

[42] [42]

M. C. Cross and P. C. Hohenberg. Pattern formation outside of equilibrium.Reviews of Modern Physics, 65(3):851–1112, 1993. doi: 10.1103/RevModPhys.65.851. URL https: //doi.org/10.1103/RevModPhys.65.851

work page doi:10.1103/revmodphys.65.851 1993

[43] [43]

Reports on Progress in Physics75(12), 126001 (2012) https://doi.org/ 10.1088/0034-4885/75/12/126001

Udo Seifert. Stochastic thermodynamics, fluctuation theorems and molecular machines.Reports on Progress in Physics, 75(12):126001, 2012. doi: 10.1088/0034-4885/75/12/126001. URL https://doi.org/10.1088/0034-4885/75/12/126001

work page doi:10.1088/0034-4885/75/12/126001 2012

[44] [44]

Statistical-mechanical theory of irreversible processes

Ryogo Kubo. Statistical-mechanical theory of irreversible processes. i. general theory and simple applications to magnetic and conduction problems.Journal of the Physical Society of Japan, 12(6):570–586, 1957. doi: 10.1143/JPSJ.12.570. URL https://doi.org/10.1143/JPSJ.12.570

work page doi:10.1143/jpsj.12.570 1957

[45] [45]

Barrett, Ron Brightwell, K

Hermann Haken.Synergetics: An Introduction. Springer, 3 edition, 1983. doi: 10.1007/978-3- 642-88338-5. URL https://doi.org/10.1007/978-3-642-88338-5

work page doi:10.1007/978-3- 1983

[46] [46]

R. E. Kalman. A new approach to linear filtering and prediction problems.Journal of Basic Engineering, 82(1):35–45, 1960. doi: 10.1115/1.3662552. URL https://doi.org/10.1115/1.3662 552

work page doi:10.1115/1.3662552 1960

[47] [47]

Åström and Richard M

Karl J. Åström and Richard M. Murray.Feedback Systems: An Introduction for Scientists and Engineers. Princeton University Press, 2 edition, 2021. ISBN 9780691193984. URL https://press.princeton.edu/books/hardcover/9780691193984/feedback-systems. 24

arXiv 2021

[48] [48]

Mikail Khona and Ila R. Fiete. Attractor and integrator networks in the brain.Nature Reviews Neuroscience, 23(12):744–766, 2022. doi: 10.1038/s41583-022-00642-0. URL https://doi.org/10.1038/s41583-022-00642-0

work page doi:10.1038/s41583-022-00642-0 2022

[49] [49]

Perich, Divya Narain, and Juan A

Matthew G. Perich, Divya Narain, and Juan A. Gallego. A neural manifold view of the brain.Nature Neuroscience, 28:1582–1597, 2025. doi: 10.1038/s41593-025-02031-z. URL https://doi.org/10.1038/s41593-025-02031-z

work page doi:10.1038/s41593-025-02031-z 2025

[50] [50]

Churchland, John P

Mark M. Churchland, John P. Cunningham, Matthew T. Kaufman, Justin D. Foster, Paul Nuyujukian, Stephen I. Ryu, and Krishna V. Shenoy. Neural population dynamics during reaching.Nature, 487(7405):51–56, 2012. doi: 10.1038/nature11129. URL https://doi.org/10.1 038/nature11129

work page doi:10.1038/nature11129 2012

[51] [51]

Towards best practices of activation patching in language models: Metrics and methods

Fred Zhang and Neel Nanda. Towards best practices of activation patching in language models: Metrics and methods. InInternational Conference on Learning Representations, 2024. URL https://arxiv.org/abs/2309.16042

Pith/arXiv arXiv 2024

[52] [52]

Reciprocal relations in irreversible processes

Lars Onsager. Reciprocal relations in irreversible processes. i.Physical Review, 37(4):405–426,

[53] [53]

URL https://doi.org/10.1103/PhysRev.37.405

doi: 10.1103/PhysRev.37.405. URL https://doi.org/10.1103/PhysRev.37.405

work page doi:10.1103/physrev.37.405

[54] [54]

Yuffa and John A

Alex J. Yuffa and John A. Scales. Linear response laws and causality in electrodynamics. European Journal of Physics, 33(6):1635–1650, 2012. doi: 10.1088/0143-0807/33/6/1635. URL https://doi.org/10.1088/0143-0807/33/6/1635

work page doi:10.1088/0143-0807/33/6/1635 2012

[55] [55]

A review of linear response theory for general differentiable dynamical systems

David Ruelle. A review of linear response theory for general differentiable dynamical systems. Nonlinearity, 22(4):855–870, 2009. doi: 10.1088/0951-7715/22/4/009. URL https://doi.org/10 .1088/0951-7715/22/4/009

work page doi:10.1088/0951-7715/22/4/009 2009

[56] [56]

Coleman and Walter Noll

Bernard D. Coleman and Walter Noll. Foundations of linear viscoelasticity.Reviews of Modern Physics, 33(2):239–249, 1961. doi: 10.1103/RevModPhys.33.239. URL https: //doi.org/10.1103/RevModPhys.33.239

work page doi:10.1103/revmodphys.33.239 1961

[57] [57]

Gemma 3 technical report, 2025

Gemma Team. Gemma 3 technical report, 2025. URL https://arxiv.org/abs/2503.19786

Pith/arXiv arXiv 2025

[58] [58]

Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras, 2025

Marah Abdin et al. Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras, 2025. URL https://arxiv.org/abs/2503.01743

Pith/arXiv arXiv 2025

[59] [59]

The llama 3 herd of models, 2024

Aaron Grattafiori et al. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.2 1783

2024

[60] [60]

Qwen3 technical report, 2025

An Yang et al. Qwen3 technical report, 2025. URL https://arxiv.org/abs/2505.09388

Pith/arXiv arXiv 2025

[61] [61]

Artificial intelligence risk management framework (AI RMF 1.0)

National Institute of Standards and Technology. Artificial intelligence risk management framework (AI RMF 1.0). NIST AI 100-1, National Institute of Standards and Technology,

[62] [62]

URL https://doi.org/10.6028/NIST.AI.100-1

work page doi:10.6028/nist.ai.100-1

[63] [63]

Robust neuronal dynamics in premotor cortex during motor planning.Nature, 532:459–464, 2016

Nuo Li, Kayvon Daie, Karel Svoboda, and Shaul Druckmann. Robust neuronal dynamics in premotor cortex during motor planning.Nature, 532:459–464, 2016. doi: 10.1038/nature17643. URL https://doi.org/10.1038/nature17643

work page doi:10.1038/nature17643 2016

[64] [64]

Discrete attractor dynamics underlies persistent activity in the frontal cortex, 2019

Karel Svoboda and Hidehiko Inagaki. Discrete attractor dynamics underlies persistent activity in the frontal cortex, 2019. URL https://janelia.figshare.com/articles/dataset/Discrete_attr actor_dynamics_underlies_persistent_activity_in_the_frontal_cortex/7489253. Janelia Research Campus dataset. 25

arXiv 2019

[65] [65]

Dataset (matlab format) from chen kang et al

Nuo Li and Guang Chen. Dataset (matlab format) from chen kang et al. (2021) modularity and robustness of frontal cortex networks, 2022. URL https://doi.org/10.5281/zenodo.6713616. Zenodo dataset

work page doi:10.5281/zenodo.6713616 2021

[66] [66]

Dataset (matlab format) from yang et al

Nuo Li and Weiguo Yang. Dataset (matlab format) from yang et al. (2022) thalamus-driven functional populations in frontal cortex support decision-making, 2022. URL https://doi.org/ 10.5281/zenodo.6846161. Zenodo dataset

work page doi:10.5281/zenodo.6846161 2022

[67] [67]

pyramidal cell types drive functionally distinct cortical activity patterns during decision-making

Anne Churchland, Xiaonan Sun, and Simon Musall. Data supporting “pyramidal cell types drive functionally distinct cortical activity patterns during decision-making”, 2023. URL https://plus.figshare.com/articles/dataset/Data_supporting_Pyramidal_cell_types_driv e_functionally_distinct_cortical_activity_patterns_during_decision-making_/21538458. Figshare+ dataset

2023

[68] [68]

Standardized and reproducible measurement of decision- making in mice.eLife, 10:e63711, 2021

The International Brain Laboratory. Standardized and reproducible measurement of decision- making in mice.eLife, 10:e63711, 2021. doi: 10.7554/elife.63711. URL https://elifesciences.or g/articles/63711

work page doi:10.7554/elife.63711 2021

[69] [69]

Reproducibility of in vivo electrophysiological mea- surements in mice.eLife, 13:RP100840, 2025

The International Brain Laboratory. Reproducibility of in vivo electrophysiological mea- surements in mice.eLife, 13:RP100840, 2025. doi: 10.7554/elife.100840.3. URL https://elifesciences.org/articles/100840

work page doi:10.7554/elife.100840.3 2025

[70] [70]

Francesco Randi, Anuj Sharma, Sophie Dvali, and Andrew M. Leifer. Neural signal propagation atlas of caenorhabditis elegans, 2024. URL https://dandiarchive.org/dandiset/001075/0.24093 0.1859. DANDI archive dataset

2024

[71] [71]

Thermoregulatory responses forebrain, 2023

Martin Haesemeyer and Kaarthik Balakrishnan. Thermoregulatory responses forebrain, 2023. URL https://dandiarchive.org/dandiset/000235/0.230316.1600. DANDI archive dataset

arXiv 2023

[72] [72]

Thermoregulatory responses midbrain, 2023

Martin Haesemeyer and Kaarthik Balakrishnan. Thermoregulatory responses midbrain, 2023. URL https://dandiarchive.org/dandiset/000236/0.230316.2031. DANDI archive dataset

arXiv 2023

[73] [73]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models, 2021. URL https://arxiv.org/abs/2106.09685

Pith/arXiv arXiv 2021

[74] [74]

Measuring AI R&D automation, 2026

Alan Chan, Ranay Padarath, Joe Kwon, Hilary Greaves, and Markus Anderljung. Measuring AI R&D automation, 2026. URL https://arxiv.org/abs/2603.03992

arXiv 2026

[75] [75]

Inspectable AI for science: A research object approach to generative AI governance,

Ruta Binkyte, Sharif Abuaddba, Chamikara Mahawaga, Ming Ding, Natasha Fernandes, and Mario Fritz. Inspectable AI for science: A research object approach to generative AI governance,

[76] [76]

URL https://arxiv.org/abs/2604.11261

Pith/arXiv arXiv

[77] [77]

Zhehao Zhang, Weijie Xu, Fanyou Wu, and Chandan K. Reddy. Falsereject: A resource for improving contextual safety and mitigating over-refusals in LLMs via structured reasoning,

[78] [78]

URL https://arxiv.org/abs/2505.08054

arXiv

[79] [79]

DeceptionBench: A comprehensive benchmark for AI deception behaviors in real-world scenarios,

Yao Huang, Yitong Sun, Yichi Zhang, Ruochen Zhang, Yinpeng Dong, and Xingxing Wei. DeceptionBench: A comprehensive benchmark for AI deception behaviors in real-world scenarios,

[80] [80]

URL https://arxiv.org/abs/2510.15501

arXiv