From monoliths to modules: Decomposing transducers for efficient world modelling

Alexander Boyd; David Hyland; Fernando E. Rosas; Franz Nowak; Manuel Baltieri

arxiv: 2512.02193 · v2 · pith:WZDD6CJCnew · submitted 2025-12-01 · 💻 cs.AI

From monoliths to modules: Decomposing transducers for efficient world modelling

Alexander Boyd , Franz Nowak , David Hyland , Manuel Baltieri , Fernando E. Rosas This is my paper

Pith reviewed 2026-05-22 11:49 UTC · model grok-4.3

classification 💻 cs.AI

keywords transducer decompositionworld modelsmodular modelingPOMDPsdistributed inferenceAI safetyefficient world modeling

0 comments

The pith

Decomposing transducers into sub-transducers on distinct input-output subspaces yields parallel and interpretable world models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a framework for breaking down complex transducer-based world models into smaller sub-transducers that each handle separate input and output subspaces. This inverts the standard way transducers are combined, producing modular pieces instead of one large model. A reader would care because world models demand heavy computation for training AI agents, and modular versions could run in parallel while remaining easier to inspect. The work connects efficiency needs for real-world use with the transparency required for AI safety.

Core claim

Our results clarify how to invert this process by deriving sub-transducers operating on distinct input-output subspaces, enabling parallelizable and interpretable alternatives to monolithic world modelling that can support distributed inference.

What carries the argument

Decomposition framework for transducers, which generalizes POMDPs, that inverts composition to produce sub-transducers on distinct input-output subspaces.

If this is right

World model computation becomes parallelizable across subspaces.
Interpretability of the model increases for inspection and safety analysis.
Distributed inference across separate components becomes feasible.
Efficiency gains support scaling to realistic, high-demand scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may allow larger world models to run on distributed hardware without full centralization.
It could connect to modular designs in other sequential prediction tasks beyond world modeling.
Empirical tests on agent training loops would show whether speedups appear in practice.

Load-bearing premise

Real-world scenarios tend to involve subcomponents that interact in a modular manner.

What would settle it

A counter-example transducer built from a non-modular real-world process where the derived sub-transducers fail to match the original input-output behavior on their subspaces.

Figures

Figures reproduced from arXiv: 2512.02193 by Alexander Boyd, David Hyland, Fernando E. Rosas, Franz Nowak, Manuel Baltieri.

**Figure 1.** Figure 1: In this work, we first present a method for composing stochastic environments into larger ones. We then use this framework to identify procedures that reverse the process, decomposing complex environments into simpler, modular subcomponents. 2 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: A general interface and two illustrations of a causal interface: An unravelled general interface (a) takes a semi-infinite sequence of inputs X0, X1, X2 · · · and stochastically transforms it to a semi-infinite sequence of outputs Y0, Y1, Y2 · · · , without any constraints on dependencies between inputs and outputs. An unravelled causal interface (b) shows individual inputs Xt and outputs Yt unravelled int… view at source ↗

**Figure 3.** Figure 3: A transducer is a general model that transforms an input process X to an output process Y using a latent process R as memory, which can be used to generate interfaces. The network representation draws an arrow from the input process X with unknown source ( X ) to the output process Y with latent variable R ( Y R ). The circuit element representation takes two inputs (Xt and Rt) to two outputs (Yt and Rt+1)… view at source ↗

**Figure 4.** Figure 4: The network in the top left shows the most general way of composing two transducers, T with latent states R, inputs X, and outputs Y , and U with latent states S, inputs XY , and outputs Z. A circuit element that implements this composite transducer is shown on the top right, with time proceeding from left to right. The composite transducer V , when applied in sequence at the bottom, produces the interface… view at source ↗

**Figure 5.** Figure 5: Lattice of sub-classes of transducer composition, ordered according to the number of restrictions they consider. Pruning edges from left to right corresponds to limiting dependencies on the inputs — these limitations are enumerated in conditions 1 through 5. The arrows with dotted lines are labelled with the number of the condition that is necessary to prune each edge. In this lattice, we highlight three n… view at source ↗

**Figure 6.** Figure 6: A network of N + 1 transducers can be expressed as a single transducer (left), or as a compositional network (middle). In this fully connected composition, we see that the nth transducer, with random variables X(n) R(n) depends on all prior outputs X(0 : n) = {X(0), · · · , X(n − 1)}. This composition allows us to examine sparse networks (right), where dependencies are pruned between elements. As shown in … view at source ↗

**Figure 7.** Figure 7: We can shift between two equivalent viewpoints: one which is conditional on inputs X , and one which addresses the joint process. by piping the output of the first to the input of the second, while the output of the second becomes the output of the resulting composed transducer (Mohri et al., 2002). Note that this is an extension of the classic algorithm for intersecting finite state automata (FSTs without… view at source ↗

**Figure 8.** Figure 8: Two illustrations of a transducer: An unravelled transducer (a) shows individual inputs Xt, outputs Yt, and latent states Rt unravelled into a semi-infinite sequence. The same object can be condensed (b) into a mapping from input pasts ←− Xt, input futures −→Xt, and latent state pasts ←− Rt to output pasts ←− Y t, output futures −→Y t, latent state futures −→Rt via the present latent state Rt. conditional … view at source ↗

**Figure 9.** Figure 9: By finding the smallest sets of observables O and latents O ′ such that the remaining observables J − O function as the input to a transducer with output X(O) and latents R(O ′ ), we can decompose the transducer network. This provides a factorization of the overall transducer X(J ) R(J ′) . Algorithm 1: Decomposition via Intransducibility P := Pr(X(J ), R(J ′ )) ; Robs := J ; Rlat := J ′ ; M := ∅ ; while (… view at source ↗

**Figure 10.** Figure 10: The total transducer network can be simplified from the bottom by conditioning on upstream observables, and from the top by marginalizing downstream observables. 5.1.1 Simplifying from the top If our objective is to describe the system only up to X(b−1), then all nodes X(b : N)/R(b : N) lie strictly downstream of the region of interest. Their influence on earlier nodes occurs solely through the observable… view at source ↗

**Figure 11.** Figure 11: A perception-action loop (top) operates through the exchange of actions [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

read the original abstract

World models have been recently proposed as sandbox environments in which AI agents can be trained and evaluated before deployment. While realistic world models often have high computational demands, this can often be alleviated by exploiting the fact that real-world scenarios tend to involve subcomponents that interact in a modular manner. In this paper, we explore this idea by developing a framework for decomposing complex world models represented by transducers, a class of models generalising POMDPs. Whereas the composition of transducers is well understood, our results clarify how to invert this process by deriving sub-transducers operating on distinct input-output subspaces, enabling parallelizable and interpretable alternatives to monolithic world modelling that can support distributed inference. Overall, these results lay groundwork for bridging the computational efficiency required for real-world inference and the structural transparency demanded by AI safety.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches an inversion of transducer composition to get modular sub-models on separate subspaces, but the conditions needed for the split to preserve original dynamics are not laid out.

read the letter

The main point to take away is that this work develops a method to decompose transducers used as world models into smaller sub-transducers that run on distinct input and output subspaces. This aims to enable more efficient, parallel, and interpretable modeling compared to handling everything in one big model. On the positive side, the paper builds on the known ways to compose transducers and turns the attention to the reverse direction. By deriving these sub-transducers, it offers a path toward modular world models that could support distributed computation. This aligns well with the needs in AI agent training, where high computational costs are a barrier, and it adds a layer of structural transparency that could help with safety considerations. The abstract does a good job connecting the technical step to these practical benefits without overclaiming. The softer area is around the guarantees. The decomposition relies on the original transducer's dynamics factoring over the chosen subspaces. If there are cross-dependencies in how states update or outputs are produced, then the sub-transducers might not fully capture the behavior when recombined. The provided description does not detail the necessary conditions on the transition functions or state space for the inversion to be exact. This makes it difficult to assess how broadly the approach applies to realistic entangled scenarios. Some concrete examples or a formal statement of the requirements would strengthen the case. Overall, this paper targets readers working at the intersection of formal methods, reinforcement learning, and AI safety. Those with background in automata theory or POMDPs will find the framework accessible and potentially useful for designing better world models. It shows clear engagement with the problem of scaling world models and offers a specific technical direction, so it merits a serious look from referees who can check the derivations and test the limits. I would recommend putting it through peer review to get feedback on the algebraic foundations and any empirical validation.

Referee Report

2 major / 2 minor

Summary. The paper develops a framework for decomposing transducers (generalizing POMDPs) that represent world models. It inverts the well-understood composition operation to derive sub-transducers operating on distinct input-output subspaces, with the goal of producing parallelizable and interpretable alternatives to monolithic models that support distributed inference and address both computational efficiency and AI safety requirements.

Significance. If the decomposition is shown to preserve dynamics under stated conditions, the work would provide a principled algebraic route to modular world models. This could reduce inference costs via parallelism while improving structural transparency, directly supporting scalable agent training and safety analysis in realistic environments.

major comments (2)

[§4] §4 (Inversion of Composition): The central derivation of sub-transducers from the composition inverse is presented, but the manuscript does not explicitly state the algebraic conditions on the transition relation or state space that guarantee the chosen input-output subspaces admit a factorization. Without these conditions, the projection step risks discarding cross-subspace dependencies, so that recomposition of the sub-transducers fails to recover the original dynamics (precisely the concern raised by the stress-test note).
[§5] §5 (Validation): The reported experiments and examples do not include a controlled case in which the monolithic transducer contains non-factorizable cross-subspace interactions. Such a test is required to establish the scope of the method and to confirm that the derived sub-transducers remain equivalent when the factorization assumption holds.

minor comments (2)

Notation for the input-output subspaces and the projection operators could be introduced earlier and used consistently to improve readability of the derivation.
[§2] A short related-work paragraph situating the transducer decomposition relative to existing factored POMDP or modular RL literature would help readers assess novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed report. We address each major comment below. The concerns are valid, and we have revised the manuscript to incorporate explicit conditions and an additional validation case.

read point-by-point responses

Referee: [§4] §4 (Inversion of Composition): The central derivation of sub-transducers from the composition inverse is presented, but the manuscript does not explicitly state the algebraic conditions on the transition relation or state space that guarantee the chosen input-output subspaces admit a factorization. Without these conditions, the projection step risks discarding cross-subspace dependencies, so that recomposition of the sub-transducers fails to recover the original dynamics (precisely the concern raised by the stress-test note).

Authors: We agree that the algebraic conditions must be stated explicitly. The derivation in §4 assumes a product state space S = S1 × S2 and a separable transition relation δ((s1, s2), (i1, i2)) = (δ1(s1, i1), δ2(s2, i2)) with no cross terms. We have added a formal statement of this precondition together with a short proof that, under separability, projection yields sub-transducers whose composition recovers the original transducer exactly. This makes clear that cross-subspace dependencies are excluded by the assumption rather than discarded after the fact. revision: yes
Referee: [§5] §5 (Validation): The reported experiments and examples do not include a controlled case in which the monolithic transducer contains non-factorizable cross-subspace interactions. Such a test is required to establish the scope of the method and to confirm that the derived sub-transducers remain equivalent when the factorization assumption holds.

Authors: The referee correctly identifies a gap in the validation. The existing examples assume modular structure by construction. We have added a controlled counter-example in the revised §5: a transducer whose transition couples the two subspaces (a simple joint-update rule). As predicted, the projected sub-transducers fail to recompose to the original dynamics. We report both the successful modular cases and this negative result to delineate the precise scope of the method. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on algebraic inversion of transducer composition

full rationale

The paper develops a mathematical framework for inverting transducer composition to obtain sub-transducers on distinct I/O subspaces. This is presented as a direct derivation from the well-understood composition operation, without evidence of self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central claim to its own inputs. The modular interaction assumption is explicitly motivational rather than part of the derivation chain. No equations or steps in the abstract reduce by construction to prior results from the same authors; the work appears self-contained against external algebraic benchmarks for transducers and POMDPs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that real-world scenarios are modular; no free parameters, invented entities, or additional axioms are mentioned in the abstract.

axioms (1)

domain assumption Real-world scenarios tend to involve subcomponents that interact in a modular manner.
Invoked in the abstract to motivate the decomposition approach and its efficiency benefits.

pith-pipeline@v0.9.0 · 5670 in / 1168 out tokens · 24638 ms · 2026-05-22T11:49:45.425150+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We explore this idea by developing a framework for decomposing complex world models represented by transducers... deriving sub-transducers operating on distinct input-output subspaces
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

transducers... stochastic kernel T(y|x) r→r′ ... linear operator ˆT(y|x) = Σ T(y|x) r→r′ |r′⟩⟨r|

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 3 internal anchors

[1]

Dreamweaver: Learning compositional world representations from pixels.arXiv preprint arXiv:2501.14174,

Junyeob Baek, Yi-Fu Wu, Gautam Singh, and Sungjin Ahn. Dreamweaver: Learning compositional world representations from pixels.arXiv preprint arXiv:2501.14174,

work page arXiv
[2]

International sci- entific report on the safety of advanced ai (interim report).arXiv preprint arXiv:2412.05282,

Yoshua Bengio, Sören Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Danielle Goldfarb, Hoda Heidari, Leila Khalatbari, et al. International sci- entific report on the safety of advanced ai (interim report).arXiv preprint arXiv:2412.05282,

work page arXiv
[3]

Dota 2 with Large Scale Deep Reinforcement Learning

Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ ebiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, et al. Dota 2 with large scale deep reinforcement learning.arXiv preprint arXiv:1912.06680,

work page internal anchor Pith review Pith/arXiv arXiv 1912
[4]

Thermodynamic overfitting and generalization: Energetic limits on predictive complexity.arXiv preprint arXiv:2402.16995,

Alexander B Boyd, James P Crutchfield, Mile Gu, and Felix C Binder. Thermodynamic overfitting and generalization: Energetic limits on predictive complexity.arXiv preprint arXiv:2402.16995,

work page arXiv
[5]

A Prime Decomposition of Probabilistic Automata

20 Gunnar Carlsson and Jun Yu. A prime decomposition of probabilistic automata.arXiv preprint arXiv:1503.01502,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Towards guaranteed safe ai: A framework for ensuring robust and reliable ai systems

David Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, et al. Towards guaranteed safe AI: A framework for ensuring robust and reliable ai systems.arXiv preprint arXiv:2405.06624,

work page arXiv
[7]

The work capacity of channels with memory: Maximum extractable work in percept-action loops.arXiv preprint arXiv:2504.06209,

Lukas J Fiderer, Paul C Barth, Isaac D Smith, and Hans J Briegel. The work capacity of channels with memory: Maximum extractable work in percept-action loops.arXiv preprint arXiv:2504.06209,

work page arXiv
[8]

Decomposing interventional causality into synergistic, redundant, and unique compo- nents.arXiv preprint arXiv:2501.11447,

Abel Jansma. Decomposing interventional causality into synergistic, redundant, and unique compo- nents.arXiv preprint arXiv:2501.11447,

work page arXiv
[9]

Constrained belief updates explain geometric structures in transformer representations.arXiv preprint arXiv:2502.01954,

Mateusz Piotrowski, Paul M Riechers, Daniel Filan, and Adam S Shai. Constrained belief updates explain geometric structures in transformer representations.arXiv preprint arXiv:2502.01954,

work page arXiv
[10]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations.arXiv preprint arXiv:1709.10087,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Parallel intersection and serial composition of finite state trans- ducers

Mike Reape and Henry Thompson. Parallel intersection and serial composition of finite state trans- ducers. InColing Budapest 1988 Volume 2: International Conference on Computational Linguis- tics,

work page 1988
[12]

Software in the natural world: A computational approach to hierar- chical emergence.arXiv preprint arXiv:2402.09090,

Fernando E Rosas, Bernhard C Geiger, Andrea I Luppi, Anil K Seth, Daniel Polani, Michael Gast- par, and Pedro AM Mediano. Software in the natural world: A computational approach to hierar- chical emergence.arXiv preprint arXiv:2402.09090,

work page arXiv
[13]

Mathematical Sciences Directorate, Air Force Office of Scientific Research, 1961a

Marcel P Schützenberger.A remark on finite transducers. Mathematical Sciences Directorate, Air Force Office of Scientific Research, 1961a. M.P. Schützenberger. A remark on finite transducers.Information and Control, 4(2):185–196, 1961b. ISSN 0019-9958. DOI: https://doi.org/10.1016/S0019-9958(61)80006-5. Adam Shai, Lucas Teixeira, Alexander Oldenziel, Sara...

work page doi:10.1016/s0019-9958(61)80006-5
[14]

Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, et al

DOI: 10.1038/s41586-025-09805-2. Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, et al. Prioritizing safeguarding over autonomy: Risks of llm agents for science. InICLR 2024 Workshop on Large Language Model (LLM) Agents,

work page doi:10.1038/s41586-025-09805-2 2024
[15]

24 Supplementary Materials The following content was not necessarily subject to peer review. A Feedback interfaces In the context of a perception-action loop linking an agent and an environment, the environment can be thought of as a system that stochastically turns action sequencesx 0:t =x 0 · · ·x t−1 into observation sequencesy 0:t =y 0 · · ·y t−1 for ...

work page 2025
[16]

IE(y0:t|x0:t) = Pr(Y 0:t =y 0:t|X0:t =x 0:t) IA(x0:t|y0:t) = Pr(X 0:t =x 0:t|Y0:t =y 0:t).(23) Here, the capital variables represent random variables while the lowercase represent specific re- alizations. Together, they produce the joint probability of an action-observation sequence in the perception-action loop (Fiderer et al., 2025): Pr(X0:t =x 0:t, Y0:...

work page 2025
[17]

V3+/ROSa+T0RzIkLgYPGuACELKg=

t−1Y i=0 e(yi, ri+1|xi, ri)   (29) = Pr(X 0:t =x 0:t|Y0:t =y 0:t) Pr(Y0:t =y 0:t|X0:t =x 0:t)(30) =I A(x0:t|y0:t)IE(y0:t|x0:t).(31) The interface characterizes the behavior of the agent or environment, independent of the details of their internal models or other latents. Figure 11 shows how the perception-action loop can be decomposed into distinct inte...

work page 2025
[18]

Definition 8(Parallel composition).LetT= (X,Y,R, T (y|x) r→r′ )andU= (Z,W,S, U (w|z) s→s′ )be transducers

There are two main types of composition of weighted finite state transducers (WFSTs): in parallel and in series (Mohri, 1997; Mohri et al., 2002). Definition 8(Parallel composition).LetT= (X,Y,R, T (y|x) r→r′ )andU= (Z,W,S, U (w|z) s→s′ )be transducers. The parallel composition ofTandUis a new transducerV= (X × Z,Y × W,R × S, V (yw|rz) rs→r′s′ )with input...

work page 1997

[1] [1]

Dreamweaver: Learning compositional world representations from pixels.arXiv preprint arXiv:2501.14174,

Junyeob Baek, Yi-Fu Wu, Gautam Singh, and Sungjin Ahn. Dreamweaver: Learning compositional world representations from pixels.arXiv preprint arXiv:2501.14174,

work page arXiv

[2] [2]

International sci- entific report on the safety of advanced ai (interim report).arXiv preprint arXiv:2412.05282,

Yoshua Bengio, Sören Mindermann, Daniel Privitera, Tamay Besiroglu, Rishi Bommasani, Stephen Casper, Yejin Choi, Danielle Goldfarb, Hoda Heidari, Leila Khalatbari, et al. International sci- entific report on the safety of advanced ai (interim report).arXiv preprint arXiv:2412.05282,

work page arXiv

[3] [3]

Dota 2 with Large Scale Deep Reinforcement Learning

Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ ebiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, et al. Dota 2 with large scale deep reinforcement learning.arXiv preprint arXiv:1912.06680,

work page internal anchor Pith review Pith/arXiv arXiv 1912

[4] [4]

Thermodynamic overfitting and generalization: Energetic limits on predictive complexity.arXiv preprint arXiv:2402.16995,

Alexander B Boyd, James P Crutchfield, Mile Gu, and Felix C Binder. Thermodynamic overfitting and generalization: Energetic limits on predictive complexity.arXiv preprint arXiv:2402.16995,

work page arXiv

[5] [5]

A Prime Decomposition of Probabilistic Automata

20 Gunnar Carlsson and Jun Yu. A prime decomposition of probabilistic automata.arXiv preprint arXiv:1503.01502,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Towards guaranteed safe ai: A framework for ensuring robust and reliable ai systems

David Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, et al. Towards guaranteed safe AI: A framework for ensuring robust and reliable ai systems.arXiv preprint arXiv:2405.06624,

work page arXiv

[7] [7]

The work capacity of channels with memory: Maximum extractable work in percept-action loops.arXiv preprint arXiv:2504.06209,

Lukas J Fiderer, Paul C Barth, Isaac D Smith, and Hans J Briegel. The work capacity of channels with memory: Maximum extractable work in percept-action loops.arXiv preprint arXiv:2504.06209,

work page arXiv

[8] [8]

Decomposing interventional causality into synergistic, redundant, and unique compo- nents.arXiv preprint arXiv:2501.11447,

Abel Jansma. Decomposing interventional causality into synergistic, redundant, and unique compo- nents.arXiv preprint arXiv:2501.11447,

work page arXiv

[9] [9]

Constrained belief updates explain geometric structures in transformer representations.arXiv preprint arXiv:2502.01954,

Mateusz Piotrowski, Paul M Riechers, Daniel Filan, and Adam S Shai. Constrained belief updates explain geometric structures in transformer representations.arXiv preprint arXiv:2502.01954,

work page arXiv

[10] [10]

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations.arXiv preprint arXiv:1709.10087,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Parallel intersection and serial composition of finite state trans- ducers

Mike Reape and Henry Thompson. Parallel intersection and serial composition of finite state trans- ducers. InColing Budapest 1988 Volume 2: International Conference on Computational Linguis- tics,

work page 1988

[12] [12]

Software in the natural world: A computational approach to hierar- chical emergence.arXiv preprint arXiv:2402.09090,

Fernando E Rosas, Bernhard C Geiger, Andrea I Luppi, Anil K Seth, Daniel Polani, Michael Gast- par, and Pedro AM Mediano. Software in the natural world: A computational approach to hierar- chical emergence.arXiv preprint arXiv:2402.09090,

work page arXiv

[13] [13]

Mathematical Sciences Directorate, Air Force Office of Scientific Research, 1961a

Marcel P Schützenberger.A remark on finite transducers. Mathematical Sciences Directorate, Air Force Office of Scientific Research, 1961a. M.P. Schützenberger. A remark on finite transducers.Information and Control, 4(2):185–196, 1961b. ISSN 0019-9958. DOI: https://doi.org/10.1016/S0019-9958(61)80006-5. Adam Shai, Lucas Teixeira, Alexander Oldenziel, Sara...

work page doi:10.1016/s0019-9958(61)80006-5

[14] [14]

Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, et al

DOI: 10.1038/s41586-025-09805-2. Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, et al. Prioritizing safeguarding over autonomy: Risks of llm agents for science. InICLR 2024 Workshop on Large Language Model (LLM) Agents,

work page doi:10.1038/s41586-025-09805-2 2024

[15] [15]

24 Supplementary Materials The following content was not necessarily subject to peer review. A Feedback interfaces In the context of a perception-action loop linking an agent and an environment, the environment can be thought of as a system that stochastically turns action sequencesx 0:t =x 0 · · ·x t−1 into observation sequencesy 0:t =y 0 · · ·y t−1 for ...

work page 2025

[16] [16]

IE(y0:t|x0:t) = Pr(Y 0:t =y 0:t|X0:t =x 0:t) IA(x0:t|y0:t) = Pr(X 0:t =x 0:t|Y0:t =y 0:t).(23) Here, the capital variables represent random variables while the lowercase represent specific re- alizations. Together, they produce the joint probability of an action-observation sequence in the perception-action loop (Fiderer et al., 2025): Pr(X0:t =x 0:t, Y0:...

work page 2025

[17] [17]

V3+/ROSa+T0RzIkLgYPGuACELKg=

t−1Y i=0 e(yi, ri+1|xi, ri)   (29) = Pr(X 0:t =x 0:t|Y0:t =y 0:t) Pr(Y0:t =y 0:t|X0:t =x 0:t)(30) =I A(x0:t|y0:t)IE(y0:t|x0:t).(31) The interface characterizes the behavior of the agent or environment, independent of the details of their internal models or other latents. Figure 11 shows how the perception-action loop can be decomposed into distinct inte...

work page 2025

[18] [18]

Definition 8(Parallel composition).LetT= (X,Y,R, T (y|x) r→r′ )andU= (Z,W,S, U (w|z) s→s′ )be transducers

There are two main types of composition of weighted finite state transducers (WFSTs): in parallel and in series (Mohri, 1997; Mohri et al., 2002). Definition 8(Parallel composition).LetT= (X,Y,R, T (y|x) r→r′ )andU= (Z,W,S, U (w|z) s→s′ )be transducers. The parallel composition ofTandUis a new transducerV= (X × Z,Y × W,R × S, V (yw|rz) rs→r′s′ )with input...

work page 1997