Preisach Attention: A Hysteretic Model of Sequential Memory

Piotr Frydrych

arxiv: 2605.23603 · v1 · pith:QDXC2BDCnew · submitted 2026-05-22 · 💻 cs.LG · cond-mat.dis-nn· cs.AI· cs.NE

Preisach Attention: A Hysteretic Model of Sequential Memory

Piotr Frydrych This is my paper

Pith reviewed 2026-05-25 04:40 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.dis-nncs.AIcs.NE

keywords preisach hysteresisattention layerturing completenessrate independenceextrema stacksequence modelingpushdown automatontransformer architecture

0 comments

The pith

A hysteretic attention layer using binary relays and an extrema stack achieves Turing completeness in constant depth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the Preisach Attention Layer as a replacement for softmax attention, where each unit applies a binary relay operator with learned activation and deactivation thresholds. This operator tracks only the sequence of local extrema in the input history rather than all tokens or their positions. The resulting architecture allows a single-layer model to simulate a two-stack pushdown automaton, establishing Turing completeness under arbitrary-precision arithmetic. Standard transformers require logarithmic depth for the same capability. The model also separates from transformers by computing certain historical range statistics in constant layers while lacking efficient random-access retrieval.

Core claim

The Preisach Attention Layer replaces softmax with binary relay operators that maintain a stack of local extrema as internal state. This construction yields a single-layer PAL-Transformer that is Turing-complete by direct simulation of a two-stack pushdown automaton. The same layer computes rate-independent functionals, such as historical range statistics, in O(1) depth where transformers require O(log n) depth, while the function classes remain incomparable because transformers support random-access operations that PAL cannot perform without auxiliary state. The extremum stack is shown to be a minimal sufficient statistic for all rate-independent functionals.

What carries the argument

The Preisach Attention Layer, which applies binary relay operators with learned thresholds to maintain and update a stack of local extrema as the sole internal state.

If this is right

A single PAL-Transformer layer suffices for Turing completeness via two-stack pushdown simulation.
Historical range statistics become computable in O(1) layers by PAL while requiring O(log n) layers in transformers.
Transformers retain the ability to perform random-access retrieval that PAL cannot achieve without added state.
PAL inference runs in O(n log n) total cost for sequences where standard attention costs O(n squared).

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models built on rate-independent memory may handle long episodic sequences more efficiently when token timing carries little information.
Hybrid architectures could combine PAL layers for extrema tracking with standard attention for positional retrieval.
The minimal-statistic property suggests that PAL could serve as a drop-in memory module for any task whose target depends only on the ordered sequence of peaks and troughs.

Load-bearing premise

The rate-independence property together with the specific update rules of the binary relay operators are enough to simulate arbitrary computation.

What would settle it

An explicit two-stack pushdown automaton computation that cannot be reproduced by any assignment of thresholds and initial stack state in the binary relay operators under arbitrary precision arithmetic.

read the original abstract

We introduce the Preisach Attention Layer (PAL), a novel sequence modelling architecture grounded in the classical Preisach hysteresis operator from mathematical physics. PAL replaces the softmax attention mechanism with a binary relay operator parameterised by learned activation and deactivation thresholds, maintaining a stack of local extrema as its internal state. A single-layer PAL-Transformer with O(1) depth is Turing-complete under arbitrary precision arithmetic, achievable through simulation of a two-stack pushdown automaton -- in contrast to the O(log n) depth required by standard hard-attention transformers. Second, we prove that the function classes computable by PAL and by the transformer are incomparable: PAL computes historical range statistics in O(1) layers that require O(log n) layers for transformers, while transformers support random-access retrieval that PAL cannot perform without auxiliary state. The separating property is rate-independence -- PAL responds only to the sequence of local extrema, not to absolute token positions or temporal spacing. Third, we show that the extremum stack constitutes a minimal sufficient statistic of the input history for all rate-independent functionals, providing a formal analogue of the wiping property in classical hysteresis theory. PAL is thus an efficient architecture for tasks with long episodic memory and weak positional dependence, with O(n log n) total inference cost versus O(n^2) for standard attention.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Preisach attention layer maps hysteresis to sequences with a constant-depth Turing completeness claim and a rate-independence separation from transformers, but the two-stack simulation detail is the part that needs checking.

read the letter

The paper replaces softmax with a Preisach-style binary relay operator that keeps a stack of local extrema as state. A single layer is claimed to simulate a two-stack pushdown automaton in constant depth under arbitrary precision, while transformers need logarithmic depth for the same. It also proves the two function classes are incomparable because PAL only responds to the ordered sequence of extrema values and ignores positions or timing.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Preisach Attention Layer (PAL) replacing softmax attention with binary relay operators from the classical Preisach hysteresis model, maintaining a stack of local extrema. It claims a single-layer PAL-Transformer achieves Turing completeness in O(1) depth by simulating a two-stack pushdown automaton (contrasting O(log n) depth for hard-attention transformers), proves the function classes are incomparable due to rate-independence (PAL computes historical range statistics efficiently while transformers support random-access retrieval that PAL cannot), and shows the extremum stack is a minimal sufficient statistic for rate-independent functionals, yielding O(n log n) inference cost for suitable tasks.

Significance. If the results hold, the work establishes a new attention mechanism with formal expressivity separations and efficiency gains for episodic memory tasks with weak positional dependence, plus a direct link to hysteresis theory via the wiping property. Explicit proofs of Turing completeness and the minimal sufficient statistic property would constitute a notable theoretical contribution in the field.

major comments (2)

[Abstract] Abstract: the Turing-completeness claim via two-stack PDA simulation is load-bearing for the central contribution but provides no derivation showing how a single extremum stack plus binary relays (updated only at turning points) can encode and manipulate two independent stacks while strictly obeying rate-independence, which discards absolute positions, step counts, and non-extremal tokens.
[Abstract] Abstract: the claim that the extremum stack is a minimal sufficient statistic for all rate-independent functionals (the formal analogue of the wiping property) is stated without the supporting derivation or error analysis, preventing verification of whether the parameterization preserves the required separation from transformer function classes.

minor comments (1)

Clarify the precise parameterization of the learned activation/deactivation thresholds for the relay operators and how they interact with the stack maintenance rules.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and for highlighting the need for greater clarity on the central theoretical claims. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the Turing-completeness claim via two-stack PDA simulation is load-bearing for the central contribution but provides no derivation showing how a single extremum stack plus binary relays (updated only at turning points) can encode and manipulate two independent stacks while strictly obeying rate-independence, which discards absolute positions, step counts, and non-extremal tokens.

Authors: We agree that the abstract statement is too brief and does not contain the required derivation. The construction is developed in Section 3, where the single extrema stack encodes one PDA stack via the ordered sequence of turning-point values while the second stack is maintained in the pattern of active/inactive relays; all updates occur exclusively at local extrema, thereby preserving rate-independence. To make the argument self-contained at the abstract level, we will insert a concise sketch of this encoding in the revised abstract. revision: yes
Referee: [Abstract] Abstract: the claim that the extremum stack is a minimal sufficient statistic for all rate-independent functionals (the formal analogue of the wiping property) is stated without the supporting derivation or error analysis, preventing verification of whether the parameterization preserves the required separation from transformer function classes.

Authors: We accept that the abstract does not supply the supporting derivation. The proof that the extrema stack is a minimal sufficient statistic, together with the accompanying error analysis establishing exact representation of any rate-independent functional, appears in Theorem 2 and its proof in Section 4. We will add a one-sentence outline of this result to the abstract in the revision so that the separation from transformer function classes can be verified directly from the abstract. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external mathematical properties of hysteresis and automata

full rationale

The derivation chain consists of proofs that a single-layer PAL simulates a two-stack PDA (hence Turing-complete) and that the extremum stack is a minimal sufficient statistic for rate-independent functionals. These rest on classical Preisach operator properties (rate-independence, wiping property) and standard automata theory rather than any self-definition, fitted-parameter renaming, or self-citation load-bearing step. No equation or claim in the abstract reduces a result to an input defined by the model itself; the separating property and incomparability results are stated as independent consequences of the operator's definition. The paper is therefore self-contained against external benchmarks with no reduction by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on the classical mathematical definition of the Preisach operator and the assumption that its rate-independent properties transfer directly to the neural layer without additional constraints.

axioms (2)

standard math The Preisach hysteresis operator is defined by a collection of binary relays with activation and deactivation thresholds and maintains a stack of local extrema as its state.
Invoked as the grounding for the PAL construction in the abstract.
domain assumption Rate-independence is the separating property between PAL and standard transformers.
Used to establish the incomparability of the two function classes.

invented entities (1)

Preisach Attention Layer (PAL) no independent evidence
purpose: Sequence modeling architecture that replaces softmax attention with hysteretic relays and an extrema stack.
Newly introduced in the paper; no independent evidence outside this work is provided.

pith-pipeline@v0.9.0 · 5766 in / 1493 out tokens · 25554 ms · 2026-05-25T04:40:09.001126+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 6 internal anchors

[1]

Titans: Learning to Memorize at Test Time

doi: 10.1016/j.est.2015.06.001. 18 Preprint Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. InarXiv preprint arXiv:2501.00663,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.est.2015.06.001 2015
[2]

Titans: Learning to Memorize at Test Time

URLhttps://arxiv.org/abs/2501.00663. Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer: The long-document transformer. InarXiv preprint arXiv:2004.05150,

work page internal anchor Pith review Pith/arXiv arXiv 2004
[3]

Longformer: The Long-Document Transformer

URLhttps://arxiv.org/abs/2004.05150. Martin Brokate and Jürgen Sprekels.Hysteresis and Phase Transitions, volume 121 ofApplied Mathematical Sciences. Springer, New York,

work page internal anchor Pith review Pith/arXiv arXiv 2004
[4]

George Cybenko

doi: 10.1007/978-1-4612-4048-8. George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314,

work page doi:10.1007/978-1-4612-4048-8
[5]

Karin Dahmen and James P

doi: 10.1007/BF02551274. Karin Dahmen and James P. Sethna. Hysteresis loop critical exponents in6−εdimensions.Physical Review Letters, 71(20):3222–3225,

work page doi:10.1007/bf02551274
[6]

Piotr Frydrych.Modelowanie charakterystyk magnesowania amorficznych rdzeni dwuosiowych sensorów transduktorowych

doi: 10.1103/PhysRevLett.71.3222. Piotr Frydrych.Modelowanie charakterystyk magnesowania amorficznych rdzeni dwuosiowych sensorów transduktorowych. PhD thesis, Politechnika Warszawska, Wydział Mechatroniki, Warsaw, Poland,

work page doi:10.1103/physrevlett.71.3222
[7]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

doi: 10.1007/BF01744431. Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InarXiv preprint arXiv:2312.00752,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/bf01744431
[8]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

URLhttps://arxiv.org/abs/2312.00752. Michael Hahn. Theoretical limitations of self-attention in neural sequence models.Transactions of the Association for Computational Linguistics, 8:156–171,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Johan Håstad.Computational Limitations of Small-Depth Circuits

doi: 10.1162/tacl_a_00306. Johan Håstad.Computational Limitations of Small-Depth Circuits. MIT Press, Cambridge, MA,

work page doi:10.1162/tacl_a_00306
[10]

Christopher G

doi: 10.1073/pnas.79.8.2554. Christopher G. Langton. Computation at the edge of chaos: Phase transitions and emergent computation. Physica D: Nonlinear Phenomena, 42(1–3):12–37,

work page doi:10.1073/pnas.79.8.2554
[11]

Andrew Lucas

doi: 10.1016/0167-2789(90)90064-V. Andrew Lucas. Ising formulations of many NP problems.Frontiers in Physics, 2:5,

work page doi:10.1016/0167-2789(90)90064-v
[12]

2014.00005

doi: 10.3389/fphy. 2014.00005. Isaak D. Mayergoyz.Mathematical Models of Hysteresis. Springer, New York,

work page doi:10.3389/fphy 2014
[13]

Marc Mézard and Andrea Montanari.Information, Physics, and Computation

doi: 10.1162/tacl_a_ 00562. Marc Mézard and Andrea Montanari.Information, Physics, and Computation. Oxford University Press, Oxford,

work page doi:10.1162/tacl_a_
[14]

Oxford Univer- sity Press (2018).https://doi.org/10.1093/oso/9780198814788.001.0001

doi: 10.1093/acprof:oso/9780198570837.001.0001. Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, et al. RWKV: Reinventing RNNs for the transformer era. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 14048–14077,

work page doi:10.1093/acprof:oso/9780198570837.001.0001 2023
[15]

Jorge Pérez, Javier Marinkovi ´c, and Pablo Barceló

doi: 10.18653/v1/2023.findings-emnlp.936. Jorge Pérez, Javier Marinkovi ´c, and Pablo Barceló. Attention is Turing complete.Journal of Machine Learning Research, 22(75):1–35,

work page doi:10.18653/v1/2023.findings-emnlp.936 2023
[16]

doi: 10.1007/BF01349418. James P. Sethna, Karin A. Dahmen, and Olga Perkovic. Random-field ising models of hysteresis.The Science of Hysteresis, 2:107–179,

work page doi:10.1007/bf01349418
[17]

Random-Field Ising Models of Hysteresis

URLhttps://arxiv.org/abs/cond-mat/0406320. Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. InInterna- tional Conference on Learning Representations,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin

doi: 10.1006/jcss.1995.1013. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume

work page doi:10.1006/jcss.1995.1013 1995
[19]

Lemma A.5(Boolean combinations via MLP).Letϕ 1, ϕ2 be two EFO formulae computable by PAL heads producing outputsr 1, r2 ∈ {0,1}

In the limitε→0, this detectsu i ≥cat each extremal step. Lemma A.5(Boolean combinations via MLP).Letϕ 1, ϕ2 be two EFO formulae computable by PAL heads producing outputsr 1, r2 ∈ {0,1}. Thenϕ 1 ∧ϕ 2,ϕ 1 ∨ϕ 2, and¬ϕ 1 are computable by a two-layer MLP applied to(r 1, r2). Proof.Boolean functions on{0,1} 2 are representable by two-layer networks with ReLU ...

work page 1989
[20]

These are exact (not approximate) representations on{0,1}. Lemma A.6(Extremum aggregation).The formulaExtAgg i[f(x i), ϕ(i)]is computable by a single MPAL head with measure: µ(α,β) =f(Dec(α, β))·1[ϕholds at(α, β)],(28) whereDec(α, β)recovers the valuex i encoded at thresholds(α, β). Proof.The MPAL output for a single head with measureµis: PALµ(u0:n) = X i...

work page 2020
[21]

Sinceu n+1 < M top, the wiping conditionu n+1 > M top is not satisfied

+ε < M top (sinceM top = code ∗(aj, d) +εfor somejand theεgap is smaller than the inter-level gap∆>0, providedε <∆/2). Sinceu n+1 < M top, the wiping conditionu n+1 > M top is not satisfied. A new pair is added without removing any existing pairs. B.2 MLP Width for Transition Function Lemma B.3(MLP width for finite transition functions).For a 2-PDA with|Q...

work page 1989

[1] [1]

Titans: Learning to Memorize at Test Time

doi: 10.1016/j.est.2015.06.001. 18 Preprint Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. InarXiv preprint arXiv:2501.00663,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.est.2015.06.001 2015

[2] [2]

Titans: Learning to Memorize at Test Time

URLhttps://arxiv.org/abs/2501.00663. Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer: The long-document transformer. InarXiv preprint arXiv:2004.05150,

work page internal anchor Pith review Pith/arXiv arXiv 2004

[3] [3]

Longformer: The Long-Document Transformer

URLhttps://arxiv.org/abs/2004.05150. Martin Brokate and Jürgen Sprekels.Hysteresis and Phase Transitions, volume 121 ofApplied Mathematical Sciences. Springer, New York,

work page internal anchor Pith review Pith/arXiv arXiv 2004

[4] [4]

George Cybenko

doi: 10.1007/978-1-4612-4048-8. George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314,

work page doi:10.1007/978-1-4612-4048-8

[5] [5]

Karin Dahmen and James P

doi: 10.1007/BF02551274. Karin Dahmen and James P. Sethna. Hysteresis loop critical exponents in6−εdimensions.Physical Review Letters, 71(20):3222–3225,

work page doi:10.1007/bf02551274

[6] [6]

Piotr Frydrych.Modelowanie charakterystyk magnesowania amorficznych rdzeni dwuosiowych sensorów transduktorowych

doi: 10.1103/PhysRevLett.71.3222. Piotr Frydrych.Modelowanie charakterystyk magnesowania amorficznych rdzeni dwuosiowych sensorów transduktorowych. PhD thesis, Politechnika Warszawska, Wydział Mechatroniki, Warsaw, Poland,

work page doi:10.1103/physrevlett.71.3222

[7] [7]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

doi: 10.1007/BF01744431. Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. InarXiv preprint arXiv:2312.00752,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/bf01744431

[8] [8]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

URLhttps://arxiv.org/abs/2312.00752. Michael Hahn. Theoretical limitations of self-attention in neural sequence models.Transactions of the Association for Computational Linguistics, 8:156–171,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Johan Håstad.Computational Limitations of Small-Depth Circuits

doi: 10.1162/tacl_a_00306. Johan Håstad.Computational Limitations of Small-Depth Circuits. MIT Press, Cambridge, MA,

work page doi:10.1162/tacl_a_00306

[10] [10]

Christopher G

doi: 10.1073/pnas.79.8.2554. Christopher G. Langton. Computation at the edge of chaos: Phase transitions and emergent computation. Physica D: Nonlinear Phenomena, 42(1–3):12–37,

work page doi:10.1073/pnas.79.8.2554

[11] [11]

Andrew Lucas

doi: 10.1016/0167-2789(90)90064-V. Andrew Lucas. Ising formulations of many NP problems.Frontiers in Physics, 2:5,

work page doi:10.1016/0167-2789(90)90064-v

[12] [12]

2014.00005

doi: 10.3389/fphy. 2014.00005. Isaak D. Mayergoyz.Mathematical Models of Hysteresis. Springer, New York,

work page doi:10.3389/fphy 2014

[13] [13]

Marc Mézard and Andrea Montanari.Information, Physics, and Computation

doi: 10.1162/tacl_a_ 00562. Marc Mézard and Andrea Montanari.Information, Physics, and Computation. Oxford University Press, Oxford,

work page doi:10.1162/tacl_a_

[14] [14]

Oxford Univer- sity Press (2018).https://doi.org/10.1093/oso/9780198814788.001.0001

doi: 10.1093/acprof:oso/9780198570837.001.0001. Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, et al. RWKV: Reinventing RNNs for the transformer era. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 14048–14077,

work page doi:10.1093/acprof:oso/9780198570837.001.0001 2023

[15] [15]

Jorge Pérez, Javier Marinkovi ´c, and Pablo Barceló

doi: 10.18653/v1/2023.findings-emnlp.936. Jorge Pérez, Javier Marinkovi ´c, and Pablo Barceló. Attention is Turing complete.Journal of Machine Learning Research, 22(75):1–35,

work page doi:10.18653/v1/2023.findings-emnlp.936 2023

[16] [16]

doi: 10.1007/BF01349418. James P. Sethna, Karin A. Dahmen, and Olga Perkovic. Random-field ising models of hysteresis.The Science of Hysteresis, 2:107–179,

work page doi:10.1007/bf01349418

[17] [17]

Random-Field Ising Models of Hysteresis

URLhttps://arxiv.org/abs/cond-mat/0406320. Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. InInterna- tional Conference on Learning Representations,

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin

doi: 10.1006/jcss.1995.1013. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems, volume

work page doi:10.1006/jcss.1995.1013 1995

[19] [19]

Lemma A.5(Boolean combinations via MLP).Letϕ 1, ϕ2 be two EFO formulae computable by PAL heads producing outputsr 1, r2 ∈ {0,1}

In the limitε→0, this detectsu i ≥cat each extremal step. Lemma A.5(Boolean combinations via MLP).Letϕ 1, ϕ2 be two EFO formulae computable by PAL heads producing outputsr 1, r2 ∈ {0,1}. Thenϕ 1 ∧ϕ 2,ϕ 1 ∨ϕ 2, and¬ϕ 1 are computable by a two-layer MLP applied to(r 1, r2). Proof.Boolean functions on{0,1} 2 are representable by two-layer networks with ReLU ...

work page 1989

[20] [20]

These are exact (not approximate) representations on{0,1}. Lemma A.6(Extremum aggregation).The formulaExtAgg i[f(x i), ϕ(i)]is computable by a single MPAL head with measure: µ(α,β) =f(Dec(α, β))·1[ϕholds at(α, β)],(28) whereDec(α, β)recovers the valuex i encoded at thresholds(α, β). Proof.The MPAL output for a single head with measureµis: PALµ(u0:n) = X i...

work page 2020

[21] [21]

Sinceu n+1 < M top, the wiping conditionu n+1 > M top is not satisfied

+ε < M top (sinceM top = code ∗(aj, d) +εfor somejand theεgap is smaller than the inter-level gap∆>0, providedε <∆/2). Sinceu n+1 < M top, the wiping conditionu n+1 > M top is not satisfied. A new pair is added without removing any existing pairs. B.2 MLP Width for Transition Function Lemma B.3(MLP width for finite transition functions).For a 2-PDA with|Q...

work page 1989