Dendritic In-Context Learning in a Single-Layer Spiking Neural Network

Changwen Chen; Juwei Shen; Yujie Wu

arxiv: 2607.02283 · v1 · pith:IR3NKMLKnew · submitted 2026-07-02 · 💻 cs.NE · cs.LG

Dendritic In-Context Learning in a Single-Layer Spiking Neural Network

Juwei Shen , Yujie Wu , Changwen Chen This is my paper

Pith reviewed 2026-07-03 02:48 UTC · model grok-4.3

classification 💻 cs.NE cs.LG

keywords in-context learningspiking neural networksdendritic computationonline learningWidrow-Hoff LMSsingle-layer architecturecompartmental models

0 comments

The pith

The subthreshold dynamics of a single dendritic compartment embed a complete online learning algorithm, enabling in-context learning in a single-layer spiking neural network.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that by treating the dendritic compartment as an active computational unit with specific subthreshold recurrence, rather than a passive conduit, a single-layer SNN can perform general-purpose in-context learning. This is because the apical membrane dynamics match the leaky online Widrow-Hoff LMS update rule exactly. If true, this means ICL does not require attention mechanisms, multiple layers, or separate plasticity rules at inference time. A linear probe on the apical potentials recovers the reference LMS trajectory at high accuracy. The model also shows stability at task dimensions where transformers fail.

Core claim

By modeling the dendritic compartment's apical recurrence as structurally identical to leaky online Widrow-Hoff LMS, the architecture implements the learning algorithm directly in the dynamics. This allows the single-layer compartmental spiking network to succeed on the Garg-2022 ICL benchmark at super-dimensional tasks, where it is seed-stable unlike dense transformers, and the algorithm is shown to be embedded rather than learned.

What carries the argument

The apical recurrence in the single dendritic compartment, which is structurally identical to the leaky online Widrow-Hoff LMS update.

If this is right

ICL can be achieved without multi-layer architectures or attention.
No inference-time synaptic plasticity is needed for adaptation.
The model remains stable at high task dimensions where other models exhibit instability.
The learning trajectory can be directly recovered from the membrane potentials.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Neuromorphic hardware could implement ICL with minimal resources by exploiting dendritic dynamics.
Biological neurons might perform similar computations in their dendrites for rapid adaptation.
Extensions could test if other learning rules can be embedded in similar compartment models.

Load-bearing premise

The apical recurrence in the model is exactly the same mathematical update as leaky online Widrow-Hoff LMS, embedding the algorithm by construction.

What would settle it

Observing that a linear probe on apical membrane potentials does not recover the online LMS weight trajectory at R^2 = 0.93, or the network failing to solve the Garg-2022 benchmark at high dimensions.

Figures

Figures reproduced from arXiv: 2607.02283 by Changwen Chen, Juwei Shen, Yujie Wu.

**Figure 1.** Figure 1: DendriCL overview. (A) Biological layer-5 cortical pyramidal neuron: apical tuft receives top-down feedback, basal dendrites receive bottom-up input, and the soma integrates both. (B) DendriCL computational model — a single compartmental layer implementing the apical-basal-soma architecture with a learned online-LMS update in the apical membrane potential uA. Synaptic weights WA, WB, Wout are frozen at inf… view at source ↗

**Figure 2.** Figure 2: DendriCL architecture and contrast with prior compartmental models. (A) Singlelayer data flow: input tokens [xi ; yi ; flagi ] feed both the basal projection uB = WBxt and the apical online-LMS recurrence. The LIF soma integrates vsoma = gBuB +gAWoutuA and spikes at threshold; a readout is active only at the query position. (B) Contrast with four prior compartmental-neuron interpretations: Urbanczik-Senn … view at source ↗

**Figure 3.** Figure 3: Mechanistic verification: trained DendriCL apical membrane runs online LMS. (a) Linear-probe R2 of the apical trajectory into the reference online-LMS estimate (solid, burgundy) and into the true task parameter (dashed, teal), across d. The apical ≡ LMS correspondence R2 grows from 0.84 at d = 5 to 0.97 at d = 30; decoding of the true parameter tracks task-level eval R2 (≈ 0.84), i.e., the ICL capacity of … view at source ↗

**Figure 4.** Figure 4: Compute-matched 50k-step super-d comparison (3 seeds, DendriCL d=30 with n=6). Visualization of [PITH_FULL_IMAGE:figures/full_fig_p024_4.png] view at source ↗

**Figure 5.** Figure 5: Task-dimensionality sweep at both compute budgets. (a) 10k-step baseline: Transformer and Pure LIF fail at d ≥ 25, Spikformer degrades smoothly, DendriCL alone retains R2 ≥ 0.5 throughout. (b) 50k-step compute-matched: Transformer now trains cleanly at d = 25 (0.99) and exhibits bimodal/grokking behaviour at d = 30, but fails uniformly for d ≥ 40. DendriCL’s curve is nearly flat from d = 20 to d = 40 and d… view at source ↗

**Figure 6.** Figure 6: Dense attention exhibits grokking at super-d ICL; compartmental architecture does not. Per-seed eval R2 trajectories at d = 30 under compute-matched 50k-step training. Left: Transformer — seeds 0, 1 remain at the chance floor for all 50k steps; seed 2 undergoes a phase transition at step ∼6000 and converges to R2 = 0.993; seed 3 breaks through late (step ∼34000) and is still rising. This is the canonical g… view at source ↗

**Figure 7.** Figure 7: DendriCL context-length scaling at d = 10 follows classical LMS convergence. Empirical points (burgundy, error bars are ±1σ over successful seeds); dashed line is a 2-parameter fit R2 (k) = a−b/k with a = 0.97, b = 3.28. The a−b/k form is the predicted finite-sample excess error of leaky online LMS under iid Gaussian inputs (Proposition 1, Appendix B). The empirical curve matches this form without any per-… view at source ↗

**Figure 8.** Figure 8: Contrasting ICL mechanisms. Top: Transformer ICL per von Oswald et al. 2023 — multiple attention layers collectively approximate gradient-descent steps on the attention-weight surface; the learning algorithm is implicit. Bottom: DendriCL ICL — a single compartmental unit runs an explicit online LMS recurrence in the apical membrane potential; the learning algorithm is structurally embedded in the architect… view at source ↗

read the original abstract

In-context learning (ICL) operates via implicit gradient descent embedded in the forward pass of modern AI architectures -- Transformers, Mamba, state-space models, and MLPs. Capturing this capability in biologically plausible Spiking Neural Networks (SNNs) has remained an open challenge: existing SNNs fail the Garg-2022 benchmark at non-trivial task dimensions. We trace this failure to a structural assumption: prior SNN designs route adaptation through inference-time synaptic plasticity, viewing the dendritic compartment as a passive conduit for error or teacher signals. We challenge this assumption. The subthreshold dynamics of a single dendritic compartment already implement a complete online learning algorithm. By treating the compartment as the computational substrate rather than a passive conduit, we propose DendriCL -- a single-layer compartmental spiking architecture whose apical recurrence is structurally identical to leaky online Widrow-Hoff LMS. This dynamics-only update collapses the architectural depth required for general-purpose ICL to a single layer. DendriCL is uniquely seed-stable at super-dimensional Garg-2022 ICL -- where dense Transformers exhibit grokking-style instability and fail past moderate task dimension -- and a linear probe recovers the reference online-LMS trajectory directly from the apical membrane at R^2 = 0.93, showing the algorithm is structurally embedded in the dynamics rather than implicitly discovered during training. Taken together, ICL requires neither attention, depth, nor inference-time plasticity: a single compartment with online-LMS dynamics is sufficient.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The core claim is that a single dendritic compartment's subthreshold dynamics embed leaky LMS exactly, enabling stable single-layer ICL on hard Garg-2022 tasks, but this rests on an asserted structural identity that the abstract supports only with R²=0.93 correlation.

read the letter

The paper's main result is that a single-layer compartmental SNN called DendriCL achieves seed-stable in-context learning on super-dimensional Garg-2022 tasks by embedding an online LMS rule directly in apical membrane recurrence. This is new relative to prior SNN ICL work that used inference-time plasticity or deeper routing. The architecture is simple by design: one layer, no attention, no separate plasticity mechanism. The linear probe recovering the LMS trajectory at R²=0.93 and the reported stability where dense transformers fail are concrete points in its favor.

The soft spot is the central assertion that the apical recurrence is structurally identical to leaky Widrow-Hoff LMS. The abstract states the identity and gives the probe correlation, but does not show term-by-term equation matching or derive the update from the compartment ODE. If the recurrence was chosen or scaled to produce LMS-like behavior, the "embedded by construction" and "dynamics-only" claims become weaker; the high R² is then expected rather than diagnostic. Dataset details, error bars, and exclusion criteria for the probe are also missing from the abstract.

The work is aimed at researchers building efficient or biologically motivated ICL models. A reader already working on SNNs or compartmental dynamics would get the most out of the equations and stability numbers once they are fully shown. The paper deserves peer review because the result, if the identity holds exactly, would shift the design space for shallow SNNs; the current evidence is suggestive but not yet conclusive on the load-bearing step.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes DendriCL, a single-layer compartmental spiking neural network. It claims that the subthreshold dynamics of a single dendritic compartment implement a complete online learning algorithm whose apical recurrence is structurally identical to leaky online Widrow-Hoff LMS. This dynamics-only update enables general-purpose in-context learning on the Garg-2022 benchmark at super-dimensional tasks where dense Transformers exhibit instability, with reported seed-stability and a linear probe recovering the reference LMS trajectory from the apical membrane at R²=0.93, indicating the algorithm is structurally embedded rather than discovered during training.

Significance. If the apical recurrence is shown to be algebraically identical to the leaky LMS rule by construction, the result would demonstrate that ICL can be realized in a single-layer biologically plausible SNN without attention, depth, or inference-time plasticity. The seed-stability at high task dimensions and the probe-based recovery would then constitute a meaningful reduction in architectural requirements for ICL. However, the current evidence rests on correlation rather than explicit identity, limiting the strength of the central claim.

major comments (3)

[Abstract] Abstract: The assertion that the apical recurrence 'is structurally identical to leaky online Widrow-Hoff LMS' lacks any term-by-term equation comparison (e.g., matching leak factor, error-driven update term, and scaling). The sole supporting evidence is a linear probe with R²=0.93, which shows approximate trajectory recovery but does not establish exact algebraic identity or embedding by construction.
[Abstract] Abstract: If the apical recurrence is constructed to be identical to the LMS update, the probe recovery should be exact (R²=1.0 within numerical precision) rather than 0.93; the reported value indicates residual differences, additional compartmental coupling terms, or implicit fitting that contradict the 'structurally identical' and 'dynamics-only' claims.
[Abstract] Abstract: No derivation steps, error bars, dataset details, exclusion criteria, or probe methodology are provided for the R²=0.93 result. This omission prevents verification that the apical membrane directly implements the LMS rule without training-induced adjustments.

minor comments (1)

[Abstract] The Garg-2022 benchmark should be briefly described or cited on first use to aid readers outside the ICL literature.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments correctly identify that the manuscript's central claim of structural identity would be strengthened by explicit algebraic derivation and additional methodological details. We address each point below.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that the apical recurrence 'is structurally identical to leaky online Widrow-Hoff LMS' lacks any term-by-term equation comparison (e.g., matching leak factor, error-driven update term, and scaling). The sole supporting evidence is a linear probe with R²=0.93, which shows approximate trajectory recovery but does not establish exact algebraic identity or embedding by construction.

Authors: We agree that the manuscript would benefit from an explicit term-by-term mapping. The apical recurrence is obtained directly from the subthreshold dynamics of the compartmental model and matches the leaky LMS form (leak factor, error-driven term, and scaling) by algebraic construction. We will add the derivation and side-by-side equation comparison in the revised manuscript. revision: yes
Referee: [Abstract] Abstract: If the apical recurrence is constructed to be identical to the LMS update, the probe recovery should be exact (R²=1.0 within numerical precision) rather than 0.93; the reported value indicates residual differences, additional compartmental coupling terms, or implicit fitting that contradict the 'structurally identical' and 'dynamics-only' claims.

Authors: The R²=0.93 is measured on the apical membrane voltage of the full spiking network, where discrete spike events and weak inter-compartment coupling produce small deviations from the ideal continuous subthreshold trajectory. The recurrence itself remains identical by construction in the subthreshold equations; the probe result quantifies how faithfully this identity is preserved under spiking. We will clarify this distinction and report the exact match obtained when the model is restricted to the linear subthreshold regime. revision: partial
Referee: [Abstract] Abstract: No derivation steps, error bars, dataset details, exclusion criteria, or probe methodology are provided for the R²=0.93 result. This omission prevents verification that the apical membrane directly implements the LMS rule without training-induced adjustments.

Authors: We acknowledge these omissions. The revision will include: (i) the full step-by-step derivation from compartmental current-balance equations to the LMS recurrence, (ii) the linear-probe procedure (ordinary least-squares regression of apical voltage onto the reference LMS state vector), (iii) Garg-2022 task dimensions and seed counts, (iv) any exclusion criteria, and (v) error bars or confidence intervals on R² across runs. These additions will confirm that the match is present by construction rather than learned. revision: yes

Circularity Check

1 steps flagged

Apical recurrence asserted as identical to LMS makes trajectory recovery tautological by construction

specific steps

self definitional [abstract]
"whose apical recurrence is structurally identical to leaky online Widrow-Hoff LMS. This dynamics-only update collapses the architectural depth required for general-purpose ICL to a single layer. ... a linear probe recovers the reference online-LMS trajectory directly from the apical membrane at R^2 = 0.93, showing the algorithm is structurally embedded in the dynamics rather than implicitly discovered during training."

The paper defines the apical recurrence to be identical to the LMS update rule, then treats the subsequent recovery of the LMS trajectory as evidence that the algorithm is embedded by the dynamics. Because the match is imposed by construction in the model definition, the R^2 correlation and the claim of 'structurally embedded' are not independent discoveries but direct consequences of the chosen recurrence.

full rationale

The central claim rests on defining the model's apical recurrence to be structurally identical to leaky online Widrow-Hoff LMS, then reporting that a probe recovers the LMS trajectory at R^2=0.93 as evidence the algorithm is 'structurally embedded in the dynamics rather than implicitly discovered during training.' This reduces the 'dynamics-only' and 'embedded by construction' assertions to a definitional choice rather than an independent derivation. No external uniqueness theorem or prior self-citation is invoked; the reduction is internal to the model specification itself. The R^2 result is therefore expected once the recurrence is set to match the target rule.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The model rests on standard SNN membrane equations plus the modeling choice that apical recurrence exactly implements LMS; no new physical entities are postulated, but the architectural mapping itself is the novel element.

free parameters (1)

apical time constant or learning-rate scaling
Parameters that set the exact match between compartment dynamics and LMS update must be chosen or fitted to achieve the reported R²=0.93.

axioms (1)

domain assumption Subthreshold dendritic voltage follows a linear leaky integrator whose recurrence can be set identical to the LMS update rule
Invoked when the paper states the apical recurrence 'is structurally identical to leaky online Widrow-Hoff LMS'.

pith-pipeline@v0.9.1-grok · 5801 in / 1408 out tokens · 30183 ms · 2026-07-03T02:48:27.000739+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Grazzi, J

10 R. Grazzi, J. Siems, S. Schrodi, T. Brox, and F. Hutter. Is Mamba capable of in-context learning?arXiv preprint arXiv:2402.03170,

work page arXiv
[2]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

A. Power, Y . Burda, H. Edwards, I. Babuschkin, and V . Misra. Grokking: generalization beyond overfitting on small algorithmic datasets.arXiv preprint arXiv:2201.02177,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Tier 1 — Anatomically established.The apical-basal-soma three-compartment architecture corre- sponds to cortical layer-5 pyramidal neurons [Larkum, 2013, Major et al., 2013]

A Biological Grounding and Distinction from Prior Compartmental Models A.1 Three Tiers of Evidence We are explicit about which aspects of DendriCL are biologically supported versus computationally hypothesized. Tier 1 — Anatomically established.The apical-basal-soma three-compartment architecture corre- sponds to cortical layer-5 pyramidal neurons [Larkum...

2013
[4]

show single human L2/3 pyrami- dal neurons can compute XOR-level nonlinear functions, establishing single pyramidal neurons as computationally substantial units. Tier 2 — Qualitatively plausible.Our functional assignment — apical receives prediction-error- like signals — is consistent with predictive coding [Rao and Ballard, 1999, Bastos et al., 2012, Kel...

1999
[5]

=O(σ 2), and after k steps the expected squared error is O(d/k) up to the noise floor. This matches the classical LMS rate [Widrow and Hoff, 1960, Sayed, 2003].■ Remark on the LIF nonlinearity.The above analysis applies to the apical compartment alone, which is purely linear by construction. The somatic LIF (threshold-reset nonlinearity) affects the reado...

1960
[6]

•Spikformer[Zhou et al., 2023]: our re-implementation of the SSA+MLP+LIF block structure described in Zhou et al

No pretraining; trained from scratch on each task family. •Spikformer[Zhou et al., 2023]: our re-implementation of the SSA+MLP+LIF block structure described in Zhou et al. 2023 ICLR. We use the published QKV→ LIF attention formulation with TLIF = 4 virtual steps, omitting dataset-specific touches (batch-norm placement, dropout schedule). Not a direct port...

2023
[7]

efficient

throughout training; the training loss exhibits sporadic large spikes consistent with LMS’s finite-sample blowup when 16 Table 7: Width ablation reveals asharp upper cliff: dapical ≤384 all work; dapical ≥512 catastrophi- cally fails. The failure mode is training divergence (loss never decreases), consistent with the learned LMS step sizeγbecoming unstabl...

1920
[8]

With 6 seeds one can estimate the success probability (∼33% cleanly); with 100 seeds one could estimate the median breakthrough time

— the Transformer must discover, via gradient descent, a specific attention-weight configuration that implements the online-learning algorithm; this discovery is a discrete phase transition with an unpredictable onset time. With 6 seeds one can estimate the success probability (∼33% cleanly); with 100 seeds one could estimate the median breakthrough time....

work page arXiv 2019
[9]

192h=44τ=4,θ=1∼1.0M Pure LIF standard LIF [Gerstner and Kistler, 2002] 320 — 4τ=4,θ=10.64M LSNN Bellec et al

2002
[10]

Dendritic Universal Approximation

192 head 64,h=44 —∼1.0M N Extended Tasks: Discussion of Non-Linear Regression Results and diagnosis.On the 2-layer ReLU NN regression task ( d=20, k=40), DendriCL (L=2) reaches R2 = 0.568±0.015 , comparable to Spikformer (0.582) and above Pure LIF (0.322). On binary classification ( d=10, k=20), DendriCL ( L=2) reaches 0.805±0.008 , closely matching Spikf...

2008
[11]

• Initialization

throughout training.g A, gB ∈Runconstrained. • Initialization. uA(0) =0 , WA, WB ∼ N(0,1/ √ d) Kaiming-style, WA,out ∼ N(0,1/ p dapical).˜α= 2.2(soα≈0.9at start),˜γ= 0(soγ≈0.1at start). • Post-LIF block. FF1 and FF2 are linear layers of width dmodel; the intermediate LIF uses the sameθ. This block adds capacity for nonlinear feature mixing on top of the a...

2022
[12]

Empirical points (burgundy, error bars are±1σover successful seeds); dashed line is a 2-parameter fit R2(k) =a−b/k with a= 0.97 , b= 3.28

Figure 7:DendriCL context-length scaling at d= 10 follows classical LMS convergence. Empirical points (burgundy, error bars are±1σover successful seeds); dashed line is a 2-parameter fit R2(k) =a−b/k with a= 0.97 , b= 3.28 . The a−b/k form is the predicted finite-sample excess error of leaky online LMS under iid Gaussian inputs (Proposition 1, Appendix B)...

2023
[13]

and biological compartmental neuroscience (Larkum 2013). 26

2013

[1] [1]

Grazzi, J

10 R. Grazzi, J. Siems, S. Schrodi, T. Brox, and F. Hutter. Is Mamba capable of in-context learning?arXiv preprint arXiv:2402.03170,

work page arXiv

[2] [2]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

A. Power, Y . Burda, H. Edwards, I. Babuschkin, and V . Misra. Grokking: generalization beyond overfitting on small algorithmic datasets.arXiv preprint arXiv:2201.02177,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Tier 1 — Anatomically established.The apical-basal-soma three-compartment architecture corre- sponds to cortical layer-5 pyramidal neurons [Larkum, 2013, Major et al., 2013]

A Biological Grounding and Distinction from Prior Compartmental Models A.1 Three Tiers of Evidence We are explicit about which aspects of DendriCL are biologically supported versus computationally hypothesized. Tier 1 — Anatomically established.The apical-basal-soma three-compartment architecture corre- sponds to cortical layer-5 pyramidal neurons [Larkum...

2013

[4] [4]

show single human L2/3 pyrami- dal neurons can compute XOR-level nonlinear functions, establishing single pyramidal neurons as computationally substantial units. Tier 2 — Qualitatively plausible.Our functional assignment — apical receives prediction-error- like signals — is consistent with predictive coding [Rao and Ballard, 1999, Bastos et al., 2012, Kel...

1999

[5] [5]

=O(σ 2), and after k steps the expected squared error is O(d/k) up to the noise floor. This matches the classical LMS rate [Widrow and Hoff, 1960, Sayed, 2003].■ Remark on the LIF nonlinearity.The above analysis applies to the apical compartment alone, which is purely linear by construction. The somatic LIF (threshold-reset nonlinearity) affects the reado...

1960

[6] [6]

•Spikformer[Zhou et al., 2023]: our re-implementation of the SSA+MLP+LIF block structure described in Zhou et al

No pretraining; trained from scratch on each task family. •Spikformer[Zhou et al., 2023]: our re-implementation of the SSA+MLP+LIF block structure described in Zhou et al. 2023 ICLR. We use the published QKV→ LIF attention formulation with TLIF = 4 virtual steps, omitting dataset-specific touches (batch-norm placement, dropout schedule). Not a direct port...

2023

[7] [7]

efficient

throughout training; the training loss exhibits sporadic large spikes consistent with LMS’s finite-sample blowup when 16 Table 7: Width ablation reveals asharp upper cliff: dapical ≤384 all work; dapical ≥512 catastrophi- cally fails. The failure mode is training divergence (loss never decreases), consistent with the learned LMS step sizeγbecoming unstabl...

1920

[8] [8]

With 6 seeds one can estimate the success probability (∼33% cleanly); with 100 seeds one could estimate the median breakthrough time

— the Transformer must discover, via gradient descent, a specific attention-weight configuration that implements the online-learning algorithm; this discovery is a discrete phase transition with an unpredictable onset time. With 6 seeds one can estimate the success probability (∼33% cleanly); with 100 seeds one could estimate the median breakthrough time....

work page arXiv 2019

[9] [9]

192h=44τ=4,θ=1∼1.0M Pure LIF standard LIF [Gerstner and Kistler, 2002] 320 — 4τ=4,θ=10.64M LSNN Bellec et al

2002

[10] [10]

Dendritic Universal Approximation

192 head 64,h=44 —∼1.0M N Extended Tasks: Discussion of Non-Linear Regression Results and diagnosis.On the 2-layer ReLU NN regression task ( d=20, k=40), DendriCL (L=2) reaches R2 = 0.568±0.015 , comparable to Spikformer (0.582) and above Pure LIF (0.322). On binary classification ( d=10, k=20), DendriCL ( L=2) reaches 0.805±0.008 , closely matching Spikf...

2008

[11] [11]

• Initialization

throughout training.g A, gB ∈Runconstrained. • Initialization. uA(0) =0 , WA, WB ∼ N(0,1/ √ d) Kaiming-style, WA,out ∼ N(0,1/ p dapical).˜α= 2.2(soα≈0.9at start),˜γ= 0(soγ≈0.1at start). • Post-LIF block. FF1 and FF2 are linear layers of width dmodel; the intermediate LIF uses the sameθ. This block adds capacity for nonlinear feature mixing on top of the a...

2022

[12] [12]

Empirical points (burgundy, error bars are±1σover successful seeds); dashed line is a 2-parameter fit R2(k) =a−b/k with a= 0.97 , b= 3.28

Figure 7:DendriCL context-length scaling at d= 10 follows classical LMS convergence. Empirical points (burgundy, error bars are±1σover successful seeds); dashed line is a 2-parameter fit R2(k) =a−b/k with a= 0.97 , b= 3.28 . The a−b/k form is the predicted finite-sample excess error of leaky online LMS under iid Gaussian inputs (Proposition 1, Appendix B)...

2023

[13] [13]

and biological compartmental neuroscience (Larkum 2013). 26

2013