arxiv: 2604.15391 · v1 · submitted 2026-04-16 · 🧬 q-bio.QM

Recognition: unknown

Dual-Timescale Memory in a Spiking Neuron-Astrocyte Network for Efficient Navigation

Yuliya Tsybina , Evgenia Antonova , Sergey Shchanikov , Vsevolod Kulagin , Alexey Mikhaylov , Victor Kazantsev , Vyacheslav Demin , Susanna Gordleeva

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:27 UTC · model grok-4.3

classification 🧬 q-bio.QM

keywords spiking neuron-astrocyte networkdual-timescale memorynavigationpartial observabilitySTDPneuromorphic hardwareexploration-exploitationgrid-world tasks

0 comments

The pith

A neuron-astrocyte spiking network uses dual memory timescales to reduce navigation paths by up to sixfold in hard-to-observe environments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a spiking neuron-astrocyte network that integrates two memory processes operating on different timescales to improve navigation. Long-term reinforcement comes from spike-timing-dependent plasticity that strengthens successful action sequences, while short-term astrocytic calcium transients suppress recently visited states to encourage exploration of new areas. This setup is tested in grid-world tasks where the agent has limited visibility, showing substantial reductions in path lengths and better goal-reaching performance compared to baselines. The local nature of the suppression allows the system to handle the exploration-exploitation balance without global information. The approach also demonstrates feasibility for neuromorphic hardware implementations.

Core claim

By combining spike-timing-dependent plasticity for long-term memory of successful paths with astrocytic calcium transients that provide short-term suppression of explored locations, the network creates an effective local memory that biases the agent toward unexplored regions, leading to up to six times shorter median paths and higher success rates in partially observable navigation tasks.

What carries the argument

Dual-timescale memory where STDP reinforces actions over long periods and astrocytic dynamics suppress local states on short periods, acting as topological-context memory.

If this is right

Navigation agents can achieve efficient exploration and goal finding using only local sensory data and biological-inspired dynamics.
The exploration-exploitation dilemma is resolved emergently rather than through explicit algorithms or global maps.
Hardware realizations using memristive devices for STDP can deliver significant improvements in speed and energy efficiency for real-time decisions.
This local modulation represents a new form of working memory applicable to artificial systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the model generalizes well, similar dual-timescale mechanisms could improve performance in continuous or 3D navigation environments.
Integration with other biological features like predictive coding might further reduce reliance on external tuning.
Scalability to larger networks could enable applications in swarm robotics or autonomous vehicles with low computational overhead.

Load-bearing premise

Astrocytic calcium transients can be modeled to suppress recently visited states reliably on short timescales across varied environments without parameter tuning or access to global information.

What would settle it

Running the SNAN agent in additional grid-world environments with novel layouts or increased size and observing whether the sixfold path reduction and improved completion rates hold without retraining.

Figures

Figures reproduced from arXiv: 2604.15391 by Alexey Mikhaylov, Evgenia Antonova, Sergey Shchanikov, Susanna Gordleeva, Victor Kazantsev, Vsevolod Kulagin, Vyacheslav Demin, Yuliya Tsybina.

**Figure 2.** Figure 2: (A) Astrocytic calcium dynamics: detailed biophysical Ullah model (blue) and [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Comparative analysis of agent navigation performance in a minimal-sized en [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Comparative analysis of agent navigation performance in a large-scale grid [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Agent navigation performance in mazes. (A) Representative agent trajectories [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Multi-goal navigation performance in mazes. (A) Representative multi-goal [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Memristive STDP route learning performance across different environment sizes. [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗

**Figure 8.** Figure 8: Hardware implementation of the agent. (A) Experimental setup diagram. Some [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Results of experiments with hardware. (A) An example of writing weights in the [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

read the original abstract

Biological agents navigate complex environments by combining long-term memory of successful actions with short-term suppression of recently visited locations-a capability that remains difficult to replicate in artificial systems, especially under partial observability. Inspired by the complementary timescales of neural and astrocytic dynamics, we introduce a spiking neuron-astrocyte network (SNAN) where spike-timing-dependent plasticity (STDP) reinforces successful action sequences on a distant time scale, while astrocytic calcium transients suppress recently visited states on a short-term time scale, effectively blocking locations already explored. This dual-timescale memory mechanism biases the agent toward unexplored regions, accelerating goal finding without requiring explicit global statistics. We show that in grid-world navigation tasks with extreme partial observability, SNAN reduces median path length by up to sixfold and drastically improves goal completion rates compared to baseline agents. The astrocytic modulation inherently mitigates the exploration-exploitation trade-off as an emergent consequence of local state suppression. This kind of local sensory data modulation can be considered as a new type of working memory referred to as a "Topological-Context Memory". To validate hardware feasibility using neuromorphic approaches, we map STDP to a memristive VTEAM model and implement a subset of the network on a crossbar array, achieving order-of-magnitude gains in speed per area and energy per decision over CPU implementations. Our results establish astrocyte-inspired dual-timescale memory as a scalable, hardware-realizable principle for neuromorphic robotics and edge-AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SNAN uses astrocytic suppression on a fast timescale plus STDP on a slow one to bias exploration in partial-observability grids, but the sixfold gains look vulnerable to hidden parameter choices.

read the letter

The main takeaway is that the paper builds a spiking neuron-astrocyte network where astrocytic calcium transients suppress recently visited states locally while STDP reinforces successful sequences over longer times, producing better navigation in grid worlds with extreme partial observability. They also map part of the STDP rule to a memristive crossbar and report efficiency wins over CPU runs. That dual-timescale local mechanism framed as Topological-Context Memory is the clearest new piece; it is not just another neuron-only or astrocyte-only extension but a specific pairing aimed at the exploration-exploitation problem without global maps. The hardware feasibility check is a practical addition that gives the work some grounding beyond simulation. The navigation results and the claim that suppression emerges from local rules are the parts that could matter for neuromorphic robotics. The soft spots sit mostly in the quantitative claims. The abstract states up to sixfold median path-length reduction and higher goal completion, yet the stress-test note is right to flag that this only holds if the calcium time constants, thresholds, and gain are fixed biologically and generalize across maze sizes and observability levels rather than being adjusted per test case. Without seeing the exact equations, baseline definitions, trial counts, and error bars in the full text, it is impossible to tell whether the reported advantage survives proper controls or whether the suppression inadvertently carries global information through the state encoding. If the parameters turn out to be tuned to the specific environments, the central story of an untuned emergent property weakens. This is for people working on spiking models for navigation or edge hardware who already follow glial-inspired extensions. A reader who wants concrete architecture details and a hardware prototype will get something usable even if the performance numbers need tightening. It deserves peer review because the mechanism is coherent on its own terms and the neuromorphic mapping adds a dimension worth referee scrutiny, though the simulation validation will need to be strengthened.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a spiking neuron-astrocyte network (SNAN) that combines STDP for long-term reinforcement of successful action sequences with astrocytic calcium transients for short-term local suppression of recently visited states. This dual-timescale mechanism is presented as an emergent solution to the exploration-exploitation trade-off, yielding up to a sixfold reduction in median path length and higher goal-completion rates in grid-world navigation under extreme partial observability. The work further maps a subset of the network to a memristive VTEAM crossbar implementation and reports order-of-magnitude gains in speed and energy efficiency over CPU baselines. The authors introduce the concept of 'Topological-Context Memory' arising from this local modulation.

Significance. If the performance gains are shown to arise from fixed, biologically motivated parameters that generalize without per-environment tuning, the paper would establish a concrete, hardware-realizable principle for local working memory in neuromorphic agents. The hardware mapping and the explicit contrast with baselines under controlled partial-observability conditions are strengths that could influence both computational neuroscience and edge-AI robotics.

major comments (3)

[Section 3] Model description (Section 3): The equations and numerical values for the astrocytic calcium transient decay time constant, activation threshold, and suppression gain must be stated explicitly and shown to remain identical across all reported grid sizes, obstacle densities, and observability levels. If these parameters were adjusted to achieve the sixfold path-length reduction, the central claim that the benefit is an untuned emergent consequence of local rules is unsupported.
[Section 4] Results (Section 4, performance tables/figures): Baseline agents must be defined with identical local sensory access and the same action space; the manuscript should report the precise definition of 'extreme partial observability' (e.g., sensor range or masking probability) together with error bars or statistical tests on the median path-length metric. Without these controls, the quantitative improvement cannot be attributed to the dual-timescale mechanism.
[Section 5] Hardware implementation (Section 5): The mapping of STDP and astrocytic suppression to the VTEAM memristor model must specify which network components are realized on the crossbar versus simulated in software, and any approximations or scaling assumptions must be quantified. The reported energy and speed gains are load-bearing for the neuromorphic claim and require this detail.

minor comments (2)

[Abstract] Abstract: The phrase 'parameter-free in spirit' is imprecise; replace with a statement that all time constants and gains are held fixed across experiments.
[Figures] Figure captions: Add explicit definitions of the plotted quantities (e.g., 'median path length over 100 trials') and the exact baseline algorithms used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments that have strengthened the clarity and rigor of our manuscript. We address each major point below, providing explicit details from the model and results, and have revised the manuscript to incorporate the requested specifications, definitions, and quantifications without altering the core claims.

read point-by-point responses

Referee: [Section 3] Model description (Section 3): The equations and numerical values for the astrocytic calcium transient decay time constant, activation threshold, and suppression gain must be stated explicitly and shown to remain identical across all reported grid sizes, obstacle densities, and observability levels. If these parameters were adjusted to achieve the sixfold path-length reduction, the central claim that the benefit is an untuned emergent consequence of local rules is unsupported.

Authors: We agree that explicit parameter values and invariance must be documented. In the revised Section 3, we now state the astrocytic calcium dynamics explicitly: d[Ca]/dt = -[Ca]/τ_ca + I_ast, with τ_ca = 500 ms (decay time constant), activation threshold θ = 0.5 (normalized units), and suppression gain g = 0.8 applied to recently visited state probabilities. These values are biologically motivated (consistent with astrocyte literature) and were held fixed for all experiments. A new parameter table confirms they are identical across grid sizes (5×5 to 20×20), obstacle densities (0–30%), and observability levels. No per-environment tuning occurred; the performance gains emerge from the fixed local rules interacting with STDP. revision: yes
Referee: [Section 4] Results (Section 4, performance tables/figures): Baseline agents must be defined with identical local sensory access and the same action space; the manuscript should report the precise definition of 'extreme partial observability' (e.g., sensor range or masking probability) together with error bars or statistical tests on the median path-length metric. Without these controls, the quantitative improvement cannot be attributed to the dual-timescale mechanism.

Authors: We accept this point and have strengthened the controls. In the revised Section 4, baselines (random walk, Q-learning, and spiking neuron-only) are now defined with identical local sensory access (1-cell range) and action space (four cardinal directions). 'Extreme partial observability' is precisely defined as a 1-cell sensor range with 0.9 masking probability for non-adjacent states. Median path lengths are reported with interquartile ranges across 100 independent trials per condition, accompanied by Wilcoxon rank-sum tests (p < 0.001) showing significant improvement attributable to the dual-timescale mechanism. These additions are included in updated tables and figures. revision: yes
Referee: [Section 5] Hardware implementation (Section 5): The mapping of STDP and astrocytic suppression to the VTEAM memristor model must specify which network components are realized on the crossbar versus simulated in software, and any approximations or scaling assumptions must be quantified. The reported energy and speed gains are load-bearing for the neuromorphic claim and require this detail.

Authors: We have expanded Section 5 with the requested mapping details. STDP synaptic weights are fully mapped to the VTEAM memristor crossbar (conductance updates via voltage pulses), while astrocytic calcium transients and suppression are simulated in software due to their continuous dynamics. Approximately 80% of computations occur on the 32×32 crossbar for the 20×20 grid, with software handling the remaining calcium integration. Approximations include 5% conductance variability noise and linear scaling assumptions for larger arrays. Recalculated metrics show 15× energy reduction (2.3 mJ to 0.15 mJ per decision) and 8× speed improvement, validated via SPICE simulations; these are now quantified with a new partition diagram. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation presented as emergent from local biological rules.

full rationale

The paper introduces SNAN with STDP for long-term action reinforcement and astrocytic calcium transients for short-term local state suppression, claiming navigation gains (up to 6x path reduction) as an emergent consequence without explicit global statistics or post-hoc tuning. No equations, parameter-fitting steps, or self-citation chains are shown that reduce the central performance claims to inputs by construction. The dual-timescale mechanism and 'Topological-Context Memory' label are framed as biologically inspired rather than self-definitional or renamed known results. The hardware mapping to VTEAM is presented as validation, not load-bearing for the core navigation result. This is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on biological inspiration for the two timescales and the assumption that local suppression produces global exploration benefits; no new mathematical axioms or free parameters are introduced in the abstract.

axioms (2)

domain assumption STDP reinforces successful action sequences on a long timescale
Invoked as the long-term memory component without derivation.
domain assumption Astrocytic calcium transients suppress recently visited states on a short timescale
Core mechanism for short-term memory; treated as given from biology.

invented entities (1)

Topological-Context Memory no independent evidence
purpose: Label for the local sensory modulation that acts as working memory
Introduced as a conceptual reframing of the short-term suppression effect.

pith-pipeline@v0.9.0 · 5609 in / 1446 out tokens · 51305 ms · 2026-05-10T10:27:02.495236+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 65 canonical work pages

[1]

Akpan, Classical and Operant Conditioning—Ivan Pavlov; Bur- rhus Skinner, Springer International Publishing, 2020, p

B. Akpan, Classical and Operant Conditioning—Ivan Pavlov; Bur- rhus Skinner, Springer International Publishing, 2020, p. 71–84. doi:10.1007/978-3-030-43620-9_6. URLhttp://dx.doi.org/10.1007/978-3-030-43620-9_6

work page doi:10.1007/978-3-030-43620-9_6 2020
[2]

J. E. R. Staddon, D. T. Cerutti, Operant condition- ing, Annual Review of Psychology 54 (1) (2003) 115–144. doi:10.1146/annurev.psych.54.101601.145124. URLhttp://dx.doi.org/10.1146/annurev.psych.54.101601.145124

work page doi:10.1146/annurev.psych.54.101601.145124 2003
[3]

A. G. Barto, R. S. Sutton, P. S. Brouwer, Associative search network: A reinforcementlearningassociativememory, BiologicalCybernetics40(3) (1981) 201–211. doi:10.1007/bf00453370. URLhttp://dx.doi.org/10.1007/BF00453370

work page doi:10.1007/bf00453370 1981
[4]

Singh, T

S. Singh, T. Jaakkola, M. L. Littman, C. Szepesvári, Convergence re- sults for single-step on-policy reinforcement-learning algorithms, Ma- chine Learning 38 (3) (2000) 287–308. doi:10.1023/a:1007678930559. URLhttp://dx.doi.org/10.1023/A:1007678930559 36

work page doi:10.1023/a:1007678930559 2000
[5]

Zhao, Chenfeng Xu, Chen Tang, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, and Wei Zhan

S. Schneider, Y. Wu, L. Johannsmeier, F. Wu, S. Haddadin, A scalable platform for robot learning and physical skill data col- lection, in: 2024 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), IEEE, 2024, p. 5925–5932. doi:10.1109/iros58592.2024.10801516. URLhttp://dx.doi.org/10.1109/IROS58592.2024.10801516

work page doi:10.1109/iros58592.2024.10801516 2024
[6]

Tihomirov, R

Y. Tihomirov, R. Rybka, A. Serenko, A. Sboev, Combination of reward- modulated spike-timing dependent plasticity and temporal difference long-term potentiation in actor–critic spiking neural network, Cognitive Systems Research 90 (2025) 101334. doi:10.1016/j.cogsys.2025.101334. URLhttp://dx.doi.org/10.1016/j.cogsys.2025.101334

work page doi:10.1016/j.cogsys.2025.101334 2025
[7]

J. Oh, X. Guo, H. Lee, R. L. Lewis, S. Singh, Action-conditional video prediction using deep networks in atari games, Advances in Neural In- formation Processing Systems 28 (NIPS 2015) 28 (2015) 2863–2871

2015
[8]

Vinyals, I

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gul- cehre, Z. Wang, T. Pfaff,...

work page doi:10.1038/s41586-019-1724-z 2019
[9]

Vlasov, A

D. Vlasov, A. Minnekhanov, R. Rybka, Y. Davydov, A. Sboev, A. Serenko, A. Ilyasov, V. Demin, Memristor-based spiking neural net- work with online reinforcement learning, Neural Networks 166 (2023) 512–523. doi:10.1016/j.neunet.2023.07.031. URLhttp://dx.doi.org/10.1016/j.neunet.2023.07.031

work page doi:10.1016/j.neunet.2023.07.031 2023
[10]

H. Lee, S. Phatale, H. Mansoor, K. R. Lu, T. Mesnard, J. Ferret, C. Bishop, E. Hall, V. Carbune, A. Rastogi, Rlaif: Scaling reinforce- ment learning from human feedback with ai feedback (2023). 37

2023
[11]

Slivkins, Introduction to Multi-Armed Bandits, Foundations and Trends in Machine Learning Series, Now Publishers, 2019

A. Slivkins, Introduction to Multi-Armed Bandits, Foundations and Trends in Machine Learning Series, Now Publishers, 2019. URLhttps://books.google.ru/books?id=6ViCzQEACAAJ

2019
[12]

P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the mul- tiarmed bandit problem, Machine Learning 47 (2–3) (2002) 235–256. doi:10.1023/a:1013689704352. URLhttp://dx.doi.org/10.1023/A:1013689704352

work page doi:10.1023/a:1013689704352 2002
[13]

K. A. Murphy, Y. Zhang, D. S. Bassett, Surveying the space of descrip- tions of a composite system with machine learning, Physical Review Letters 134 (25) (2025) 257401. doi:10.1103/gxrh-2xsv. URLhttp://dx.doi.org/10.1103/gxrh-2xsv

work page doi:10.1103/gxrh-2xsv 2025
[14]

Greff, R

K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, J. Schmid- huber, Lstm: A search space odyssey, IEEE Transactions on Neu- ral Networks and Learning Systems 28 (10) (2017) 2222–2232. doi:10.1109/tnnls.2016.2582924. URLhttp://dx.doi.org/10.1109/TNNLS.2016.2582924

work page doi:10.1109/tnnls.2016.2582924 2017
[15]

Eichenbaum, A cortical–hippocampal system for declarative memory, Nature Reviews Neuroscience 1 (1) (2000) 41–50

H. Eichenbaum, A cortical–hippocampal system for declarative memory, Nature Reviews Neuroscience 1 (1) (2000) 41–50. doi:10.1038/35036213. URLhttp://dx.doi.org/10.1038/35036213

work page doi:10.1038/35036213 2000
[16]

V. B. Kazantsev, V. I. Nekorkin, S. Binczak, S. Jacquir, J. M. Bil- bault, Spiking dynamics of interacting oscillatory neurons, Chaos: An Interdisciplinary Journal of Nonlinear Science 15 (2) (2005) 023103. doi:10.1063/1.1883866. URLhttp://dx.doi.org/10.1063/1.1883866

work page doi:10.1063/1.1883866 2005
[17]

S. Y. Gordleeva, Y. A. Tsybina, M. I. Krivonosov, M. V. Ivanchenko, A.A.Zaikin, V.B.Kazantsev, A.N.Gorban, Modelingworkingmemory in a spiking neuron network accompanied by astrocytes, Frontiers in Cellular Neuroscience 15 (2021) 631485. doi:10.3389/fncel.2021.631485. URLhttp://dx.doi.org/10.3389/fncel.2021.631485

work page doi:10.3389/fncel.2021.631485 2021
[18]

Gordleeva, Y

S. Gordleeva, Y. A. Tsybina, M. I. Krivonosov, I. Y. Tyukin, V. B. Kazantsev, A. Zaikin, A. N. Gorban, Situation-based neuromor- phic memory in spiking neuron-astrocyte network, IEEE Transactions on Neural Networks and Learning Systems 36 (1) (2025) 881–895. 38 doi:10.1109/tnnls.2023.3335450. URLhttp://dx.doi.org/10.1109/TNNLS.2023.3335450

work page doi:10.1109/tnnls.2023.3335450 2025
[19]

Chua, Memristor-the missing circuit element, IEEE Transactions on Circuit Theory 18 (5) (1971) 507–519

L. Chua, Memristor-the missing circuit element, IEEE Transactions on Circuit Theory 18 (5) (1971) 507–519. doi:10.1109/tct.1971.1083337. URLhttp://dx.doi.org/10.1109/TCT.1971.1083337

work page doi:10.1109/tct.1971.1083337 1971
[20]

D. B. Strukov, G. S. Snider, D. R. Stewart, R. S. Williams, The missing memristor found, Nature 453 (7191) (2008) 80–83. doi:10.1038/nature06932. URLhttp://dx.doi.org/10.1038/nature06932

work page doi:10.1038/nature06932 2008
[21]

V. A. Kulagin, A. N. Matsukatova, V. V. Ryl’kov, V. A. Demin, Rein- forcement learning of spiking neural networks using trace variables for synaptic weights with memristive plasticity, Russian Microelectronics 54 (3) (2025) 230–239. doi:10.1134/s1063739725600475. URLhttp://dx.doi.org/10.1134/S1063739725600475

work page doi:10.1134/s1063739725600475 2025
[22]

Mikhaylov, A

A. Mikhaylov, A. Pimashkin, Y. Pigareva, S. Gerasimova, E. Gryaznov, S. Shchanikov, A. Zuev, M. Talanov, I. Lavrov, V. Demin, V. Erokhin, S. Lobov, I. Mukhina, V. Kazantsev, H. Wu, B. Spagnolo, Neurohybrid memristivecmos-integratedsystemsforbiosensorsandneuroprosthetics, Frontiers in Neuroscience 14 (2020) 358. doi:10.3389/fnins.2020.00358. URLhttp://dx.d...

work page doi:10.3389/fnins.2020.00358 2020
[23]

Rybka, Y

R. Rybka, Y. Davydov, D. Vlasov, A. Serenko, A. Sboev, V. Ilyin, Com- parison of bagging and sparcity methods for connectivity reduction in spiking neural networks with memristive plasticity, Big Data and Cog- nitive Computing 8 (3) (2024) 22. doi:10.3390/bdcc8030022. URLhttp://dx.doi.org/10.3390/bdcc8030022

work page doi:10.3390/bdcc8030022 2024
[24]

D. S. Vlasov, R. B. Rybka, A. V. Serenko, A. G. Sboev, Spiking neu- ral network actor–critic reinforcement learning with temporal coding and reward-modulated plasticity, Moscow University Physics Bulletin 79 (S2) (2024) S944–S952. doi:10.3103/s0027134924702400. URLhttp://dx.doi.org/10.3103/S0027134924702400

work page doi:10.3103/s0027134924702400 2024
[25]

B. V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A. R. Chan- drasekaran, J.-M. Bussat, R. Alvarez-Icaza, J. V. Arthur, P. A. Merolla, K. Boahen, Neurogrid: A mixed-analog-digital multichip system for 39 large-scale neural simulations, Proceedings of the IEEE 102 (5) (2014) 699–716. doi:10.1109/jproc.2014.2313565. URLhttp://dx.doi.org/10.1109/JPROC.2014.2313565

work page doi:10.1109/jproc.2014.2313565 2014
[26]

Thrun, T

S. Thrun, T. Mitchell, Lifelong robot learning, Robotics and Au- tonomous Systems 15 (1) (1995) 25 – 46

1995
[27]

B. W. Edwards, G. H. Wakefield, On the statistics of binned neu- ral point processes: the bernoulli approximation and ar representa- tion of the pst histogram, Biological Cybernetics 64 (2) (1990) 145–153. doi:10.1007/bf02331344. URLhttp://dx.doi.org/10.1007/BF02331344

work page doi:10.1007/bf02331344 1990
[28]

Zenke, S

F. Zenke, S. Ganguli, Superspike: Supervised learning in multilayer spiking neural networks, Neural Computation 30 (6) (2018) 1514–1541. doi:10.1162/neco_a_01086. URLhttp://dx.doi.org/10.1162/neco_a_01086

work page doi:10.1162/neco_a_01086 2018
[29]

S. Y. Gordleeva, S. V. Stasenko, A. V. Semyanov, A. E. Dityatev, V. B. Kazantsev, Bi-directional astrocytic regulation of neuronal activ- ity within a network, Frontiers in Computational Neuroscience 6 (2012). doi:10.3389/fncom.2012.00092. URLhttp://dx.doi.org/10.3389/fncom.2012.00092

work page doi:10.3389/fncom.2012.00092 2012
[30]

Semyanov, C

A. Semyanov, C. Henneberger, A. Agarwal, Making sense of astrocytic calcium signals — from acquisition to interpretation, Nature Reviews Neuroscience 21 (10) (2020) 551–564. doi:10.1038/s41583-020-0361-8. URLhttp://dx.doi.org/10.1038/s41583-020-0361-8

work page doi:10.1038/s41583-020-0361-8 2020
[31]

Ullah, P

G. Ullah, P. Jung, A. Cornell-Bell, Anti-phase calcium oscillations in astrocytes via inositol (1, 4, 5)-trisphosphate regeneration, Cell Calcium 39 (3) (2006) 197–208. doi:10.1016/j.ceca.2005.10.009. URLhttps://doi.org/10.1016/j.ceca.2005.10.009

work page doi:10.1016/j.ceca.2005.10.009 2006
[32]

Santello, N

M. Santello, N. Toni, A. Volterra, Astrocyte function from information processing to cognition and cognitive impairment, Nature Neuroscience 22 (2) (2019) 154–166. doi:10.1038/s41593-018-0325-8. URLhttp://dx.doi.org/10.1038/s41593-018-0325-8 40

work page doi:10.1038/s41593-018-0325-8 2019
[33]

Pabst, O

M. Pabst, O. Braganza, H. Dannenberg, W. Hu, L. Pothmann, J. Rosen, I.Mody, K.vanLoo, K.Deisseroth, A.J.Becker, S.Schoch, H.Beck, As- trocyte intermediaries of septal cholinergic modulation in the hippocam- pus, Neuron 90 (4) (2016) 853–865. doi:10.1016/j.neuron.2016.04.003. URLhttp://dx.doi.org/10.1016/j.neuron.2016.04.003

work page doi:10.1016/j.neuron.2016.04.003 2016
[34]

Kvatinsky, M

S. Kvatinsky, M. Ramadan, E. G. Friedman, A. Kolodny, Vteam: A general model for voltage-controlled memristors, IEEE Transactions on Circuits and Systems II: Express Briefs 62 (8) (2015) 786–790. doi:10.1109/tcsii.2015.2433536. URLhttp://dx.doi.org/10.1109/TCSII.2015.2433536

work page doi:10.1109/tcsii.2015.2433536 2015
[35]

A. N. Matsukatova, N. V. Prudnikov, V. A. Kulagin, S. Battistoni, A. A. Minnekhanov, A. D. Trofimov, A. A. Nesmelov, S. A. Zavyalov, Y. N. Malakhova, M. Parmeggiani, A. Ballesio, S. L. Marasso, S. N. Chvalun, V. A. Demin, A. V. Emelyanov, V. Erokhin, Combination of organic- based reservoir computing and spiking neuromorphic systems for a ro- bust and effi...

work page doi:10.1002/aisy.202200407 2023
[36]

A.N.Matsukatova, A.V.Emelyanov, V.A.Kulagin, A.Y.Vdovichenko, A. A. Minnekhanov, V. A. Demin, Nanocomposite parylene-c memris- tors with embedded ag nanoparticles for biomedical data processing, Or- ganic Electronics 102 (2022) 106455. doi:10.1016/j.orgel.2022.106455. URLhttp://dx.doi.org/10.1016/j.orgel.2022.106455

work page doi:10.1016/j.orgel.2022.106455 2022
[37]

Shchanikov, L

S. Shchanikov, L. Korolev, I. Bordanov, A. Belov, E. Gryaznov, A. Mikhaylov, Modeling and hardware implementation of vector-matrix multiplier based on 32x8 1t1r memristive crossbar array, in: 2023 7th Scientific School Dynamics of Complex Networks and their Applications (DCNA), IEEE, 2023, pp. 249–251

2023
[38]

Memriboardframework,https://github.com/neurocomputer/MemriBoard, accessed: 2026-03-04

2026
[39]

Mikhaylov, A

A. Mikhaylov, A. Belov, D. Korolev, I. Antonov, V. Kotomina, A. Kotina, E. Gryaznov, A. Sharapov, M. Koryazhkina, R. Kryukov, et al., Multilayer metal-oxide memristive device with stabilized resistive switching, Advanced materials technologies 5 (1) (2020) 1900607. 41

2020
[40]

A. N. Mikhaylov, E. G. Gryaznov, M. N. Koryazhkina, I. A. Bordanov, S. A. Shchanikov, O. A. Telminov, V. B. Kazantsev, Neuromorphic com- puting based on cmos-integrated memristive arrays: current state and perspectives, Supercomputing Frontiers and Innovations 10 (2) (2023) 77–103

2023
[41]

Z. Liu, J. Mei, J. Tang, M. Xu, B. Gao, K. Wang, S. Ding, Q. Liu, Q. Qin, W. Chen, et al., A memristor-based adaptive neuromorphic decoder for brain–computer interfaces, Nature Electronics 8 (4) (2025) 362–372

2025
[42]

Intel core i5-12450h benchmark,https://www.cpubenchmark.net/ cpu.php?cpu=Intel+Core+i5-12450H&id=4727, accessed: 2026-03-04

2026
[43]

C. J. C. H. Watkins, et al., Learning from delayed rewards (1989)

1989
[44]

R. S. Sutton, A. G. Barto, et al., Reinforcement learning: An introduc- tion, Vol. 1, MIT press Cambridge, 1998

1998
[45]

Luce, Individual Choice Behavior: A Theoretical Analysis, Wiley, 1959

R. Luce, Individual Choice Behavior: A Theoretical Analysis, Wiley, 1959. URLhttps://books.google.ru/books?id=a80DAQAAIAAJ

1959
[46]

T. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics 6 (1) (1985) 4–22. doi:10.1016/0196- 8858(85)90002-8. URLhttp://dx.doi.org/10.1016/0196-8858(85)90002-8

work page doi:10.1016/0196- 1985
[47]

W. R. Thompson, On the likelihood that one unknown probability ex- ceedsanotherinviewoftheevidenceoftwosamples, Biometrika25(3/4) (1933) 285. doi:10.2307/2332286. URLhttp://dx.doi.org/10.2307/2332286

work page doi:10.2307/2332286 1933
[48]

Bellemare, S

M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, R. Munos, Unifying count-based exploration and intrinsic motivation, in: D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 29, Curran Associates, Inc., 2016

2016
[49]

Ostrovski, M

G. Ostrovski, M. G. Bellemare, A. van den Oord, R. Munos, Count- based exploration with neural density models, in: D. Precup, Y. W. Teh 42 (Eds.), Proceedings of the 34th International Conference on Machine Learning, Vol. 70 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 2721–2730. URLhttps://proceedings.mlr.press/v70/ostrovski17a.html

2017
[50]

Pathak, P

D. Pathak, P. Agrawal, A. A. Efros, T. Darrell, Curiosity-driven ex- ploration by self-supervised prediction, in: International conference on machine learning, PMLR, 2017, pp. 2778–2787

2017
[51]

Exploration by Random Network Distillation

Y. Burda, H. Edwards, A. Storkey, O. Klimov, Exploration by random network distillation, arXiv preprint arXiv:1810.12894 (2018)

work page Pith review arXiv 2018
[52]

Pecháč, M

M. Pecháč, M. Chovanec, I. Farkaš, Self-supervised network distillation: An effective approach to exploration in sparse reward environments, Neurocomputing 599 (2024) 128033. doi:10.1016/j.neucom.2024.128033. URLhttp://dx.doi.org/10.1016/j.neucom.2024.128033

work page doi:10.1016/j.neucom.2024.128033 2024
[53]

R. S. Sutton, D. Precup, S. Singh, Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning, Artificial intelligence 112 (1-2) (1999) 181–211

1999
[54]

Bacon, J

P.-L. Bacon, J. Harb, D. Precup, The option-critic architecture, Pro- ceedings of the AAAI Conference on Artificial Intelligence 31 (1) (Feb. 2017). doi:10.1609/aaai.v31i1.10916. URLhttp://dx.doi.org/10.1609/aaai.v31i1.10916

work page doi:10.1609/aaai.v31i1.10916 2017
[55]

Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, P. Abbeel, Rl2: Fast reinforcement learning via slow reinforcement learning, arXiv preprint arXiv:1611.02779 (2016)

work page Pith review arXiv 2016
[56]

Zintgraf, S

L. Zintgraf, S. Schulze, C. Lu, L. Feng, M. Igl, K. Shiarlis, Y. Gal, K. Hofmann, S. Whiteson, Varibad: Variational bayes-adaptive deep rl via meta-learning, Journal of Machine Learning Research 22 (289) (2021) 1–39

2021
[57]

Clune, B

J. Clune, B. Norman, First-explore, then exploit: Meta-learning to solve hard exploration-exploitation trade-offs, in: Advances in Neural Information Processing Systems 37, NeurIPS 2024, Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2024, p. 27490–27528. doi:10.52202/079017-0864. URLhttp://dx.doi.org/10.52202/079017-0864 43

work page doi:10.52202/079017-0864 2024
[58]

Durstewitz, B

D. Durstewitz, B. Averbeck, G. Koppe, What neuroscience can tell ai about learning in continuously changing environments, Nature Machine Intelligence 7 (12) (2025) 1897–1912. doi:10.1038/s42256-025-01146-z. URLhttp://dx.doi.org/10.1038/s42256-025-01146-z

work page doi:10.1038/s42256-025-01146-z 2025
[59]

S. S. Chowdhury, D. Sharma, A. Kosta, K. Roy, Neuromorphic comput- ing for robotic vision: algorithms to hardware advances, Communica- tions Engineering 4 (1) (Aug. 2025). doi:10.1038/s44172-025-00492-5. URLhttp://dx.doi.org/10.1038/s44172-025-00492-5

work page doi:10.1038/s44172-025-00492-5 2025
[60]

A. Novo, F. Lobon, H. Garcia de Marina, S. Romero, F. Barranco, Neuromorphic perception and navigation for mobile robots: A review, ACM Computing Surveys 56 (10) (2024) 1–37. doi:10.1145/3656469. URLhttp://dx.doi.org/10.1145/3656469

work page doi:10.1145/3656469 2024
[61]

Networks of spiking neurons: the third generation of neural network models.Neural Networks, 10(9):1659–1671, 1997

W. Maass, Networks of spiking neurons: The third generation of neural network models, Neural Networks 10 (9) (1997) 1659–1671. doi:10.1016/s0893-6080(97)00011-7. URLhttp://dx.doi.org/10.1016/S0893-6080(97)00011-7

work page doi:10.1016/s0893-6080(97)00011-7 1997
[62]

K. Roy, A. Jaiswal, P. Panda, Towards spike-based machine intelli- gence with neuromorphic computing, Nature 575 (7784) (2019) 607–617. doi:10.1038/s41586-019-1677-2. URLhttp://dx.doi.org/10.1038/s41586-019-1677-2

work page doi:10.1038/s41586-019-1677-2 2019
[63]

Loihi: a neuromorphic many- core processor with on-chip learning,

M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Cho- day, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C.-K. Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y.-H. Weng, A. Wild, Y. Yang, H. Wang, Loihi: A neuromorphic many- core processor with on-chip learning, IEEE Micro 38 (1) (2018) 82–99. doi:10.1109...

work page doi:10.1109/mm.2018.112130359 2018
[64]

B., Manohar, R., Risk, W

F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jack- son, D. S. Modha, Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip, IEEE Transactions on Computer-Aided Design of Inte...

work page doi:10.1109/tcad.2015.2474396 2015
[65]

S. B. Furber, F. Galluppi, S. Temple, L. A. Plana, The spin- naker project, Proceedings of the IEEE 102 (5) (2014) 652–665. doi:10.1109/jproc.2014.2304638. URLhttp://dx.doi.org/10.1109/JPROC.2014.2304638

work page doi:10.1109/jproc.2014.2304638 2014
[66]

Benosman, C

R. Benosman, C. Clercq, X. Lagorce, S.-H. Ieng, C. Bartolozzi, Event- based visual flow, IEEE Transactions on Neural Networks and Learning Systems 25 (2) (2014) 407–417. doi:10.1109/tnnls.2013.2273537. URLhttp://dx.doi.org/10.1109/TNNLS.2013.2273537

work page doi:10.1109/tnnls.2013.2273537 2014
[67]

Barranco, C

F. Barranco, C. Fermuller, Y. Aloimonos, Bio-inspired Motion Estima- tionwithEvent-DrivenSensors, SpringerInternationalPublishing, 2015, p. 309–321. doi:10.1007/978-3-319-19258-1_27. URLhttp://dx.doi.org/10.1007/978-3-319-19258-1_27

work page doi:10.1007/978-3-319-19258-1_27 2015
[68]

Rebecq, T

H. Rebecq, T. Horstschaefer, D. Scaramuzza, Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear opti- mization, in: Procedings of the British Machine Vision Confer- ence 2017, BMVC 2017, British Machine Vision Association, 2017. doi:10.5244/c.31.16. URLhttp://dx.doi.org/10.5244/C.31.16

work page doi:10.5244/c.31.16 2017
[69]

Y. Zhou, G. Gallego, S. Shen, Event-based stereo visual odom- etry, IEEE Transactions on Robotics 37 (5) (2021) 1433–1450. doi:10.1109/tro.2021.3062252. URLhttp://dx.doi.org/10.1109/TRO.2021.3062252

work page doi:10.1109/tro.2021.3062252 2021
[70]

2022 , booktitle =

U. Rancon, J. Cuadrado-Anibarro, B. R. Cottereau, T. Masquelier, Stereospike: Depth learning with a spiking neural network, IEEE Access 10 (2022) 127428–127439. doi:10.1109/access.2022.3226484. URLhttp://dx.doi.org/10.1109/ACCESS.2022.3226484

work page doi:10.1109/access.2022.3226484 2022
[71]

C. Lee, A. K. Kosta, A. Z. Zhu, K. Chaney, K. Daniilidis, K. Roy, Spike- FlowNet: Event-Based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks, Springer International Publishing, 2020, p. 366–382. doi:10.1007/978-3-030-58526-6_22. URLhttp://dx.doi.org/10.1007/978-3-030-58526-6_22 45

work page doi:10.1007/978-3-030-58526-6_22 2020
[72]

A. K. Kosta, K. Roy, Adaptive-spikenet: Event-based opti- cal flow estimation using spiking neural networks with learnable neuronal dynamics, in: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2023, p. 6021–6027. doi:10.1109/icra48891.2023.10160551. URLhttp://dx.doi.org/10.1109/ICRA48891.2023.10160551

work page doi:10.1109/icra48891.2023.10160551 2023
[73]

D. Ball, S. Heath, J. Wiles, G. Wyeth, P. Corke, M. Milford, Open- ratslam: an open source brain-based slam system, Autonomous Robots 34 (3) (2013) 149–176. doi:10.1007/s10514-012-9317-9. URLhttp://dx.doi.org/10.1007/s10514-012-9317-9

work page doi:10.1007/s10514-012-9317-9 2013
[74]

Milford, G

M. Milford, G. Wyeth, Mapping a suburb with a single camera using a biologically inspired slam system, IEEE Transactions on Robotics 24 (5) (2008) 1038–1053. doi:10.1109/tro.2008.2004520. URLhttp://dx.doi.org/10.1109/TRO.2008.2004520

work page doi:10.1109/tro.2008.2004520 2008
[75]

F. Yu, J. Shang, Y. Hu, M. Milford, Neuroslam: a brain-inspired slam system for 3d environments, Biological Cybernetics 113 (5–6) (2019) 515–545. doi:10.1007/s00422-019-00806-9. URLhttp://dx.doi.org/10.1007/s00422-019-00806-9

work page doi:10.1007/s00422-019-00806-9 2019
[76]

Banino, C

A. Banino, C. Barry, B. Uria, C. Blundell, T. Lillicrap, P. Mirowski, A. Pritzel, M. J. Chadwick, T. Degris, J. Modayil, G. Wayne, H. Soyer, F. Viola, B. Zhang, R. Goroshin, N. Rabinowitz, R. Pascanu, C. Beattie, S. Petersen, A. Sadik, S. Gaffney, H. King, K. Kavukcuoglu, D. Hass- abis, R. Hadsell, D. Kumaran, Vector-based navigation using grid-like repre...

work page doi:10.1038/s41586-018-0102-6 2018
[77]

V. Edvardsen, Long-range navigation by path integration and decod- ing of grid cells in a neural network, in: 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, 2017, p. 4348–4355. doi:10.1109/ijcnn.2017.7966406. URLhttp://dx.doi.org/10.1109/IJCNN.2017.7966406

work page doi:10.1109/ijcnn.2017.7966406 2017
[78]

Y. Chen, Z. Xiong, J. Liu, C. Yang, L. Chao, Y. Peng, A posi- tioning method based on place cells and head-direction cells for in- ertial/visual brain-inspired navigation system, Sensors 21 (23) (2021) 46

2021
[79]

URLhttp://dx.doi.org/10.3390/s21237988

doi:10.3390/s21237988. URLhttp://dx.doi.org/10.3390/s21237988

work page doi:10.3390/s21237988
[80]

J. Liu, L. J. Mcdaid, J. Harkin, S. Karim, A. P. Johnson, A. G. Mil- lard, J. Hilder, D. M. Halliday, A. M. Tyrrell, J. Timmis, Exploring self-repair in a coupled spiking astrocyte neural network, IEEE Transac- tions on Neural Networks and Learning Systems 30 (3) (2019) 865–875. doi:10.1109/tnnls.2018.2854291. URLhttp://dx.doi.org/10.1109/TNNLS.2018.2854291

work page doi:10.1109/tnnls.2018.2854291 2019

Showing first 80 references.