pith. machine review for the scientific record. sign in

arxiv: 2604.15391 · v1 · submitted 2026-04-16 · 🧬 q-bio.QM

Recognition: unknown

Dual-Timescale Memory in a Spiking Neuron-Astrocyte Network for Efficient Navigation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:27 UTC · model grok-4.3

classification 🧬 q-bio.QM
keywords spiking neuron-astrocyte networkdual-timescale memorynavigationpartial observabilitySTDPneuromorphic hardwareexploration-exploitationgrid-world tasks
0
0 comments X

The pith

A neuron-astrocyte spiking network uses dual memory timescales to reduce navigation paths by up to sixfold in hard-to-observe environments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a spiking neuron-astrocyte network that integrates two memory processes operating on different timescales to improve navigation. Long-term reinforcement comes from spike-timing-dependent plasticity that strengthens successful action sequences, while short-term astrocytic calcium transients suppress recently visited states to encourage exploration of new areas. This setup is tested in grid-world tasks where the agent has limited visibility, showing substantial reductions in path lengths and better goal-reaching performance compared to baselines. The local nature of the suppression allows the system to handle the exploration-exploitation balance without global information. The approach also demonstrates feasibility for neuromorphic hardware implementations.

Core claim

By combining spike-timing-dependent plasticity for long-term memory of successful paths with astrocytic calcium transients that provide short-term suppression of explored locations, the network creates an effective local memory that biases the agent toward unexplored regions, leading to up to six times shorter median paths and higher success rates in partially observable navigation tasks.

What carries the argument

Dual-timescale memory where STDP reinforces actions over long periods and astrocytic dynamics suppress local states on short periods, acting as topological-context memory.

If this is right

  • Navigation agents can achieve efficient exploration and goal finding using only local sensory data and biological-inspired dynamics.
  • The exploration-exploitation dilemma is resolved emergently rather than through explicit algorithms or global maps.
  • Hardware realizations using memristive devices for STDP can deliver significant improvements in speed and energy efficiency for real-time decisions.
  • This local modulation represents a new form of working memory applicable to artificial systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the model generalizes well, similar dual-timescale mechanisms could improve performance in continuous or 3D navigation environments.
  • Integration with other biological features like predictive coding might further reduce reliance on external tuning.
  • Scalability to larger networks could enable applications in swarm robotics or autonomous vehicles with low computational overhead.

Load-bearing premise

Astrocytic calcium transients can be modeled to suppress recently visited states reliably on short timescales across varied environments without parameter tuning or access to global information.

What would settle it

Running the SNAN agent in additional grid-world environments with novel layouts or increased size and observing whether the sixfold path reduction and improved completion rates hold without retraining.

Figures

Figures reproduced from arXiv: 2604.15391 by Alexey Mikhaylov, Evgenia Antonova, Sergey Shchanikov, Susanna Gordleeva, Victor Kazantsev, Vsevolod Kulagin, Vyacheslav Demin, Yuliya Tsybina.

Figure 1
Figure 1. Figure 1: The considered grid environment and the proposed bioinspired short-term mem [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (A) Astrocytic calcium dynamics: detailed biophysical Ullah model (blue) and [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparative analysis of agent navigation performance in a minimal-sized en [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparative analysis of agent navigation performance in a large-scale grid [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Agent navigation performance in mazes. (A) Representative agent trajectories [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Multi-goal navigation performance in mazes. (A) Representative multi-goal [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Memristive STDP route learning performance across different environment sizes. [PITH_FULL_IMAGE:figures/full_fig_p025_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Hardware implementation of the agent. (A) Experimental setup diagram. Some [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Results of experiments with hardware. (A) An example of writing weights in the [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
read the original abstract

Biological agents navigate complex environments by combining long-term memory of successful actions with short-term suppression of recently visited locations-a capability that remains difficult to replicate in artificial systems, especially under partial observability. Inspired by the complementary timescales of neural and astrocytic dynamics, we introduce a spiking neuron-astrocyte network (SNAN) where spike-timing-dependent plasticity (STDP) reinforces successful action sequences on a distant time scale, while astrocytic calcium transients suppress recently visited states on a short-term time scale, effectively blocking locations already explored. This dual-timescale memory mechanism biases the agent toward unexplored regions, accelerating goal finding without requiring explicit global statistics. We show that in grid-world navigation tasks with extreme partial observability, SNAN reduces median path length by up to sixfold and drastically improves goal completion rates compared to baseline agents. The astrocytic modulation inherently mitigates the exploration-exploitation trade-off as an emergent consequence of local state suppression. This kind of local sensory data modulation can be considered as a new type of working memory referred to as a "Topological-Context Memory". To validate hardware feasibility using neuromorphic approaches, we map STDP to a memristive VTEAM model and implement a subset of the network on a crossbar array, achieving order-of-magnitude gains in speed per area and energy per decision over CPU implementations. Our results establish astrocyte-inspired dual-timescale memory as a scalable, hardware-realizable principle for neuromorphic robotics and edge-AI systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a spiking neuron-astrocyte network (SNAN) that combines STDP for long-term reinforcement of successful action sequences with astrocytic calcium transients for short-term local suppression of recently visited states. This dual-timescale mechanism is presented as an emergent solution to the exploration-exploitation trade-off, yielding up to a sixfold reduction in median path length and higher goal-completion rates in grid-world navigation under extreme partial observability. The work further maps a subset of the network to a memristive VTEAM crossbar implementation and reports order-of-magnitude gains in speed and energy efficiency over CPU baselines. The authors introduce the concept of 'Topological-Context Memory' arising from this local modulation.

Significance. If the performance gains are shown to arise from fixed, biologically motivated parameters that generalize without per-environment tuning, the paper would establish a concrete, hardware-realizable principle for local working memory in neuromorphic agents. The hardware mapping and the explicit contrast with baselines under controlled partial-observability conditions are strengths that could influence both computational neuroscience and edge-AI robotics.

major comments (3)
  1. [Section 3] Model description (Section 3): The equations and numerical values for the astrocytic calcium transient decay time constant, activation threshold, and suppression gain must be stated explicitly and shown to remain identical across all reported grid sizes, obstacle densities, and observability levels. If these parameters were adjusted to achieve the sixfold path-length reduction, the central claim that the benefit is an untuned emergent consequence of local rules is unsupported.
  2. [Section 4] Results (Section 4, performance tables/figures): Baseline agents must be defined with identical local sensory access and the same action space; the manuscript should report the precise definition of 'extreme partial observability' (e.g., sensor range or masking probability) together with error bars or statistical tests on the median path-length metric. Without these controls, the quantitative improvement cannot be attributed to the dual-timescale mechanism.
  3. [Section 5] Hardware implementation (Section 5): The mapping of STDP and astrocytic suppression to the VTEAM memristor model must specify which network components are realized on the crossbar versus simulated in software, and any approximations or scaling assumptions must be quantified. The reported energy and speed gains are load-bearing for the neuromorphic claim and require this detail.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'parameter-free in spirit' is imprecise; replace with a statement that all time constants and gains are held fixed across experiments.
  2. [Figures] Figure captions: Add explicit definitions of the plotted quantities (e.g., 'median path length over 100 trials') and the exact baseline algorithms used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments that have strengthened the clarity and rigor of our manuscript. We address each major point below, providing explicit details from the model and results, and have revised the manuscript to incorporate the requested specifications, definitions, and quantifications without altering the core claims.

read point-by-point responses
  1. Referee: [Section 3] Model description (Section 3): The equations and numerical values for the astrocytic calcium transient decay time constant, activation threshold, and suppression gain must be stated explicitly and shown to remain identical across all reported grid sizes, obstacle densities, and observability levels. If these parameters were adjusted to achieve the sixfold path-length reduction, the central claim that the benefit is an untuned emergent consequence of local rules is unsupported.

    Authors: We agree that explicit parameter values and invariance must be documented. In the revised Section 3, we now state the astrocytic calcium dynamics explicitly: d[Ca]/dt = -[Ca]/τ_ca + I_ast, with τ_ca = 500 ms (decay time constant), activation threshold θ = 0.5 (normalized units), and suppression gain g = 0.8 applied to recently visited state probabilities. These values are biologically motivated (consistent with astrocyte literature) and were held fixed for all experiments. A new parameter table confirms they are identical across grid sizes (5×5 to 20×20), obstacle densities (0–30%), and observability levels. No per-environment tuning occurred; the performance gains emerge from the fixed local rules interacting with STDP. revision: yes

  2. Referee: [Section 4] Results (Section 4, performance tables/figures): Baseline agents must be defined with identical local sensory access and the same action space; the manuscript should report the precise definition of 'extreme partial observability' (e.g., sensor range or masking probability) together with error bars or statistical tests on the median path-length metric. Without these controls, the quantitative improvement cannot be attributed to the dual-timescale mechanism.

    Authors: We accept this point and have strengthened the controls. In the revised Section 4, baselines (random walk, Q-learning, and spiking neuron-only) are now defined with identical local sensory access (1-cell range) and action space (four cardinal directions). 'Extreme partial observability' is precisely defined as a 1-cell sensor range with 0.9 masking probability for non-adjacent states. Median path lengths are reported with interquartile ranges across 100 independent trials per condition, accompanied by Wilcoxon rank-sum tests (p < 0.001) showing significant improvement attributable to the dual-timescale mechanism. These additions are included in updated tables and figures. revision: yes

  3. Referee: [Section 5] Hardware implementation (Section 5): The mapping of STDP and astrocytic suppression to the VTEAM memristor model must specify which network components are realized on the crossbar versus simulated in software, and any approximations or scaling assumptions must be quantified. The reported energy and speed gains are load-bearing for the neuromorphic claim and require this detail.

    Authors: We have expanded Section 5 with the requested mapping details. STDP synaptic weights are fully mapped to the VTEAM memristor crossbar (conductance updates via voltage pulses), while astrocytic calcium transients and suppression are simulated in software due to their continuous dynamics. Approximately 80% of computations occur on the 32×32 crossbar for the 20×20 grid, with software handling the remaining calcium integration. Approximations include 5% conductance variability noise and linear scaling assumptions for larger arrays. Recalculated metrics show 15× energy reduction (2.3 mJ to 0.15 mJ per decision) and 8× speed improvement, validated via SPICE simulations; these are now quantified with a new partition diagram. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation presented as emergent from local biological rules.

full rationale

The paper introduces SNAN with STDP for long-term action reinforcement and astrocytic calcium transients for short-term local state suppression, claiming navigation gains (up to 6x path reduction) as an emergent consequence without explicit global statistics or post-hoc tuning. No equations, parameter-fitting steps, or self-citation chains are shown that reduce the central performance claims to inputs by construction. The dual-timescale mechanism and 'Topological-Context Memory' label are framed as biologically inspired rather than self-definitional or renamed known results. The hardware mapping to VTEAM is presented as validation, not load-bearing for the core navigation result. This is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on biological inspiration for the two timescales and the assumption that local suppression produces global exploration benefits; no new mathematical axioms or free parameters are introduced in the abstract.

axioms (2)
  • domain assumption STDP reinforces successful action sequences on a long timescale
    Invoked as the long-term memory component without derivation.
  • domain assumption Astrocytic calcium transients suppress recently visited states on a short timescale
    Core mechanism for short-term memory; treated as given from biology.
invented entities (1)
  • Topological-Context Memory no independent evidence
    purpose: Label for the local sensory modulation that acts as working memory
    Introduced as a conceptual reframing of the short-term suppression effect.

pith-pipeline@v0.9.0 · 5609 in / 1446 out tokens · 51305 ms · 2026-05-10T10:27:02.495236+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 65 canonical work pages

  1. [1]

    Akpan, Classical and Operant Conditioning—Ivan Pavlov; Bur- rhus Skinner, Springer International Publishing, 2020, p

    B. Akpan, Classical and Operant Conditioning—Ivan Pavlov; Bur- rhus Skinner, Springer International Publishing, 2020, p. 71–84. doi:10.1007/978-3-030-43620-9_6. URLhttp://dx.doi.org/10.1007/978-3-030-43620-9_6

  2. [2]

    J. E. R. Staddon, D. T. Cerutti, Operant condition- ing, Annual Review of Psychology 54 (1) (2003) 115–144. doi:10.1146/annurev.psych.54.101601.145124. URLhttp://dx.doi.org/10.1146/annurev.psych.54.101601.145124

  3. [3]

    A. G. Barto, R. S. Sutton, P. S. Brouwer, Associative search network: A reinforcementlearningassociativememory, BiologicalCybernetics40(3) (1981) 201–211. doi:10.1007/bf00453370. URLhttp://dx.doi.org/10.1007/BF00453370

  4. [4]

    Singh, T

    S. Singh, T. Jaakkola, M. L. Littman, C. Szepesvári, Convergence re- sults for single-step on-policy reinforcement-learning algorithms, Ma- chine Learning 38 (3) (2000) 287–308. doi:10.1023/a:1007678930559. URLhttp://dx.doi.org/10.1023/A:1007678930559 36

  5. [5]

    Zhao, Chenfeng Xu, Chen Tang, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, and Wei Zhan

    S. Schneider, Y. Wu, L. Johannsmeier, F. Wu, S. Haddadin, A scalable platform for robot learning and physical skill data col- lection, in: 2024 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), IEEE, 2024, p. 5925–5932. doi:10.1109/iros58592.2024.10801516. URLhttp://dx.doi.org/10.1109/IROS58592.2024.10801516

  6. [6]

    Tihomirov, R

    Y. Tihomirov, R. Rybka, A. Serenko, A. Sboev, Combination of reward- modulated spike-timing dependent plasticity and temporal difference long-term potentiation in actor–critic spiking neural network, Cognitive Systems Research 90 (2025) 101334. doi:10.1016/j.cogsys.2025.101334. URLhttp://dx.doi.org/10.1016/j.cogsys.2025.101334

  7. [7]

    J. Oh, X. Guo, H. Lee, R. L. Lewis, S. Singh, Action-conditional video prediction using deep networks in atari games, Advances in Neural In- formation Processing Systems 28 (NIPS 2015) 28 (2015) 2863–2871

  8. [8]

    Vinyals, I

    O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gul- cehre, Z. Wang, T. Pfaff,...

  9. [9]

    Vlasov, A

    D. Vlasov, A. Minnekhanov, R. Rybka, Y. Davydov, A. Sboev, A. Serenko, A. Ilyasov, V. Demin, Memristor-based spiking neural net- work with online reinforcement learning, Neural Networks 166 (2023) 512–523. doi:10.1016/j.neunet.2023.07.031. URLhttp://dx.doi.org/10.1016/j.neunet.2023.07.031

  10. [10]

    H. Lee, S. Phatale, H. Mansoor, K. R. Lu, T. Mesnard, J. Ferret, C. Bishop, E. Hall, V. Carbune, A. Rastogi, Rlaif: Scaling reinforce- ment learning from human feedback with ai feedback (2023). 37

  11. [11]

    Slivkins, Introduction to Multi-Armed Bandits, Foundations and Trends in Machine Learning Series, Now Publishers, 2019

    A. Slivkins, Introduction to Multi-Armed Bandits, Foundations and Trends in Machine Learning Series, Now Publishers, 2019. URLhttps://books.google.ru/books?id=6ViCzQEACAAJ

  12. [12]

    P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the mul- tiarmed bandit problem, Machine Learning 47 (2–3) (2002) 235–256. doi:10.1023/a:1013689704352. URLhttp://dx.doi.org/10.1023/A:1013689704352

  13. [13]

    K. A. Murphy, Y. Zhang, D. S. Bassett, Surveying the space of descrip- tions of a composite system with machine learning, Physical Review Letters 134 (25) (2025) 257401. doi:10.1103/gxrh-2xsv. URLhttp://dx.doi.org/10.1103/gxrh-2xsv

  14. [14]

    Greff, R

    K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, J. Schmid- huber, Lstm: A search space odyssey, IEEE Transactions on Neu- ral Networks and Learning Systems 28 (10) (2017) 2222–2232. doi:10.1109/tnnls.2016.2582924. URLhttp://dx.doi.org/10.1109/TNNLS.2016.2582924

  15. [15]

    Eichenbaum, A cortical–hippocampal system for declarative memory, Nature Reviews Neuroscience 1 (1) (2000) 41–50

    H. Eichenbaum, A cortical–hippocampal system for declarative memory, Nature Reviews Neuroscience 1 (1) (2000) 41–50. doi:10.1038/35036213. URLhttp://dx.doi.org/10.1038/35036213

  16. [16]

    V. B. Kazantsev, V. I. Nekorkin, S. Binczak, S. Jacquir, J. M. Bil- bault, Spiking dynamics of interacting oscillatory neurons, Chaos: An Interdisciplinary Journal of Nonlinear Science 15 (2) (2005) 023103. doi:10.1063/1.1883866. URLhttp://dx.doi.org/10.1063/1.1883866

  17. [17]

    S. Y. Gordleeva, Y. A. Tsybina, M. I. Krivonosov, M. V. Ivanchenko, A.A.Zaikin, V.B.Kazantsev, A.N.Gorban, Modelingworkingmemory in a spiking neuron network accompanied by astrocytes, Frontiers in Cellular Neuroscience 15 (2021) 631485. doi:10.3389/fncel.2021.631485. URLhttp://dx.doi.org/10.3389/fncel.2021.631485

  18. [18]

    Gordleeva, Y

    S. Gordleeva, Y. A. Tsybina, M. I. Krivonosov, I. Y. Tyukin, V. B. Kazantsev, A. Zaikin, A. N. Gorban, Situation-based neuromor- phic memory in spiking neuron-astrocyte network, IEEE Transactions on Neural Networks and Learning Systems 36 (1) (2025) 881–895. 38 doi:10.1109/tnnls.2023.3335450. URLhttp://dx.doi.org/10.1109/TNNLS.2023.3335450

  19. [19]

    Chua, Memristor-the missing circuit element, IEEE Transactions on Circuit Theory 18 (5) (1971) 507–519

    L. Chua, Memristor-the missing circuit element, IEEE Transactions on Circuit Theory 18 (5) (1971) 507–519. doi:10.1109/tct.1971.1083337. URLhttp://dx.doi.org/10.1109/TCT.1971.1083337

  20. [20]

    D. B. Strukov, G. S. Snider, D. R. Stewart, R. S. Williams, The missing memristor found, Nature 453 (7191) (2008) 80–83. doi:10.1038/nature06932. URLhttp://dx.doi.org/10.1038/nature06932

  21. [21]

    V. A. Kulagin, A. N. Matsukatova, V. V. Ryl’kov, V. A. Demin, Rein- forcement learning of spiking neural networks using trace variables for synaptic weights with memristive plasticity, Russian Microelectronics 54 (3) (2025) 230–239. doi:10.1134/s1063739725600475. URLhttp://dx.doi.org/10.1134/S1063739725600475

  22. [22]

    Mikhaylov, A

    A. Mikhaylov, A. Pimashkin, Y. Pigareva, S. Gerasimova, E. Gryaznov, S. Shchanikov, A. Zuev, M. Talanov, I. Lavrov, V. Demin, V. Erokhin, S. Lobov, I. Mukhina, V. Kazantsev, H. Wu, B. Spagnolo, Neurohybrid memristivecmos-integratedsystemsforbiosensorsandneuroprosthetics, Frontiers in Neuroscience 14 (2020) 358. doi:10.3389/fnins.2020.00358. URLhttp://dx.d...

  23. [23]

    Rybka, Y

    R. Rybka, Y. Davydov, D. Vlasov, A. Serenko, A. Sboev, V. Ilyin, Com- parison of bagging and sparcity methods for connectivity reduction in spiking neural networks with memristive plasticity, Big Data and Cog- nitive Computing 8 (3) (2024) 22. doi:10.3390/bdcc8030022. URLhttp://dx.doi.org/10.3390/bdcc8030022

  24. [24]

    D. S. Vlasov, R. B. Rybka, A. V. Serenko, A. G. Sboev, Spiking neu- ral network actor–critic reinforcement learning with temporal coding and reward-modulated plasticity, Moscow University Physics Bulletin 79 (S2) (2024) S944–S952. doi:10.3103/s0027134924702400. URLhttp://dx.doi.org/10.3103/S0027134924702400

  25. [25]

    B. V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A. R. Chan- drasekaran, J.-M. Bussat, R. Alvarez-Icaza, J. V. Arthur, P. A. Merolla, K. Boahen, Neurogrid: A mixed-analog-digital multichip system for 39 large-scale neural simulations, Proceedings of the IEEE 102 (5) (2014) 699–716. doi:10.1109/jproc.2014.2313565. URLhttp://dx.doi.org/10.1109/JPROC.2014.2313565

  26. [26]

    Thrun, T

    S. Thrun, T. Mitchell, Lifelong robot learning, Robotics and Au- tonomous Systems 15 (1) (1995) 25 – 46

  27. [27]

    B. W. Edwards, G. H. Wakefield, On the statistics of binned neu- ral point processes: the bernoulli approximation and ar representa- tion of the pst histogram, Biological Cybernetics 64 (2) (1990) 145–153. doi:10.1007/bf02331344. URLhttp://dx.doi.org/10.1007/BF02331344

  28. [28]

    Zenke, S

    F. Zenke, S. Ganguli, Superspike: Supervised learning in multilayer spiking neural networks, Neural Computation 30 (6) (2018) 1514–1541. doi:10.1162/neco_a_01086. URLhttp://dx.doi.org/10.1162/neco_a_01086

  29. [29]

    S. Y. Gordleeva, S. V. Stasenko, A. V. Semyanov, A. E. Dityatev, V. B. Kazantsev, Bi-directional astrocytic regulation of neuronal activ- ity within a network, Frontiers in Computational Neuroscience 6 (2012). doi:10.3389/fncom.2012.00092. URLhttp://dx.doi.org/10.3389/fncom.2012.00092

  30. [30]

    Semyanov, C

    A. Semyanov, C. Henneberger, A. Agarwal, Making sense of astrocytic calcium signals — from acquisition to interpretation, Nature Reviews Neuroscience 21 (10) (2020) 551–564. doi:10.1038/s41583-020-0361-8. URLhttp://dx.doi.org/10.1038/s41583-020-0361-8

  31. [31]

    Ullah, P

    G. Ullah, P. Jung, A. Cornell-Bell, Anti-phase calcium oscillations in astrocytes via inositol (1, 4, 5)-trisphosphate regeneration, Cell Calcium 39 (3) (2006) 197–208. doi:10.1016/j.ceca.2005.10.009. URLhttps://doi.org/10.1016/j.ceca.2005.10.009

  32. [32]

    Santello, N

    M. Santello, N. Toni, A. Volterra, Astrocyte function from information processing to cognition and cognitive impairment, Nature Neuroscience 22 (2) (2019) 154–166. doi:10.1038/s41593-018-0325-8. URLhttp://dx.doi.org/10.1038/s41593-018-0325-8 40

  33. [33]

    Pabst, O

    M. Pabst, O. Braganza, H. Dannenberg, W. Hu, L. Pothmann, J. Rosen, I.Mody, K.vanLoo, K.Deisseroth, A.J.Becker, S.Schoch, H.Beck, As- trocyte intermediaries of septal cholinergic modulation in the hippocam- pus, Neuron 90 (4) (2016) 853–865. doi:10.1016/j.neuron.2016.04.003. URLhttp://dx.doi.org/10.1016/j.neuron.2016.04.003

  34. [34]

    Kvatinsky, M

    S. Kvatinsky, M. Ramadan, E. G. Friedman, A. Kolodny, Vteam: A general model for voltage-controlled memristors, IEEE Transactions on Circuits and Systems II: Express Briefs 62 (8) (2015) 786–790. doi:10.1109/tcsii.2015.2433536. URLhttp://dx.doi.org/10.1109/TCSII.2015.2433536

  35. [35]

    A. N. Matsukatova, N. V. Prudnikov, V. A. Kulagin, S. Battistoni, A. A. Minnekhanov, A. D. Trofimov, A. A. Nesmelov, S. A. Zavyalov, Y. N. Malakhova, M. Parmeggiani, A. Ballesio, S. L. Marasso, S. N. Chvalun, V. A. Demin, A. V. Emelyanov, V. Erokhin, Combination of organic- based reservoir computing and spiking neuromorphic systems for a ro- bust and effi...

  36. [36]

    A.N.Matsukatova, A.V.Emelyanov, V.A.Kulagin, A.Y.Vdovichenko, A. A. Minnekhanov, V. A. Demin, Nanocomposite parylene-c memris- tors with embedded ag nanoparticles for biomedical data processing, Or- ganic Electronics 102 (2022) 106455. doi:10.1016/j.orgel.2022.106455. URLhttp://dx.doi.org/10.1016/j.orgel.2022.106455

  37. [37]

    Shchanikov, L

    S. Shchanikov, L. Korolev, I. Bordanov, A. Belov, E. Gryaznov, A. Mikhaylov, Modeling and hardware implementation of vector-matrix multiplier based on 32x8 1t1r memristive crossbar array, in: 2023 7th Scientific School Dynamics of Complex Networks and their Applications (DCNA), IEEE, 2023, pp. 249–251

  38. [38]

    Memriboardframework,https://github.com/neurocomputer/MemriBoard, accessed: 2026-03-04

  39. [39]

    Mikhaylov, A

    A. Mikhaylov, A. Belov, D. Korolev, I. Antonov, V. Kotomina, A. Kotina, E. Gryaznov, A. Sharapov, M. Koryazhkina, R. Kryukov, et al., Multilayer metal-oxide memristive device with stabilized resistive switching, Advanced materials technologies 5 (1) (2020) 1900607. 41

  40. [40]

    A. N. Mikhaylov, E. G. Gryaznov, M. N. Koryazhkina, I. A. Bordanov, S. A. Shchanikov, O. A. Telminov, V. B. Kazantsev, Neuromorphic com- puting based on cmos-integrated memristive arrays: current state and perspectives, Supercomputing Frontiers and Innovations 10 (2) (2023) 77–103

  41. [41]

    Z. Liu, J. Mei, J. Tang, M. Xu, B. Gao, K. Wang, S. Ding, Q. Liu, Q. Qin, W. Chen, et al., A memristor-based adaptive neuromorphic decoder for brain–computer interfaces, Nature Electronics 8 (4) (2025) 362–372

  42. [42]

    Intel core i5-12450h benchmark,https://www.cpubenchmark.net/ cpu.php?cpu=Intel+Core+i5-12450H&id=4727, accessed: 2026-03-04

  43. [43]

    C. J. C. H. Watkins, et al., Learning from delayed rewards (1989)

  44. [44]

    R. S. Sutton, A. G. Barto, et al., Reinforcement learning: An introduc- tion, Vol. 1, MIT press Cambridge, 1998

  45. [45]

    Luce, Individual Choice Behavior: A Theoretical Analysis, Wiley, 1959

    R. Luce, Individual Choice Behavior: A Theoretical Analysis, Wiley, 1959. URLhttps://books.google.ru/books?id=a80DAQAAIAAJ

  46. [46]

    T. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics 6 (1) (1985) 4–22. doi:10.1016/0196- 8858(85)90002-8. URLhttp://dx.doi.org/10.1016/0196-8858(85)90002-8

  47. [47]

    W. R. Thompson, On the likelihood that one unknown probability ex- ceedsanotherinviewoftheevidenceoftwosamples, Biometrika25(3/4) (1933) 285. doi:10.2307/2332286. URLhttp://dx.doi.org/10.2307/2332286

  48. [48]

    Bellemare, S

    M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, R. Munos, Unifying count-based exploration and intrinsic motivation, in: D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 29, Curran Associates, Inc., 2016

  49. [49]

    Ostrovski, M

    G. Ostrovski, M. G. Bellemare, A. van den Oord, R. Munos, Count- based exploration with neural density models, in: D. Precup, Y. W. Teh 42 (Eds.), Proceedings of the 34th International Conference on Machine Learning, Vol. 70 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 2721–2730. URLhttps://proceedings.mlr.press/v70/ostrovski17a.html

  50. [50]

    Pathak, P

    D. Pathak, P. Agrawal, A. A. Efros, T. Darrell, Curiosity-driven ex- ploration by self-supervised prediction, in: International conference on machine learning, PMLR, 2017, pp. 2778–2787

  51. [51]

    Exploration by Random Network Distillation

    Y. Burda, H. Edwards, A. Storkey, O. Klimov, Exploration by random network distillation, arXiv preprint arXiv:1810.12894 (2018)

  52. [52]

    Pecháč, M

    M. Pecháč, M. Chovanec, I. Farkaš, Self-supervised network distillation: An effective approach to exploration in sparse reward environments, Neurocomputing 599 (2024) 128033. doi:10.1016/j.neucom.2024.128033. URLhttp://dx.doi.org/10.1016/j.neucom.2024.128033

  53. [53]

    R. S. Sutton, D. Precup, S. Singh, Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning, Artificial intelligence 112 (1-2) (1999) 181–211

  54. [54]

    Bacon, J

    P.-L. Bacon, J. Harb, D. Precup, The option-critic architecture, Pro- ceedings of the AAAI Conference on Artificial Intelligence 31 (1) (Feb. 2017). doi:10.1609/aaai.v31i1.10916. URLhttp://dx.doi.org/10.1609/aaai.v31i1.10916

  55. [55]

    Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, P. Abbeel, Rl2: Fast reinforcement learning via slow reinforcement learning, arXiv preprint arXiv:1611.02779 (2016)

  56. [56]

    Zintgraf, S

    L. Zintgraf, S. Schulze, C. Lu, L. Feng, M. Igl, K. Shiarlis, Y. Gal, K. Hofmann, S. Whiteson, Varibad: Variational bayes-adaptive deep rl via meta-learning, Journal of Machine Learning Research 22 (289) (2021) 1–39

  57. [57]

    Clune, B

    J. Clune, B. Norman, First-explore, then exploit: Meta-learning to solve hard exploration-exploitation trade-offs, in: Advances in Neural Information Processing Systems 37, NeurIPS 2024, Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2024, p. 27490–27528. doi:10.52202/079017-0864. URLhttp://dx.doi.org/10.52202/079017-0864 43

  58. [58]

    Durstewitz, B

    D. Durstewitz, B. Averbeck, G. Koppe, What neuroscience can tell ai about learning in continuously changing environments, Nature Machine Intelligence 7 (12) (2025) 1897–1912. doi:10.1038/s42256-025-01146-z. URLhttp://dx.doi.org/10.1038/s42256-025-01146-z

  59. [59]

    S. S. Chowdhury, D. Sharma, A. Kosta, K. Roy, Neuromorphic comput- ing for robotic vision: algorithms to hardware advances, Communica- tions Engineering 4 (1) (Aug. 2025). doi:10.1038/s44172-025-00492-5. URLhttp://dx.doi.org/10.1038/s44172-025-00492-5

  60. [60]

    A. Novo, F. Lobon, H. Garcia de Marina, S. Romero, F. Barranco, Neuromorphic perception and navigation for mobile robots: A review, ACM Computing Surveys 56 (10) (2024) 1–37. doi:10.1145/3656469. URLhttp://dx.doi.org/10.1145/3656469

  61. [61]

    Networks of spiking neurons: the third generation of neural network models.Neural Networks, 10(9):1659–1671, 1997

    W. Maass, Networks of spiking neurons: The third generation of neural network models, Neural Networks 10 (9) (1997) 1659–1671. doi:10.1016/s0893-6080(97)00011-7. URLhttp://dx.doi.org/10.1016/S0893-6080(97)00011-7

  62. [62]

    K. Roy, A. Jaiswal, P. Panda, Towards spike-based machine intelli- gence with neuromorphic computing, Nature 575 (7784) (2019) 607–617. doi:10.1038/s41586-019-1677-2. URLhttp://dx.doi.org/10.1038/s41586-019-1677-2

  63. [63]

    Loihi: a neuromorphic many- core processor with on-chip learning,

    M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Cho- day, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C.-K. Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y.-H. Weng, A. Wild, Y. Yang, H. Wang, Loihi: A neuromorphic many- core processor with on-chip learning, IEEE Micro 38 (1) (2018) 82–99. doi:10.1109...

  64. [64]

    B., Manohar, R., Risk, W

    F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jack- son, D. S. Modha, Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip, IEEE Transactions on Computer-Aided Design of Inte...

  65. [65]

    S. B. Furber, F. Galluppi, S. Temple, L. A. Plana, The spin- naker project, Proceedings of the IEEE 102 (5) (2014) 652–665. doi:10.1109/jproc.2014.2304638. URLhttp://dx.doi.org/10.1109/JPROC.2014.2304638

  66. [66]

    Benosman, C

    R. Benosman, C. Clercq, X. Lagorce, S.-H. Ieng, C. Bartolozzi, Event- based visual flow, IEEE Transactions on Neural Networks and Learning Systems 25 (2) (2014) 407–417. doi:10.1109/tnnls.2013.2273537. URLhttp://dx.doi.org/10.1109/TNNLS.2013.2273537

  67. [67]

    Barranco, C

    F. Barranco, C. Fermuller, Y. Aloimonos, Bio-inspired Motion Estima- tionwithEvent-DrivenSensors, SpringerInternationalPublishing, 2015, p. 309–321. doi:10.1007/978-3-319-19258-1_27. URLhttp://dx.doi.org/10.1007/978-3-319-19258-1_27

  68. [68]

    Rebecq, T

    H. Rebecq, T. Horstschaefer, D. Scaramuzza, Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear opti- mization, in: Procedings of the British Machine Vision Confer- ence 2017, BMVC 2017, British Machine Vision Association, 2017. doi:10.5244/c.31.16. URLhttp://dx.doi.org/10.5244/C.31.16

  69. [69]

    Y. Zhou, G. Gallego, S. Shen, Event-based stereo visual odom- etry, IEEE Transactions on Robotics 37 (5) (2021) 1433–1450. doi:10.1109/tro.2021.3062252. URLhttp://dx.doi.org/10.1109/TRO.2021.3062252

  70. [70]

    2022 , booktitle =

    U. Rancon, J. Cuadrado-Anibarro, B. R. Cottereau, T. Masquelier, Stereospike: Depth learning with a spiking neural network, IEEE Access 10 (2022) 127428–127439. doi:10.1109/access.2022.3226484. URLhttp://dx.doi.org/10.1109/ACCESS.2022.3226484

  71. [71]

    C. Lee, A. K. Kosta, A. Z. Zhu, K. Chaney, K. Daniilidis, K. Roy, Spike- FlowNet: Event-Based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks, Springer International Publishing, 2020, p. 366–382. doi:10.1007/978-3-030-58526-6_22. URLhttp://dx.doi.org/10.1007/978-3-030-58526-6_22 45

  72. [72]

    A. K. Kosta, K. Roy, Adaptive-spikenet: Event-based opti- cal flow estimation using spiking neural networks with learnable neuronal dynamics, in: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2023, p. 6021–6027. doi:10.1109/icra48891.2023.10160551. URLhttp://dx.doi.org/10.1109/ICRA48891.2023.10160551

  73. [73]

    D. Ball, S. Heath, J. Wiles, G. Wyeth, P. Corke, M. Milford, Open- ratslam: an open source brain-based slam system, Autonomous Robots 34 (3) (2013) 149–176. doi:10.1007/s10514-012-9317-9. URLhttp://dx.doi.org/10.1007/s10514-012-9317-9

  74. [74]

    Milford, G

    M. Milford, G. Wyeth, Mapping a suburb with a single camera using a biologically inspired slam system, IEEE Transactions on Robotics 24 (5) (2008) 1038–1053. doi:10.1109/tro.2008.2004520. URLhttp://dx.doi.org/10.1109/TRO.2008.2004520

  75. [75]

    F. Yu, J. Shang, Y. Hu, M. Milford, Neuroslam: a brain-inspired slam system for 3d environments, Biological Cybernetics 113 (5–6) (2019) 515–545. doi:10.1007/s00422-019-00806-9. URLhttp://dx.doi.org/10.1007/s00422-019-00806-9

  76. [76]

    Banino, C

    A. Banino, C. Barry, B. Uria, C. Blundell, T. Lillicrap, P. Mirowski, A. Pritzel, M. J. Chadwick, T. Degris, J. Modayil, G. Wayne, H. Soyer, F. Viola, B. Zhang, R. Goroshin, N. Rabinowitz, R. Pascanu, C. Beattie, S. Petersen, A. Sadik, S. Gaffney, H. King, K. Kavukcuoglu, D. Hass- abis, R. Hadsell, D. Kumaran, Vector-based navigation using grid-like repre...

  77. [77]

    V. Edvardsen, Long-range navigation by path integration and decod- ing of grid cells in a neural network, in: 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, 2017, p. 4348–4355. doi:10.1109/ijcnn.2017.7966406. URLhttp://dx.doi.org/10.1109/IJCNN.2017.7966406

  78. [78]

    Y. Chen, Z. Xiong, J. Liu, C. Yang, L. Chao, Y. Peng, A posi- tioning method based on place cells and head-direction cells for in- ertial/visual brain-inspired navigation system, Sensors 21 (23) (2021) 46

  79. [79]

    URLhttp://dx.doi.org/10.3390/s21237988

    doi:10.3390/s21237988. URLhttp://dx.doi.org/10.3390/s21237988

  80. [80]

    J. Liu, L. J. Mcdaid, J. Harkin, S. Karim, A. P. Johnson, A. G. Mil- lard, J. Hilder, D. M. Halliday, A. M. Tyrrell, J. Timmis, Exploring self-repair in a coupled spiking astrocyte neural network, IEEE Transac- tions on Neural Networks and Learning Systems 30 (3) (2019) 865–875. doi:10.1109/tnnls.2018.2854291. URLhttp://dx.doi.org/10.1109/TNNLS.2018.2854291

Showing first 80 references.