Recognition: unknown
Dual-Timescale Memory in a Spiking Neuron-Astrocyte Network for Efficient Navigation
Pith reviewed 2026-05-10 10:27 UTC · model grok-4.3
The pith
A neuron-astrocyte spiking network uses dual memory timescales to reduce navigation paths by up to sixfold in hard-to-observe environments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By combining spike-timing-dependent plasticity for long-term memory of successful paths with astrocytic calcium transients that provide short-term suppression of explored locations, the network creates an effective local memory that biases the agent toward unexplored regions, leading to up to six times shorter median paths and higher success rates in partially observable navigation tasks.
What carries the argument
Dual-timescale memory where STDP reinforces actions over long periods and astrocytic dynamics suppress local states on short periods, acting as topological-context memory.
If this is right
- Navigation agents can achieve efficient exploration and goal finding using only local sensory data and biological-inspired dynamics.
- The exploration-exploitation dilemma is resolved emergently rather than through explicit algorithms or global maps.
- Hardware realizations using memristive devices for STDP can deliver significant improvements in speed and energy efficiency for real-time decisions.
- This local modulation represents a new form of working memory applicable to artificial systems.
Where Pith is reading between the lines
- If the model generalizes well, similar dual-timescale mechanisms could improve performance in continuous or 3D navigation environments.
- Integration with other biological features like predictive coding might further reduce reliance on external tuning.
- Scalability to larger networks could enable applications in swarm robotics or autonomous vehicles with low computational overhead.
Load-bearing premise
Astrocytic calcium transients can be modeled to suppress recently visited states reliably on short timescales across varied environments without parameter tuning or access to global information.
What would settle it
Running the SNAN agent in additional grid-world environments with novel layouts or increased size and observing whether the sixfold path reduction and improved completion rates hold without retraining.
Figures
read the original abstract
Biological agents navigate complex environments by combining long-term memory of successful actions with short-term suppression of recently visited locations-a capability that remains difficult to replicate in artificial systems, especially under partial observability. Inspired by the complementary timescales of neural and astrocytic dynamics, we introduce a spiking neuron-astrocyte network (SNAN) where spike-timing-dependent plasticity (STDP) reinforces successful action sequences on a distant time scale, while astrocytic calcium transients suppress recently visited states on a short-term time scale, effectively blocking locations already explored. This dual-timescale memory mechanism biases the agent toward unexplored regions, accelerating goal finding without requiring explicit global statistics. We show that in grid-world navigation tasks with extreme partial observability, SNAN reduces median path length by up to sixfold and drastically improves goal completion rates compared to baseline agents. The astrocytic modulation inherently mitigates the exploration-exploitation trade-off as an emergent consequence of local state suppression. This kind of local sensory data modulation can be considered as a new type of working memory referred to as a "Topological-Context Memory". To validate hardware feasibility using neuromorphic approaches, we map STDP to a memristive VTEAM model and implement a subset of the network on a crossbar array, achieving order-of-magnitude gains in speed per area and energy per decision over CPU implementations. Our results establish astrocyte-inspired dual-timescale memory as a scalable, hardware-realizable principle for neuromorphic robotics and edge-AI systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a spiking neuron-astrocyte network (SNAN) that combines STDP for long-term reinforcement of successful action sequences with astrocytic calcium transients for short-term local suppression of recently visited states. This dual-timescale mechanism is presented as an emergent solution to the exploration-exploitation trade-off, yielding up to a sixfold reduction in median path length and higher goal-completion rates in grid-world navigation under extreme partial observability. The work further maps a subset of the network to a memristive VTEAM crossbar implementation and reports order-of-magnitude gains in speed and energy efficiency over CPU baselines. The authors introduce the concept of 'Topological-Context Memory' arising from this local modulation.
Significance. If the performance gains are shown to arise from fixed, biologically motivated parameters that generalize without per-environment tuning, the paper would establish a concrete, hardware-realizable principle for local working memory in neuromorphic agents. The hardware mapping and the explicit contrast with baselines under controlled partial-observability conditions are strengths that could influence both computational neuroscience and edge-AI robotics.
major comments (3)
- [Section 3] Model description (Section 3): The equations and numerical values for the astrocytic calcium transient decay time constant, activation threshold, and suppression gain must be stated explicitly and shown to remain identical across all reported grid sizes, obstacle densities, and observability levels. If these parameters were adjusted to achieve the sixfold path-length reduction, the central claim that the benefit is an untuned emergent consequence of local rules is unsupported.
- [Section 4] Results (Section 4, performance tables/figures): Baseline agents must be defined with identical local sensory access and the same action space; the manuscript should report the precise definition of 'extreme partial observability' (e.g., sensor range or masking probability) together with error bars or statistical tests on the median path-length metric. Without these controls, the quantitative improvement cannot be attributed to the dual-timescale mechanism.
- [Section 5] Hardware implementation (Section 5): The mapping of STDP and astrocytic suppression to the VTEAM memristor model must specify which network components are realized on the crossbar versus simulated in software, and any approximations or scaling assumptions must be quantified. The reported energy and speed gains are load-bearing for the neuromorphic claim and require this detail.
minor comments (2)
- [Abstract] Abstract: The phrase 'parameter-free in spirit' is imprecise; replace with a statement that all time constants and gains are held fixed across experiments.
- [Figures] Figure captions: Add explicit definitions of the plotted quantities (e.g., 'median path length over 100 trials') and the exact baseline algorithms used.
Simulated Author's Rebuttal
We thank the referee for the constructive comments that have strengthened the clarity and rigor of our manuscript. We address each major point below, providing explicit details from the model and results, and have revised the manuscript to incorporate the requested specifications, definitions, and quantifications without altering the core claims.
read point-by-point responses
-
Referee: [Section 3] Model description (Section 3): The equations and numerical values for the astrocytic calcium transient decay time constant, activation threshold, and suppression gain must be stated explicitly and shown to remain identical across all reported grid sizes, obstacle densities, and observability levels. If these parameters were adjusted to achieve the sixfold path-length reduction, the central claim that the benefit is an untuned emergent consequence of local rules is unsupported.
Authors: We agree that explicit parameter values and invariance must be documented. In the revised Section 3, we now state the astrocytic calcium dynamics explicitly: d[Ca]/dt = -[Ca]/τ_ca + I_ast, with τ_ca = 500 ms (decay time constant), activation threshold θ = 0.5 (normalized units), and suppression gain g = 0.8 applied to recently visited state probabilities. These values are biologically motivated (consistent with astrocyte literature) and were held fixed for all experiments. A new parameter table confirms they are identical across grid sizes (5×5 to 20×20), obstacle densities (0–30%), and observability levels. No per-environment tuning occurred; the performance gains emerge from the fixed local rules interacting with STDP. revision: yes
-
Referee: [Section 4] Results (Section 4, performance tables/figures): Baseline agents must be defined with identical local sensory access and the same action space; the manuscript should report the precise definition of 'extreme partial observability' (e.g., sensor range or masking probability) together with error bars or statistical tests on the median path-length metric. Without these controls, the quantitative improvement cannot be attributed to the dual-timescale mechanism.
Authors: We accept this point and have strengthened the controls. In the revised Section 4, baselines (random walk, Q-learning, and spiking neuron-only) are now defined with identical local sensory access (1-cell range) and action space (four cardinal directions). 'Extreme partial observability' is precisely defined as a 1-cell sensor range with 0.9 masking probability for non-adjacent states. Median path lengths are reported with interquartile ranges across 100 independent trials per condition, accompanied by Wilcoxon rank-sum tests (p < 0.001) showing significant improvement attributable to the dual-timescale mechanism. These additions are included in updated tables and figures. revision: yes
-
Referee: [Section 5] Hardware implementation (Section 5): The mapping of STDP and astrocytic suppression to the VTEAM memristor model must specify which network components are realized on the crossbar versus simulated in software, and any approximations or scaling assumptions must be quantified. The reported energy and speed gains are load-bearing for the neuromorphic claim and require this detail.
Authors: We have expanded Section 5 with the requested mapping details. STDP synaptic weights are fully mapped to the VTEAM memristor crossbar (conductance updates via voltage pulses), while astrocytic calcium transients and suppression are simulated in software due to their continuous dynamics. Approximately 80% of computations occur on the 32×32 crossbar for the 20×20 grid, with software handling the remaining calcium integration. Approximations include 5% conductance variability noise and linear scaling assumptions for larger arrays. Recalculated metrics show 15× energy reduction (2.3 mJ to 0.15 mJ per decision) and 8× speed improvement, validated via SPICE simulations; these are now quantified with a new partition diagram. revision: yes
Circularity Check
No significant circularity; derivation presented as emergent from local biological rules.
full rationale
The paper introduces SNAN with STDP for long-term action reinforcement and astrocytic calcium transients for short-term local state suppression, claiming navigation gains (up to 6x path reduction) as an emergent consequence without explicit global statistics or post-hoc tuning. No equations, parameter-fitting steps, or self-citation chains are shown that reduce the central performance claims to inputs by construction. The dual-timescale mechanism and 'Topological-Context Memory' label are framed as biologically inspired rather than self-definitional or renamed known results. The hardware mapping to VTEAM is presented as validation, not load-bearing for the core navigation result. This is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption STDP reinforces successful action sequences on a long timescale
- domain assumption Astrocytic calcium transients suppress recently visited states on a short timescale
invented entities (1)
-
Topological-Context Memory
no independent evidence
Reference graph
Works this paper leans on
-
[1]
B. Akpan, Classical and Operant Conditioning—Ivan Pavlov; Bur- rhus Skinner, Springer International Publishing, 2020, p. 71–84. doi:10.1007/978-3-030-43620-9_6. URLhttp://dx.doi.org/10.1007/978-3-030-43620-9_6
-
[2]
J. E. R. Staddon, D. T. Cerutti, Operant condition- ing, Annual Review of Psychology 54 (1) (2003) 115–144. doi:10.1146/annurev.psych.54.101601.145124. URLhttp://dx.doi.org/10.1146/annurev.psych.54.101601.145124
-
[3]
A. G. Barto, R. S. Sutton, P. S. Brouwer, Associative search network: A reinforcementlearningassociativememory, BiologicalCybernetics40(3) (1981) 201–211. doi:10.1007/bf00453370. URLhttp://dx.doi.org/10.1007/BF00453370
-
[4]
S. Singh, T. Jaakkola, M. L. Littman, C. Szepesvári, Convergence re- sults for single-step on-policy reinforcement-learning algorithms, Ma- chine Learning 38 (3) (2000) 287–308. doi:10.1023/a:1007678930559. URLhttp://dx.doi.org/10.1023/A:1007678930559 36
-
[5]
Zhao, Chenfeng Xu, Chen Tang, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, and Wei Zhan
S. Schneider, Y. Wu, L. Johannsmeier, F. Wu, S. Haddadin, A scalable platform for robot learning and physical skill data col- lection, in: 2024 IEEE/RSJ International Conference on Intel- ligent Robots and Systems (IROS), IEEE, 2024, p. 5925–5932. doi:10.1109/iros58592.2024.10801516. URLhttp://dx.doi.org/10.1109/IROS58592.2024.10801516
-
[6]
Y. Tihomirov, R. Rybka, A. Serenko, A. Sboev, Combination of reward- modulated spike-timing dependent plasticity and temporal difference long-term potentiation in actor–critic spiking neural network, Cognitive Systems Research 90 (2025) 101334. doi:10.1016/j.cogsys.2025.101334. URLhttp://dx.doi.org/10.1016/j.cogsys.2025.101334
-
[7]
J. Oh, X. Guo, H. Lee, R. L. Lewis, S. Singh, Action-conditional video prediction using deep networks in atari games, Advances in Neural In- formation Processing Systems 28 (NIPS 2015) 28 (2015) 2863–2871
2015
-
[8]
O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gul- cehre, Z. Wang, T. Pfaff,...
-
[9]
D. Vlasov, A. Minnekhanov, R. Rybka, Y. Davydov, A. Sboev, A. Serenko, A. Ilyasov, V. Demin, Memristor-based spiking neural net- work with online reinforcement learning, Neural Networks 166 (2023) 512–523. doi:10.1016/j.neunet.2023.07.031. URLhttp://dx.doi.org/10.1016/j.neunet.2023.07.031
-
[10]
H. Lee, S. Phatale, H. Mansoor, K. R. Lu, T. Mesnard, J. Ferret, C. Bishop, E. Hall, V. Carbune, A. Rastogi, Rlaif: Scaling reinforce- ment learning from human feedback with ai feedback (2023). 37
2023
-
[11]
Slivkins, Introduction to Multi-Armed Bandits, Foundations and Trends in Machine Learning Series, Now Publishers, 2019
A. Slivkins, Introduction to Multi-Armed Bandits, Foundations and Trends in Machine Learning Series, Now Publishers, 2019. URLhttps://books.google.ru/books?id=6ViCzQEACAAJ
2019
-
[12]
P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the mul- tiarmed bandit problem, Machine Learning 47 (2–3) (2002) 235–256. doi:10.1023/a:1013689704352. URLhttp://dx.doi.org/10.1023/A:1013689704352
-
[13]
K. A. Murphy, Y. Zhang, D. S. Bassett, Surveying the space of descrip- tions of a composite system with machine learning, Physical Review Letters 134 (25) (2025) 257401. doi:10.1103/gxrh-2xsv. URLhttp://dx.doi.org/10.1103/gxrh-2xsv
-
[14]
K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, J. Schmid- huber, Lstm: A search space odyssey, IEEE Transactions on Neu- ral Networks and Learning Systems 28 (10) (2017) 2222–2232. doi:10.1109/tnnls.2016.2582924. URLhttp://dx.doi.org/10.1109/TNNLS.2016.2582924
-
[15]
H. Eichenbaum, A cortical–hippocampal system for declarative memory, Nature Reviews Neuroscience 1 (1) (2000) 41–50. doi:10.1038/35036213. URLhttp://dx.doi.org/10.1038/35036213
-
[16]
V. B. Kazantsev, V. I. Nekorkin, S. Binczak, S. Jacquir, J. M. Bil- bault, Spiking dynamics of interacting oscillatory neurons, Chaos: An Interdisciplinary Journal of Nonlinear Science 15 (2) (2005) 023103. doi:10.1063/1.1883866. URLhttp://dx.doi.org/10.1063/1.1883866
-
[17]
S. Y. Gordleeva, Y. A. Tsybina, M. I. Krivonosov, M. V. Ivanchenko, A.A.Zaikin, V.B.Kazantsev, A.N.Gorban, Modelingworkingmemory in a spiking neuron network accompanied by astrocytes, Frontiers in Cellular Neuroscience 15 (2021) 631485. doi:10.3389/fncel.2021.631485. URLhttp://dx.doi.org/10.3389/fncel.2021.631485
-
[18]
S. Gordleeva, Y. A. Tsybina, M. I. Krivonosov, I. Y. Tyukin, V. B. Kazantsev, A. Zaikin, A. N. Gorban, Situation-based neuromor- phic memory in spiking neuron-astrocyte network, IEEE Transactions on Neural Networks and Learning Systems 36 (1) (2025) 881–895. 38 doi:10.1109/tnnls.2023.3335450. URLhttp://dx.doi.org/10.1109/TNNLS.2023.3335450
-
[19]
L. Chua, Memristor-the missing circuit element, IEEE Transactions on Circuit Theory 18 (5) (1971) 507–519. doi:10.1109/tct.1971.1083337. URLhttp://dx.doi.org/10.1109/TCT.1971.1083337
-
[20]
D. B. Strukov, G. S. Snider, D. R. Stewart, R. S. Williams, The missing memristor found, Nature 453 (7191) (2008) 80–83. doi:10.1038/nature06932. URLhttp://dx.doi.org/10.1038/nature06932
-
[21]
V. A. Kulagin, A. N. Matsukatova, V. V. Ryl’kov, V. A. Demin, Rein- forcement learning of spiking neural networks using trace variables for synaptic weights with memristive plasticity, Russian Microelectronics 54 (3) (2025) 230–239. doi:10.1134/s1063739725600475. URLhttp://dx.doi.org/10.1134/S1063739725600475
-
[22]
A. Mikhaylov, A. Pimashkin, Y. Pigareva, S. Gerasimova, E. Gryaznov, S. Shchanikov, A. Zuev, M. Talanov, I. Lavrov, V. Demin, V. Erokhin, S. Lobov, I. Mukhina, V. Kazantsev, H. Wu, B. Spagnolo, Neurohybrid memristivecmos-integratedsystemsforbiosensorsandneuroprosthetics, Frontiers in Neuroscience 14 (2020) 358. doi:10.3389/fnins.2020.00358. URLhttp://dx.d...
-
[23]
R. Rybka, Y. Davydov, D. Vlasov, A. Serenko, A. Sboev, V. Ilyin, Com- parison of bagging and sparcity methods for connectivity reduction in spiking neural networks with memristive plasticity, Big Data and Cog- nitive Computing 8 (3) (2024) 22. doi:10.3390/bdcc8030022. URLhttp://dx.doi.org/10.3390/bdcc8030022
-
[24]
D. S. Vlasov, R. B. Rybka, A. V. Serenko, A. G. Sboev, Spiking neu- ral network actor–critic reinforcement learning with temporal coding and reward-modulated plasticity, Moscow University Physics Bulletin 79 (S2) (2024) S944–S952. doi:10.3103/s0027134924702400. URLhttp://dx.doi.org/10.3103/S0027134924702400
-
[25]
B. V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A. R. Chan- drasekaran, J.-M. Bussat, R. Alvarez-Icaza, J. V. Arthur, P. A. Merolla, K. Boahen, Neurogrid: A mixed-analog-digital multichip system for 39 large-scale neural simulations, Proceedings of the IEEE 102 (5) (2014) 699–716. doi:10.1109/jproc.2014.2313565. URLhttp://dx.doi.org/10.1109/JPROC.2014.2313565
-
[26]
Thrun, T
S. Thrun, T. Mitchell, Lifelong robot learning, Robotics and Au- tonomous Systems 15 (1) (1995) 25 – 46
1995
-
[27]
B. W. Edwards, G. H. Wakefield, On the statistics of binned neu- ral point processes: the bernoulli approximation and ar representa- tion of the pst histogram, Biological Cybernetics 64 (2) (1990) 145–153. doi:10.1007/bf02331344. URLhttp://dx.doi.org/10.1007/BF02331344
-
[28]
F. Zenke, S. Ganguli, Superspike: Supervised learning in multilayer spiking neural networks, Neural Computation 30 (6) (2018) 1514–1541. doi:10.1162/neco_a_01086. URLhttp://dx.doi.org/10.1162/neco_a_01086
-
[29]
S. Y. Gordleeva, S. V. Stasenko, A. V. Semyanov, A. E. Dityatev, V. B. Kazantsev, Bi-directional astrocytic regulation of neuronal activ- ity within a network, Frontiers in Computational Neuroscience 6 (2012). doi:10.3389/fncom.2012.00092. URLhttp://dx.doi.org/10.3389/fncom.2012.00092
-
[30]
A. Semyanov, C. Henneberger, A. Agarwal, Making sense of astrocytic calcium signals — from acquisition to interpretation, Nature Reviews Neuroscience 21 (10) (2020) 551–564. doi:10.1038/s41583-020-0361-8. URLhttp://dx.doi.org/10.1038/s41583-020-0361-8
-
[31]
G. Ullah, P. Jung, A. Cornell-Bell, Anti-phase calcium oscillations in astrocytes via inositol (1, 4, 5)-trisphosphate regeneration, Cell Calcium 39 (3) (2006) 197–208. doi:10.1016/j.ceca.2005.10.009. URLhttps://doi.org/10.1016/j.ceca.2005.10.009
-
[32]
M. Santello, N. Toni, A. Volterra, Astrocyte function from information processing to cognition and cognitive impairment, Nature Neuroscience 22 (2) (2019) 154–166. doi:10.1038/s41593-018-0325-8. URLhttp://dx.doi.org/10.1038/s41593-018-0325-8 40
-
[33]
M. Pabst, O. Braganza, H. Dannenberg, W. Hu, L. Pothmann, J. Rosen, I.Mody, K.vanLoo, K.Deisseroth, A.J.Becker, S.Schoch, H.Beck, As- trocyte intermediaries of septal cholinergic modulation in the hippocam- pus, Neuron 90 (4) (2016) 853–865. doi:10.1016/j.neuron.2016.04.003. URLhttp://dx.doi.org/10.1016/j.neuron.2016.04.003
-
[34]
S. Kvatinsky, M. Ramadan, E. G. Friedman, A. Kolodny, Vteam: A general model for voltage-controlled memristors, IEEE Transactions on Circuits and Systems II: Express Briefs 62 (8) (2015) 786–790. doi:10.1109/tcsii.2015.2433536. URLhttp://dx.doi.org/10.1109/TCSII.2015.2433536
-
[35]
A. N. Matsukatova, N. V. Prudnikov, V. A. Kulagin, S. Battistoni, A. A. Minnekhanov, A. D. Trofimov, A. A. Nesmelov, S. A. Zavyalov, Y. N. Malakhova, M. Parmeggiani, A. Ballesio, S. L. Marasso, S. N. Chvalun, V. A. Demin, A. V. Emelyanov, V. Erokhin, Combination of organic- based reservoir computing and spiking neuromorphic systems for a ro- bust and effi...
-
[36]
A.N.Matsukatova, A.V.Emelyanov, V.A.Kulagin, A.Y.Vdovichenko, A. A. Minnekhanov, V. A. Demin, Nanocomposite parylene-c memris- tors with embedded ag nanoparticles for biomedical data processing, Or- ganic Electronics 102 (2022) 106455. doi:10.1016/j.orgel.2022.106455. URLhttp://dx.doi.org/10.1016/j.orgel.2022.106455
-
[37]
Shchanikov, L
S. Shchanikov, L. Korolev, I. Bordanov, A. Belov, E. Gryaznov, A. Mikhaylov, Modeling and hardware implementation of vector-matrix multiplier based on 32x8 1t1r memristive crossbar array, in: 2023 7th Scientific School Dynamics of Complex Networks and their Applications (DCNA), IEEE, 2023, pp. 249–251
2023
-
[38]
Memriboardframework,https://github.com/neurocomputer/MemriBoard, accessed: 2026-03-04
2026
-
[39]
Mikhaylov, A
A. Mikhaylov, A. Belov, D. Korolev, I. Antonov, V. Kotomina, A. Kotina, E. Gryaznov, A. Sharapov, M. Koryazhkina, R. Kryukov, et al., Multilayer metal-oxide memristive device with stabilized resistive switching, Advanced materials technologies 5 (1) (2020) 1900607. 41
2020
-
[40]
A. N. Mikhaylov, E. G. Gryaznov, M. N. Koryazhkina, I. A. Bordanov, S. A. Shchanikov, O. A. Telminov, V. B. Kazantsev, Neuromorphic com- puting based on cmos-integrated memristive arrays: current state and perspectives, Supercomputing Frontiers and Innovations 10 (2) (2023) 77–103
2023
-
[41]
Z. Liu, J. Mei, J. Tang, M. Xu, B. Gao, K. Wang, S. Ding, Q. Liu, Q. Qin, W. Chen, et al., A memristor-based adaptive neuromorphic decoder for brain–computer interfaces, Nature Electronics 8 (4) (2025) 362–372
2025
-
[42]
Intel core i5-12450h benchmark,https://www.cpubenchmark.net/ cpu.php?cpu=Intel+Core+i5-12450H&id=4727, accessed: 2026-03-04
2026
-
[43]
C. J. C. H. Watkins, et al., Learning from delayed rewards (1989)
1989
-
[44]
R. S. Sutton, A. G. Barto, et al., Reinforcement learning: An introduc- tion, Vol. 1, MIT press Cambridge, 1998
1998
-
[45]
Luce, Individual Choice Behavior: A Theoretical Analysis, Wiley, 1959
R. Luce, Individual Choice Behavior: A Theoretical Analysis, Wiley, 1959. URLhttps://books.google.ru/books?id=a80DAQAAIAAJ
1959
-
[46]
T. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics 6 (1) (1985) 4–22. doi:10.1016/0196- 8858(85)90002-8. URLhttp://dx.doi.org/10.1016/0196-8858(85)90002-8
-
[47]
W. R. Thompson, On the likelihood that one unknown probability ex- ceedsanotherinviewoftheevidenceoftwosamples, Biometrika25(3/4) (1933) 285. doi:10.2307/2332286. URLhttp://dx.doi.org/10.2307/2332286
-
[48]
Bellemare, S
M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, R. Munos, Unifying count-based exploration and intrinsic motivation, in: D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, R. Garnett (Eds.), Advances in Neural Information Processing Systems, Vol. 29, Curran Associates, Inc., 2016
2016
-
[49]
Ostrovski, M
G. Ostrovski, M. G. Bellemare, A. van den Oord, R. Munos, Count- based exploration with neural density models, in: D. Precup, Y. W. Teh 42 (Eds.), Proceedings of the 34th International Conference on Machine Learning, Vol. 70 of Proceedings of Machine Learning Research, PMLR, 2017, pp. 2721–2730. URLhttps://proceedings.mlr.press/v70/ostrovski17a.html
2017
-
[50]
Pathak, P
D. Pathak, P. Agrawal, A. A. Efros, T. Darrell, Curiosity-driven ex- ploration by self-supervised prediction, in: International conference on machine learning, PMLR, 2017, pp. 2778–2787
2017
-
[51]
Exploration by Random Network Distillation
Y. Burda, H. Edwards, A. Storkey, O. Klimov, Exploration by random network distillation, arXiv preprint arXiv:1810.12894 (2018)
work page Pith review arXiv 2018
-
[52]
M. Pecháč, M. Chovanec, I. Farkaš, Self-supervised network distillation: An effective approach to exploration in sparse reward environments, Neurocomputing 599 (2024) 128033. doi:10.1016/j.neucom.2024.128033. URLhttp://dx.doi.org/10.1016/j.neucom.2024.128033
-
[53]
R. S. Sutton, D. Precup, S. Singh, Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning, Artificial intelligence 112 (1-2) (1999) 181–211
1999
-
[54]
P.-L. Bacon, J. Harb, D. Precup, The option-critic architecture, Pro- ceedings of the AAAI Conference on Artificial Intelligence 31 (1) (Feb. 2017). doi:10.1609/aaai.v31i1.10916. URLhttp://dx.doi.org/10.1609/aaai.v31i1.10916
-
[55]
Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, P. Abbeel, Rl2: Fast reinforcement learning via slow reinforcement learning, arXiv preprint arXiv:1611.02779 (2016)
work page Pith review arXiv 2016
-
[56]
Zintgraf, S
L. Zintgraf, S. Schulze, C. Lu, L. Feng, M. Igl, K. Shiarlis, Y. Gal, K. Hofmann, S. Whiteson, Varibad: Variational bayes-adaptive deep rl via meta-learning, Journal of Machine Learning Research 22 (289) (2021) 1–39
2021
-
[57]
J. Clune, B. Norman, First-explore, then exploit: Meta-learning to solve hard exploration-exploitation trade-offs, in: Advances in Neural Information Processing Systems 37, NeurIPS 2024, Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2024, p. 27490–27528. doi:10.52202/079017-0864. URLhttp://dx.doi.org/10.52202/079017-0864 43
-
[58]
D. Durstewitz, B. Averbeck, G. Koppe, What neuroscience can tell ai about learning in continuously changing environments, Nature Machine Intelligence 7 (12) (2025) 1897–1912. doi:10.1038/s42256-025-01146-z. URLhttp://dx.doi.org/10.1038/s42256-025-01146-z
-
[59]
S. S. Chowdhury, D. Sharma, A. Kosta, K. Roy, Neuromorphic comput- ing for robotic vision: algorithms to hardware advances, Communica- tions Engineering 4 (1) (Aug. 2025). doi:10.1038/s44172-025-00492-5. URLhttp://dx.doi.org/10.1038/s44172-025-00492-5
-
[60]
A. Novo, F. Lobon, H. Garcia de Marina, S. Romero, F. Barranco, Neuromorphic perception and navigation for mobile robots: A review, ACM Computing Surveys 56 (10) (2024) 1–37. doi:10.1145/3656469. URLhttp://dx.doi.org/10.1145/3656469
-
[61]
W. Maass, Networks of spiking neurons: The third generation of neural network models, Neural Networks 10 (9) (1997) 1659–1671. doi:10.1016/s0893-6080(97)00011-7. URLhttp://dx.doi.org/10.1016/S0893-6080(97)00011-7
-
[62]
K. Roy, A. Jaiswal, P. Panda, Towards spike-based machine intelli- gence with neuromorphic computing, Nature 575 (7784) (2019) 607–617. doi:10.1038/s41586-019-1677-2. URLhttp://dx.doi.org/10.1038/s41586-019-1677-2
-
[63]
Loihi: a neuromorphic many- core processor with on-chip learning,
M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Cho- day, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C.-K. Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y.-H. Weng, A. Wild, Y. Yang, H. Wang, Loihi: A neuromorphic many- core processor with on-chip learning, IEEE Micro 38 (1) (2018) 82–99. doi:10.1109...
-
[64]
F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.-J. Nam, B. Taba, M. Beakes, B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jack- son, D. S. Modha, Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip, IEEE Transactions on Computer-Aided Design of Inte...
-
[65]
S. B. Furber, F. Galluppi, S. Temple, L. A. Plana, The spin- naker project, Proceedings of the IEEE 102 (5) (2014) 652–665. doi:10.1109/jproc.2014.2304638. URLhttp://dx.doi.org/10.1109/JPROC.2014.2304638
-
[66]
R. Benosman, C. Clercq, X. Lagorce, S.-H. Ieng, C. Bartolozzi, Event- based visual flow, IEEE Transactions on Neural Networks and Learning Systems 25 (2) (2014) 407–417. doi:10.1109/tnnls.2013.2273537. URLhttp://dx.doi.org/10.1109/TNNLS.2013.2273537
-
[67]
F. Barranco, C. Fermuller, Y. Aloimonos, Bio-inspired Motion Estima- tionwithEvent-DrivenSensors, SpringerInternationalPublishing, 2015, p. 309–321. doi:10.1007/978-3-319-19258-1_27. URLhttp://dx.doi.org/10.1007/978-3-319-19258-1_27
-
[68]
H. Rebecq, T. Horstschaefer, D. Scaramuzza, Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear opti- mization, in: Procedings of the British Machine Vision Confer- ence 2017, BMVC 2017, British Machine Vision Association, 2017. doi:10.5244/c.31.16. URLhttp://dx.doi.org/10.5244/C.31.16
-
[69]
Y. Zhou, G. Gallego, S. Shen, Event-based stereo visual odom- etry, IEEE Transactions on Robotics 37 (5) (2021) 1433–1450. doi:10.1109/tro.2021.3062252. URLhttp://dx.doi.org/10.1109/TRO.2021.3062252
-
[70]
U. Rancon, J. Cuadrado-Anibarro, B. R. Cottereau, T. Masquelier, Stereospike: Depth learning with a spiking neural network, IEEE Access 10 (2022) 127428–127439. doi:10.1109/access.2022.3226484. URLhttp://dx.doi.org/10.1109/ACCESS.2022.3226484
-
[71]
C. Lee, A. K. Kosta, A. Z. Zhu, K. Chaney, K. Daniilidis, K. Roy, Spike- FlowNet: Event-Based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks, Springer International Publishing, 2020, p. 366–382. doi:10.1007/978-3-030-58526-6_22. URLhttp://dx.doi.org/10.1007/978-3-030-58526-6_22 45
-
[72]
A. K. Kosta, K. Roy, Adaptive-spikenet: Event-based opti- cal flow estimation using spiking neural networks with learnable neuronal dynamics, in: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2023, p. 6021–6027. doi:10.1109/icra48891.2023.10160551. URLhttp://dx.doi.org/10.1109/ICRA48891.2023.10160551
-
[73]
D. Ball, S. Heath, J. Wiles, G. Wyeth, P. Corke, M. Milford, Open- ratslam: an open source brain-based slam system, Autonomous Robots 34 (3) (2013) 149–176. doi:10.1007/s10514-012-9317-9. URLhttp://dx.doi.org/10.1007/s10514-012-9317-9
-
[74]
M. Milford, G. Wyeth, Mapping a suburb with a single camera using a biologically inspired slam system, IEEE Transactions on Robotics 24 (5) (2008) 1038–1053. doi:10.1109/tro.2008.2004520. URLhttp://dx.doi.org/10.1109/TRO.2008.2004520
-
[75]
F. Yu, J. Shang, Y. Hu, M. Milford, Neuroslam: a brain-inspired slam system for 3d environments, Biological Cybernetics 113 (5–6) (2019) 515–545. doi:10.1007/s00422-019-00806-9. URLhttp://dx.doi.org/10.1007/s00422-019-00806-9
-
[76]
A. Banino, C. Barry, B. Uria, C. Blundell, T. Lillicrap, P. Mirowski, A. Pritzel, M. J. Chadwick, T. Degris, J. Modayil, G. Wayne, H. Soyer, F. Viola, B. Zhang, R. Goroshin, N. Rabinowitz, R. Pascanu, C. Beattie, S. Petersen, A. Sadik, S. Gaffney, H. King, K. Kavukcuoglu, D. Hass- abis, R. Hadsell, D. Kumaran, Vector-based navigation using grid-like repre...
-
[77]
V. Edvardsen, Long-range navigation by path integration and decod- ing of grid cells in a neural network, in: 2017 International Joint Conference on Neural Networks (IJCNN), IEEE, 2017, p. 4348–4355. doi:10.1109/ijcnn.2017.7966406. URLhttp://dx.doi.org/10.1109/IJCNN.2017.7966406
-
[78]
Y. Chen, Z. Xiong, J. Liu, C. Yang, L. Chao, Y. Peng, A posi- tioning method based on place cells and head-direction cells for in- ertial/visual brain-inspired navigation system, Sensors 21 (23) (2021) 46
2021
-
[79]
URLhttp://dx.doi.org/10.3390/s21237988
doi:10.3390/s21237988. URLhttp://dx.doi.org/10.3390/s21237988
-
[80]
J. Liu, L. J. Mcdaid, J. Harkin, S. Karim, A. P. Johnson, A. G. Mil- lard, J. Hilder, D. M. Halliday, A. M. Tyrrell, J. Timmis, Exploring self-repair in a coupled spiking astrocyte neural network, IEEE Transac- tions on Neural Networks and Learning Systems 30 (3) (2019) 865–875. doi:10.1109/tnnls.2018.2854291. URLhttp://dx.doi.org/10.1109/TNNLS.2018.2854291
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.