Smart Walkers in Discrete Space

arxiv: 2601.22235 · v4 · submitted 2026-01-29 · ❄️ cond-mat.stat-mech

Smart Walkers in Discrete Space

Gianluca Peri , Lorenzo Buffoni , Giacomo Chiti , Duccio Fanelli , Raffaele Marino , Andrea Nocentini , Pier Paolo Panti This is my paper

Pith reviewed 2026-05-16 09:17 UTC · model grok-4.3

classification ❄️ cond-mat.stat-mech

keywords configuration entropyreinforcement learningdiscrete spaceagent performance proxychaser-target dynamicsstatistical mechanics of walkerstask complexity measure

0 comments p. Extension

The pith

Configuration entropy from position statistics alone measures how well smart agents have learned to perform their tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets up a model of agents moving on a discrete grid and first derives exact encounter statistics for two purely random walkers acting as chaser and target. It then equips the walkers with reinforcement learning so they can adjust their steps in response to an external reward signal, thereby reshaping the same encounter statistics to accumulate more reward. The central result is that the configuration entropy of the agents' joint position distribution, computed from location data alone, rises or falls in step with the agents' actual task performance even when reward values, policies, and internal states remain hidden. This entropy-based gauge is further checked by pitting a trained chess engine against a random opponent, where the same pattern holds.

Core claim

Configuration entropy, derived solely from the statistical distribution of agent positions in discrete space, functions as a quantitative proxy for the agents' acquired competence at the assigned task. Random walkers produce analytically predictable encounter rates; learning walkers systematically alter those rates to increase cumulative reward. Because the entropy calculation requires no access to reward signals or policy parameters, it supplies an external, observable signature of how effectively the agents have adapted to the task complexity, a signature confirmed in both the walker simulations and the chess-engine test.

What carries the argument

Configuration entropy computed from the joint position histogram of the agents, which quantifies the spatial uncertainty of their locations and serves as the observable correlate of learned task skill.

If this is right

Encounter statistics between chaser and target become analytically tractable for random walkers and shift systematically once learning is introduced.
Reinforcement learning allows agents to reshape their position distributions in order to maximize cumulative reward.
The entropy proxy operates without knowledge of the reward function or the agent's internal policy.
The same entropy measure applied to a chess engine against a random opponent reproduces the correlation between entropy reduction and performance gain.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Entropy values could be monitored in real time to detect when an agent has stopped improving without inspecting its reward history.
The approach may generalize to continuous spaces if position histograms are replaced by kernel density estimates.
Tasks could be ranked by the minimum entropy an agent must achieve to reach a given performance threshold.

Load-bearing premise

That the entropy of observed positions will rise and fall with task performance across arbitrary reward structures and environments without any task-specific adjustment to the entropy formula.

What would settle it

Train the walkers under a reward schedule that awards high scores only for narrow, low-entropy paths; if configuration entropy nevertheless fails to increase with measured reward, the proposed proxy is refuted.

Figures

Figures reproduced from arXiv: 2601.22235 by Andrea Nocentini, Duccio Fanelli, Giacomo Chiti, Gianluca Peri, Lorenzo Buffoni, Pier Paolo Panti, Raffaele Marino.

**Figure 2.** Figure 2: Transition matrices for the two completely random agents. Reflecting boundary conditions [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: First encounter probability distribution for two random walkers under reflecting boundary [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of the vector t representing meeting times for two random walkers on a discrete space with N = 11. The left plot depicts directly the values of the elements of t. The right plot instead shows a factorized view of t across the individual walker subspaces using Eq. (3.9), resulting in a bar plot of mean meeting times (in game moves) as a function of the walkers’ starting positions. The central … view at source ↗

**Figure 5.** Figure 5: Generalized game environment for smart walkers. Each agent is equipped with a Q-table, [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: First encounter probability distributions, derived both numerically ( [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Mean first encounter times, derived analytically with (3.9). [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: (a) Cumulative rewards obtained by Alice during training. The strongest performance belongs [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: In this figure the non-normalized configuration entropy values are reported as a function [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: First encounter probability distributions, derived both numerically ( [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

**Figure 11.** Figure 11: Different entropies are depicted as measured during the training of Alice (predator) and Bob [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

read the original abstract

We study the statistical properties of trainable agents moving in discrete space. After introducing the mathematical framework, we first analyze the dynamics of two completely random walkers, mutually competing in a chaser-target interaction scheme. The statistics of the encounters is analytically obtained and the predictions tested versus numerical simulations. We then move forward to extend the baseline case to agents capable of learning and adapting to an external reward signal, using reinforcement learning. Smart walkers morph the statistics of the encounter, to maximize their cumulated reward, as confirmed by combined numerical and analytical insights. More interestingly, configuration entropy proves a reliable proxy to gauge the acquired ability of the agents to cope with the assigned task when no other information about them (i.e. reward signal, policy, etc) is present. We further test the proposed measure of learned skills by operating the Stockfish chess engine against a quasi-random untrained opponent. The obtained conclusions corroborate our claim. Summing up, our primary contribution is to propose and test a quantitative measure of agents' awareness that naturally correlates with the inherent complexity of the task being performed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Configuration entropy from position stats tracks learned performance in the chaser and chess cases they show, but the generality claim needs more varied tests.

read the letter

Configuration entropy computed from where the agents spend time looks like a workable external signal for how well they've picked up the task in these discrete setups. The paper shows it dropping as the RL agents improve their reward in the chaser game and staying low for the strong chess engine against a weak opponent. That is the core new angle: a measure that does not require seeing the policy or the reward function itself. The random-walker analysis is the cleanest part. They derive the encounter rate exactly for two independent walkers and confirm it with simulations, which gives the baseline something solid to stand on before moving to learning agents. The RL extension and the chess check then illustrate that the entropy tracks performance in those specific cases. The soft spot is the reach of the claim. The entropy-performance link is shown for an encounter-based reward in the grid and for chess against random play, but there are no ablations that change the reward structure while keeping the space fixed, nor tests on a qualitatively different discrete task. Without those, it is still possible the correlation is tied to the particular reward definitions rather than emerging for any learned policy. The chess example is also narrow, pitting a top engine against an untrained opponent rather than comparing agents of graded strength under controlled conditions. This work is mainly for people who already combine statistical mechanics tools with grid-based RL or multi-agent systems. A reader looking for new evaluation metrics that work from trajectory data alone could find the idea useful and might try it on their own setups. The analytical derivation and the concrete numerical checks are enough to justify sending it to peer review, though any referee will likely ask for broader controls on the entropy proxy before the generality is accepted.

Referee Report

2 major / 1 minor

Summary. The paper studies statistical properties of agents moving in discrete space. It derives exact encounter statistics for two random walkers in a chaser-target interaction, validates them against simulations, then extends the setup to reinforcement-learning agents that adapt policies to maximize cumulative reward and thereby alter encounter statistics. The central claim is that configuration entropy, computed solely from position histograms, serves as a reliable, task-agnostic proxy for the agents' acquired ability to solve the assigned task, with supporting evidence from the RL walker experiments and a Stockfish-versus-random chess demonstration.

Significance. If the entropy-performance correlation proves robust, the work supplies a quantitative, externally observable metric of learned skill that does not require access to reward signals or internal policies. The exact analytical treatment of the random-walker baseline adds a useful reference point for discrete-space dynamics.

major comments (2)

[§4] §4 (RL extension): the assertion that configuration entropy is a general proxy for task competence rests on numerical observations within the single chaser-target reward structure; no ablation that varies the reward function while holding the environment fixed is reported, leaving open the possibility that the observed correlation is specific to encounter-based rewards rather than a universal property of learned policies.
[§5] §5 (chess demonstration): the entropy measure is applied to board configurations, yet the manuscript provides no quantitative controls for opponent strength, game length, or position sampling density; without these, it is unclear whether the reported entropy drop reflects genuine skill acquisition or simply the difference between deterministic and random move selection.

minor comments (1)

Notation for the configuration entropy (Eq. (X)) should be defined explicitly in terms of the histogram bins and normalization before its use in the RL and chess sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our manuscript. We provide point-by-point responses to the major comments below, outlining the revisions we intend to implement to address the concerns raised.

read point-by-point responses

Referee: [§4] §4 (RL extension): the assertion that configuration entropy is a general proxy for task competence rests on numerical observations within the single chaser-target reward structure; no ablation that varies the reward function while holding the environment fixed is reported, leaving open the possibility that the observed correlation is specific to encounter-based rewards rather than a universal property of learned policies.

Authors: We agree that the primary numerical evidence for the entropy-performance correlation in the RL setting comes from the chaser-target interaction. However, the chess demonstration applies the entropy measure to a fundamentally different task with its own implicit reward structure based on game outcomes, providing supporting evidence beyond a single reward type. To further address this, we will revise Section 4 to explicitly discuss the scope of the claim and include a brief analysis or discussion on how the entropy proxy might depend on the reward structure, along with suggestions for future ablations. revision: partial
Referee: [§5] §5 (chess demonstration): the entropy measure is applied to board configurations, yet the manuscript provides no quantitative controls for opponent strength, game length, or position sampling density; without these, it is unclear whether the reported entropy drop reflects genuine skill acquisition or simply the difference between deterministic and random move selection.

Authors: We acknowledge the lack of detailed controls in the current manuscript. In the revised version, we will add quantitative information: specify the Stockfish engine version and the exact skill level used, report statistics on game lengths and number of positions sampled, and describe the position sampling method (e.g., from complete games played under controlled conditions). These additions will help demonstrate that the entropy reduction correlates with skill level rather than mere determinism. revision: yes

Circularity Check

0 steps flagged

No circularity: configuration entropy computed independently from position statistics and correlated with external performance metrics

full rationale

The paper first derives encounter statistics for random walkers analytically, then extends to RL-trained smart walkers that maximize reward. Configuration entropy is defined directly from the resulting position histograms (configuration statistics alone) without reference to the reward signal, policy, or task performance in its formula. The claim that this entropy serves as a proxy is supported by post-hoc numerical correlations with independent quantities (cumulated reward, chess win rates) rather than by construction or fitting. No self-citations, uniqueness theorems, or ansatzes are invoked to force the result; the derivation chain remains self-contained against external benchmarks. Limited test cases raise questions of generality but do not constitute circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard assumptions from statistical mechanics for discrete random walks and from reinforcement learning theory; no new free parameters, axioms, or invented entities are introduced in the abstract.

axioms (2)

standard math Standard statistical mechanics framework for encounter statistics of random walkers on a discrete lattice
Invoked for the analytical derivation of encounter probabilities in the baseline case.
domain assumption Reinforcement learning agents can adapt policies to maximize cumulative reward in discrete state spaces
Used when extending the model to trainable smart walkers.

pith-pipeline@v0.9.0 · 5498 in / 1299 out tokens · 34962 ms · 2026-05-16T09:17:14.078443+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

configuration entropy proves a reliable proxy to gauge the acquired ability of the agents... computed from position statistics alone
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat recovery unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

global transition matrix A = A_A ⊗ A_B ... modified to absorbing states at meetings

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

[1]

Extreme events in dynamical systems and random walkers: A review

Sayantan Nag Chowdhury et al. “Extreme events in dynamical systems and random walkers: A review”. In:Physics Reports966 (2022). Extreme events in dynamical systems and random walkers: A review, pp. 1–52.issn: 0370-1573.doi:https://doi.org/10.1016/j.physrep.2022.04.001. url:https://www.sciencedirect.com/science/article/pii/S0370157322001144

work page doi:10.1016/j.physrep.2022.04.001 2022
[2]

Two-particle problem in comblike structures

Elena Agliari et al. “Two-particle problem in comblike structures”. In:Phys. Rev. E93 (5 May 2016), p. 052111.doi:10.1103/PhysRevE.93.052111.url:https://link.aps.org/doi/10. 1103/PhysRevE.93.052111

work page doi:10.1103/physreve.93.052111.url:https://link.aps.org/doi/10 2016
[3]

Slow encounters of particle pairs in branched structures

Elena Agliari, Alexander Blumen, and Davide Cassi. “Slow encounters of particle pairs in branched structures”. In:Phys. Rev. E89 (5 May 2014), p. 052147.doi:10.1103/PhysRevE.89.052147. url:https://link.aps.org/doi/10.1103/PhysRevE.89.052147

work page doi:10.1103/physreve.89.052147 2014
[4]

First encounters on combs

Junhao Peng and Elena Agliari. “First encounters on combs”. In:Phys. Rev. E100 (6 Dec. 2019), p. 062310.doi:10.1103/PhysRevE.100.062310.url:https://link.aps.org/doi/10.1103/ PhysRevE.100.062310

work page doi:10.1103/physreve.100.062310.url:https://link.aps.org/doi/10.1103/ 2019
[5]

20 years of network community detection

Santo Fortunato and Mark E. J. Newman. “20 years of network community detection”. In:Nature Physics18.8 (2022), pp. 848–850.doi:10.1038/s41567-022-01716-7.url:https://doi.org/ 10.1038/s41567-022-01716-7

work page doi:10.1038/s41567-022-01716-7.url:https://doi.org/ 2022
[6]

Reactive random walkers on complex networks

Giulia Cencetti et al. “Reactive random walkers on complex networks”. In:Phys. Rev. E98 (5 Nov. 2018), p. 052302.doi:10.1103/PhysRevE.98.052302.url:https://link.aps.org/doi/ 10.1103/PhysRevE.98.052302

work page doi:10.1103/physreve.98.052302.url:https://link.aps.org/doi/ 2018
[7]

Diego Febbe, Duccio Fanelli, and Timoteo Carletti.Random Walks Across Dimensions: Exploring Simplicial Complexes. 2026. arXiv:2601.16086 [cond-mat.stat-mech].url:https://arxiv. org/abs/2601.16086

work page arXiv 2026
[8]

Mean encounter times for multiple random walkers on networks

Alejandro P. Riascos and David P. Sanders. “Mean encounter times for multiple random walkers on networks”. In:Phys. Rev. E103 (4 Apr. 2021), p. 042312.doi:10.1103/PhysRevE.103.042312. url:https://link.aps.org/doi/10.1103/PhysRevE.103.042312

work page doi:10.1103/physreve.103.042312 2021
[9]

Multitarget Pursuit-Evasion Based on Distributed and Competitive Mechanisms

Ning Tan et al. “Multitarget Pursuit-Evasion Based on Distributed and Competitive Mechanisms”. In:IEEE Transactions on Systems, Man, and Cybernetics: Systems54.10 (2024), pp. 5989–6000. doi:10.1109/TSMC.2024.3410693.url:https://doi.org/10.1109/TSMC.2024.3410693

work page doi:10.1109/tsmc.2024.3410693.url:https://doi.org/10.1109/tsmc.2024.3410693 2024
[10]

Waymire.Random Walk, Brownian Motion, and Martingales

Rabi Bhattacharya and Edward C. Waymire.Random Walk, Brownian Motion, and Martingales. Springer Cham, 2021.isbn: 978-3-030-78937-4.doi:10.1007/978-3-030-78939-8.url:https: //doi.org/10.1007/978-3-030-78939-8

work page doi:10.1007/978-3-030-78939-8.url:https: 2021
[11]

Money distribution in agent-based models with position-exchange dynamics: the Pareto paradigm revisited

Ekrem Aydiner, Andrey G. Cherstvy, and Ralf Metzler. “Money distribution in agent-based models with position-exchange dynamics: the Pareto paradigm revisited”. In:European Physical Journal B92.5, 104 (May 2019), p. 104.doi:10 . 1140 / epjb / e2019 - 90674 - 0.url:https : / / link . springer.com/article/10.1140/epjb/e2019-90674-0

work page doi:10.1140/epjb/e2019-90674-0 2019
[12]

Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation

R. Vidal et al. “Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation”. In:IEEE Transactions on Robotics and Automation18.5 (2002), pp. 662–669.doi: 10.1109/TRA.2002.804040.url:https://ieeexplore.ieee.org/document/1067989

work page doi:10.1109/tra.2002.804040.url:https://ieeexplore.ieee.org/document/1067989 2002
[13]

Multiple Random Walks in Random Regular Graphs

Colin Cooper, Alan Frieze, and Tomasz Radzik. “Multiple Random Walks in Random Regular Graphs”. In:SIAM Journal on Discrete Mathematics23.4 (2010), pp. 1738–1761.doi:10.1137/ 080729542.url:https://doi.org/10.1137/080729542

work page doi:10.1137/080729542 2010
[14]

The capture time of a graph

A. Bonato et al. “The capture time of a graph”. In:Discrete Mathematics309.18 (2009), pp. 5588– 5595.issn: 0012-365X.doi:https://doi.org/10.1016/j.disc.2008.04.004.url:https: //www.sciencedirect.com/science/article/pii/S0012365X08002173. 21

work page doi:10.1016/j.disc.2008.04.004.url:https: 2009
[15]

The “Princess and Monster

Steve Alpern et al. “The “Princess and Monster” Game on an Interval”. In:SIAM Journal on Control and Optimization47.3 (2008), pp. 1178–1190.doi:10 . 1137 / 060672054.url:https : //doi.org/10.1137/060672054

work page doi:10.1137/060672054 2008
[16]

Underwater glider persistent coverage using deep reinforcement learning for ocean observation

Jiayi Liu et al. “Underwater glider persistent coverage using deep reinforcement learning for ocean observation”. In:Ocean Engineering336 (2025), p. 121804.issn: 0029-8018.doi:https://doi. org/10.1016/j.oceaneng.2025.121804.url:https://www.sciencedirect.com/science/ article/pii/S0029801825015100

work page doi:10.1016/j.oceaneng.2025.121804.url:https://www.sciencedirect.com/science/ 2025
[17]

Access to Emergency Services: A New York City Case Study

Sukhwan Chung et al. “Access to Emergency Services: A New York City Case Study”. In:Trans- portation Research Interdisciplinary Perspectives25(2024),p.101111.issn:2590-1982.doi:https: //doi.org/10.1016/j.trip.2024.101111.url:https://www.sciencedirect.com/science/ article/pii/S2590198224000976

work page doi:10.1016/j.trip.2024.101111.url:https://www.sciencedirect.com/science/ 2024
[18]

Enhancing Reliability of Time-Triggered Traffic in Joint Scheduling and Routing Optimization Within Time-Sensitive Networks

Bilal Omar Akram et al. “Enhancing Reliability of Time-Triggered Traffic in Joint Scheduling and Routing Optimization Within Time-Sensitive Networks”. In:IEEE Access12 (2024), pp. 78379– 78396.doi:10.1109/ACCESS.2024.3408923.url:https://ieeexplore.ieee.org/document/ 10546916

work page doi:10.1109/access.2024.3408923.url:https://ieeexplore.ieee.org/document/ 2024
[19]

Monitoring Environmental Boundaries With a Robotic Sensor Network

Sara Susca, Francesco Bullo, and Sonia Martinez. “Monitoring Environmental Boundaries With a Robotic Sensor Network”. In:IEEE Transactions on Control Systems Technology16.2 (2008), pp. 288–296.doi:10.1109/TCST.2007.903395.url:https://ieeexplore.ieee.org/document/ 4431884

work page doi:10.1109/tcst.2007.903395.url:https://ieeexplore.ieee.org/document/ 2008
[20]

Decentralized multi-task stochastic optimization with compressed communi- cations

Navjot Singh et al. “Decentralized multi-task stochastic optimization with compressed communi- cations”. In:Automatica159 (2024), p. 111363.issn: 0005-1098.doi:https://doi.org/10.1016/ j.automatica.2023.111363.url:https://www.sciencedirect.com/science/article/pii/ S0005109823005290

work page arXiv 2024
[21]

Guaranteed Bounds on Posterior Distributions of Discrete Probabilistic Programs with Loops

Fabian Zaiser, Andrzej S. Murawski, and C.-H. Luke Ong. “Guaranteed Bounds on Posterior Distributions of Discrete Probabilistic Programs with Loops”. In:Proc. ACM Program. Lang. 9.POPL (Jan. 2025).doi:10.1145/3704874.url:https://doi.org/10.1145/3704874

work page doi:10.1145/3704874.url:https://doi.org/10.1145/3704874 2025
[22]

SimRank: a measure of structural-context similarity

Glen Jeh and Jennifer Widom. “SimRank: a measure of structural-context similarity”. In:Proceed- ings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing. KDD ’02. Edmonton, Alberta, Canada: Association for Computing Machinery, 2002, pp. 538– 543.isbn: 158113567X.doi:10.1145/775047.775126.url:https://doi.org/10.1145/7...

work page doi:10.1145/775047.775126.url:https://doi.org/10.1145/775047 2002
[23]

On a random walk problem arising in self-stabilizing token management

Prasad Tetali and Peter Winkler. “On a random walk problem arising in self-stabilizing token management”. In:Proceedings of the Tenth Annual ACM Symposium on Principles of Distributed Computing. PODC ’91. Montreal, Quebec, Canada: Association for Computing Machinery, 1991, pp. 273–280.isbn: 0897914392.doi:10.1145/112600.112623.url:https://doi.org/10.1145/...

work page doi:10.1145/112600.112623.url:https://doi.org/10.1145/ 1991
[24]

Token management schemes and random walks yield self-stabilizing mutual exclusion

Amos Israeli and Marc Jalfon. “Token management schemes and random walks yield self-stabilizing mutual exclusion”. In:Proceedings of the Ninth Annual ACM Symposium on Principles of Dis- tributed Computing. PODC ’90. Quebec City, Quebec, Canada: Association for Computing Ma- chinery, 1990, pp. 119–131.isbn: 089791404X.doi:10.1145/93385.93409.url:https://do...

work page doi:10.1145/93385.93409.url:https://doi 1990
[25]

Collisions Among Random Walks on a Graph

Don Coppersmith, Prasad Tetali, and Peter Winkler. “Collisions Among Random Walks on a Graph”. In:SIAM Journal on Discrete Mathematics6.3 (1993), pp. 363–374.doi:10 . 1137 / 0406029.url:https://doi.org/10.1137/0406029. 22

work page doi:10.1137/0406029 1993
[26]

Meeting times of random walks on graphs

Nader H. Bshouty, Lisa Higham, and Jolanta Warpechowska-Gruca. “Meeting times of random walks on graphs”. In:Information Processing Letters69.5 (1999), pp. 259–265.issn: 0020-0190. doi:https://doi.org/10.1016/S0020-0190(99)00017-4.url:https://www.sciencedirect. com/science/article/pii/S0020019099000174

work page doi:10.1016/s0020-0190(99)00017-4.url:https://www.sciencedirect 1999
[27]

Mishel George, Rushabh Patel, and Francesco Bullo.The Meeting Time of Multiple Random Walks

work page
[28]

arXiv:1806.08843 [math.PR].url:https://arxiv.org/abs/1806.08843

work page internal anchor Pith review Pith/arXiv arXiv
[29]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book, 2018.isbn: 0262039249

work page 2018
[30]

AMax-MinEntropyFrameworkforReinforcementLearning

SeungyulHanandYoungchulSung.“AMax-MinEntropyFrameworkforReinforcementLearning”. In:Neural Information Processing Systems. 2021.url:https : / / api . semanticscholar . org / CorpusID:235490266

work page 2021
[31]

Math Stack Exchange

Damian Pavlyshyn.The random walkers’ first meeting. Math Stack Exchange. 2025.url:https: //math.stackexchange.com/questions/5033858/the-random-walkers-first-meeting

work page arXiv 2025
[32]

A Survey of Exploration Methods in Reinforcement Learning

Susan Amin et al. “A Survey of Exploration Methods in Reinforcement Learning”. In:CoRR abs/2109.00157 (2021). arXiv:2109.00157.url:https://arxiv.org/abs/2109.00157. 23

work page arXiv 2021

[1] [1]

Extreme events in dynamical systems and random walkers: A review

Sayantan Nag Chowdhury et al. “Extreme events in dynamical systems and random walkers: A review”. In:Physics Reports966 (2022). Extreme events in dynamical systems and random walkers: A review, pp. 1–52.issn: 0370-1573.doi:https://doi.org/10.1016/j.physrep.2022.04.001. url:https://www.sciencedirect.com/science/article/pii/S0370157322001144

work page doi:10.1016/j.physrep.2022.04.001 2022

[2] [2]

Two-particle problem in comblike structures

Elena Agliari et al. “Two-particle problem in comblike structures”. In:Phys. Rev. E93 (5 May 2016), p. 052111.doi:10.1103/PhysRevE.93.052111.url:https://link.aps.org/doi/10. 1103/PhysRevE.93.052111

work page doi:10.1103/physreve.93.052111.url:https://link.aps.org/doi/10 2016

[3] [3]

Slow encounters of particle pairs in branched structures

Elena Agliari, Alexander Blumen, and Davide Cassi. “Slow encounters of particle pairs in branched structures”. In:Phys. Rev. E89 (5 May 2014), p. 052147.doi:10.1103/PhysRevE.89.052147. url:https://link.aps.org/doi/10.1103/PhysRevE.89.052147

work page doi:10.1103/physreve.89.052147 2014

[4] [4]

First encounters on combs

Junhao Peng and Elena Agliari. “First encounters on combs”. In:Phys. Rev. E100 (6 Dec. 2019), p. 062310.doi:10.1103/PhysRevE.100.062310.url:https://link.aps.org/doi/10.1103/ PhysRevE.100.062310

work page doi:10.1103/physreve.100.062310.url:https://link.aps.org/doi/10.1103/ 2019

[5] [5]

20 years of network community detection

Santo Fortunato and Mark E. J. Newman. “20 years of network community detection”. In:Nature Physics18.8 (2022), pp. 848–850.doi:10.1038/s41567-022-01716-7.url:https://doi.org/ 10.1038/s41567-022-01716-7

work page doi:10.1038/s41567-022-01716-7.url:https://doi.org/ 2022

[6] [6]

Reactive random walkers on complex networks

Giulia Cencetti et al. “Reactive random walkers on complex networks”. In:Phys. Rev. E98 (5 Nov. 2018), p. 052302.doi:10.1103/PhysRevE.98.052302.url:https://link.aps.org/doi/ 10.1103/PhysRevE.98.052302

work page doi:10.1103/physreve.98.052302.url:https://link.aps.org/doi/ 2018

[7] [7]

Diego Febbe, Duccio Fanelli, and Timoteo Carletti.Random Walks Across Dimensions: Exploring Simplicial Complexes. 2026. arXiv:2601.16086 [cond-mat.stat-mech].url:https://arxiv. org/abs/2601.16086

work page arXiv 2026

[8] [8]

Mean encounter times for multiple random walkers on networks

Alejandro P. Riascos and David P. Sanders. “Mean encounter times for multiple random walkers on networks”. In:Phys. Rev. E103 (4 Apr. 2021), p. 042312.doi:10.1103/PhysRevE.103.042312. url:https://link.aps.org/doi/10.1103/PhysRevE.103.042312

work page doi:10.1103/physreve.103.042312 2021

[9] [9]

Multitarget Pursuit-Evasion Based on Distributed and Competitive Mechanisms

Ning Tan et al. “Multitarget Pursuit-Evasion Based on Distributed and Competitive Mechanisms”. In:IEEE Transactions on Systems, Man, and Cybernetics: Systems54.10 (2024), pp. 5989–6000. doi:10.1109/TSMC.2024.3410693.url:https://doi.org/10.1109/TSMC.2024.3410693

work page doi:10.1109/tsmc.2024.3410693.url:https://doi.org/10.1109/tsmc.2024.3410693 2024

[10] [10]

Waymire.Random Walk, Brownian Motion, and Martingales

Rabi Bhattacharya and Edward C. Waymire.Random Walk, Brownian Motion, and Martingales. Springer Cham, 2021.isbn: 978-3-030-78937-4.doi:10.1007/978-3-030-78939-8.url:https: //doi.org/10.1007/978-3-030-78939-8

work page doi:10.1007/978-3-030-78939-8.url:https: 2021

[11] [11]

Money distribution in agent-based models with position-exchange dynamics: the Pareto paradigm revisited

Ekrem Aydiner, Andrey G. Cherstvy, and Ralf Metzler. “Money distribution in agent-based models with position-exchange dynamics: the Pareto paradigm revisited”. In:European Physical Journal B92.5, 104 (May 2019), p. 104.doi:10 . 1140 / epjb / e2019 - 90674 - 0.url:https : / / link . springer.com/article/10.1140/epjb/e2019-90674-0

work page doi:10.1140/epjb/e2019-90674-0 2019

[12] [12]

Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation

R. Vidal et al. “Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation”. In:IEEE Transactions on Robotics and Automation18.5 (2002), pp. 662–669.doi: 10.1109/TRA.2002.804040.url:https://ieeexplore.ieee.org/document/1067989

work page doi:10.1109/tra.2002.804040.url:https://ieeexplore.ieee.org/document/1067989 2002

[13] [13]

Multiple Random Walks in Random Regular Graphs

Colin Cooper, Alan Frieze, and Tomasz Radzik. “Multiple Random Walks in Random Regular Graphs”. In:SIAM Journal on Discrete Mathematics23.4 (2010), pp. 1738–1761.doi:10.1137/ 080729542.url:https://doi.org/10.1137/080729542

work page doi:10.1137/080729542 2010

[14] [14]

The capture time of a graph

A. Bonato et al. “The capture time of a graph”. In:Discrete Mathematics309.18 (2009), pp. 5588– 5595.issn: 0012-365X.doi:https://doi.org/10.1016/j.disc.2008.04.004.url:https: //www.sciencedirect.com/science/article/pii/S0012365X08002173. 21

work page doi:10.1016/j.disc.2008.04.004.url:https: 2009

[15] [15]

The “Princess and Monster

Steve Alpern et al. “The “Princess and Monster” Game on an Interval”. In:SIAM Journal on Control and Optimization47.3 (2008), pp. 1178–1190.doi:10 . 1137 / 060672054.url:https : //doi.org/10.1137/060672054

work page doi:10.1137/060672054 2008

[16] [16]

Underwater glider persistent coverage using deep reinforcement learning for ocean observation

Jiayi Liu et al. “Underwater glider persistent coverage using deep reinforcement learning for ocean observation”. In:Ocean Engineering336 (2025), p. 121804.issn: 0029-8018.doi:https://doi. org/10.1016/j.oceaneng.2025.121804.url:https://www.sciencedirect.com/science/ article/pii/S0029801825015100

work page doi:10.1016/j.oceaneng.2025.121804.url:https://www.sciencedirect.com/science/ 2025

[17] [17]

Access to Emergency Services: A New York City Case Study

Sukhwan Chung et al. “Access to Emergency Services: A New York City Case Study”. In:Trans- portation Research Interdisciplinary Perspectives25(2024),p.101111.issn:2590-1982.doi:https: //doi.org/10.1016/j.trip.2024.101111.url:https://www.sciencedirect.com/science/ article/pii/S2590198224000976

work page doi:10.1016/j.trip.2024.101111.url:https://www.sciencedirect.com/science/ 2024

[18] [18]

Enhancing Reliability of Time-Triggered Traffic in Joint Scheduling and Routing Optimization Within Time-Sensitive Networks

Bilal Omar Akram et al. “Enhancing Reliability of Time-Triggered Traffic in Joint Scheduling and Routing Optimization Within Time-Sensitive Networks”. In:IEEE Access12 (2024), pp. 78379– 78396.doi:10.1109/ACCESS.2024.3408923.url:https://ieeexplore.ieee.org/document/ 10546916

work page doi:10.1109/access.2024.3408923.url:https://ieeexplore.ieee.org/document/ 2024

[19] [19]

Monitoring Environmental Boundaries With a Robotic Sensor Network

Sara Susca, Francesco Bullo, and Sonia Martinez. “Monitoring Environmental Boundaries With a Robotic Sensor Network”. In:IEEE Transactions on Control Systems Technology16.2 (2008), pp. 288–296.doi:10.1109/TCST.2007.903395.url:https://ieeexplore.ieee.org/document/ 4431884

work page doi:10.1109/tcst.2007.903395.url:https://ieeexplore.ieee.org/document/ 2008

[20] [20]

Decentralized multi-task stochastic optimization with compressed communi- cations

Navjot Singh et al. “Decentralized multi-task stochastic optimization with compressed communi- cations”. In:Automatica159 (2024), p. 111363.issn: 0005-1098.doi:https://doi.org/10.1016/ j.automatica.2023.111363.url:https://www.sciencedirect.com/science/article/pii/ S0005109823005290

work page arXiv 2024

[21] [21]

Guaranteed Bounds on Posterior Distributions of Discrete Probabilistic Programs with Loops

Fabian Zaiser, Andrzej S. Murawski, and C.-H. Luke Ong. “Guaranteed Bounds on Posterior Distributions of Discrete Probabilistic Programs with Loops”. In:Proc. ACM Program. Lang. 9.POPL (Jan. 2025).doi:10.1145/3704874.url:https://doi.org/10.1145/3704874

work page doi:10.1145/3704874.url:https://doi.org/10.1145/3704874 2025

[22] [22]

SimRank: a measure of structural-context similarity

Glen Jeh and Jennifer Widom. “SimRank: a measure of structural-context similarity”. In:Proceed- ings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing. KDD ’02. Edmonton, Alberta, Canada: Association for Computing Machinery, 2002, pp. 538– 543.isbn: 158113567X.doi:10.1145/775047.775126.url:https://doi.org/10.1145/7...

work page doi:10.1145/775047.775126.url:https://doi.org/10.1145/775047 2002

[23] [23]

On a random walk problem arising in self-stabilizing token management

Prasad Tetali and Peter Winkler. “On a random walk problem arising in self-stabilizing token management”. In:Proceedings of the Tenth Annual ACM Symposium on Principles of Distributed Computing. PODC ’91. Montreal, Quebec, Canada: Association for Computing Machinery, 1991, pp. 273–280.isbn: 0897914392.doi:10.1145/112600.112623.url:https://doi.org/10.1145/...

work page doi:10.1145/112600.112623.url:https://doi.org/10.1145/ 1991

[24] [24]

Token management schemes and random walks yield self-stabilizing mutual exclusion

Amos Israeli and Marc Jalfon. “Token management schemes and random walks yield self-stabilizing mutual exclusion”. In:Proceedings of the Ninth Annual ACM Symposium on Principles of Dis- tributed Computing. PODC ’90. Quebec City, Quebec, Canada: Association for Computing Ma- chinery, 1990, pp. 119–131.isbn: 089791404X.doi:10.1145/93385.93409.url:https://do...

work page doi:10.1145/93385.93409.url:https://doi 1990

[25] [25]

Collisions Among Random Walks on a Graph

Don Coppersmith, Prasad Tetali, and Peter Winkler. “Collisions Among Random Walks on a Graph”. In:SIAM Journal on Discrete Mathematics6.3 (1993), pp. 363–374.doi:10 . 1137 / 0406029.url:https://doi.org/10.1137/0406029. 22

work page doi:10.1137/0406029 1993

[26] [26]

Meeting times of random walks on graphs

Nader H. Bshouty, Lisa Higham, and Jolanta Warpechowska-Gruca. “Meeting times of random walks on graphs”. In:Information Processing Letters69.5 (1999), pp. 259–265.issn: 0020-0190. doi:https://doi.org/10.1016/S0020-0190(99)00017-4.url:https://www.sciencedirect. com/science/article/pii/S0020019099000174

work page doi:10.1016/s0020-0190(99)00017-4.url:https://www.sciencedirect 1999

[27] [27]

Mishel George, Rushabh Patel, and Francesco Bullo.The Meeting Time of Multiple Random Walks

work page

[28] [28]

arXiv:1806.08843 [math.PR].url:https://arxiv.org/abs/1806.08843

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book, 2018.isbn: 0262039249

work page 2018

[30] [30]

AMax-MinEntropyFrameworkforReinforcementLearning

SeungyulHanandYoungchulSung.“AMax-MinEntropyFrameworkforReinforcementLearning”. In:Neural Information Processing Systems. 2021.url:https : / / api . semanticscholar . org / CorpusID:235490266

work page 2021

[31] [31]

Math Stack Exchange

Damian Pavlyshyn.The random walkers’ first meeting. Math Stack Exchange. 2025.url:https: //math.stackexchange.com/questions/5033858/the-random-walkers-first-meeting

work page arXiv 2025

[32] [32]

A Survey of Exploration Methods in Reinforcement Learning

Susan Amin et al. “A Survey of Exploration Methods in Reinforcement Learning”. In:CoRR abs/2109.00157 (2021). arXiv:2109.00157.url:https://arxiv.org/abs/2109.00157. 23

work page arXiv 2021