arxiv: 2604.14811 · v1 · submitted 2026-04-16 · 💻 cs.LG · cs.MA· cs.NI

Recognition: unknown

Learning Ad Hoc Network Dynamics via Graph-Structured World Models

Can Karacelebi , Yusuf Talha Sahin , Elif Surer , Ertan Onur

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:58 UTC · model grok-4.3

classification 💻 cs.LG cs.MAcs.NI

keywords ad hoc networksgraph neural networksrecurrent state space modelsreinforcement learningworld modelsclusteringMANETVANET

0 comments

The pith

A graph-structured recurrent state space model learns ad hoc network dynamics from offline trajectories to train size-generalizable clustering policies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that the coupled effects of node mobility, energy depletion, and topology shifts in wireless ad hoc networks can be captured in a single structured latent model rather than through continuous real-world trial and error. It keeps a separate latent state for each node and lets multi-head attention handle their interactions so that a downstream policy for choosing cluster heads can be optimized entirely inside imagined rollouts. If the approach holds, decision policies become trainable without sustained online interaction and remain effective across network types and node counts far from the training size.

Core claim

The central claim is that G-RSSM, a graph-structured recurrent state space model, maintains per-node latent states and employs cross-node multi-head attention to learn joint multi-physics dynamics directly from offline trajectories. A cluster-head selection policy trained solely via imagined rollouts in this model sustains high connectivity across 27 evaluation scenarios that cover MANET, VANET, FANET, WSN, and tactical networks with node counts ranging from 30 to 1000, even though training occurred only at N=50.

What carries the argument

G-RSSM, a recurrent state space model that assigns each node its own latent state vector inside a graph and uses multi-head attention to propagate interaction effects across nodes while learning from trajectory data.

If this is right

Cluster-head policies trained entirely inside the learned world model transfer to real networks without major loss of performance.
The same policy remains effective for node counts both smaller and larger than the single training size of 50.
One joint model captures mobility, energy depletion, and topology change across multiple network categories.
Offline trajectory collection replaces the need for continuous online interaction during policy training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Historical traces from one ad hoc network could pre-train a model usable in another type without new data collection.
The per-node latent structure may extend to other combinatorial tasks such as routing or channel allocation.
Dynamic node addition or removal during operation would provide a direct test of the size-agnostic property.

Load-bearing premise

The latent dynamics learned from offline data remain close enough to real network evolution that policies optimized inside the model continue to work when transferred to actual operation.

What would settle it

Deploying the learned cluster-head policy in a fresh set of network simulations and finding that connectivity falls below the levels achieved by a model-free reinforcement learner trained directly on the target environment.

Figures

Figures reproduced from arXiv: 2604.14811 by Can Karacelebi, Elif Surer, Ertan Onur, Yusuf Talha Sahin.

**Figure 1.** Figure 1: (a) Standard RSSM compresses N nodes into a single state vector. (b) G-RSSM maintains per-node states with cross node attention. death), the rollout truncates, discounting future returns and teaching the policy to avoid actions that accelerate network collapse. Each node’s previous action decision feeds back into its dynamics step enabling the world model to capture how individual clustering decisions prop… view at source ↗

**Figure 2.** Figure 2: Overall Architecture. contains H˜t ∈ R N×32 which is the all nodes’ GRU outputs stacked. A four-head self attention enables capturing the inter node effects. Each node’s stochastic state zi is a 64- dimensional categorical latent, factored as eight independent one-hot vectors of dimension eight following the architecture proposed in DreamerV3 [2]. Prior and posterior from the canonical RSSM are now transfo… view at source ↗

**Figure 3.** Figure 3: Training convergence. (a) World model reconstruction losses over 100 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Cross-scenario evaluation averaged by category (26 scenarios, [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Ad hoc wireless networks exhibit complex, innate and coupled dynamics: node mobility, energy depletion and topology change that are difficult to model analytically. Model-free deep reinforcement learning requires sustained online interaction whereas existing model based approaches use flat state representations that lose per node structure. Therefore we propose G-RSSM, a graph structured recurrent state space model that maintains per node latent states with cross node multi head attention to learn the dynamics jointly from offline trajectories. We apply the proposed method to the downstream task clustering where a cluster head selection policy trains entirely through imagined rollouts in the learned world model. Across 27 evaluation scenarios spanning MANET, VANET, FANET, WSN and tactical networks with N=30 to 1000 nodes, the learned policy maintains high connectivity with only trained for N=50. Herein, we propose the first multi physics graph structured world model applied to combinatorial per node decision making in size agnostic wireless ad hoc networks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces per-node latent states with cross-node attention inside an RSSM for ad hoc network dynamics and claims size-agnostic clustering policies, but offers no world-model accuracy numbers to support the transfer results.

read the letter

The main thing to know is that this work builds a graph-structured recurrent state space model that keeps separate latent states per node and uses multi-head attention to capture joint effects across nodes. They learn it from offline trajectories and then train a cluster-head selection policy entirely inside imagined rollouts, reporting that the policy trained at N=50 still keeps high connectivity when tested on networks up to N=1000 across MANET, VANET, FANET, WSN, and tactical scenarios. That combination of per-node structure plus size-agnostic training via the world model is the concrete step beyond flat-state model-based RL in this domain. It makes sense for combinatorial per-node decisions where topology and energy matter together. The framing of why analytical models are hard and why structure should be preserved is straightforward and on target. The soft spot is the missing evidence on whether the world model itself is any good. The abstract and available description give no next-step or multi-step prediction errors, no rollout divergence metrics on connectivity or energy, no ablations on the attention layers, and no check on how the graph handles node counts that differ from training. Without those, the headline transfer result could come from an accurate model or from a policy that happens to be robust to moderate mismatch. The size-agnostic claim also assumes the attention mechanism stays stable without explicit conditioning, which is plausible but unshown. This is for researchers who work on model-based RL for wireless or graph-structured control problems. A reader looking for ideas on keeping node-level structure in dynamics models would find something usable here, but anyone needing verified prediction quality or strong baselines would have to wait for the full experiments. I would bring it to a reading group to talk through the architecture, would not cite it in its current form, and would send it for peer review because the modeling choice is reasonable and the application is practical even if the validation needs substantial work.

Referee Report

2 major / 1 minor

Summary. The paper proposes G-RSSM, a graph-structured recurrent state space model that maintains per-node latent states and employs cross-node multi-head attention to jointly learn the coupled dynamics of node mobility, energy depletion, and topology changes in wireless ad hoc networks from offline trajectories. A cluster-head selection policy is then trained entirely via imagined rollouts in this learned world model. The central claim is that this policy, trained only on networks with N=50 nodes, generalizes to maintain high connectivity across 27 evaluation scenarios spanning MANET, VANET, FANET, WSN, and tactical networks with N ranging from 30 to 1000.

Significance. If the results hold with proper validation, the work would represent a meaningful advance in model-based reinforcement learning for combinatorial per-node decisions in size-agnostic ad hoc networks. It offers a structured alternative to model-free methods that require sustained online interaction and to flat state representations that discard per-node structure, while introducing the first multi-physics graph world model for this domain. The size-agnostic transfer property, if substantiated, could have practical value for scaling policies across varying network sizes without retraining.

major comments (2)

[Abstract] Abstract: The headline generalization result (high-connectivity policy trained at N=50 transfers to N=30-1000 across 27 scenarios) is presented without any reported world-model validation metrics, such as next-step or multi-step prediction error on held-out trajectories, rollout divergence statistics for connectivity or energy, or ablations on attention scaling with N. This absence prevents assessment of whether observed policy performance arises from accurate capture of coupled mobility-energy-topology effects or from the policy succeeding despite moderate model mismatch.
[Abstract (and implied method description)] The size-agnostic property of the G-RSSM (per-node latents and multi-head attention without explicit size conditioning) is central to the transfer claim, yet the manuscript provides no analysis of how latent propagation or attention stability behaves as N increases; failure here would induce precisely the distribution shift that undermines generalization at the largest evaluated N values.

minor comments (1)

[Abstract] Abstract: The phrase 'maintains high connectivity' is used without defining the quantitative threshold or metric (e.g., fraction of connected components, average cluster size) or providing baseline comparisons to existing methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for acknowledging the potential significance of G-RSSM. We address each major comment below and indicate the corresponding revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The headline generalization result (high-connectivity policy trained at N=50 transfers to N=30-1000 across 27 scenarios) is presented without any reported world-model validation metrics, such as next-step or multi-step prediction error on held-out trajectories, rollout divergence statistics for connectivity or energy, or ablations on attention scaling with N. This absence prevents assessment of whether observed policy performance arises from accurate capture of coupled mobility-energy-topology effects or from the policy succeeding despite moderate model mismatch.

Authors: We agree that world-model validation metrics are necessary to substantiate the claims. In the revised manuscript we have added a dedicated subsection reporting next-step and multi-step prediction errors on held-out trajectories, rollout divergence statistics for connectivity and energy, and ablations on attention scaling with N. These metrics confirm that the G-RSSM accurately captures the coupled dynamics, indicating that policy performance derives from faithful modeling rather than mismatch. revision: yes
Referee: [Abstract (and implied method description)] The size-agnostic property of the G-RSSM (per-node latents and multi-head attention without explicit size conditioning) is central to the transfer claim, yet the manuscript provides no analysis of how latent propagation or attention stability behaves as N increases; failure here would induce precisely the distribution shift that undermines generalization at the largest evaluated N values.

Authors: The referee correctly notes the lack of explicit scaling analysis. We have added an ablation study and discussion in the revised paper that examines attention weight stability and latent propagation consistency as N scales from 50 to 1000. The results show no significant degradation or distribution shift, supporting the size-agnostic transfer property. revision: yes

Circularity Check

0 steps flagged

No circularity: model learning and policy training remain independent of target performance claims

full rationale

The paper learns G-RSSM parameters from offline trajectories, then optimizes the cluster-head policy exclusively inside imagined rollouts of that model before evaluating the resulting policy on real networks. No equation, definition, or self-citation is shown that makes the reported connectivity or size-agnostic generalization equivalent to the training data or fitted parameters by construction. The size-agnostic property is presented as an empirical outcome of the attention-based architecture rather than a definitional or fitted-input result. Because the central claim rests on external evaluation rather than on any reduction to the model's own inputs, the derivation chain contains no load-bearing circular step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that offline trajectories contain sufficient coverage of mobility-energy-topology couplings and that graph attention can represent those couplings in latent space without explicit physics equations.

axioms (1)

domain assumption Network dynamics can be captured by per-node latent states updated via recurrent transitions and cross-node attention.
Invoked in the model design to replace analytic modeling of mobility, energy, and topology.

pith-pipeline@v0.9.0 · 5471 in / 1292 out tokens · 36737 ms · 2026-05-10T11:58:43.241590+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 13 canonical work pages · 4 internal anchors

[2]

World Models

[Online]. Available: http://arxiv.org/abs/1803.10122

work page internal anchor Pith review arXiv
[3]

Mastering Diverse Domains through World Models

D. Hafner, J. Pasukonis, J. Ba, and T. Lillicrap, “Mastering diverse domains through world models,” 2024. [Online]. Available: https://arxiv.org/abs/2301.04104

work page internal anchor Pith review arXiv 2024
[4]

World model-based learning for long-term age of information minimization in vehicular networks,

L. Wang, R. Shelim, W. Saad, and N. Ramakrishnan, “World model-based learning for long-term age of information minimization in vehicular networks,” 2025. [Online]. Available: https://arxiv.org/abs/ 2505.01712

work page arXiv 2025
[5]

World models for cognitive agents: Transforming 22 edge intelligence in future networks,

C. Zhao, R. Zhang, J. Wang, G. Zhao, D. Niyato, G. Sun, S. Mao, and D. I. Kim, “World models for cognitive agents: Transforming edge intelligence in future networks,” 2025. [Online]. Available: https://arxiv.org/abs/2506.00417

work page arXiv 2025
[6]

Dual-mind world models: A general framework for learning in dynamic wireless networks,

L. Wang, R. Shelim, W. Saad, and N. Ramakrishnan, “Dual-mind world models: A general framework for learning in dynamic wireless networks,” 2025. [Online]. Available: https://arxiv.org/abs/2510.24546

work page arXiv 2025
[7]

Variational graph recurrent neural networks,

E. Hajiramezanali, A. Hasanzadeh, K. Narayanan, N. Duffield, M. Zhou, and X. Qian, “Variational graph recurrent neural networks,” inAdvances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019

2019
[8]

Graph dreamer: Temporal graph world models for sample-efficient and generalisable reinforcement learning,

A. Berkes, D. Vakalis, Y . Bengio, and D. Rolnick, “Graph dreamer: Temporal graph world models for sample-efficient and generalisable reinforcement learning,” inWomen in Machine Learning Workshop @ NeurIPS 2025, 2026. [Online]. Available: https: //openreview.net/forum?id=pHmgNUZixd

2025
[9]

Model- based reinforcement learning: A survey,

T. M. Moerland, J. Broekens, A. Plaat, and C. M. Jonker, “Model- based reinforcement learning: A survey,” 2022. [Online]. Available: https://arxiv.org/abs/2006.16712

work page arXiv 2022
[10]

Learning latent dynamics for planning from pixels,

D. Hafner, T. Lillicrap, I. Fischer, R. Villegas, D. Ha, H. Lee, and J. Davidson, “Learning latent dynamics for planning from pixels,”
[11]

Learning Latent Dynamics for Planning from Pixels

[Online]. Available: https://arxiv.org/abs/1811.04551

work page Pith review arXiv
[12]

Dream to Control: Learning Behaviors by Latent Imagination

D. Hafner, T. Lillicrap, J. Ba, and M. Norouzi, “Dream to control: Learning behaviors by latent imagination,” 2020. [Online]. Available: https://arxiv.org/abs/1912.01603

work page internal anchor Pith review arXiv 2020
[13]

Relational state-space model for stochastic multi-object systems,

F. Yang, L. Chen, F. Zhou, Y . Gao, and W. Cao, “Relational state-space model for stochastic multi-object systems,” 2020. [Online]. Available: https://arxiv.org/abs/2001.04050

work page arXiv 2020
[14]

Variational graph recurrent neural networks,

E. Hajiramezanali, A. Hasanzadeh, N. Duffield, K. R. Narayanan, M. Zhou, and X. Qian, “Variational graph recurrent neural networks,”
[15]

Available: https://arxiv.org/abs/1908.09710

[Online]. Available: https://arxiv.org/abs/1908.09710

work page arXiv 1908
[16]

& Yahav, E

S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” 2022. [Online]. Available: https://arxiv.org/abs/2105.14491

work page arXiv 2022
[17]

Graph Attention Networks

P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y . Bengio, “Graph attention networks,” 2018. [Online]. Available: https://arxiv.org/abs/1710.10903

work page internal anchor Pith review arXiv 2018
[18]

Wireless power control via counterfactual optimization of graph neural networks,

N. Naderializadeh, M. Eisen, and A. Ribeiro, “Wireless power control via counterfactual optimization of graph neural networks,” 2020. [Online]. Available: https://arxiv.org/abs/2002.07631

work page arXiv 2020
[19]

Energy- efficient communication protocol for wireless microsensor networks,

W. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “Energy- efficient communication protocol for wireless microsensor networks,” inProceedings of the 33rd Annual Hawaii International Conference on System Sciences, 2000, pp. 10 pp. vol.2–

2000
[20]

Adaptive clustering for mobile wireless net- works,

C. Lin and M. Gerla, “Adaptive clustering for mobile wireless net- works,”IEEE Journal on Selected Areas in Communications, vol. 15, no. 7, pp. 1265–1275, 1997

1997
[21]

An on-demand weighted clustering algorithm (wca) for ad hoc networks,

M. Chatterjee, S. Das, and D. Turgut, “An on-demand weighted clustering algorithm (wca) for ad hoc networks,” inGlobecom ’00 - IEEE. Global Telecommunications Conference. Conference Record (Cat. No.00CH37137), vol. 3, 2000, pp. 1697–1701 vol.3

2000
[22]

A comment on

C.-H. Lin and M.-J. Tsai, “A comment on "heed: A hybrid, energy- efficient, distributed clustering approach for ad hoc sensor networks’,” IEEE Transactions on Mobile Computing, vol. 5, no. 10, pp. 1471–1472, 2006

2006
[23]

Distributed clustering for ad hoc networks,

S. Basagni, “Distributed clustering for ad hoc networks,” inProceedings of the 1999 International Symposium on Parallel Architectures, Algo- rithms and Networks, ser. ISPAN ’99. USA: IEEE Computer Society, 1999, p. 310

1999