arxiv: 2604.12920 · v3 · submitted 2026-04-14 · 💻 cs.NI

Graph-based Hierarchical Deep Reinforcement Learning for Deliverable Block Propagation with Optimal Hybrid Cost in Web 3.0

Shi Chen , Jinbo Wen , Jiawen Kang , Tenghui Huang , Maomao Zhang , Tao Zhang , Dong In Kim This is my paper

Pith reviewed 2026-05-10 13:54 UTC · model grok-4.3

classification 💻 cs.NI

keywords block propagationconsortium blockchaindeep reinforcement learninggraph neural networksWeb 3.0hybrid cost optimizationAge of Validated Block

0 comments

The pith

A graph-based hierarchical reinforcement learning method optimizes block propagation in consortium blockchains to reduce a hybrid cost of timeliness and delivery.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve block delivery in consortium blockchains for Web 3.0 by addressing redundancy and latency in gossip protocols while accounting for nodes that are only intermittently available. It defines a new Age of Validated Block metric that ignores arrivals outside each node's availability window, then combines this with block arrival rate into a single hybrid cost. A Graph-based Hierarchical Deep Reinforcement Learning approach with graph isomorphism and attention networks is trained in two stages to make assignment and propagation decisions that minimize this cost. If the approach works, consortium networks could achieve more reliable synchronization at lower overhead across varying sizes. A sympathetic reader would care because enterprise Web 3.0 applications depend on timely, complete block delivery without wasting bandwidth on unavailable nodes.

Core claim

The central claim is that the proposed GHDRL framework, built from a graph isomorphism network assignment module and a graph attention network propagation module trained jointly in two stages, produces lower hybrid cost than baselines while generalizing from 100-peer training instances to 500-peer deployments without retraining, achieving up to 19.2 percent improvement over the best neural baseline across scales from 50 to 500 peers.

What carries the argument

The Graph-based Hierarchical Deep Reinforcement Learning (GHDRL) method, which uses a graph isomorphism network to assign propagation responsibilities and a graph attention network to select propagation actions, jointly minimizing the hybrid cost of Age of Validated Block (AoVB) timeliness and block arrival rate.

Load-bearing premise

The simulated heterogeneous availability patterns of consensus nodes and the hybrid cost objective accurately capture the real-world timeliness and delivery requirements of consortium blockchain networks.

What would settle it

Running the GHDRL policy on a live consortium blockchain testbed with recorded real-node availability traces and measuring the resulting hybrid cost against the simulated results would directly test whether the reported gains persist.

read the original abstract

Web 3.0 is envisioned as a decentralized paradigm, where blockchain serves as a core technology for transparent and tamper-proof data management. Among various blockchain architectures, consortium blockchains have emerged as the preferred platform for enterprise-grade Web 3.0. For consortium blockchains, newly generated blocks are generally propagated to all consensus nodes for validation through the gossip protocol. However, gossip-based propagation may introduce substantial message redundancy and tail latency. Moreover, the consensus nodes exhibit heterogeneous availability patterns, and existing block propagation schemes often overlook such temporal constraints. Therefore, the joint optimization of propagation timeliness and delivery coverage remains an open problem. In this paper, we propose a deliverable block propagation optimization framework for consortium blockchain-enabled Web 3.0. We first propose a delivery-aware timeliness metric called Age of Validated Block (AoVB), which excludes block receptions occurring outside the availability window of each consensus node, thereby measuring only actionable synchronization latency. This metric is unified with the block arrival rate into a hybrid cost objective that balances timeliness against delivery. To solve this complex optimization problem, we propose a Graph-based Hierarchical Deep Reinforcement Learning (GHDRL) method, which comprises a graph isomorphism network-based assignment module and a graph attention network-based propagation module. The two modules are optimized jointly under a two-stage training strategy. Numerical results show that GHDRL consistently outperforms all compared schemes across network scales from 50 to 500 peers, achieving up to 19.2% lower hybrid cost than the best-performing neural baseline. Moreover, the model generalizes from 100-peer training instances to 500-peer deployments without retraining.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract introduces a new AoVB timeliness metric and a two-module GHDRL architecture for block propagation under heterogeneous node availability, but the 19.2% gain and generalization claims rest on simulation details that are not described.

read the letter

The paper's core move is to define Age of Validated Block (AoVB) so that only deliveries inside a node's availability window count toward the timeliness score, then fold that together with arrival rate into a single hybrid cost. It pairs this with a hierarchical setup: a graph isomorphism network for assignment decisions and a graph attention network for propagation, trained in two stages. That structure is a reasonable way to handle the joint problem of coverage and latency in a gossip setting with time-varying nodes, and the reported ability to train on 100-peer instances and deploy to 500 without retraining would be practically useful if it holds.

Referee Report

2 major / 1 minor

Summary. The paper proposes a deliverable block propagation optimization framework for consortium blockchain-enabled Web 3.0. It introduces the Age of Validated Block (AoVB) metric, which accounts for heterogeneous consensus node availability windows by excluding out-of-window receptions, and unifies AoVB with block arrival rate into a hybrid cost objective. To solve the resulting joint optimization, the authors present Graph-based Hierarchical Deep Reinforcement Learning (GHDRL), consisting of a graph isomorphism network-based assignment module and a graph attention network-based propagation module trained jointly via a two-stage strategy. Numerical results are claimed to show consistent outperformance over baselines for networks of 50–500 peers, with up to 19.2% lower hybrid cost than the best neural baseline, plus generalization from 100-peer training instances to 500-peer deployments without retraining.

Significance. If the simulation-based claims can be substantiated with full experimental details and realistic node-availability models, the work would offer a concrete advance in latency-aware block dissemination for enterprise blockchains. The AoVB metric provides a useful way to incorporate temporal availability constraints into propagation objectives, while the hierarchical GNN-RL architecture addresses scalability in joint timeliness-coverage optimization. These contributions would be relevant to the blockchain, graph neural networks, and applied reinforcement learning communities.

major comments (2)

Abstract: The central quantitative claims—up to 19.2% lower hybrid cost and generalization from 100- to 500-peer instances—are stated without any description of the experimental setup, generative model for heterogeneous availability patterns, baseline algorithms, number of simulation runs, error bars, or statistical tests. This absence renders the reported performance gains unverifiable and prevents assessment of whether the hybrid cost formulation actually supports the 19.2% figure.
Abstract: The precise mathematical definition of the hybrid cost objective (AoVB combined with arrival rate) is not supplied. Without an explicit formulation or weighting, it is impossible to determine how the objective balances timeliness against delivery coverage or to reproduce the optimization problem solved by GHDRL.

minor comments (1)

Abstract: The description of the two-stage training strategy is too brief to convey how the graph isomorphism and attention modules are jointly optimized or how the assignment and propagation stages interact.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help improve the clarity of our abstract. We address each major comment below and will incorporate revisions to make the key claims more verifiable while respecting abstract length constraints.

read point-by-point responses

Referee: Abstract: The central quantitative claims—up to 19.2% lower hybrid cost and generalization from 100- to 500-peer instances—are stated without any description of the experimental setup, generative model for heterogeneous availability patterns, baseline algorithms, number of simulation runs, error bars, or statistical tests. This absence renders the reported performance gains unverifiable and prevents assessment of whether the hybrid cost formulation actually supports the 19.2% figure.

Authors: We agree that the abstract, being a concise summary, omits these experimental details, which are required for full verification of the claims. The full manuscript describes the generative model for heterogeneous availability patterns, the neural and non-neural baselines, the use of multiple simulation runs with error bars, and statistical tests in the Numerical Results section. To directly address the concern, we will revise the abstract to include a brief summary of the simulation setup (network scales, run count, and statistical analysis). This is a partial revision, as the complete experimental protocol remains in the body. revision: partial
Referee: Abstract: The precise mathematical definition of the hybrid cost objective (AoVB combined with arrival rate) is not supplied. Without an explicit formulation or weighting, it is impossible to determine how the objective balances timeliness against delivery coverage or to reproduce the optimization problem solved by GHDRL.

Authors: We acknowledge that the abstract does not provide the explicit mathematical formulation of the hybrid cost. The objective unifies AoVB with block arrival rate via a weighting parameter to balance timeliness and coverage, as defined in the manuscript methodology. We will revise the abstract to include a concise description of this objective function, allowing readers to understand the optimization problem addressed by GHDRL. revision: yes

Circularity Check

0 steps flagged

No circularity detected; claims rest on simulation of newly defined metrics

full rationale

The abstract introduces AoVB as a new delivery-aware timeliness metric and unifies it with block arrival rate into a hybrid cost objective, then applies a proposed GHDRL architecture (graph isomorphism and attention networks) under two-stage training. Numerical outperformance (up to 19.2% lower hybrid cost, generalization from 100- to 500-peer instances) is reported from simulations across network scales. No equations, derivations, or self-citations appear in the provided text that would reduce any prediction or result to a fitted input, self-definition, or prior author work by construction. The chain is self-contained: new metrics are defined, an RL method is applied, and empirical comparisons are performed against baselines.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

Based solely on abstract; full details unavailable. The claim rests on the new metric and RL method working in simulations.

free parameters (1)

RL hyperparameters and network sizes
Typical DRL parameters such as learning rates, layer dimensions, and training schedules are chosen or fitted during the two-stage process, though unspecified in abstract.

axioms (2)

domain assumption Consensus nodes exhibit heterogeneous availability patterns that can be modeled to define actionable latency windows.
Directly used to construct the AoVB metric that excludes out-of-window receptions.
ad hoc to paper The complex joint optimization of timeliness and delivery coverage is solvable by jointly training graph isomorphism and attention modules under a two-stage RL strategy.
Core premise for proposing GHDRL as the solution method.

invented entities (2)

Age of Validated Block (AoVB) no independent evidence
purpose: Delivery-aware timeliness metric that measures only actionable synchronization latency by ignoring receptions outside node availability windows.
Newly defined to unify with arrival rate into the hybrid cost.
Graph-based Hierarchical Deep Reinforcement Learning (GHDRL) no independent evidence
purpose: Optimization solver consisting of graph isomorphism network assignment module and graph attention network propagation module trained jointly.
Newly proposed framework to address the open propagation problem.

pith-pipeline@v0.9.0 · 5590 in / 1611 out tokens · 77241 ms · 2026-05-10T13:54:40.184570+00:00 · methodology

Graph-based Hierarchical Deep Reinforcement Learning for Deliverable Block Propagation with Optimal Hybrid Cost in Web 3.0

Core claim

What carries the argument

Load-bearing premise

What would settle it

discussion (0)