arxiv: 2604.19404 · v1 · submitted 2026-04-21 · 💻 cs.RO · cs.AI

Recognition: unknown

M²GRPO: Mamba-based Multi-Agent Group Relative Policy Optimization for Biomimetic Underwater Robots Pursuit

Yukai Feng , Zhiheng Wu , Zhengxing Wu , Junwen Gu , Junzhi Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:16 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords multi-agent reinforcement learningMamba state space modelgroup relative policy optimizationcooperative pursuitunderwater robotsbiomimetic systemsCTDEpartial observability

0 comments

The pith

A Mamba-based multi-agent policy optimization method raises pursuit success and efficiency for biomimetic underwater robot teams.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops M²GRPO to handle long-horizon decisions, partial information, and robot coordination in underwater pursuit tasks. It pairs a selective state-space Mamba policy that processes observation history and relational features with group-relative advantages obtained by normalizing rewards across agents in each episode. This combination runs under centralized training with decentralized execution and aims to deliver stable updates while cutting training demands. If the approach holds, multi-robot systems could coordinate more reliably in real underwater settings without the heavy compute costs of prior methods.

Core claim

The central claim is that integrating a Mamba policy—which uses observation history to capture temporal dependencies and attention-based features to encode inter-agent interactions—with group-relative policy optimization under the CTDE paradigm produces higher pursuit success rates and capture efficiency than MAPPO or recurrent baselines, as shown by extensive simulations and real-world pool experiments across team sizes and evader strategies.

What carries the argument

The selective state-space Mamba policy with attention-based relational encoding, paired with group-relative advantage normalization that computes advantages by averaging rewards across agents within an episode.

If this is right

Enables stable policy updates with lower training resource needs for multi-agent coordination.
Produces bounded continuous actions through normalized Gaussian sampling suitable for robot actuators.
Maintains performance gains across varying team scales and different evader behaviors in both simulation and physical tests.
Supports decentralized execution after centralized training for practical deployment on individual robots.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The combination of Mamba state-space models and group normalization may apply to other partially observable multi-robot tasks such as formation control or search missions.
Reward normalization across agents could serve as a general technique to stabilize credit assignment in long-horizon group reinforcement learning problems.
Substituting recurrent networks with Mamba policies might reduce inference latency for real-time decisions on resource-limited underwater platforms.

Load-bearing premise

Normalizing rewards across agents within each episode yields stable credit assignment and scalable policy updates without introducing bias that harms long-horizon coordination under partial observability.

What would settle it

Experiments showing that M²GRPO achieves no higher or lower success rates and capture efficiency than MAPPO when team size grows or evaders adopt more complex paths would falsify the claim of consistent outperformance.

Figures

Figures reproduced from arXiv: 2604.19404 by Junwen Gu, Junzhi Yu, Yukai Feng, Zhengxing Wu, Zhiheng Wu.

**Figure 1.** Figure 1: Overall framework of the proposed M2GRPO algorithm, which consists of three components: (a) CTDE paradigm: centralized training with decentralized execution, where agents share environment information and update in parallel during training, but rely solely on local observations and history for independent decision-making at the execution stage; (b) Mamba Policy: a selective state-space architecture that mo… view at source ↗

**Figure 3.** Figure 3: Illustration of the pursuit–evasion task with two pursuers Pi, Pj and one evader E. Each pursuer is assigned a perception range Rc. The distance between the pursuit and evader is denoted as di,e. where less maneuverable pursuers must coordinate efficiently and adapt their strategies adaptively to capture the agile evader. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Capture success rate of pursuers under different evader strategies: (i) evader with a learned policy; (ii) evader with a random policy. 14 17 15 20 17 21 19 23 5 10 15 20 25 Random Escape Learned Escape M 2GRPO MAPPO HAPPO MASAC Average Success Steps [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Average steps to successful capture under different evader strategies: (i) evader with a learned policy, (ii) evader with a random policy. 2 3 4 5 6 Number of pursuers 75 80 85 90 95 100 Success Rate (%) M 2GRPO MAPPO MHPPO MASAC M 2GRPO MAPPO MHPPO MASAC [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: The success rate of the pursuit for different numbers of pursuers. three representative baselines, and ablation studies examine the contribution of key modules. In real-world experiments, the learned policy is deployed on biomimetic robot shark platforms to verify its feasibility and effectiveness. During testing, the evader adopts either a random strategy or a learned strategy as detailed in [32]. A. S… view at source ↗

**Figure 7.** Figure 7: Snapshots of the cooperative pursuit experiment for bionic underwater robots Episode time (s) X position (m) Y position (m) 0 5 10 15 20 25 30 -2 -1 0 1 2 0 5 10 15 20 25 30 -2 -1 0 1 2 0 5 10 15 20 25 30 -2 -1 0 1 2 Evader Pursuer1 Pursuer2 Episode time (s) (a) (b) Evader Pursuer1 Pursuer2 [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 8.** Figure 8: Positional relationship between the evader and the pursuers. ral modeling branch, (ii) removing interaction encoding, and (iii) replacing the Mamba backbone with a plain MLP. As shown in Table I, the complete model (M2GRPO+Mamba) achieves the strongest performance. Notably, among the variants, removing interaction encoding leads to the most significant degradation, demonstrating the essential role of … view at source ↗

**Figure 9.** Figure 9: Distance variation between the evader and the pursuers. For quantitative analysis, [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗

read the original abstract

Traditional policy learning methods in cooperative pursuit face fundamental challenges in biomimetic underwater robots, where long-horizon decision making, partial observability, and inter-robot coordination require both expressiveness and stability. To address these issues, a novel framework called Mamba-based multi-agent group relative policy optimization (M$^{2}$GRPO) is proposed, which integrates a selective state-space Mamba policy with group-relative policy optimization under the centralized-training and decentralized-execution (CTDE) paradigm. Specifically, the Mamba-based policy leverages observation history to capture long-horizon temporal dependencies and exploits attention-based relational features to encode inter-agent interactions, producing bounded continuous actions through normalized Gaussian sampling. To further improve credit assignment without sacrificing stability, the group-relative advantages are obtained by normalizing rewards across agents within each episode and optimized through a multi-agent extension of GRPO, significantly reducing the demand for training resources while enabling stable and scalable policy updates. Extensive simulations and real-world pool experiments across team scales and evader strategies demonstrate that M$^{2}$GRPO consistently outperforms MAPPO and recurrent baselines in both pursuit success rate and capture efficiency. Overall, the proposed framework provides a practical and scalable solution for cooperative underwater pursuit with biomimetic robot systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

M²GRPO pairs Mamba with group-relative PO for underwater pursuit but the abstract supplies no numbers or ablations to back the performance claims.

read the letter

The paper's core move is to replace recurrent policies with Mamba state-space models inside a multi-agent GRPO setup under CTDE, then normalize rewards across agents in each episode to compute group-relative advantages. That combination is not a routine extension of the cited baselines, so the named framework is genuinely new for the biomimetic underwater pursuit niche. The motivation section correctly flags long-horizon dependencies and partial observability as the real bottlenecks, and the Mamba choice plus attention-based relational features is a plausible way to address them while keeping actions bounded via normalized Gaussian sampling. The claim that this reduces training resources while staying stable is at least directionally sensible for CTDE methods. What is missing is any quantitative evidence. The abstract asserts consistent outperformance over MAPPO and recurrent baselines in success rate and capture efficiency across team scales and evader strategies, yet it gives no tables, no ablation results, no statistical tests, and no details on how the normalization step was validated. Without those, the central empirical result cannot be assessed. The stress-test concern about per-episode reward normalization potentially biasing credit assignment under partial observability and heterogeneous contributions is therefore live; if an agent's contribution is masked or if variance is compressed in sparse-reward settings, the reported gains could be overstated. The full methods and results sections will need to show that the normalization does not introduce systematic under- or over-crediting before the stability claim holds. This work is aimed at researchers already working on multi-agent RL for marine robotics or on Mamba applications in partially observable domains. A reader in that subfield would get a concrete implementation sketch and a set of experiments worth replicating, but the paper is not yet ready for broad citation. I would send it to peer review because the problem is well-posed, the architectural choice is reasonable, and the experiments are described as extensive; a referee can check whether the missing numbers and ablations actually support the conclusions or whether the normalization step needs revision.

Referee Report

2 major / 2 minor

Summary. The paper proposes M²GRPO, a Mamba-based multi-agent group relative policy optimization framework for cooperative pursuit tasks with biomimetic underwater robots. It integrates selective state-space Mamba policies with attention-based relational features under the CTDE paradigm, computes group-relative advantages by normalizing rewards across agents within each episode, and claims superior pursuit success rates and capture efficiency over MAPPO and recurrent baselines in simulations and real-world pool experiments across varying team scales and evader strategies.

Significance. If the empirical claims hold with proper validation, the integration of Mamba for long-horizon temporal modeling and group-relative normalization for credit assignment could offer a resource-efficient approach to stable multi-agent RL in partially observable robotic settings, addressing coordination challenges in underwater pursuit without excessive training demands.

major comments (2)

[Abstract] Abstract: the central empirical claim of consistent outperformance in pursuit success rate and capture efficiency is asserted without any quantitative metrics, ablation studies, or statistical tests, preventing evaluation of whether the reported gains are load-bearing or attributable to the proposed components.
[Methods (group-relative advantages)] Description of group-relative advantages: normalizing rewards across agents within each episode to obtain advantages implicitly assumes comparable per-agent reward distributions despite partial observability, heterogeneous contributions, and Mamba-handled long-horizon dependencies; this risks systematic bias in credit assignment for sparse-reward pursuit, and requires explicit comparison to per-agent normalization or variance-preserving alternatives to substantiate stability and scalability claims.

minor comments (2)

[Abstract] The abstract would benefit from inclusion of specific performance deltas (e.g., success rate improvements) and details on the number of trials or statistical significance to ground the outperformance statements.
[Methods] Notation for the normalized Gaussian sampling of continuous actions and the multi-agent GRPO update rule should be defined with explicit equations for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our empirical claims and methodological choices. We address each major comment below with clarifications from the manuscript and commit to targeted revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim of consistent outperformance in pursuit success rate and capture efficiency is asserted without any quantitative metrics, ablation studies, or statistical tests, preventing evaluation of whether the reported gains are load-bearing or attributable to the proposed components.

Authors: We acknowledge that the abstract states the performance improvements qualitatively. The full manuscript reports quantitative results, including specific success rates, capture efficiencies, ablation studies on the Mamba policy and group-relative components, and statistical significance across simulation and pool experiments in Sections 4 and 5. In the revision we will incorporate key quantitative metrics and references to the ablations and tests into the abstract to make the central claims immediately evaluable. revision: yes
Referee: [Methods (group-relative advantages)] Description of group-relative advantages: normalizing rewards across agents within each episode to obtain advantages implicitly assumes comparable per-agent reward distributions despite partial observability, heterogeneous contributions, and Mamba-handled long-horizon dependencies; this risks systematic bias in credit assignment for sparse-reward pursuit, and requires explicit comparison to per-agent normalization or variance-preserving alternatives to substantiate stability and scalability claims.

Authors: The group-relative normalization is motivated by the cooperative nature of the pursuit task, where agents share a joint objective; it is applied within each episode to reduce advantage variance while preserving relative contributions under CTDE. We recognize that partial observability and role heterogeneity could introduce bias and that direct comparisons would strengthen the stability claims. We will add an ablation study in the revised manuscript comparing group-relative normalization to per-agent normalization and variance-preserving alternatives, reporting the resulting effects on training stability and scalability. revision: yes

Circularity Check

0 steps flagged

No significant circularity in M²GRPO framework

full rationale

The paper proposes an algorithmic framework (Mamba policy + group-relative advantages via per-episode reward normalization under CTDE) and validates it empirically via simulations and pool experiments. No first-principles derivation, uniqueness theorem, or fitted quantity is presented as a 'prediction' that reduces by construction to its own inputs. The normalization step is an explicit design choice for credit assignment, not a self-referential result. No self-citations or ansatz smuggling appear in the abstract or described chain. The method is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities can be extracted beyond standard RL assumptions such as Markov decision processes.

pith-pipeline@v0.9.0 · 5535 in / 1111 out tokens · 29280 ms · 2026-05-10T02:16:46.239876+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 6 canonical work pages · 4 internal anchors

[1]

Review of research and control technology of underwater bionic robots,

Z. Cui, L. Li, Y. Wang, Z. Zhong, and J. Li, “Review of research and control technology of underwater bionic robots,” Intell. Mar. Technol. Syst. , vol. 1, no. 1, p. 7, 2023

2023
[2]

A versatile jellyfish-like robotic platform for effective underwater propulsion and manipulation,

T. Wang, H.-J. Joo, S. Song, W. Hu, C. Keplinger, and M. Sitti, “A versatile jellyfish-like robotic platform for effective underwater propulsion and manipulation,” Sci. Adv. , vol. 9, no. 15, p. eadg0292, 2023

2023
[3]

Bioinspired soft robots for deep-sea exploration,

G. Li, T.-W. Wong, B. Shih, C. Guo, L. Wang, J. Liu, T. Wang, X. Liu, J. Yan, B. Wu, et al. , “Bioinspired soft robots for deep-sea exploration,” Nat. Commun. , vol. 14, no. 1, p. 7097, 2023

2023
[4]

Study on the hydrodynamic performance of a self-propelled robot fish swimming in pipelines environment,

O. Xie, C. Zhang, C. Shen, Y. Li, and D. Zhou, “Study on the hydrodynamic performance of a self-propelled robot fish swimming in pipelines environment,” Ocean Eng., vol. 309, p. 118356, 2024

2024
[5]

Agile robotic fish based on direct drive of continuum body,

K. Iguchi, T. Shimooka, S. Uchikai, Y. Konno, H. Tanaka, Y. Ikemoto, and J. Shintake, “Agile robotic fish based on direct drive of continuum body,” npj Robot., vol. 2, no. 1, p. 7, 2024

2024
[6]

Implicit coordination for 3D underwater collective behaviors in a fish-inspired robot swarm,

F. Berlinger, M. Gauci, and R. Nagpal, “Implicit coordination for 3D underwater collective behaviors in a fish-inspired robot swarm,” Sci. Robot., vol. 6, no. 50, p. eabd8668, 2021

2021
[7]

A survey of autonomous under- water vehicle formation: Performance, formation control, and communication capability,

Y. Yang, Y. Xiao, and T. Li, “A survey of autonomous under- water vehicle formation: Performance, formation control, and communication capability,” IEEE Commun. Surv. Tutor. , vol. 23, no. 2, pp. 815–841, 2021

2021
[8]

Cooperative artificial intelligence for underwater robotic swarm,

W. Cai, Z. Liu, M. Zhang, and C. Wang, “Cooperative artificial intelligence for underwater robotic swarm,” Robot. Auton. Syst., vol. 164, p. 104410, 2023

2023
[9]

Approximate methods for visibility-based pursuit-evasion,

E. Antonio, I. Becerra, and R. Murrieta-Cid, “Approximate methods for visibility-based pursuit-evasion,” IEEE Trans. Robot., early access, 2024

2024
[10]

Zero-sum differential game guidance law for missile interception engagement via neuro-dynamic programming,

A. Xi, Y. Cai, Y. Deng, and H. Jiang, “Zero-sum differential game guidance law for missile interception engagement via neuro-dynamic programming,” Proc. Inst. Mech. Eng., Part G: J. Aerosp. Eng. , vol. 237, no. 14, pp. 3352–3366, 2023

2023
[11]

A novel graph-based motion planner of multi-mobile robot systems with formation and obstacle constraints,

W. Liu, J. Hu, H. Zhang, M. Y. Wang, and Z. Xiong, “A novel graph-based motion planner of multi-mobile robot systems with formation and obstacle constraints,” IEEE Trans. Robot., vol. 40, pp. 714–728, 2023

2023
[12]

A visibility-based pursuit-evasion game between two nonholonomic robots in environments with obstacles,

E. Lozano, I. Becerra, U. Ruiz, L. Bravo, and R. Murrieta- Cid, “A visibility-based pursuit-evasion game between two nonholonomic robots in environments with obstacles,” Auton. Robots, vol. 46, no. 2, pp. 349–371, 2022

2022
[13]

Multiplayer pursuit-evasion differential games with malicious pursuers,

Y. Xu, H. Yang, B. Jiang, and M. M. Polycarpou, “Multiplayer pursuit-evasion differential games with malicious pursuers,” IEEE Trans. Autom. Control , vol. 67, no. 9, pp. 4939–4946, 2022

2022
[14]

Particle swarm optimization algorithm for the optimization of rescue task allocation with uncertain time constraints,

N. Geng, Z. Chen, Q. A. Nguyen, and D. Gong, “Particle swarm optimization algorithm for the optimization of rescue task allocation with uncertain time constraints,” Complex Intell. Syst. , vol. 7, no. 2, pp. 873–890, 2021

2021
[15]

Comparison of two optimal guidance methods for the long-distance orbital pursuit-evasion game,

X. Zeng, L. Yang, Y. Zhu, and F. Yang, “Comparison of two optimal guidance methods for the long-distance orbital pursuit-evasion game,” IEEE Trans. Aerosp. Electron. Syst. , vol. 57, no. 1, pp. 521–539, 2020

2020
[16]

Multi-robot cooperative pursuit via potential field-enhanced reinforcement learning,

Z. Zhang, X. Wang, Q. Zhang, and T. Hu, “Multi-robot cooperative pursuit via potential field-enhanced reinforcement learning,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA) , 2022, pp. 8808–8814

2022
[17]

Multi-target pursuit by a decentralized heterogeneous UA V swarm using deep multi-agent reinforcement learning,

M. Kouzeghar, Y. Song, M. Meghjani, and R. Bouffanais, “Multi-target pursuit by a decentralized heterogeneous UA V swarm using deep multi-agent reinforcement learning,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), 2023, pp. 3289– 3295

2023
[18]

Decentralized multi-agent pursuit using deep reinforcement learning,

C. De Souza, R. Newbury, A. Cosgun, P. Castillo, B. Vidolov, and D. Kulić, “Decentralized multi-agent pursuit using deep reinforcement learning,” IEEE Robot. Autom. Lett., vol. 6, no. 3, pp. 4552–4559, 2021

2021
[19]

An improved approach towards multi-agent pursuit–evasion game decision-making using deep reinforcement learning,

K. Wan, D. Wu, Y. Zhai, B. Li, X. Gao, and Z. Hu, “An improved approach towards multi-agent pursuit–evasion game decision-making using deep reinforcement learning,” Entropy, vol. 23, no. 11, p. 1433, 2021

2021
[20]

Large scale pursuit-evasion under collision avoidance using deep reinforce- ment learning,

H. Yang, P. Ge, J. Cao, Y. Yang, and Y. Liu, “Large scale pursuit-evasion under collision avoidance using deep reinforce- ment learning,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS) , 2023, pp. 2232–2239

2023
[21]

Distributed pursuit–evasion game decision-making based on multi-agent deep reinforce- ment learning,

Y. Lin, H. Gao, and Y. Xia, “Distributed pursuit–evasion game decision-making based on multi-agent deep reinforce- ment learning,” Electronics, vol. 14, no. 11, p. 2141, 2025

2025
[22]

Recurrent prediction model for partially observable MDPs,

S. Xie, Z. Zhang, H. Yu, and X. Luo, “Recurrent prediction model for partially observable MDPs,” Inf. Sci. , vol. 620, pp. 125–141, 2023

2023
[23]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv:2312.00752 , 2023

work page internal anchor Pith review arXiv 2023
[24]

MARL-MambaContour: Unleashing multi-agent deep reinforcement learning for active contour optimization in medical image segmentation,

R. Zhang, Y. Sun, Z. Zhang, J. Li, X. Liu, A. H. Fan, H. Guo, and P. Yan, “MARL-MambaContour: Unleashing multi-agent deep reinforcement learning for active contour optimization in medical image segmentation,” arXiv preprint arXiv:2506.18679, 2025

work page arXiv 2025
[25]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

T. Dao and A. Gu, “Transformers are SSMs: Generalized models and eﬀicient algorithms through structured state space duality,” arXiv preprint arXiv:2405.21060 , 2024

work page internal anchor Pith review arXiv 2024
[26]

Chem- ical language modeling with structured state space sequence models,

R. Özçelik, S. de Ruiter, E. Criscuolo, and F. Grisoni, “Chem- ical language modeling with structured state space sequence models,” Nat. Commun. , vol. 15, no. 1, p. 6176, 2024

2024
[27]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, et al. , “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,” arXiv preprint arXiv:2402.03300 , 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Bi, et al., “Deepseek-r1: Incentivizing reasoning capability in LLMs via reinforcement learning,” arXiv preprint arXiv:2501.12948 , 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

Reinforcement learning with verifiable rewards: Grpo’s effective loss, dynamics, and success amplification.arXiv preprint arXiv:2503.06639,

Y. Mroueh, “Reinforcement learning with verifiable rewards: GRPO’s effective loss, dynamics, and success amplification,” arXiv preprint arXiv:2503.06639 , 2025

work page arXiv 2025
[30]

Real-world learning control for autonomous exploration of a biomimetic robotic shark,

S. Yan, Z. Wu, J. Wang, Y. Huang, M. Tan, and J. Yu, “Real-world learning control for autonomous exploration of a biomimetic robotic shark,” IEEE Trans. Ind. Electron. , vol. 70, no. 4, pp. 3966–3974, 2022

2022
[31]

Decentralized multirobotic fish pursuit control with attraction-enhanced reinforcement learning,

Y. Feng, Z. Wu, J. Wang, J. Gu, F. Yu, J. Yu, and M. Tan, “Decentralized multirobotic fish pursuit control with attraction-enhanced reinforcement learning,” IEEE Trans. Ind. Electron., vol. 72, no. 8, pp. 8290–8300, 2025

2025
[32]

Cooperative pursuit policy for bionic underwater robot based on MARL-MHSA architecture: Data-driven modeling and distributed strategy optimization,

Y.-K. Feng, Z.-X. Wu, and M. Tan, “Cooperative pursuit policy for bionic underwater robot based on MARL-MHSA architecture: Data-driven modeling and distributed strategy optimization,” Acta Autom. Sin., vol. 51, no. 9, pp. 1001–1014, 2025

2025