Convergence Detection of Asynchronous Iterations based on Modified Recursive Doubling

Frederic Magoules; Qinmeng Zou

arxiv: 1907.01201 · v1 · pith:NTR57WZNnew · submitted 2019-07-02 · 💻 cs.DC

Convergence Detection of Asynchronous Iterations based on Modified Recursive Doubling

Qinmeng Zou , Frederic Magoules This is my paper

Pith reviewed 2026-05-25 10:57 UTC · model grok-4.3

classification 💻 cs.DC

keywords asynchronous iterationsconvergence detectionrecursive doublingdistributed computingreduction operationnon-power-of-two

0 comments

The pith

A modified recursive doubling algorithm allows convergence detection in asynchronous iterations even when the number of processes is not a power of two.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates how to detect when asynchronous iterations have converged in a distributed setting. It modifies the recursive doubling method to handle cases where the processor count is not a power of two. This modification enables the use of reduction operations to check for global convergence. A reader would care because many practical distributed systems do not have power-of-two numbers of nodes, making standard methods inapplicable without extra overhead.

Core claim

The central claim is that a modified recursive doubling algorithm adapts to the non-power-of-two case for distributed convergence detection in asynchronous iterations, with algorithms illustrated based on the reduction operation. A concluding discussion covers implementation and applicability.

What carries the argument

Modified recursive doubling algorithm based on the reduction operation for handling arbitrary process counts in convergence detection.

Load-bearing premise

Asynchronous iterations allow direct use of reduction operations in the modified doubling without extra synchronization that would break the detection logic.

What would settle it

A simulation or run with three processes where the algorithm either misses actual convergence or falsely detects it when iterations are still ongoing.

Figures

Figures reproduced from arXiv: 1907.01201 by Frederic Magoules, Qinmeng Zou.

**Figure 2.** Figure 2: Recursive doubling P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Forward shift function, and then design its behavior by some external interface functions; we could also create a state-based interface that should be invoked repeatedly in user applications, in which some lightweight functions act as different states in the life cycle of a collective operation. Here, we adopt the latter and give an example of state diagram depicted in [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 4.** Figure 4: Statechart of a non-blocking Allreduce function possible, which favors the point-to-point operations like Send and Recv functions. On the other hand, the residual collection requires intrinsically an Allreduce operation. Therefore, we address the convergence detection problem in terms of the non-blocking reduction. We consider first an inexact residual collection strategy that involves only the Allreduce f… view at source ↗

**Figure 5.** Figure 5: Asynchronous iterations in a concentrated environment [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

This paper addresses the distributed convergence detection problem in asynchronous iterations. A modified recursive doubling algorithm is investigated in order to adapt to the non-power-of-two case. Some convergence detection algorithms are illustrated based on the reduction operation. Finally, a concluding discussion about the implementation and the applicability is presented.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript addresses the distributed convergence detection problem in asynchronous iterations. It investigates a modified recursive doubling algorithm adapted to the non-power-of-two case and illustrates some convergence detection algorithms based on the reduction operation, followed by a discussion of implementation and applicability.

Significance. If the illustrated modification correctly extends recursive doubling to arbitrary process counts while preserving the detection logic, the work provides a practical adaptation for distributed systems that leverages existing reduction primitives rather than introducing new synchronization mechanisms. This is a modest but useful contribution at the level of algorithmic illustration, consistent with the absence of formal theorems or quantitative claims. The stress-test concern regarding synchronization overheads does not appear load-bearing given the paper's focus on standard reduction operations.

minor comments (2)

The abstract and manuscript would benefit from explicit pseudocode or a diagram for the modified recursive doubling scheme to make the adaptation to non-power-of-two cases concrete.
The concluding discussion on implementation and applicability could reference specific distributed computing frameworks (e.g., MPI collectives) to ground the claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and positive evaluation of the manuscript. We appreciate the recommendation for minor revision and the recognition that the work offers a practical adaptation of recursive doubling for convergence detection in asynchronous iterations using existing reduction primitives.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes and illustrates a modified recursive doubling scheme for distributed convergence detection in asynchronous iterations, specifically adapting the method to the non-power-of-two case via standard reduction operations. No equations, fitted parameters, self-referential definitions, or load-bearing self-citations appear in the abstract or described contribution. The work remains at the level of algorithmic presentation and discussion, which is self-contained against external benchmarks and does not reduce any claimed result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the work relies on standard concepts of asynchronous iterations and reduction operations from the distributed computing literature.

pith-pipeline@v0.9.0 · 5557 in / 934 out tokens · 33557 ms · 2026-05-25T10:57:40.302582+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Asynchronousiterativemethodsformultiprocessors

G.M.Baudet. Asynchronousiterativemethodsformultiprocessors. J. ACM,25(2):226– 244, 1978

work page 1978
[2]

D. P. Bertsekas and J. N. Tsitsiklis.Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1989

work page 1989
[3]

K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems , 3(1):63–75, 1985

work page 1985
[4]

Chazan and W

D. Chazan and W. Miranker. Chaotic relaxation.Linear Algebra and its Applications , 2(2):199–222, 1969

work page 1969
[5]

M. N. El Tarazi. Some convergence results for asynchronous algorithms.Numerische Mathematik, 39(3):325–340, 1982

work page 1982
[6]

Magoulès and A.-K

F. Magoulès and A.-K. Cheik Ahamed. Alinea: An advanced linear algebra library for massively parallel computations on graphics processing units. The International Journal of High Performance Computing Applications , 29(3):284–310, 2015

work page 2015
[7]

Magoulès, A.-K

F. Magoulès, A.-K. Cheik Ahamed, and R. Putanowicz. Auto-tuned Krylov methods on cluster of graphics processing unit.International Journal of Computer Mathematics , 92(6):1222–1250, 2015

work page 2015
[8]

Magoulès, A.-K

F. Magoulès, A.-K. Cheik Ahamed, and R. Putanowicz. Optimized Schwarz method withoutoverlapforthegravitational potentialequationonclusterofgraphicsprocessing unit. International Journal of Computer Mathematics , 93(6):955–980, 2015

work page 2015
[9]

Magoulès, A.-K

F. Magoulès, A.-K. Cheik Ahamed, and A. Suzuki. Green computing on graphics pro- cessing units. Concurrency and Computation: Practice and Experience , 28(16):4305– 4325, 2016

work page 2016
[10]

Magoulès and G

F. Magoulès and G. Gbikpi-Benissan. JACK: an asynchronous communication kernel library for iterative algorithms.The Journal of Supercomputing, 73(8):3468–3487, 2017

work page 2017
[11]

Distributedconvergencedetectionbasedonglobal residual error under asynchronous iterations.IEEE Transactions on Parallel and Dis- tributed Systems, 29(4):819–829, 2018

F.MagoulèsandG.Gbikpi-Benissan. Distributedconvergencedetectionbasedonglobal residual error under asynchronous iterations.IEEE Transactions on Parallel and Dis- tributed Systems, 29(4):819–829, 2018

work page 2018
[12]

Magoulès and G

F. Magoulès and G. Gbikpi-Benissan. JACK2: An MPI-based communication library withnon-blockingsynchronizationforasynchronousiterations. Advances in Engineering Software, 119:116–133, 2018

work page 2018
[13]

Magoulès, G

F. Magoulès, G. Gbikpi-Benissan, and Q. Zou. Asynchronous iterations of Parareal algorithm for option pricing models.Mathematics, 6(4):1–18, 2018

work page 2018
[14]

Magoulès, D

F. Magoulès, D. B. Szyld, and C. Venet. Asynchronous optimized Schwarz methods with and without overlap.Numerische Mathematik, 137(1):199–227, 2017

work page 2017
[15]

Magoulès and C

F. Magoulès and C. Venet. Asynchronous iterative sub-structuring methods.Mathe- matics and Computers in Simulation , 145:34–49, 2018. 7

work page 2018
[16]

MPI: A message passing interface standard.Inter- national Journal of Supercomputer Applications , 8(3/4):159–416, 1994

Message Passing Interface Forum. MPI: A message passing interface standard.Inter- national Journal of Supercomputer Applications , 8(3/4):159–416, 1994

work page 1994
[17]

J.-C. Miellou. Algorithmes de relaxation chaotique à retards.ESAIM: Mathematical Modelling and Numerical Analysis , 9(R1):55–82, 1975

work page 1975
[18]

Miellou, P

J.-C. Miellou, P. Spiteri, and D. El Baz. A new stopping criterion for linear per- turbed asynchronous iterations. Journal of Computational and Applied Mathematics , 219(2):471–483, 2008

work page 2008
[19]

S. A. Savari and D. P. Bertsekas. Finite termination of asynchronous iterative algo- rithms. Parallel Computing, 22(1):39–56, 1996

work page 1996
[20]

Thakur, R

R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in MPICH. The International Journal of High Performance Computing Applications, 19(1):49–66, 2005. 8

work page 2005

[1] [1]

Asynchronousiterativemethodsformultiprocessors

G.M.Baudet. Asynchronousiterativemethodsformultiprocessors. J. ACM,25(2):226– 244, 1978

work page 1978

[2] [2]

D. P. Bertsekas and J. N. Tsitsiklis.Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1989

work page 1989

[3] [3]

K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems , 3(1):63–75, 1985

work page 1985

[4] [4]

Chazan and W

D. Chazan and W. Miranker. Chaotic relaxation.Linear Algebra and its Applications , 2(2):199–222, 1969

work page 1969

[5] [5]

M. N. El Tarazi. Some convergence results for asynchronous algorithms.Numerische Mathematik, 39(3):325–340, 1982

work page 1982

[6] [6]

Magoulès and A.-K

F. Magoulès and A.-K. Cheik Ahamed. Alinea: An advanced linear algebra library for massively parallel computations on graphics processing units. The International Journal of High Performance Computing Applications , 29(3):284–310, 2015

work page 2015

[7] [7]

Magoulès, A.-K

F. Magoulès, A.-K. Cheik Ahamed, and R. Putanowicz. Auto-tuned Krylov methods on cluster of graphics processing unit.International Journal of Computer Mathematics , 92(6):1222–1250, 2015

work page 2015

[8] [8]

Magoulès, A.-K

F. Magoulès, A.-K. Cheik Ahamed, and R. Putanowicz. Optimized Schwarz method withoutoverlapforthegravitational potentialequationonclusterofgraphicsprocessing unit. International Journal of Computer Mathematics , 93(6):955–980, 2015

work page 2015

[9] [9]

Magoulès, A.-K

F. Magoulès, A.-K. Cheik Ahamed, and A. Suzuki. Green computing on graphics pro- cessing units. Concurrency and Computation: Practice and Experience , 28(16):4305– 4325, 2016

work page 2016

[10] [10]

Magoulès and G

F. Magoulès and G. Gbikpi-Benissan. JACK: an asynchronous communication kernel library for iterative algorithms.The Journal of Supercomputing, 73(8):3468–3487, 2017

work page 2017

[11] [11]

Distributedconvergencedetectionbasedonglobal residual error under asynchronous iterations.IEEE Transactions on Parallel and Dis- tributed Systems, 29(4):819–829, 2018

F.MagoulèsandG.Gbikpi-Benissan. Distributedconvergencedetectionbasedonglobal residual error under asynchronous iterations.IEEE Transactions on Parallel and Dis- tributed Systems, 29(4):819–829, 2018

work page 2018

[12] [12]

Magoulès and G

F. Magoulès and G. Gbikpi-Benissan. JACK2: An MPI-based communication library withnon-blockingsynchronizationforasynchronousiterations. Advances in Engineering Software, 119:116–133, 2018

work page 2018

[13] [13]

Magoulès, G

F. Magoulès, G. Gbikpi-Benissan, and Q. Zou. Asynchronous iterations of Parareal algorithm for option pricing models.Mathematics, 6(4):1–18, 2018

work page 2018

[14] [14]

Magoulès, D

F. Magoulès, D. B. Szyld, and C. Venet. Asynchronous optimized Schwarz methods with and without overlap.Numerische Mathematik, 137(1):199–227, 2017

work page 2017

[15] [15]

Magoulès and C

F. Magoulès and C. Venet. Asynchronous iterative sub-structuring methods.Mathe- matics and Computers in Simulation , 145:34–49, 2018. 7

work page 2018

[16] [16]

MPI: A message passing interface standard.Inter- national Journal of Supercomputer Applications , 8(3/4):159–416, 1994

Message Passing Interface Forum. MPI: A message passing interface standard.Inter- national Journal of Supercomputer Applications , 8(3/4):159–416, 1994

work page 1994

[17] [17]

J.-C. Miellou. Algorithmes de relaxation chaotique à retards.ESAIM: Mathematical Modelling and Numerical Analysis , 9(R1):55–82, 1975

work page 1975

[18] [18]

Miellou, P

J.-C. Miellou, P. Spiteri, and D. El Baz. A new stopping criterion for linear per- turbed asynchronous iterations. Journal of Computational and Applied Mathematics , 219(2):471–483, 2008

work page 2008

[19] [19]

S. A. Savari and D. P. Bertsekas. Finite termination of asynchronous iterative algo- rithms. Parallel Computing, 22(1):39–56, 1996

work page 1996

[20] [20]

Thakur, R

R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in MPICH. The International Journal of High Performance Computing Applications, 19(1):49–66, 2005. 8

work page 2005