Accelerating Microswimmer Simulations via a Heterogeneous Pipelined Parallel-in-Time Framework

Ruixiang Huang; Weifan Liu

arxiv: 2604.12083 · v1 · submitted 2026-04-13 · 💻 cs.DC

Accelerating Microswimmer Simulations via a Heterogeneous Pipelined Parallel-in-Time Framework

Ruixiang Huang , Weifan Liu This is my paper

Pith reviewed 2026-05-10 14:47 UTC · model grok-4.3

classification 💻 cs.DC

keywords microswimmer simulationsparallel-in-time methodsGPU computingParareal algorithmviscous fluid dynamicsheterogeneous computingfilamentous microswimmershigh-performance computing

0 comments

The pith

A heterogeneous CPU-GPU framework with pipelined Parareal achieves order-of-magnitude speedups for filamentous microswimmer simulations in viscous fluid.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that coupling high-intensity GPU kernels for spatial interactions with a distributed MPI-GPU pipelined Parareal scheme for temporal parallelism can overcome the separate handling of space and time complexity in conventional methods. A sympathetic reader would care because long-time simulations of many microswimmers are needed to study collective behaviors in biology and physics but remain computationally prohibitive. The framework maps an asynchronous pipeline across multiple GPUs to overlap coarse and fine propagators, while also optimizing the matrix square root step on the GPU. Theoretical efficiency analysis and experiments back the resulting speedups.

Core claim

The central claim is that a two-level parallelization strategy—GPU kernels resolving quadratic spatial interactions via the Method of Regularized Stokeslets together with a distributed MPI-GPU pipelined Parareal architecture that overlaps coarse and fine propagators—delivers order-of-magnitude speedups over CPU-only methods, with a GPU-optimized numerical routine for the matrix square root arising in the filamentous microswimmer scheme.

What carries the argument

The distributed MPI-GPU pipelined Parareal architecture, which enables temporal concurrency by overlapping coarse and fine propagators across devices while pairing with GPU kernels for spatial computations.

If this is right

The approach supplies a scalable route to simulating complex emergent behaviors in large-scale biology and physics systems.
It removes the serial bottlenecks that limit traditional Parareal implementations by overlapping computations across GPUs.
GPU acceleration of the matrix square root step improves the efficiency of the underlying numerical scheme for the microswimmers.
Theoretical analysis of the pipelined Parareal efficiency provides a basis for predicting performance on larger problem sizes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-level strategy could be tested on other nonlinear time-dependent fluid problems that involve many-body interactions.
The reported speedups open the possibility of exploring larger ensembles or extended time horizons that were previously out of reach.
Adaptive selection of coarse propagator tolerance might further reduce communication costs in distributed GPU deployments.

Load-bearing premise

The pipelined Parareal scheme must maintain accuracy and convergence rate for the nonlinear microswimmer dynamics without prohibitive communication overhead when spread across multiple GPU devices.

What would settle it

Running a controlled benchmark on a standard filamentous microswimmer test case that directly compares wall-clock time and solution accuracy between the proposed multi-GPU framework and a reference CPU implementation would show whether the claimed speedups materialize without loss of fidelity.

Figures

Figures reproduced from arXiv: 2604.12083 by Ruixiang Huang, Weifan Liu.

**Figure 2.** Figure 2: Schematic comparison of the computation flow of a [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: CPU–GPU coupled workflow for the nth-Order Runge– [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Thread–block mapping and data flow in the GPU [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: (a) Relative increment ηe k (b) True relative error η k C. GPU Spatial Parallel Performance First, we examine the performance gain of spatial parallelization on a single GPU. We perform experiments for the rod count 1, 4, 12, 25. In each case, we measure the time cost of the three main components that make up each time step’s calculation [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: (b), and the associated statistical error and time metrics in Table II. As summarized in Table II, the proposed method achieves comparable or higher accuracy than SciPy’s sqrtm. Numerical instability may appear when θ ≈ 0, π, which results in several outliers in the error distribution. This phenomenon TABLE II: Statistical summary of performance and numerical accuracy. Metric Mean Median Std / Max Speedup … view at source ↗

**Figure 7.** Figure 7: Runtime gap Treg − Tpipe versus 1/r for (a) 1 rod (b) 4 rods (c) 12 rods (d) 25 rods. In the regular Parareal scheduling, the additional idle time mainly arises from the (m − 1) GPUs waiting during the sequential coarse propagation. Hence, ∆T has a dominant term that scales with (m − 1), with an additional term proportional to −m/2, which accounts for the finite cost of establishing parallelism in the pipe… view at source ↗

**Figure 8.** Figure 8: Scalability results of the proposed solver: (a) Weak [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

read the original abstract

Simulating large-scale microswimmer dynamics in viscous fluid poses significant challenges due to the coupled high spatial and temporal complexity. Conventional high-performance computing (HPC) methods often address these two dimensions in isolation, leaving a critical gap for synergistic acceleration. This paper introduces a heterogeneous CPU--GPU computing framework specifically optimized for the long-time simulation of filamentous microswimmers in viscous fluid. We propose a two-level parallelization strategy: (1) high-intensity GPU kernels to resolve the quadratic spatial interactions given by the Method of Regularized Stokeslets (MRS), and (2) a distributed MPI-GPU pipelined Parareal architecture to exploit temporal concurrency. By mapping the asynchronous pipeline onto multiple GPU devices, our framework effectively overlaps coarse and fine propagators, overcoming the serial bottlenecks of traditional Parareal method. Furthermore, we employ a GPU-optimized numerical routine for computing the matrix square root arising in the numerical scheme of the filamentous microswimmer simulations. Theoretical analysis of the efficiency improvement of the pipelined Parareal is presented. Numerical experiments demonstrate that the proposed framework achieves order-of-magnitude speedups over CPU-only methods, providing a scalable pathway for simulating complex emergent behaviors in large-scale biology and physics systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper layers a pipelined Parareal scheme over GPU kernels for Method of Regularized Stokeslets microswimmer simulations and reports order-of-magnitude speedups, but the experimental details needed to verify convergence and net gains are thin.

read the letter

This paper combines GPU kernels for the quadratic spatial interactions in the Method of Regularized Stokeslets with a distributed MPI-GPU pipelined Parareal method to run coarse and fine propagators in an overlapping way across devices. They also add a GPU routine for the matrix square root that appears in the filament scheme and give a theoretical efficiency analysis for the pipeline. The main practical move is mapping the asynchronous Parareal steps onto multiple GPUs so that communication and computation can hide some of the serial time-parallel overhead for long-time runs of filamentous microswimmers. That specific heterogeneous setup for this application is the clearest new element; the individual pieces (Parareal, GPU Stokes solvers) are known, but the combination and the overlap strategy are presented as a working extension. The work is useful for anyone already running large-scale fluid-structure simulations who needs to push temporal scale without simply buying more nodes. It directly targets the dual bottleneck of dense spatial interactions and serial time stepping that shows up in collective microswimmer problems. The soft spot is the performance evidence. The abstract states that numerical experiments show the speedups, yet supplies no numbers on iteration counts, problem sizes, error tolerances, or wall-clock breakdowns that separate kernel time from MPI-GPU transfers. The stress-test point about Parareal convergence for the nonlinear nonlocal filament dynamics is reasonable to raise: if the coarse propagator misses bending or steric effects, the number of iterations could rise enough that the net gain falls well below an order of magnitude once communication is counted. Without those concrete measurements it is hard to judge whether the claimed acceleration survives in practice. This is the sort of paper that belongs in a computational methods venue rather than a methods journal. A reader working on HPC for biological or soft-matter flows would get concrete implementation ideas even if they end up tuning the coarse model themselves. I would send it to referees because the claims are testable, the approach is reproducible in principle, and the gap it tries to close is real for people who already use these simulations.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a heterogeneous CPU-GPU framework for long-time simulations of filamentous microswimmers in viscous fluids. It combines GPU kernels for the quadratic interactions of the Method of Regularized Stokeslets (MRS) with a distributed MPI-GPU pipelined Parareal scheme for temporal parallelism, includes a GPU-optimized matrix-square-root routine, provides a theoretical efficiency analysis of the pipelined Parareal, and reports order-of-magnitude speedups over CPU-only baselines in numerical experiments.

Significance. If the reported speedups hold while preserving accuracy for the nonlinear nonlocal dynamics, the work would provide a practical route to larger-scale microswimmer simulations, enabling studies of emergent collective behaviors that remain out of reach with conventional serial or purely spatial-parallel methods. The explicit theoretical efficiency analysis and the heterogeneous pipelining strategy are positive features that distinguish the contribution.

major comments (3)

[Numerical experiments / abstract] Numerical experiments (abstract and results section): the central order-of-magnitude speedup claim is supported only by unspecified experiments. No problem sizes (filament count, spatial discretization), Parareal iteration counts, convergence tolerances, error metrics (e.g., relative error versus serial reference), or baseline descriptions (CPU code, single-GPU timings) are provided, rendering the performance numbers unverifiable and the weakest-assumption concern about iteration count unaddressed.
[Theoretical analysis of pipelined Parareal] Pipelined Parareal efficiency analysis (theoretical section): the analysis does not bound the contraction factor or iteration count for the specific quadratic, nonlocal MRS filament dynamics. Because the coarse propagator necessarily omits fine-scale bending and steric effects, convergence may require more than the 2–4 iterations needed to preserve an order-of-magnitude net gain once GPU-to-GPU MPI latency is included.
[Multi-GPU pipelined architecture] Multi-GPU mapping (implementation section): no quantitative comparison of inter-device communication time versus kernel execution time is given for the asynchronous pipeline. If matrix-square-root transfers or correction broadcasts dominate, the claimed overlap benefit and overall speedup cannot be realized.

minor comments (2)

[GPU-optimized numerical routine] Notation for the matrix-square-root GPU kernel is introduced without an explicit equation reference or stability discussion under Parareal corrections.
[Figures] Figure captions for timing and speedup plots should include the exact problem parameters, number of Parareal iterations, and hardware configuration used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and describe the revisions planned for the manuscript.

read point-by-point responses

Referee: Numerical experiments (abstract and results section): the central order-of-magnitude speedup claim is supported only by unspecified experiments. No problem sizes (filament count, spatial discretization), Parareal iteration counts, convergence tolerances, error metrics (e.g., relative error versus serial reference), or baseline descriptions (CPU code, single-GPU timings) are provided, rendering the performance numbers unverifiable and the weakest-assumption concern about iteration count unaddressed.

Authors: We agree that the current presentation of the numerical experiments lacks the necessary detail for independent verification. In the revised manuscript we will expand the results section (and update the abstract) to report explicit problem sizes (filament counts and spatial discretization), Parareal iteration counts, convergence tolerances, quantitative error metrics (relative L2 error against a serial reference), and baseline timings (CPU-only and single-GPU). These additions will directly address verifiability and demonstrate that iteration counts remain sufficiently low to retain the reported net speedup. revision: yes
Referee: Pipelined Parareal efficiency analysis (theoretical section): the analysis does not bound the contraction factor or iteration count for the specific quadratic, nonlocal MRS filament dynamics. Because the coarse propagator necessarily omits fine-scale bending and steric effects, convergence may require more than the 2–4 iterations needed to preserve an order-of-magnitude net gain once GPU-to-GPU MPI latency is included.

Authors: The existing theoretical analysis supplies a general efficiency bound for pipelined Parareal under standard contraction-factor assumptions. We acknowledge that a problem-specific bound for the quadratic, nonlocal MRS dynamics is not derived. In the revision we will add a dedicated subsection discussing the expected contraction behavior for this application, explaining why the simplified coarse propagator still yields rapid convergence in practice, and quantifying the effect of GPU-to-GPU MPI latency on the net speedup. The discussion will be supported by the additional numerical data mentioned above. revision: partial
Referee: Multi-GPU mapping (implementation section): no quantitative comparison of inter-device communication time versus kernel execution time is given for the asynchronous pipeline. If matrix-square-root transfers or correction broadcasts dominate, the claimed overlap benefit and overall speedup cannot be realized.

Authors: We agree that a quantitative breakdown of communication versus computation is required to substantiate the overlap claims. The revised implementation section will include profiling measurements that compare inter-device MPI communication times (matrix-square-root transfers and correction broadcasts) against GPU kernel execution times. These data will confirm that the asynchronous pipeline achieves effective overlap and that communication does not dominate the overall runtime. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on independent numerical experiments and theoretical analysis

full rationale

The paper presents a two-level parallelization strategy (GPU kernels for MRS quadratic interactions plus distributed MPI-GPU pipelined Parareal) whose efficiency improvement is supported by a separate theoretical analysis and whose order-of-magnitude speedups are reported as outcomes of numerical experiments. No load-bearing step reduces by construction to its own inputs: there are no self-definitional equations, no fitted parameters relabeled as predictions, and no uniqueness theorems or ansatzes imported solely via self-citation. The derivation chain for the asynchronous pipeline, matrix-square-root GPU routine, and convergence behavior remains independent of the final speedup figures.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard assumptions from parallel computing and numerical methods for Stokes flow; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption Pipelined Parareal can be mapped asynchronously onto multiple GPU devices while preserving stability for filamentous microswimmer dynamics.
Invoked to justify overlapping coarse and fine propagators and overcoming serial bottlenecks.

pith-pipeline@v0.9.0 · 5515 in / 1307 out tokens · 44101 ms · 2026-05-10T14:47:48.911224+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

The immersed boundary method,

C. S. Peskin, “The immersed boundary method,”Acta Numerica, vol. 11, p. 479–517, Jan. 2002

work page 2002
[2]

Simulating the dynamics and interactions of flexible fibers in stokes flows,

A.-K. Tornberg and M. J. Shelley, “Simulating the dynamics and interactions of flexible fibers in stokes flows,”Journal of Computational Physics, vol. 196, no. 1, pp. 8–40, 2004

work page 2004
[3]

The method of regularized Stokeslets,

R. Cortez, “The method of regularized Stokeslets,”SIAM. J. Sci. Comput., vol. 23, no. 4, pp. 1204–1225, 2001

work page 2001
[4]

Pozrikidis,Boundary Integral and Singularity Methods for Linearized Viscous Flow

C. Pozrikidis,Boundary Integral and Singularity Methods for Linearized Viscous Flow. Cambridge University Press, Feb. 1992

work page 1992
[5]

Variational treatment of hydrody- namic interaction in polymers,

J. Rotne and S. Prager, “Variational treatment of hydrody- namic interaction in polymers,”The Journal of Chemical Physics, vol. 50, no. 11, pp. 4831–4837, 1969

work page 1969
[6]

Transport properties of polymer chains in dilute solution: Hydrodynamic interaction,

H. Yamakawa, “Transport properties of polymer chains in dilute solution: Hydrodynamic interaction,”The Journal of Chemical Physics, vol. 53, no. 1, pp. 436–443, 1970

work page 1970
[7]

Emergent three- dimensional sperm motility: coupling calcium dynamics and preferred curvature in a kirchhoff rod model,

L. Carichino and S. D. Olson, “Emergent three- dimensional sperm motility: coupling calcium dynamics and preferred curvature in a kirchhoff rod model,”Math- ematical medicine and biology: a journal of the IMA, vol. 36, no. 4, pp. 439–469, 2019

work page 2019
[8]

A three-dimensional model of flagellar swimming in a brinkman fluid,

N. Ho, K. Leiderman, and S. Olson, “A three-dimensional model of flagellar swimming in a brinkman fluid,”Journal of Fluid Mechanics, vol. 864, pp. 1088–1124, 2019

work page 2019
[9]

Fluid-mechanical interaction of flexible bacterial flagella by the immersed boundary method,

S. Lim and C. S. Peskin, “Fluid-mechanical interaction of flexible bacterial flagella by the immersed boundary method,”Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, vol. 85, no. 3, p. 036307, 2012

work page 2012
[10]

A parareal in time discretization of pdes. comptes rendus de l’acadé, mie des sciences–series i–mathematics 332 (7), 661–668 (2001)

J. Lions, Y . Maday, and G. Turinici, “A parareal in time discretization of pdes. comptes rendus de l’acadé, mie des sciences–series i–mathematics 332 (7), 661–668 (2001).”

work page 2001
[11]

Toward an efficient parallel in time method for partial differential equations,

M. Emmett and M. Minion, “Toward an efficient parallel in time method for partial differential equations,”Com- munications in Applied Mathematics and Computational Science, vol. 7, no. 1, p. 105–132, Mar. 2012

work page 2012
[12]

Parallel performance of shared memory parallel spectral deferred corrections,

P. Freese, S. Götschel, T. Lunet, D. Ruprecht, and M. Schreiber, “Parallel performance of shared memory parallel spectral deferred corrections,”arXiv preprint arXiv:2403.20135, 2024

work page arXiv 2024
[13]

Parallel time integration with multi- grid,

R. D. Falgout, S. Friedhoff, T. V . Kolev, S. P. MacLachlan, and J. B. Schroder, “Parallel time integration with multi- grid,”SIAM Journal on Scientific Computing, vol. 36, no. 6, pp. C635–C661, 2014

work page 2014
[14]

Asynchronous truncated multigrid-reduction-in-time (at-mgrit),

J. Hahne, B. Southworth, and S. Friedhoff, “Asynchronous truncated multigrid-reduction-in-time (at-mgrit),”arXiv preprint arXiv:2107.09596, 2021

work page arXiv 2021
[15]

Parallel time-stepping for fluid–structure interaction,

N. Margenberg and T. Richter, “Parallel time-stepping for fluid–structure interaction,”Computer Methods in Applied Mechanics and Engineering, vol. 384, p. 113953, 2021

work page 2021
[16]

A review of parallel-in-time algorithms,

B. W. Ong, “A review of parallel-in-time algorithms,” 2020

work page 2020
[17]

Multiscale parareal algorithm for long-time mesoscopic simulations of microvascular blood flow in zebrafish,

A. L. Blumers, M. Yin, H. Nakajima, Y . Hasegawa, Z. Li, and G. E. Karniadakis, “Multiscale parareal algorithm for long-time mesoscopic simulations of microvascular blood flow in zebrafish,”Computational Mechanics, 2021

work page 2021
[18]

Time paral- lelization for hyperbolic and parabolic problems,

M. J. Gander, S.-L. Wu, and T. Zhou, “Time paral- lelization for hyperbolic and parabolic problems,”Acta Numerica, pp. 1–, 2026, arXiv preprint arXiv:2503.13526

work page arXiv 2026
[19]

Parallel-in-time integration of the shallow water equations on the rotating sphere using parareal and mgrit,

J. G. C. Steinstraesser, P. d. S. Peixoto, and M. Schreiber, “Parallel-in-time integration of the shallow water equations on the rotating sphere using parareal and mgrit,”Journal of Computational Physics, vol. 496, p. 112591, 2024

work page 2024
[20]

Parallel-in-time simulation of biofluids,

W. Liu and M. W. Rostami, “Parallel-in-time simulation of biofluids,”Journal of Computational Physics, vol. 464, p. 111366, 2022

work page 2022
[21]

Acceleration of unsteady hydrodynamic simulations using the parareal algorithm,

A. Eghbal, A. G. Gerber, and E. Aubanel, “Acceleration of unsteady hydrodynamic simulations using the parareal algorithm,”Journal of Computational Science, vol. 19, pp. 57–76, 2017

work page 2017
[22]

A stable and efficient semi-implicit coupling method for fluid-structure inter- action problems with immersed boundaries in a hybrid cpu-gpu framework,

Y . Zeng, Y . Wang, and H. Yuan, “A stable and efficient semi-implicit coupling method for fluid-structure inter- action problems with immersed boundaries in a hybrid cpu-gpu framework,”Journal of Computational Physics, vol. 534, p. 114026, Aug. 2025

work page 2025
[23]

Cpu–gpu heterogeneous code acceleration of a finite volume computational fluid dynamics solver,

W. Xue, H. Wang, and C. J. Roy, “Cpu–gpu heterogeneous code acceleration of a finite volume computational fluid dynamics solver,”Future Generation Computer Systems, vol. 158, pp. 367–377, 2024

work page 2024
[24]

An incompressible flow solver on a gpu/cpu heterogeneous architecture parallel computing platform,

Q. Li, R. Li, and Z. Yang, “An incompressible flow solver on a gpu/cpu heterogeneous architecture parallel computing platform,”Theoretical and Applied Mechanics Letters, vol. 13, no. 5, p. 100474, 2023

work page 2023
[25]

Passively parallel regularized stokeslets,

M. T. Gallagher and D. J. Smith, “Passively parallel regularized stokeslets,”Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 378, no. 2179, 2020

work page 2020
[26]

Modeling the dynamics of an elastic rod with intrinsic curvature and twist using a regularized stokes formulation,

S. D. Olson, S. Lim, and R. Cortez, “Modeling the dynamics of an elastic rod with intrinsic curvature and twist using a regularized stokes formulation,”Journal of Computational Physics, vol. 238, pp. 169–187, 2013

work page 2013
[27]

N. J. Higham,Functions of Matrices: Theory and Com- putation. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2008

work page 2008
[28]

Dynamics of an open elastic rod with intrinsic curvature and twist in a viscous fluid,

S. Lim, “Dynamics of an open elastic rod with intrinsic curvature and twist in a viscous fluid,”Phys. Fluids, vol. 22, no. 2, p. 024104, 2010

work page 2010
[29]

Motion of filaments with planar and helical bending waves in a viscous fluid,

S. D. Olson, “Motion of filaments with planar and helical bending waves in a viscous fluid,” inBiological Fluid Dynamics: Modeling, Computations, and Applications, ser. Contemporary Mathematics, A. T. Layton and S. D. Olson, Eds. AMS, 2014, vol. 628, pp. 109–127

work page 2014
[30]

Hyperactivation of mam- malian spermatozoa: function and regulation

H.-C. Ho and S. S. Suarez, “Hyperactivation of mam- malian spermatozoa: function and regulation.”Reproduc- tion, vol. 122 4, pp. 519–26, 2001

work page 2001
[31]

Bend propagation in the flagella of migrating human sperm, and its modulation by viscosity

D. J. Smith, E. A. Gaffney, H. Gadêlha, N. Kapur, and J. C. Kirkman-Brown, “Bend propagation in the flagella of migrating human sperm, and its modulation by viscosity.” Cell motility and the cytoskeleton, vol. 66 4, pp. 220–36, 2009

work page 2009

[1] [1]

The immersed boundary method,

C. S. Peskin, “The immersed boundary method,”Acta Numerica, vol. 11, p. 479–517, Jan. 2002

work page 2002

[2] [2]

Simulating the dynamics and interactions of flexible fibers in stokes flows,

A.-K. Tornberg and M. J. Shelley, “Simulating the dynamics and interactions of flexible fibers in stokes flows,”Journal of Computational Physics, vol. 196, no. 1, pp. 8–40, 2004

work page 2004

[3] [3]

The method of regularized Stokeslets,

R. Cortez, “The method of regularized Stokeslets,”SIAM. J. Sci. Comput., vol. 23, no. 4, pp. 1204–1225, 2001

work page 2001

[4] [4]

Pozrikidis,Boundary Integral and Singularity Methods for Linearized Viscous Flow

C. Pozrikidis,Boundary Integral and Singularity Methods for Linearized Viscous Flow. Cambridge University Press, Feb. 1992

work page 1992

[5] [5]

Variational treatment of hydrody- namic interaction in polymers,

J. Rotne and S. Prager, “Variational treatment of hydrody- namic interaction in polymers,”The Journal of Chemical Physics, vol. 50, no. 11, pp. 4831–4837, 1969

work page 1969

[6] [6]

Transport properties of polymer chains in dilute solution: Hydrodynamic interaction,

H. Yamakawa, “Transport properties of polymer chains in dilute solution: Hydrodynamic interaction,”The Journal of Chemical Physics, vol. 53, no. 1, pp. 436–443, 1970

work page 1970

[7] [7]

Emergent three- dimensional sperm motility: coupling calcium dynamics and preferred curvature in a kirchhoff rod model,

L. Carichino and S. D. Olson, “Emergent three- dimensional sperm motility: coupling calcium dynamics and preferred curvature in a kirchhoff rod model,”Math- ematical medicine and biology: a journal of the IMA, vol. 36, no. 4, pp. 439–469, 2019

work page 2019

[8] [8]

A three-dimensional model of flagellar swimming in a brinkman fluid,

N. Ho, K. Leiderman, and S. Olson, “A three-dimensional model of flagellar swimming in a brinkman fluid,”Journal of Fluid Mechanics, vol. 864, pp. 1088–1124, 2019

work page 2019

[9] [9]

Fluid-mechanical interaction of flexible bacterial flagella by the immersed boundary method,

S. Lim and C. S. Peskin, “Fluid-mechanical interaction of flexible bacterial flagella by the immersed boundary method,”Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, vol. 85, no. 3, p. 036307, 2012

work page 2012

[10] [10]

A parareal in time discretization of pdes. comptes rendus de l’acadé, mie des sciences–series i–mathematics 332 (7), 661–668 (2001)

J. Lions, Y . Maday, and G. Turinici, “A parareal in time discretization of pdes. comptes rendus de l’acadé, mie des sciences–series i–mathematics 332 (7), 661–668 (2001).”

work page 2001

[11] [11]

Toward an efficient parallel in time method for partial differential equations,

M. Emmett and M. Minion, “Toward an efficient parallel in time method for partial differential equations,”Com- munications in Applied Mathematics and Computational Science, vol. 7, no. 1, p. 105–132, Mar. 2012

work page 2012

[12] [12]

Parallel performance of shared memory parallel spectral deferred corrections,

P. Freese, S. Götschel, T. Lunet, D. Ruprecht, and M. Schreiber, “Parallel performance of shared memory parallel spectral deferred corrections,”arXiv preprint arXiv:2403.20135, 2024

work page arXiv 2024

[13] [13]

Parallel time integration with multi- grid,

R. D. Falgout, S. Friedhoff, T. V . Kolev, S. P. MacLachlan, and J. B. Schroder, “Parallel time integration with multi- grid,”SIAM Journal on Scientific Computing, vol. 36, no. 6, pp. C635–C661, 2014

work page 2014

[14] [14]

Asynchronous truncated multigrid-reduction-in-time (at-mgrit),

J. Hahne, B. Southworth, and S. Friedhoff, “Asynchronous truncated multigrid-reduction-in-time (at-mgrit),”arXiv preprint arXiv:2107.09596, 2021

work page arXiv 2021

[15] [15]

Parallel time-stepping for fluid–structure interaction,

N. Margenberg and T. Richter, “Parallel time-stepping for fluid–structure interaction,”Computer Methods in Applied Mechanics and Engineering, vol. 384, p. 113953, 2021

work page 2021

[16] [16]

A review of parallel-in-time algorithms,

B. W. Ong, “A review of parallel-in-time algorithms,” 2020

work page 2020

[17] [17]

Multiscale parareal algorithm for long-time mesoscopic simulations of microvascular blood flow in zebrafish,

A. L. Blumers, M. Yin, H. Nakajima, Y . Hasegawa, Z. Li, and G. E. Karniadakis, “Multiscale parareal algorithm for long-time mesoscopic simulations of microvascular blood flow in zebrafish,”Computational Mechanics, 2021

work page 2021

[18] [18]

Time paral- lelization for hyperbolic and parabolic problems,

M. J. Gander, S.-L. Wu, and T. Zhou, “Time paral- lelization for hyperbolic and parabolic problems,”Acta Numerica, pp. 1–, 2026, arXiv preprint arXiv:2503.13526

work page arXiv 2026

[19] [19]

Parallel-in-time integration of the shallow water equations on the rotating sphere using parareal and mgrit,

J. G. C. Steinstraesser, P. d. S. Peixoto, and M. Schreiber, “Parallel-in-time integration of the shallow water equations on the rotating sphere using parareal and mgrit,”Journal of Computational Physics, vol. 496, p. 112591, 2024

work page 2024

[20] [20]

Parallel-in-time simulation of biofluids,

W. Liu and M. W. Rostami, “Parallel-in-time simulation of biofluids,”Journal of Computational Physics, vol. 464, p. 111366, 2022

work page 2022

[21] [21]

Acceleration of unsteady hydrodynamic simulations using the parareal algorithm,

A. Eghbal, A. G. Gerber, and E. Aubanel, “Acceleration of unsteady hydrodynamic simulations using the parareal algorithm,”Journal of Computational Science, vol. 19, pp. 57–76, 2017

work page 2017

[22] [22]

A stable and efficient semi-implicit coupling method for fluid-structure inter- action problems with immersed boundaries in a hybrid cpu-gpu framework,

Y . Zeng, Y . Wang, and H. Yuan, “A stable and efficient semi-implicit coupling method for fluid-structure inter- action problems with immersed boundaries in a hybrid cpu-gpu framework,”Journal of Computational Physics, vol. 534, p. 114026, Aug. 2025

work page 2025

[23] [23]

Cpu–gpu heterogeneous code acceleration of a finite volume computational fluid dynamics solver,

W. Xue, H. Wang, and C. J. Roy, “Cpu–gpu heterogeneous code acceleration of a finite volume computational fluid dynamics solver,”Future Generation Computer Systems, vol. 158, pp. 367–377, 2024

work page 2024

[24] [24]

An incompressible flow solver on a gpu/cpu heterogeneous architecture parallel computing platform,

Q. Li, R. Li, and Z. Yang, “An incompressible flow solver on a gpu/cpu heterogeneous architecture parallel computing platform,”Theoretical and Applied Mechanics Letters, vol. 13, no. 5, p. 100474, 2023

work page 2023

[25] [25]

Passively parallel regularized stokeslets,

M. T. Gallagher and D. J. Smith, “Passively parallel regularized stokeslets,”Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 378, no. 2179, 2020

work page 2020

[26] [26]

Modeling the dynamics of an elastic rod with intrinsic curvature and twist using a regularized stokes formulation,

S. D. Olson, S. Lim, and R. Cortez, “Modeling the dynamics of an elastic rod with intrinsic curvature and twist using a regularized stokes formulation,”Journal of Computational Physics, vol. 238, pp. 169–187, 2013

work page 2013

[27] [27]

N. J. Higham,Functions of Matrices: Theory and Com- putation. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2008

work page 2008

[28] [28]

Dynamics of an open elastic rod with intrinsic curvature and twist in a viscous fluid,

S. Lim, “Dynamics of an open elastic rod with intrinsic curvature and twist in a viscous fluid,”Phys. Fluids, vol. 22, no. 2, p. 024104, 2010

work page 2010

[29] [29]

Motion of filaments with planar and helical bending waves in a viscous fluid,

S. D. Olson, “Motion of filaments with planar and helical bending waves in a viscous fluid,” inBiological Fluid Dynamics: Modeling, Computations, and Applications, ser. Contemporary Mathematics, A. T. Layton and S. D. Olson, Eds. AMS, 2014, vol. 628, pp. 109–127

work page 2014

[30] [30]

Hyperactivation of mam- malian spermatozoa: function and regulation

H.-C. Ho and S. S. Suarez, “Hyperactivation of mam- malian spermatozoa: function and regulation.”Reproduc- tion, vol. 122 4, pp. 519–26, 2001

work page 2001

[31] [31]

Bend propagation in the flagella of migrating human sperm, and its modulation by viscosity

D. J. Smith, E. A. Gaffney, H. Gadêlha, N. Kapur, and J. C. Kirkman-Brown, “Bend propagation in the flagella of migrating human sperm, and its modulation by viscosity.” Cell motility and the cytoskeleton, vol. 66 4, pp. 220–36, 2009

work page 2009