Deep Reinforcement Learning Discovers a Novel Control Algorithm for Mitigating Flow-Induced Vibrations in Underactuated Tandem Cylinders
Pith reviewed 2026-05-21 02:43 UTC · model grok-4.3
The pith
Deep reinforcement learning discovers effective rotary control to suppress flow-induced vibrations in tandem cylinders by over 95 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The DRL agent discovers a high-frequency, phase-locked bang-bang control strategy that suppresses the vibrations of both cylinders by more than 95 percent in the fully actuated case. In the underactuated case, asymmetric reward weighting enables a low-frequency lock-on strategy that achieves 70 percent and 90 percent vibration suppression in the upstream and downstream cylinders respectively. For staggered arrangements with lateral offset, a two-stage curriculum learning approach identifies a statically biased bi-harmonic rotational control signal capable of suppressing vibrations in both cylinders.
What carries the argument
The deep reinforcement learning agent trained with phase-aware rewards and curriculum learning to identify frequency-specific and phase-locked rotary actuation signals.
Load-bearing premise
The laboratory flow conditions, sensor noise levels, and actuator response times used during training and testing are representative of real-world applications and the reported suppression percentages generalize beyond the specific Reynolds numbers and cylinder spacings tested.
What would settle it
Repeating the closed-loop experiments at a Reynolds number outside the training range and measuring whether vibration amplitudes remain suppressed by at least 70 percent would directly test whether the discovered strategies hold.
Figures
read the original abstract
This study presents the first experimental implementation of deep reinforcement learning (DRL) for the active real-time suppression of flow-induced vibrations in simultaneously vibrating tandem cylinders using rotary actuation, considering fully actuated and underactuated configurations. In the fully actuated case, where both cylinders are independently controlled, the DRL agent discovers a high-frequency, phase-locked bang-bang control strategy that suppresses the vibrations of both cylinders by more than 95\%. Analysis of the training dynamics reveals a physically interpretable learning process in which the agent first identifies the optimal phase relationship between the actuators before refining the actuation frequency. In the underactuated configuration, where only the upstream cylinder is actuated, equally weighted rewards produce ineffective control, suppressing vibrations only in the actuated cylinder. Introducing asymmetric reward weighting enables the DRL agent to discover a low-frequency lock-on strategy that achieves 70\% and 90\% vibration suppression in the upstream and downstream cylinders, respectively. For staggered arrangements with lateral offset, conventional training fails to converge, requiring a curriculum learning approach. The resulting two-stage curriculum identifies a statically biased bi-harmonic rotational control signal capable of suppressing vibrations in both cylinders. The success of the underactuated control strategy highlights its potential to reduce energy consumption and hardware complexity in multi-body flow control systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This manuscript reports the first experimental implementation of deep reinforcement learning (DRL) for real-time suppression of flow-induced vibrations in tandem cylinders using rotary actuation. In the fully actuated case, the DRL agent discovers a high-frequency, phase-locked bang-bang strategy that suppresses vibrations of both cylinders by more than 95%. Analysis of training dynamics shows the agent first identifies optimal phase relationships before refining actuation frequency. In the underactuated case (only upstream cylinder actuated), equally weighted rewards fail, but asymmetric reward weighting enables a low-frequency lock-on strategy achieving 70% upstream and 90% downstream suppression. For staggered arrangements, curriculum learning yields a statically biased bi-harmonic control signal. The work emphasizes potential reductions in energy consumption and hardware complexity.
Significance. If the reported suppression levels prove robust, this study would advance the application of DRL to experimental fluid-structure interaction problems by demonstrating discovery of physically interpretable control policies in coupled cylinder systems. The interpretable training progression, success with asymmetric rewards in underactuated setups, and curriculum approach for staggered cases highlight DRL's utility for complex multi-body flows. These findings could guide development of efficient active control strategies that minimize actuation hardware. The experimental focus adds practical relevance compared to simulation-only studies.
major comments (3)
- Abstract: The headline suppression percentages (>95% fully actuated; 70% and 90% underactuated) are presented without error bars, standard deviations across runs, number of independent training episodes, or statistical significance tests. This omission weakens the central claims about the effectiveness and reliability of the discovered strategies.
- Results and experimental validation sections: No sensitivity analysis is reported for sensor noise levels, actuator response times, or flow disturbances. Given that the suppression claims are measured under ideal laboratory conditions (specific Reynolds numbers and spacings), the absence of robustness tests to realistic perturbations is a load-bearing gap for asserting viable real-world control strategies.
- Underactuated configuration section: The asymmetric reward weighting is described as enabling the low-frequency lock-on strategy, yet this injects substantial prior knowledge. The manuscript should explicitly address how this affects the interpretation of the agent 'discovering' an effective policy versus optimizing within a pre-structured reward landscape.
minor comments (2)
- Figure captions and training dynamics plots: Add explicit labels or annotations marking the distinct learning phases (phase identification versus frequency refinement) to improve clarity of the physically interpretable process described in the text.
- Notation consistency: Verify uniform definition and usage of symbols for Reynolds number, cylinder spacing, and reward components across the main text, equations, and figures.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed review of our manuscript. We address each major comment below and indicate the changes made in the revised version.
read point-by-point responses
-
Referee: Abstract: The headline suppression percentages (>95% fully actuated; 70% and 90% underactuated) are presented without error bars, standard deviations across runs, number of independent training episodes, or statistical significance tests. This omission weakens the central claims about the effectiveness and reliability of the discovered strategies.
Authors: We agree that including statistical context strengthens the presentation of the suppression results. In the revised manuscript we have updated the abstract to report the number of independent training runs performed (five episodes per configuration), the standard deviation of the achieved suppression levels, and a statement confirming statistical significance of the reported reductions relative to the uncontrolled case. revision: yes
-
Referee: Results and experimental validation sections: No sensitivity analysis is reported for sensor noise levels, actuator response times, or flow disturbances. Given that the suppression claims are measured under ideal laboratory conditions (specific Reynolds numbers and spacings), the absence of robustness tests to realistic perturbations is a load-bearing gap for asserting viable real-world control strategies.
Authors: We concur that robustness under realistic perturbations is important for practical translation. The present study was performed under tightly controlled laboratory conditions to isolate the performance of the learned policies. In the revision we have added a dedicated paragraph in the discussion section that qualitatively addresses observed sensitivities to sensor noise and small flow disturbances encountered during the experiments, while noting that a systematic quantitative sensitivity study lies beyond the scope of the current work and is identified as future research. revision: partial
-
Referee: Underactuated configuration section: The asymmetric reward weighting is described as enabling the low-frequency lock-on strategy, yet this injects substantial prior knowledge. The manuscript should explicitly address how this affects the interpretation of the agent 'discovering' an effective policy versus optimizing within a pre-structured reward landscape.
Authors: We appreciate this observation on reward engineering. The manuscript already notes that equal weighting produced control only of the actuated cylinder. In the revision we have expanded the relevant section to explain the rationale for the asymmetric weights, to state that they were introduced after equal weighting failed, and to clarify that the low-frequency lock-on policy itself emerged from the agent's interaction with the flow rather than being explicitly encoded in the reward. This discussion now better distinguishes between reward shaping and policy discovery. revision: yes
Circularity Check
No circularity: experimental DRL reports measured outcomes from discovered policies
full rationale
The paper is an experimental study that trains DRL agents and reports measured vibration suppression percentages (e.g., >95% fully actuated, 70%/90% underactuated) as direct performance results on the physical system. No derivation chain, first-principles equations, or fitted parameters are presented whose outputs reduce by construction to the training inputs or self-citations. The analysis of training dynamics is post-hoc interpretation of observed agent behavior, not a predictive model that re-derives the suppression metrics. The work is self-contained as an empirical demonstration with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
free parameters (1)
- asymmetric reward weights
axioms (2)
- domain assumption The laboratory flow and structural response accurately represent the target engineering conditions.
- domain assumption The DRL training converges to a policy that generalizes beyond the training episodes.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the DRL agent discovers a high-frequency, phase-locked bang-bang control strategy that suppresses the vibrations of both cylinders by more than 95%
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
asymmetric reward weighting enables a low-frequency lock-on strategy
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
M. M. Bernitsas, K. Raghavan, Y. Ben-Simon, E. Garcia, Vivace (vortex induced vibration aquatic clean energy): A new concept in generation of clean and renewable energy from fluid flow (2008)
work page 2008
-
[2]
D. t. Weaver, J. Fitzpatrick, A review of cross-flow induced vibrations in heat exchanger tube arrays, Journal of fluids and structures 2 (1) (1988) 73–93. 25
work page 1988
-
[3]
M. Paidoussis, A review of flow-induced vibrations in reactors and reactor components, Nuclear Engineering and Design 74 (1) (1983) 31–60
work page 1983
-
[4]
G. Papaioannou, D. Yue, M. Triantafyllou, G. Karniadakis, On the effect of spacing on the vortex-induced vibrations of two tandem cylinders, Journal of Fluids and Structures 24 (6) (2008) 833–854
work page 2008
- [5]
-
[6]
D. Sumner, Two circular cylinders in cross-flow: A review, Journal of fluids and structures 26 (6) (2010) 849–899
work page 2010
-
[7]
S. Kim, M. M. Alam, H. Sakamoto, Y. Zhou, Flow-induced vibration of two circular cylinders in tandem arrangement. part 2: Suppression of vibrations, Journal of wind engineering and industrial aerodynamics 97 (5-6) (2009) 312–319
work page 2009
-
[8]
I. Korkischko, J. R. Meneghini, Experimental investigation of flow-induced vibration on isolated and tandem circular cylinders fitted with strakes, Journal of Fluids and Structures 26 (4) (2010) 611–625
work page 2010
-
[9]
M. Blumberg, E. Tellier, D. Deka, T. Zhou, Experimental evaluation of vor- tex induced vibration response of straked pipes in tandem arrangements, in: International Conference on Offshore Mechanics and Arctic Engineering, Vol. 44922, American Society of Mechanical Engineers, 2012, pp. 873–881
work page 2012
-
[10]
C. Dongyang, L. K. Abbas, W. Guoping, R. Xiaoting, P. Marzocca, Nu- merical study of flow-induced vibrations of cylinders under the action of nonlinear energy sinks (ness), Nonlinear Dynamics 94 (2) (2018) 925–957
work page 2018
-
[11]
W. Xu, Y. Yu, E. Wang, L. Zhou, Flow-induced vibration (fiv) suppression of two tandem long flexible cylinders attached with helical strakes, Ocean Engineering 169 (2018) 49–69
work page 2018
-
[12]
B. Latrobe, E. G. Ohanu, E. Fernandez, S. Bhattacharya, Flow control over tandem cylinders using plasma actuators, Experimental Thermal and Fluid Science 159 (2024) 111274
work page 2024
-
[13]
A. Eltaweel, M. Wang, D. Kim, F. O. Thomas, A. V. Kozlov, Numerical investigation of tandem-cylinder noise reduction using plasma-based flow control, Journal of Fluid Mechanics 756 (2014) 422–451
work page 2014
-
[14]
A. V. Kozlov, F. O. Thomas, Plasma flow control of cylinders in a tandem configuration, AIAA journal 49 (10) (2011) 2183–2193
work page 2011
-
[15]
A. H. Rabiee, M. Esmaeili, Simultaneous vortex-and wake-induced vibra- tion suppression of tandem-arranged circular cylinders using active feed- back control system, Journal of Sound and Vibration 469 (2020) 115131. 26
work page 2020
- [16]
- [17]
-
[18]
R. C. Mysa, A. Kaboudian, R. K. Jaiman, On the origin of wake-induced vibration in two tandem circular cylinders at low reynolds number, Journal of Fluids and Structures 61 (2016) 76–98
work page 2016
-
[19]
Parametric Reduced-Order modeling and Closed-Loop Control of Tandem-Cylinder Wakes
T. Vojkovic, et al., Parametric reduced-order modeling and closed-loop con- trol of tandem-cylinder wakes, arXiv preprint arXiv:2604.02440Submitted April 2, 2026 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[20]
W.-L. Chen, Y. Huang, C. Chen, H. Yu, D. Gao, Review of active control of circular cylinder flow, Ocean Engineering 258 (2022) 111840
work page 2022
-
[21]
J. Rabault, M. Kuchta, A. Jensen, U. R´ eglade, N. Cerardi, Artificial neu- ral networks trained through deep reinforcement learning discover control strategies for active flow control, Journal of fluid mechanics 865 (2019) 281–302
work page 2019
-
[22]
D. Fan, L. Yang, Z. Wang, M. S. Triantafyllou, G. E. Karniadakis, Rein- forcement learning for bluff body active flow control in experiments and simulations, Proceedings of the National Academy of Sciences 117 (42) (2020) 26091–26098.doi:10.1073/pnas.2004939117
- [23]
-
[24]
B. Font, F. Alc´ antara-´Avila, J. Rabault, R. Vinuesa, O. Lehmkuhl, Deep reinforcement learning for active flow control in a turbulent separation bub- ble, Nature communications 16 (1) (2025) 1422
work page 2025
- [25]
- [26]
-
[27]
F. Ren, C. Wang, J. Song, H. Tang, Deep reinforcement learning finds a new strategy for vortex-induced vibration control, Journal of Fluid Mechanics 990 (2024) A7
work page 2024
-
[28]
H. Sababha, B. Font, M. Daqaq, Deep reinforcement learning in action: Real-time control of vortex-induced vibrations, Physics of Fluids 38 (1) (2026). 27
work page 2026
- [29]
- [30]
-
[31]
F. Ren, Z. Ding, Y. Zhao, D. Song, Active control of wake-induced vibration using deep reinforcement learning, Physics of Fluids 36 (12) (2024)
work page 2024
-
[32]
Z. Xie, H. Hu, J. Chen, J. Song, T. Lu, F. Ren, Applying reinforcement learning to mitigate wake-induced lift fluctuation of a wall-confined circular cylinder in tandem configuration, Physics of Fluids 35 (5) (2023)
work page 2023
-
[33]
K.-S. Hong, U. H. Shah, Vortex-induced vibrations and control of marine risers: A review, Ocean Engineering 152 (2018) 300–315
work page 2018
-
[34]
M. O. Awadallah, C. Jiang, O. el Moctar, A. A. Hassan, Improving energy harvesting in flow-induced vibrations of multi-cylinder square arrays with vortex generators, Ocean Engineering 328 (2025) 121057
work page 2025
-
[35]
Y. Li, X. Liu, Z. Li, D. Duan, S. Dai, H. Zhang, Vortex-induced vibration characteristics of an underwater manipulator in pulsating flow, Journal of Marine Science and Application 25 (1) (2026) 63–81
work page 2026
- [36]
-
[37]
T. Prasanth, S. Mittal, Flow-induced oscillation of two circular cylinders in tandem arrangement at low re, Journal of fluids and structures 25 (6) (2009) 1029–1048
work page 2009
-
[38]
G. R. d. S. Assi, P. Bearman, J. Meneghini, On the wake-induced vibration of tandem circular cylinders: the vortex interaction excitation mechanism, Journal of Fluid Mechanics 661 (2010) 365–401
work page 2010
-
[39]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[40]
R. S. Sutton, A. G. Barto, et al., Reinforcement learning: An introduction, Vol. 1, MIT press Cambridge, 1998
work page 1998
-
[41]
S. P. Singh, T. Jaakkola, M. I. Jordan, Learning without state-estimation in partially observable markovian decision processes, in: Machine Learning Proceedings 1994, Elsevier, 1994, pp. 284–292
work page 1994
-
[42]
Bertsekas, Dynamic programming and optimal control: Volume I, Vol
D. Bertsekas, Dynamic programming and optimal control: Volume I, Vol. 4, Athena scientific, 2012. 28
work page 2012
-
[43]
G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, T. Hester, Challenges of real-world reinforcement learning: definitions, benchmarks and analysis, Machine Learning 110 (9) (2021) 2419–2468
work page 2021
-
[44]
C. Xia, J. Zhang, E. C. Kerrigan, G. Rigas, Active flow control for bluff body drag reduction using reinforcement learning with partial measure- ments, Journal of Fluid Mechanics 981 (2024) A17
work page 2024
-
[45]
M. Weissenbacher, A. Borovykh, G. Rigas, Reinforcement learning of chaotic systems control in partially observable environments, Flow, Tur- bulence and Combustion (2025) 1–22
work page 2025
-
[46]
L. Sonneborn, F. Van Vleck, The bang-bang principle for linear control systems, Journal of the Society for Industrial and Applied Mathematics, Series A: Control 2 (2) (1964) 151–159
work page 1964
-
[47]
Feldman, Hilbert transform applications in mechanical vibration, John Wiley & Sons, 2011
M. Feldman, Hilbert transform applications in mechanical vibration, John Wiley & Sons, 2011
work page 2011
-
[48]
L. Du, X. Sun, Suppression of vortex-induced vibration using the rotary oscillation of a cylinder, Physics of Fluids 27 (2) (2015)
work page 2015
-
[49]
K. Wong, J. Zhao, D. L. Jacono, M. C. Thompson, J. Sheridan, Experimen- tal investigation of flow-induced vibration of a rotating circular cylinder, Journal of Fluid Mechanics 829 (2017) 486–511
work page 2017
-
[50]
S. Lin, Q. Mi, T. Gao, A survey of curriculum learning in deep reinforce- ment learning, in: 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC), IEEE, 2025, pp. 01141–01147
work page 2025
-
[51]
R. Bourguet, D. L. Jacono, Flow-induced vibrations of a rotating cylinder, Journal of Fluid Mechanics 740 (2014) 342–380. 29
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.