Recognition: unknown
Adaptation of AI-accelerated CFD Simulations to the IPU platform
Pith reviewed 2026-05-09 19:26 UTC · model grok-4.3
The pith
Adapting AI-accelerated CFD simulations to IPU hardware enables scalable training with throughput increasing from 560 to 2805 samples per second across sixteen processors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By porting the training program to the IPU-POD16 platform the authors show that the popdist library removes the host-side data feeding bottleneck to deliver up to 34 percent speedup. Although moving from one to two IPUs brings no gain due to communication overheads, scaling from two to sixteen IPUs raises throughput from 560.8 to 2805.8 samples per second while the model still produces accurate predictions of fluid simulation states.
What carries the argument
The popdist library for overcoming the single-host data feeding limitation during distributed training on multiple IPUs.
If this is right
- Using popdist to distribute data loading yields up to 34% training speedup.
- Data parallelism shows no benefit from one to two IPUs due to overhead but supports good scaling beyond that.
- The adapted model maintains accurate predictions for simulation states on the new hardware.
- Throughput scales substantially with more IPUs once initial communication costs are covered.
Where Pith is reading between the lines
- This porting strategy may extend to training AI models for other types of physics simulations if their data pipelines can be similarly distributed.
- IPU clusters could become a practical option for speeding up the development of hybrid AI-numerical simulation tools.
- Repeating the experiments on larger models or different CFD problems would show how widely the scaling behavior applies.
Load-bearing premise
Model prediction accuracy stays the same after porting to IPU hardware and the throughput numbers generalize to other datasets and model architectures.
What would settle it
Measuring the test set prediction error of the model trained on IPU hardware versus the original version, or running the throughput test with a new OpenFOAM dataset or different neural network design.
Figures
read the original abstract
Intelligence Processing Units (IPU) have proven useful for many AI applications. In this paper, we evaluate them within the emerging field of \emph{AI for simulation}, where traditional numerical simulations are supported by artificial intelligence approaches. We focus specifically on a program for training machine learning models supporting a \emph{computational fluid dynamics} application. We use custom TensorFlow provided by the Poplar SDK to adapt the program for the IPU-POD16 platform and investigate its ease of use and performance scalability. Training a model on data from OpenFOAM simulations allows us to get accurate simulation state predictions in test time. We show how to utilize the \emph{popdist} library to overcome a performance bottleneck in feeding training data to the IPU on the host side, achieving up to 34\% speedup. Due to communication overheads, using data parallelism to utilize two IPUs instead of one does not improve the throughput. However, once the intra-IPU costs have been paid, the hardware capabilities for inter-IPU communication allow for good scalability. Increasing the number of IPUs from 2 to 16 improves the throughput from 560.8 to 2805.8 samples/s.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates the adaptation of a TensorFlow-based machine learning model, trained on data from OpenFOAM computational fluid dynamics simulations, to the IPU-POD16 platform using the Poplar SDK. It investigates ease of use and performance scalability, particularly using the popdist library to address host-side data feeding bottlenecks (up to 34% speedup). The work reports throughput scaling from 560.8 samples/s with 2 IPUs to 2805.8 samples/s with 16 IPUs and asserts that the adapted model delivers accurate simulation state predictions at test time.
Significance. If the accuracy of the predictions is preserved after the IPU port, the paper supplies concrete empirical data on IPU suitability for AI-accelerated CFD workloads, including practical use of popdist for data parallelism and observed scaling behavior once intra-IPU costs are amortized. The specific numeric throughput figures constitute a reproducible benchmark that could guide hardware selection in scientific HPC. The absence of any accuracy quantification, however, substantially reduces the result's utility for CFD applications.
major comments (2)
- Abstract: The claim that the model 'allows us to get accurate simulation state predictions in test time' is presented without any supporting quantitative metrics (MSE, relative L2 error, validation loss, or direct comparison to the non-IPU baseline). This is load-bearing for the central contribution because the reported throughput numbers (560.8 to 2805.8 samples/s) and the 34% popdist speedup lose practical meaning for AI-accelerated CFD if prediction quality has degraded due to reduced precision, data-parallel artifacts, or hardware-specific numerics.
- Results discussion (scaling paragraph): The observation that data parallelism with two IPUs yields no throughput gain due to communication overheads, while scaling improves from 2 to 16 IPUs, is stated without accompanying details on batch size, model architecture, or verification that accuracy remains constant across parallelism levels. This makes it difficult to assess whether the scalability claim generalizes or is specific to the chosen OpenFOAM dataset and TensorFlow model.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and agree that additional quantitative support for accuracy claims and expanded details on the scaling experiments will strengthen the paper. We address each point below and indicate the changes made in the revised version.
read point-by-point responses
-
Referee: Abstract: The claim that the model 'allows us to get accurate simulation state predictions in test time' is presented without any supporting quantitative metrics (MSE, relative L2 error, validation loss, or direct comparison to the non-IPU baseline). This is load-bearing for the central contribution because the reported throughput numbers (560.8 to 2805.8 samples/s) and the 34% popdist speedup lose practical meaning for AI-accelerated CFD if prediction quality has degraded due to reduced precision, data-parallel artifacts, or hardware-specific numerics.
Authors: We agree that the abstract's accuracy claim would benefit from explicit quantitative backing to fully support the performance results. The IPU adaptation preserves the original model's architecture, training procedure, and floating-point precision, so no degradation is expected; however, to make this self-evident, we have added a dedicated paragraph in the Results section reporting MSE and relative L2 error on the held-out test set, together with a side-by-side comparison against the non-IPU TensorFlow baseline. These metrics confirm equivalent accuracy. The abstract has also been revised to reference the new quantitative findings. revision: yes
-
Referee: Results discussion (scaling paragraph): The observation that data parallelism with two IPUs yields no throughput gain due to communication overheads, while scaling improves from 2 to 16 IPUs, is stated without accompanying details on batch size, model architecture, or verification that accuracy remains constant across parallelism levels. This makes it difficult to assess whether the scalability claim generalizes or is specific to the chosen OpenFOAM dataset and TensorFlow model.
Authors: We accept that the scaling paragraph would be clearer with these supporting details. In the revised manuscript we have expanded the paragraph to state the batch size employed, briefly recap the model architecture, and report that validation loss (and therefore test-time accuracy) remains unchanged across all tested IPU counts. This invariance follows directly from the data-parallel training strategy, which replicates the identical model and aggregates gradients identically regardless of the number of IPUs. The added information allows readers to judge the applicability of the observed scaling to other CFD workloads. revision: yes
Circularity Check
No circularity: purely empirical benchmarking with direct measurements
full rationale
The paper contains no derivations, equations, fitted parameters presented as predictions, or load-bearing self-citations. All central claims (throughput scaling from 560.8 to 2805.8 samples/s with 2-to-16 IPUs, up to 34% popdist speedup) are direct empirical measurements of runtime on IPU-POD16 hardware after porting via Poplar SDK. The statement that training on OpenFOAM data yields accurate test-time predictions is an assertion without supporting math or reduction to inputs. No step reduces by construction to its own inputs or prior self-citation; results are externally falsifiable via hardware benchmarks. This is the expected non-finding for a performance-porting study.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
https://moorinsightsstrategy.com/research-paper-the-graphcore-second- generation-ipu/ (2020)
Freund, K., Moorhead, P.: The Graphcore Second-Generation IPU. https://moorinsightsstrategy.com/research-paper-the-graphcore-second- generation-ipu/ (2020)
2020
-
[2]
In: 2021 16th Conference on Computer Science and Intelligence Systems (FedCSIS)
Gepner, P.: Machine Learning and High-Performance Computing Hybrid Systems, a New Way of Performance Acceleration in Engineering and Scientific Applications. In: 2021 16th Conference on Computer Science and Intelligence Systems (FedCSIS). pp. 27–36 (2021). https://doi.org/10.15439/2021F004
-
[3]
Water Environment Research pp
Iserte, S., Carratala, P., Arnau, R., Barreda, P., Basiero, L., Matínez-Cuenca, R., Climent, J., Chiva, S.: Modeling of Wastewater Treatment Processes with Hy- droSludge. Water Environment Research pp. 1–38 (2021)
2021
-
[4]
Journal of Computational Science 62, 101741 (2022)
Iserte, S., Macías, A., Martínez-Cuenca, R., Chiva, S., Paredes, R., Quintana- Ortí, E.S.: Accelerating Urban Scale Simulations Leveraging Local Spa- tial 3D Structure. Journal of Computational Science 62, 101741 (2022). https://doi.org/https://doi.org/10.1016/j.jocs.2022.101741
-
[5]
Computer Graphics Forum 38(2), 59–70 (2019)
Kim, B., Azevedo, V.C., Thuerey, N., Kim, T., Gross, M., Solenthaler, B.: Deep Fluids: A Generative Network for Parameterized Fluid Simulations. Computer Graphics Forum 38(2), 59–70 (2019). https://doi.org/doi.org/10.1111/cgf.13619, https://onlinelibrary.wiley.com/doi/10.1111/cgf.13619
-
[6]
Machine learning–accelerated computational fluid dynamics,
Kochkov, D., Smith, J.A., Alieva, A., Wang, Q., Brenner, M.P., Hoyer, S.: Machine Learning-accelerated Computational Fluid Dynamics. Proceedings of the National Academy of Sciences 118(21), e2101784118 (2021). https://doi.org/10.1073/pnas.2101784118, https://www.pnas.org/doi/abs/10.1073/pnas.2101784118
- [7]
-
[8]
Fron- tiers of Computer Science 11(5), 746–761 (2017)
Li, Z., Wang, Y., Zhi, T., Chen, T.: A survey of neural network accelerators. Fron- tiers of Computer Science 11(5), 746–761 (2017). https://doi.org/10.1007/s11704- 016-6159-1, https://doi.org/10.1007/s11704-016-6159-1
-
[9]
Maulik, R., San, O., Rasheed, A., Vedula, P.: Subgrid Modelling for Two- dimensional Turbulence Using Neural Networks. Journal of Fluid Mechanics 858, 122144 (2019). https://doi.org/10.1017/jfm.2018.770
- [10]
-
[11]
In: HeteroPar 2022
Rojek, K., Wyrzykowski, R.: Performance and scalability analysis of AI-accelerated CFD simulations across various computing platforms. In: HeteroPar 2022. Springer International Publishing (in press 2022)
2022
-
[12]
In: Computational Science – ICCS
Rojek, K., Wyrzykowski, R., Gepner, P.: AI-Accelerated CFD Simulation Based on OpenFOAM and CPU/GPU Computing. In: Computational Science – ICCS
-
[13]
pp. 373–385. Springer International Publishing, Cham (2021)
2021
-
[14]
A., Bustos, B., & Hitschfeld, N
Rociszewski, P., Iwaski, M., Czarnul, P.: The impact of the AC922 Architecture on Performance of Deep Neural Network Training. In: 2019 International Conference on High Performance Computing Simulation (HPCS). pp. 666–673 (Jul 2019). https://doi.org/10.1109/HPCS48598.2019.9188164
-
[15]
Horovod: fast and easy distributed deep learning in TensorFlow
Sergeev, A., Del Balso, M.: Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv:1802.05799 [cs, stat] (Feb 2018), http://arxiv.org/abs/1802.05799, arXiv: 1802.05799 Adaptation of AI-accelerated CFD simulations to the IPU platform 13
work page Pith review arXiv 2018
-
[16]
In: 2017 IEEE Custom Integrated Circuits Conference (CICC)
Sze, V., Chen, Y.H., Emer, J., Suleiman, A., Zhang, Z.: Hardware for machine learning: Challenges and opportunities. In: 2017 IEEE Custom Integrated Circuits Conference (CICC). pp. 1–8 (04 2018). https://doi.org/10.1109/CICC.2018.8357072
-
[17]
Thuerey, N., WeiSSenow, K., Prantl, L., Hu, X.: Deep learning methods for reynolds-averaged navierstokes simulations of airfoil flows. AIAA Journal 58, 1–12 (11 2019). https://doi.org/10.2514/1.J058291
-
[18]
In: Proceedings of the 34th International Conference on Neural Information Processing Systems
Um, K., Brand, R., Fei, Y.R., Holl, P., Thuerey, N.: Solver-in-the-Loop: Learning from Differentiable Physics to Interact with Iterative PDE-Solvers. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20, Curran Associates Inc., Red Hook, NY, USA (2020)
2020
-
[19]
Computer Graphics Forum 38(2), 71–82 (2019)
Wiewel, S., Becher, M., Thuerey, N.: Latent Space Physics: Towards Learning the Temporal Evolution of Fluid Flow. Computer Graphics Forum 38(2), 71–82 (2019). https://doi.org/doi.org/10.1111/cgf.13620, https://onlinelibrary.wiley.com/doi/10.1111/cgf.13620
- [20]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.