NektarIR: A Domain-Specific Compiler for High-Order Finite Element Operations on Heterogeneous Hardware
Pith reviewed 2026-06-26 15:05 UTC · model grok-4.3
The pith
A domain-specific MLIR compiler lowers high-level finite element abstractions to optimized kernels for CPUs and GPUs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through the NektarIR MLIR dialect and its lowering pipeline, common finite element operators are represented at a domain-specific level and compiled just-in-time to efficient code for both CPU and GPU targets, as demonstrated by comparisons with the existing Nektar++ framework.
What carries the argument
The NektarIR dialect, a custom MLIR intermediate representation for finite element operators, together with a bespoke lowering pipeline that applies domain-aware optimizations during progressive lowering.
If this is right
- Common finite element operators can be composed into kernels for discrete differential operators.
- These kernels can be just-in-time compiled for CPU and GPU architectures.
- Performance is achieved without manual bespoke optimization for each hardware vendor.
- Applications in computational fluid dynamics using spectral/hp element methods benefit from this automation.
Where Pith is reading between the lines
- Similar domain-specific compilers could be developed for other scientific computing domains beyond finite elements.
- The approach may extend to additional hardware types like accelerators beyond CPU and GPU.
- Integration with existing frameworks like Nektar++ could be further automated for broader adoption.
Load-bearing premise
The performance of the solvers depends on a small set of common finite element operators that can be fully represented and optimized in the MLIR dialect without losing efficiency or correctness.
What would settle it
Running the generated kernels on target hardware and finding that they underperform hand-tuned implementations on the same architecture by a significant margin would disprove the claim.
Figures
read the original abstract
Modern high performance computing (HPC) applications must target heterogeneous hardware. This requires significant work to ensure domain specific implementations translate to highly performant kernels across a range hardware types and vendors, each requiring bespoke optimization to make use of the specific target architecture. Through the development of a domain specific compiler built with the multi-level intermediate representations (MLIR) project, one can express a high-level, close to the specific domain, abstraction that is progressively lowered to a low, close to metal, abstraction. At each intermediate representation (IR), appropriate optimizations can be applied without costly analysis due to the knowledge embedded in the domain specific IRs. We apply this method to the construction of discrete differential operators for use in spectral/hp element method solvers for computational fluid dynamics (CFD). Here, the performance is driven by a small set of common finite element operators that are composed to create kernels for the discrete differential operators used to solve weak partial differential equations. We create our own MLIR dialect to represent these operators and implement a bespoke lowering pipeline to facilitate the just-in-time compilation of these kernels for both CPU and GPU architecture and illustrate performance comparisons with the Nektar++ spectral/hp element framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NektarIR, a custom MLIR dialect for representing high-order finite element operators (mass, stiffness, etc.) in spectral/hp element methods for CFD. It describes a bespoke lowering pipeline that progressively lowers domain-specific IRs to CPU/GPU targets via MLIR, claiming that embedded domain knowledge enables optimizations without costly analysis and that the approach yields performance comparable to the hand-written kernels in the Nektar++ framework.
Significance. If the central preservation claim holds, the work would demonstrate a practical route to portable, high-performance FE kernels on heterogeneous hardware by leveraging MLIR's multi-level IR structure, potentially reducing the engineering effort required for architecture-specific tuning in spectral/hp solvers.
major comments (2)
- [Abstract] Abstract: the assertion that 'performance comparisons with the Nektar++ spectral/hp element framework' are illustrated is unsupported by any quantitative data, error analysis, benchmark tables, or figures; without these, the claim that the NektarIR lowering retains efficiency and correctness relative to reference operators cannot be evaluated.
- [Abstract] Abstract: the description of the NektarIR dialect and 'bespoke lowering pipeline' supplies neither the dialect operation definitions, the lowering rules between IR levels, nor any verification (e.g., mathematical equivalence checks or floating-point reproducibility tests) that the generated kernels for discrete differential operators remain equivalent to those in Nektar++; this directly undermines the weakest assumption that common FE operators can be captured and lowered without loss of efficiency or correctness.
minor comments (1)
- [Abstract] The abstract refers to 'a small set of common finite element operators' but does not enumerate them or indicate which subset is implemented in the dialect.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comments correctly identify areas where the abstract overstates the manuscript's content without sufficient supporting material. We will revise the manuscript to address these issues directly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion that 'performance comparisons with the Nektar++ spectral/hp element framework' are illustrated is unsupported by any quantitative data, error analysis, benchmark tables, or figures; without these, the claim that the NektarIR lowering retains efficiency and correctness relative to reference operators cannot be evaluated.
Authors: We agree that the abstract should not assert that performance comparisons are illustrated without quantitative support being evident. The current manuscript text provided does not include benchmark tables, figures, or error analysis to back this claim. We will revise the abstract to remove or qualify the unsupported assertion and add a concise summary of performance results (including key metrics and references to tables/figures) along with basic error analysis in the revised version. revision: yes
-
Referee: [Abstract] Abstract: the description of the NektarIR dialect and 'bespoke lowering pipeline' supplies neither the dialect operation definitions, the lowering rules between IR levels, nor any verification (e.g., mathematical equivalence checks or floating-point reproducibility tests) that the generated kernels for discrete differential operators remain equivalent to those in Nektar++; this directly undermines the weakest assumption that common FE operators can be captured and lowered without loss of efficiency or correctness.
Authors: The referee is correct that the abstract provides no operation definitions, lowering rules, or verification evidence. While the full manuscript describes the dialect and pipeline at a high level, it does not supply the requested specifics or equivalence tests. We will revise by adding an appendix or expanded section with sample NektarIR operation definitions, key lowering rules, and verification results (e.g., mathematical equivalence and reproducibility tests) to substantiate the claims. revision: yes
Circularity Check
No circularity; implementation description relies on external MLIR framework
full rationale
The paper presents an engineering effort to build a new MLIR dialect (NektarIR) and lowering pipeline for representing and compiling finite-element operators. No mathematical derivation chain, fitted parameters, or predictions are claimed. The central assertions concern the feasibility of capturing common operators (mass, stiffness, etc.) in the dialect and lowering them without loss of correctness or efficiency; these are supported by reference to the external MLIR project rather than any self-referential construction or self-citation load-bearing step. The work is therefore self-contained as an implementation report.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption MLIR provides a suitable multi-level IR framework for embedding domain knowledge and applying staged optimizations without costly analysis.
invented entities (1)
-
NektarIR MLIR dialect
no independent evidence
Reference graph
Works this paper leans on
-
[1]
High-performance finite elements with MFEM.International Journal of High Performance Computing Applications38, 5 (2024), 447–467. doi:10.1177/10943420241261981 Daniel Arndt, Wolfgang Bangerth, Maximilian Bergbauer, Bruno Blais, Marc Fehling, Rene Gassmöller, Timo Heister, Luca Heltai, Martin Kronbichler, Matthias Maier, Peter Munch, Sam Scheuerman, Bruno ...
-
[2]
Journal of Numerical Mathematics 33, 403–415
The deal.II library, Version 9.7.Journal of Numerical Mathematics33, 4 (2025), 403–415. doi:10.1515/jnma-2025-0115 Igor A Baratta, Joseph P Dean, Jørgen S Dokken, Jack S Hale, Chris N Richardson, Marie E Rognes, Matthew W Scroggs, Nathan Sime, and Garth N Wells
-
[3]
DOLFINx: The next generation FEniCS problem solving environment.10.5281/zen- odo.10447665.(12 2025). doi:10.5281/zenodo.18101307 Peter Bastian, Markus Blatt, Andreas Dedner, Nils Arne Dreier, Christian Engwer, René Fritze, Carsten Gräser, Christoph Grüninger, Dominic Kempf, Robert Klöfkorn, Mario Ohlberger, and Oliver Sander
-
[4]
The DUNE framework: Basic concepts and recent developments.Computers and Mathematics with Applications81 (2021), 75–112. doi:10.1016/j.camwa. 2020.06.007 Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah
-
[5]
Julia: A fresh approach to numerical computing. SIAM Rev.59, 1 (2017), 65–98. doi:10.1137/141000671 Aart Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad
-
[6]
Compiler Support for Sparse Tensor Computations in MLIR.ACM Transactions on Architecture and Code Optimization19, 4 (9 2022), 1–25. doi:10.1145/3544559 Amy. Brown and Greg. Wilson. 2011.The architecture of open source applications : elegance, evolution, and a few fearless hacks. [CreativeCommons], CA, USA. 415 pages. https://aosabook.org/en/v1/llvm.html C...
-
[7]
doi:10.1016/j.cpc.2015.02.008 Clang [n
Nektar++: An open-source spectral/hp element framework.Computer Physics Communications192 (7 2015), 205–219. doi:10.1016/j.cpc.2015.02.008 Clang [n. d.].Clang: a C language family frontend for LLVM. Retrieved 1 Jun 2026 from https://clang.llvm.org/ Philippe Clauss, Ervin Altintas, and Matthieu Kuhn
-
[8]
InProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS
Automatic Collapsing of Non-Rectangular Loops. InProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS
2017
-
[9]
doi:10.1109/IPDPS.2017.34 OpenXLA Community. 2023.StableHLO Specification. Accessed: 1 Jun
-
[10]
ACM Trans
Efficient vectorised kernels for unstructured high-order finite element fluid solvers on GPU architectures in two dimensions.Computer Physics Communications284 (3 2023), 108624. ACM Trans. Math. Softw., Vol. 1, No. 1, Article . Publication date: June
2023
-
[11]
NektarIR: A Domain-Specific Compiler for High-Order Finite Element Operations on Heterogeneous Hardware 23 doi:10.1016/j.cpc.2022.108624 Marc Fehling and Wolfgang Bangerth
-
[12]
Algorithms for Parallel Generic hp-Adaptive Finite Element Software.ACM Trans. Math. Softw.49, 3, Article 25 (Sept. 2023), 26 pages. doi:10.1145/3603372 Paul Fischer, Stefan Kerkemeier, Misun Min, Yu-Hsiang Lan, Malachi Phillips, Thilina Rathnayake, Elia Merzari, Ananias Tomboulides, Ali Karakus, Noel Chalmers, and Tim Warburton
-
[13]
NekRS, a GPU-accelerated spectral element Navier–Stokes solver.Parallel Comput.114 (2022), 102982. doi:10.1016/j.parco.2022.102982 Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, and Tobias Grosser
-
[14]
Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-Accelerated Climate Simulation.ACM Transactions on Architecture and Code Optimization18, 4 (12 2021), 1–23. doi:10.1145/3469030 Niclas Jansson, Martin Karp, Artur Podobas, Stefano Markidis, and Philipp Schlatter
-
[15]
doi:10.1016/j.compfluid.2024.106243 George Karniadakis and Spencer Sherwin
Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics.Computers and Fluids275 (2024), 106243. doi:10.1016/j.compfluid.2024.106243 George Karniadakis and Spencer Sherwin. 2005.Spectral/hp Element Methods for Computational Fluid Dynamics(2nd ed.). Oxford University Press, Oxford, United Kingdom. doi:10.1093/acprof:o...
-
[16]
High-order splitting methods for the incompressible Navier-Stokes equations.J. Comput. Phys.97, 2 (1991), 414–443. doi:10.1016/0021-9991(91)90007-8 Kaloyan S. Kirilov, Jingtian Zhou, Joaquim Peiró, and David Moxey
-
[17]
doi:10.1016/j.cad.2025.103962 S
High-order curvilinear mesh generation from third-party meshes.Computer-Aided Design191 (2026), 103962. doi:10.1016/j.cad.2025.103962 S. Klabnik, C. Nichols, and C. Krycho. 2026.The Rust Programming Language, 3rd Edition. No Starch Press. https: //books.google.co.uk/books?id=Nm9REQAAQBAJ Tzanio Kolev, Paul Fischer, Misun Min, Jack Dongarra, Jed Brown, Ves...
-
[18]
Efficient exascale discretizations: High-order finite element methods,
Efficient exascale discretizations: High-order finite element methods.International Journal of High Performance Computing Applications35, 6 (11 2021), 527–552. doi:10.1177/10943420211020803 Chris Lattner and Vikram Adve
-
[19]
CoRRabs/2002.11054 (2020), 1–21
MLIR: A Compiler Infrastructure for the End of Moore’s Law. CoRRabs/2002.11054 (2020), 1–21. https://arxiv.org/abs/2002.11054 Hsin I.Cindy Liu, Marius Brehler, Mahesh Ravishankar, Nicolas Vasilache, Ben Vanik, and Stella Laurenzo
arXiv 2002
-
[20]
doi:10.1109/MM.2022.3178068 LLVM
TinyIREE: An ML Execution Environment for Embedded Systems from Compilation to Deployment.IEEE Micro42, 5 (2022), 9–16. doi:10.1109/MM.2022.3178068 LLVM. [n. d.].Torch-MLIR. Accessed: 1 Jun
-
[21]
Industry-Relevant Implicit Large-Eddy Simulation of a High-Performance Road Car via Spectral/hp Element Methods.SIAM Rev.63, 4 (2021), 723–755. arXiv:https://doi.org/10.1137/20M1345359 doi:10.1137/20M1345359 Pascal Mossier, Daniel Appel, Andrea D. Beck, and Claus-Dieter Munz
-
[22]
An Efficient hp-Adaptive Strategy for a Level-Set Ghost-Fluid Method.J. Sci. Comput.97, 2 (Oct. 2023), 41 pages. doi:10.1007/s10915-023-02363-7 David Moxey, Roman Amici, and Mike Kirby. 2020a. Efficient matrix-free high-order finite element evaluation for simplicial elements.SIAM Journal on Scientific Computing42, 3 (2020), C97–C123. doi:10.1137/19M124652...
-
[23]
Spectral methods for problems in complex geometries.J. Comput. Phys.37, 1 (1980), 70–92. doi:10.1016/0021-9991(80)90005-4 Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T.T. McRae, Gheorghe Teodor Bercea, Graham R. Markall, and Paul H.J. Kelly
-
[24]
Firedrake: Automating the finite element method by composing abstractions.ACM Trans. Math. Software43, 3 (2016), 1–27. doi:10.1145/2998441 Samuel Williams, Andrew Waterman, and David Patterson
-
[25]
Roofline: an insightful visual performance model for multicore architectures.Commun. ACM52, 4 (April 2009), 65–76. doi:10.1145/1498765.1498785 Jacques Y. Xing, Boyang Xia, Diego Renner, Chris D. Cantwell, David Moxey, Robert M. Kirby, and Spencer J. Sher- win
-
[26]
arXiv:2604.04644 [math.NA] https://arxiv.org/abs/2604.04644 ACM Trans
Architecture-aware ℎ-to-𝑝 optimisation: spectral/ ℎ𝑝 element operators for mixed-element meshes. arXiv:2604.04644 [math.NA] https://arxiv.org/abs/2604.04644 ACM Trans. Math. Softw., Vol. 1, No. 1, Article . Publication date: June 2026
Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.