Investigating the OPS intermediate representation to target GPUs in the Devito DSL
Pith reviewed 2026-05-25 15:21 UTC · model grok-4.3
The pith
Devito adds an OPS backend to target GPUs and deliver considerable speedups over its core backend for finite-difference PDE solvers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By providing an implementation of a OPS backend in Devito, the authors obtain considerable speed ups compared to the core Devito backend for applications running on structured meshes targeting various platforms including GPUs.
What carries the argument
The OPS backend implementation in Devito, which maps Devito's finite-difference code generation to the OPS API for platform-specific optimized output.
If this is right
- Devito-generated code for seismic problems can now target GPUs through the OPS layer.
- The same high-level Devito models produce optimized output for multiple hardware platforms via OPS.
- Performance gains appear for finite-difference methods on structured meshes without altering the original problem specification.
- The integration demonstrates a pathway for other code-generation DSLs to reach GPUs by adopting an intermediate API such as OPS.
Where Pith is reading between the lines
- Similar backend additions could allow Devito to target additional accelerators if corresponding OPS-like APIs exist for those devices.
- The observed speedups may vary with problem size, mesh resolution, or specific finite-difference stencil, suggesting targeted benchmarking would be needed for new applications.
- If the OPS integration adds no extra user-facing complexity, it could encourage adoption of Devito in production GPU workflows where hand-written kernels are currently used.
Load-bearing premise
An OPS backend can be added to Devito while preserving correctness and delivering net performance gains on target GPU hardware without prohibitive compilation or runtime overheads.
What would settle it
A side-by-side run of a representative seismic inversion problem on GPU hardware showing either incorrect results or no runtime improvement with the OPS backend versus the core Devito backend would falsify the claim.
Figures
read the original abstract
The Devito DSL is a code generation tool for the solution of partial differential equations using the finite difference method specifically aimed at seismic inversion problems. In this work we investigate the integration of OPS, an API to generate highly optimized code for applications running on structured meshes targeting various platforms, within Devito as a mean of bringing it to the GPU realm by providing an implementation of a OPS backend in Devito, obtaining considerable speed ups compared to the core Devito backend.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the integration of the OPS intermediate representation into the Devito DSL for finite-difference PDE solvers aimed at seismic inversion. It presents an implementation of an OPS backend within Devito and reports obtaining considerable speedups compared to the core Devito backend on GPU hardware.
Significance. If the performance claims hold with proper validation, the work would show a practical route for extending Devito to GPUs via an existing structured-mesh code-generation API, which could benefit performance-critical geophysics applications without requiring a full rewrite of the symbolic layer.
major comments (1)
- [Abstract] Abstract: the claim that 'considerable speed ups' were obtained supplies no measurement protocol, problem sizes, hardware details, or error bars, so the central performance claim cannot be evaluated from the given text.
Simulated Author's Rebuttal
We thank the referee for their review. The single major comment concerns the level of detail in the abstract regarding performance claims. We address it below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'considerable speed ups' were obtained supplies no measurement protocol, problem sizes, hardware details, or error bars, so the central performance claim cannot be evaluated from the given text.
Authors: We agree that the abstract does not supply enough context on its own for a reader to evaluate the performance claims. The body of the manuscript contains the full experimental protocol, problem sizes, hardware specifications, and results (including variability measures), but the abstract should be improved to include representative details of these elements. We will revise the abstract to incorporate key information on the measurement protocol, problem sizes, hardware, and error bars/variability. revision: yes
Circularity Check
No significant circularity; implementation report with empirical results
full rationale
The manuscript is an engineering report describing the addition of an OPS backend to Devito for GPU targeting, with the central claim being the observed speedups from that integration. No equations, derivations, fitted parameters, or predictions appear in the provided text. The work contains no self-citation chains, ansatzes, or uniqueness theorems that could reduce to inputs by construction. The result is therefore self-contained as a description of implementation outcomes rather than a mathematical argument.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Firedrake: Automating the finite element method by composing abstractions,
F. Rathgeber, D. A. Ham, L. Mitchell, M. Lange, F. Luporini, A. T. T. Mcrae, G.-T. Bercea, G. R. Markall, and P. H. J. Kelly, “Firedrake: Automating the finite element method by composing abstractions,”ACM Trans. Math. Softw., vol. 43, no. 3, pp. 24:1–24:27, 2016
work page 2016
-
[2]
The fenics project version 1.5,
M. S. Alnæs, J. Blechta, J. Hake, A. Johansson, B. Kehlet, A. Logg, C. Richard- son, J. Ring, M. E. Rognes, and G. N. Wells, “The fenics project version 1.5,” Archive of Numerical Software, vol. 3, no. 100, 2015
work page 2015
-
[3]
M. S. Alnæs, A. Logg, K. B. Ølgaard, M. E. Rognes, and G. N. Wells, “Unified form language: A domain-specific language for weak formulations of partial differential equations,”ACM Transactions on Mathematical Software, vol. 40, no. 2, 2014
work page 2014
-
[4]
Devito: an embedded domain-specific language for finite differences and geophysical exploration,
M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G. J. Gorman, “Devito: an embedded domain-specific language for finite differences and geophysical exploration,”CoRR, vol. abs/1808.01995, Aug 2018
-
[5]
Yask—yet another stencil kernel: A framework for hpc stencil code-generation and tuning,
C. Yount, J. Tobin, A. Breuer, and A. Duran, “Yask—yet another stencil kernel: A framework for hpc stencil code-generation and tuning,”2016 Sixth Interna- tional Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pp. 30–39, 2016
work page 2016
-
[6]
The ops domain specific abstraction for multi-block structured grid compu- tations,
I. Z. Reguly, G. R. Mudalige, M. B. Giles, D. Curran, and S. McIntosh-Smith, “The ops domain specific abstraction for multi-block structured grid compu- tations,” in Proceedings of the Fourth International Workshop on Domain- Specific Languages and High-Level Frameworks for High Performance Com- puting, WOLFHPC ’14, (Piscataway, NJ, USA), pp. 58–67, IEEE Pr...
work page 2014
-
[7]
Sympy: symbolic computing in python,
A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K. Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger, R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johansson, F. Pe- dregosa, M. J. Curry, A. R. Terrel, v. Roučka, A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and A. Scopatz, “Sympy: symbolic computi...
work page 2017
-
[8]
“Devito cfd tutorial series.” https://nbviewer.jupyter.org/github/ opesci/devito/blob/master/examples/cfd/01_convection.ipynb. Ac- cessed: 24th Jan 2019. 38
work page 2019
-
[9]
Architec- ture and performance of devito, a system for automated stencil computation,
F. Luporini, M. Lange, M. Louboutin, N. Kukreja, J. Hückelheim, C. Yount, P. A. Witte, P. H. J. Kelly, G. J. Gorman, and F. J. Herrmann, “Architec- ture and performance of devito, a system for automated stencil computation,” CoRR, vol. abs/1807.03032, 2018
-
[10]
Cgen - c/c++ source generation from an ast
“Cgen - c/c++ source generation from an ast.”https://github.com/inducer/ cgen. Accessed: 25th Jan 2019
work page 2019
-
[11]
Vector folding: Improving stencil performance via multi-dimensional simd-vector representation,
C. Yount, “Vector folding: Improving stencil performance via multi-dimensional simd-vector representation,” in2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th Interna- tional Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th In- ternational Conference on Embedded Software and Systems, pp...
work page 2015
-
[12]
C. Yount, A. Duran, and J. Tobin, “Multi-level spatial and temporal tiling for efficient hpc stencil computation on many-core processors with large shared caches,” Future Generation Computer Systems, vol. 92, pp. 903 – 919, 2019
work page 2019
-
[13]
Loo.py: transformation-based code generation for GPUs and CPUs
A.Klöckner, “Loo.py: transformation-basedcodegenerationforgpusandcpus,” CoRR, vol. abs/1405.7470, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[14]
isl: An integer set library for the polyhedral model,
S. Verdoolaege, “isl: An integer set library for the polyhedral model,” inMath- ematical Software – ICMS 2010(K. Fukuda, J. v. d. Hoeven, M. Joswig, and N. Takayama, eds.), (Berlin, Heidelberg), pp. 299–302, Springer Berlin Heidel- berg, 2010
work page 2010
-
[15]
Mint: Realizing cuda performance in 3d stencil methods with annotated c,
D. Unat, X. Cai, and S. B. Baden, “Mint: Realizing cuda performance in 3d stencil methods with annotated c,” pp. 214–224, 01 2011
work page 2011
-
[16]
High performance stencil code generation with lift,
B. Hagedorn, L. Stoltzfus, M. Steuwer, S. Gorlatch, and C. Dubach, “High performance stencil code generation with lift,” inProceedings of the 2018 Inter- national Symposium on Code Generation and Optimization, CGO 2018, (New York, NY, USA), pp. 100–112, ACM, 2018
work page 2018
-
[17]
Lift: A functional data-parallel ir for high-performance gpu code generation,
M. Steuwer, T. Remmelg, and C. Dubach, “Lift: A functional data-parallel ir for high-performance gpu code generation,” in2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 74–85, Feb 2017
work page 2017
-
[18]
Ops expressions translation #760
V. Mickus and V. Pandolfo, “Ops expressions translation #760.” https:// github.com/opesci/devito/pull/760. Accessed: 2nd Jun 2019
work page 2019
-
[19]
Kloeckner, “codepy.” Accessed: 7th June 2019
A. Kloeckner, “codepy.” Accessed: 7th June 2019
work page 2019
-
[20]
C-types foreign function interface (numpy.ctypeslib)
“C-types foreign function interface (numpy.ctypeslib).”https://docs.scipy. org/doc/numpy/reference/routines.ctypeslib.html. Accessed: 10th June 2019
work page 2019
-
[21]
Pep 373 python 2.7 release schedule
“Pep 373 python 2.7 release schedule.” https://legacy.python.org/dev/ peps/pep-0373/. Accessed: 7th June 2019. 39
work page 2019
-
[22]
Geforce gtx 1080 | specifications
NVIDIA, “Geforce gtx 1080 | specifications.” https://www.geforce.co. uk/hardware/desktop-gpus/geforce-gtx-1080/specifications. Accessed: 6th June 2019
work page 2019
-
[23]
Azure linux vm sizes - hpc | microsoft docs
“Azure linux vm sizes - hpc | microsoft docs.”https://docs.microsoft.com/ en-us/azure/virtual-machines/linux/sizes-hpc. Accessed: 13th June 2019
work page 2019
-
[24]
“opescibench.” https://github.com/opesci/opescibench. Accessed: 6th June 2019
work page 2019
-
[25]
S. Williams, A. Waterman, and D. Patterson, “Roofline: An insightful visual performance model for floating-point programs and multicore architectures,” tech. rep., Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), 2009
work page 2009
-
[26]
Performance of various computers using standard linear equa- tions software,
J. J. Dongarra, “Performance of various computers using standard linear equa- tions software,” SIGARCH Comput. Archit. News, vol. 20, pp. 22–44, June 1992
work page 1992
-
[27]
Fast n-body simulation with cuda,
L. Nyland, M. Harris, and J. Prins, “Fast n-body simulation with cuda,”GPU Gem, Vol. 3, pp. 677–695, 01 2009. 40
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.