Loo.py: transformation-based code generation for GPUs and CPUs
read the original abstract
Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model. Offering transformations such as loop tiling, vectorization, storage management, unrolling, instruction-level parallelism, change of data layout, and many more, it provides a convenient way to capture, parametrize, and re-unify the growth among code variants. Optional, deep integration with numpy and PyOpenCL provides a convenient computing environment where the transition from prototype to high-performance implementation can occur in a gradual, machine-assisted form.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Investigating the OPS intermediate representation to target GPUs in the Devito DSL
Integration of OPS intermediate representation as a GPU backend in Devito DSL yields speedups over the core backend for structured-mesh finite-difference applications.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.