pith. sign in

arxiv: 1405.7470 · v1 · pith:ZGA5RRVNnew · submitted 2014-05-29 · 💻 cs.PL · cs.MS· math.NA

Loo.py: transformation-based code generation for GPUs and CPUs

classification 💻 cs.PL cs.MSmath.NA
keywords codecomputationscomputingconvenientdatamanymodelprovides
0
0 comments X
read the original abstract

Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model. Offering transformations such as loop tiling, vectorization, storage management, unrolling, instruction-level parallelism, change of data layout, and many more, it provides a convenient way to capture, parametrize, and re-unify the growth among code variants. Optional, deep integration with numpy and PyOpenCL provides a convenient computing environment where the transition from prototype to high-performance implementation can occur in a gradual, machine-assisted form.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Investigating the OPS intermediate representation to target GPUs in the Devito DSL

    cs.MS 2019-06 unverdicted novelty 3.0

    Integration of OPS intermediate representation as a GPU backend in Devito DSL yields speedups over the core backend for structured-mesh finite-difference applications.