arxiv: 2605.14780 · v1 · submitted 2026-05-14 · 💻 cs.PL · cs.DC

Recognition: no theorem link

Mat2Boundary: Treating User-Defined Boundary Condition as SpMV for Distributed PDE Solvers on Block-Structured Grids

Yanzheng Cai , Mingzhe Zhang , Shengqi Chen , Haoyuan Song , Wenguang Chen (Department of Computer Science , Technology & BRNist , Tsinghua University , Beijing

show 1 more author

China)

Authors on Pith no claims yet

Pith reviewed 2026-05-15 14:29 UTC · model grok-4.3

classification 💻 cs.PL cs.DC

keywords boundary conditionsPDE solversdomain-specific languagecompilerblock-structured gridssparse linear operatorsdistributed computinghalo exchange

0 comments

The pith

Mat2Boundary models user-defined boundary conditions as affine sparse linear operators to unify and accelerate PDE handling on block-structured grids.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Mat2Boundary as a DSL and compiler that represents boundary conditions as affine sparse linear operators. This single abstraction covers halo copying, circular mappings, zero padding, block-edge synchronization, and custom interpolations through a composable sub-matrix interface. Multi-stage programming combined with polyhedral analysis then produces matrix-free kernels, removes redundant boundary work, and builds reusable communication schedules for distributed execution. The result is less code to write for boundary logic and faster runtime performance when the solver runs across many cores on block-structured grids.

Core claim

Mat2Boundary models a broad class of boundary-conditions as affine sparse linear operators. This abstraction unifies halo copying, circular and symmetric mappings, zero padding, block-edge synchronization, and user-defined interpolation, while exposing a modular basic sub-matrix interface for declarative composition. To make this representation efficient, Mat2Boundary combines multi-stage programming and polyhedral analysis to generate matrix-free kernels for structured cases, support user-defined sparse matrices for irregular cases, eliminate redundant boundary work, and synthesize reusable communication schedules for distributed execution.

What carries the argument

Modeling boundary conditions as affine sparse linear operators equipped with a modular basic sub-matrix interface, then applying multi-stage programming and polyhedral analysis to emit matrix-free kernels and communication schedules.

If this is right

Boundary condition logic becomes declarative and composable instead of scattered imperative code.
Redundant boundary computations are removed automatically by the polyhedral analysis.
Communication schedules become reusable across different boundary setups on distributed grids.
Boundary kernel execution speeds up by as much as 7.6 times on the evaluated shallow-water and HPCG cases.
Total boundary-related source code shrinks by more than 70 percent while scaling to 1,344 cores at 72-88 percent efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same operator abstraction could simplify boundary handling in adaptive-mesh or unstructured-grid codes that currently require separate implementations.
Extending the compiler to emit GPU kernels would test whether the matrix-free generation retains its advantage on accelerators.
Treating other solver stages, such as time integrators or preconditioners, as composable sparse operators might produce similar reductions in code and redundant work.
The modular sub-matrix interface invites libraries of reusable boundary templates that users can assemble without touching the generated kernels.

Load-bearing premise

Representing arbitrary user-defined boundary conditions as affine sparse linear operators keeps both full generality and high efficiency for distributed block-structured execution.

What would settle it

A concrete boundary condition from a production PDE code that cannot be expressed as an affine sparse linear operator while preserving the original numerical result, or a case where the generated kernel runs slower than the equivalent hand-written version.

Figures

Figures reproduced from arXiv: 2605.14780 by Beijing, China), Haoyuan Song, Mingzhe Zhang, Shengqi Chen, Technology & BRNist, Tsinghua University, Wenguang Chen (Department of Computer Science, Yanzheng Cai.

**Figure 1.** Figure 1: Geometry of the cubed-sphere grid. block-structured grids involve additional boundary condition (BC) computations. Simple and naive implementations of the boundary algorithm can introduce degradation on the order of accuracy and grid imprinting [9]. Therefore, researchers have proposed several high-order interpolation schemes for the boundary calculation of cubed-sphere grids [10], [11], [12], [13]. The co… view at source ↗

**Figure 2.** Figure 2: demonstrates common boundary algorithms implemented in real applications, ranging from convolutions on images to PDE solvers on structured or block-structured grids. The algorithms shown in Figures 2a, 2b, and 2d are widely adopted by popular frameworks and applications even beyond the domain of PDE solver. Meanwhile, the algorithms depicted in Figures 2c, 2e, 2f, and 2g are also commonly utilized in diff… view at source ↗

**Figure 3.** Figure 3: Example Block-Structured Grid A. Modular Boundary Matrix Construction 1) Region Representation: Before demonstrating our interface for configuring matrix, we first illustrate our representation for iteration space and structural grid cells. MAT2BOUNDARY will first suppose the existence of a multi-dimensional Cartesian grid which covers the range of [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Example on simplification of the iteration space of [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: BC computation time on a single core. Lower is better. 2) End-to-End Comparison [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 7.** Figure 7: Strong Scaling Performance and Parallel Efficiency [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

Boundary-condition (BC) handling is a major source of complexity in PDE solvers on structured and block-structured grids, especially for high-order methods and distributed-memory execution. We present Mat2Boundary, a DSL and compiler for boundary computations that models a broad class of boundary-conditions as affine sparse linear operators. This abstraction unifies halo copying, circular and symmetric mappings, zero padding, block-edge synchronization, and user-defined interpolation, while exposing a modular basic sub-matrix interface for declarative composition. To make this representation efficient, Mat2Boundary combines multi-stage programming and polyhedral analysis to generate matrix-free kernels for structured cases, support user-defined sparse matrices for irregular cases, eliminate redundant boundary work, and synthesize reusable communication schedules for distributed execution. Evaluated on two shallow-water equation solvers on cubed-sphere grids and HPCG, Mat2Boundary achieves up to 7.6$\times$ BC-kernel speedup, reduces BC code by over 70%, and scales to 1,344 CPU cores with 72%-88% efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mat2Boundary gives a practical DSL for modeling boundary conditions as affine sparse operators in distributed PDE solvers, with reported kernel speedups and code reduction that look worth checking in detail.

read the letter

The main point is that this paper introduces Mat2Boundary, a DSL and compiler that represents boundary conditions as affine sparse linear operators. It unifies halo copying, circular and symmetric mappings, zero padding, block-edge synchronization, and user-defined interpolation through a modular sub-matrix interface for composition. The compiler then applies multi-stage programming and polyhedral analysis to emit matrix-free kernels for structured cases, support sparse matrices for irregular ones, cut redundant work, and generate reusable communication schedules for distributed block-structured grids. On two shallow-water equation solvers for cubed-sphere grids plus the HPCG benchmark, they report up to 7.6 times faster BC kernels, over 70 percent less BC code, and scaling to 1,344 cores at 72-88 percent efficiency. That combination of abstraction and code generation is the concrete new piece, and the performance numbers on real applications are the strongest evidence they provide. The approach seems to deliver on productivity and speed for the cases they tested. The soft spot is whether the polyhedral guarantees and matrix-free benefits survive when users compose arbitrary interpolations or hit non-local edge mappings. The abstract does not show explicit measurements of communication volume or schedule quality after composition, so it is not yet clear how often the fallback to sparse matrices erodes the gains or whether hidden dependencies appear. Broader baselines and verification steps would also help confirm the claims. This paper is for HPC developers who maintain distributed PDE codes on structured grids and want to reduce boilerplate around boundaries. A reader working on climate or fluid solvers would pick up usable ideas on DSL design and schedule synthesis. It deserves peer review because the abstraction is fresh, the engineering is grounded in existing techniques, and the reported results are substantial enough to warrant referee scrutiny even if the methods section needs expansion.

Referee Report

2 major / 2 minor

Summary. The paper presents Mat2Boundary, a DSL and compiler for boundary conditions in PDE solvers on block-structured grids. It models BCs (halo copying, circular/symmetric mappings, zero padding, block-edge sync, and user-defined interpolation) as affine sparse linear operators with a modular sub-matrix interface for declarative composition. Multi-stage programming and polyhedral analysis are used to emit matrix-free kernels for structured cases, support sparse matrices for irregular cases, eliminate redundant work, and synthesize reusable distributed communication schedules. Evaluation on two shallow-water solvers on cubed-sphere grids and HPCG reports up to 7.6× BC-kernel speedup, >70% BC code reduction, and 72-88% scaling efficiency to 1,344 cores.

Significance. If the central claims hold, the work could meaningfully reduce implementation complexity for BC handling in distributed high-order PDE codes on complex grids while preserving performance through automated generation of optimized kernels and schedules. The unification under a linear-algebra abstraction and the reported code-size and performance gains would be valuable for HPC practitioners working with block-structured discretizations.

major comments (2)

[§3] §3 (Mat2Boundary DSL and compiler): The claim that modeling user-defined interpolation as affine sparse operators preserves both generality and matrix-free efficiency under composition is load-bearing for the central contribution, yet no analysis is provided showing that the polyhedral pass continues to fire or that communication volume remains minimal once a user-supplied interpolation matrix is inserted at block edges.
[§5] §5 (Experimental Evaluation): The 7.6× speedup and 72-88% scaling figures are reported without explicit baselines for the BC kernels, without distinguishing structured vs. irregular (user-defined matrix) cases, and without communication-volume measurements under composition; this leaves open whether the polyhedral guarantees survive on cubed-sphere edge mappings.

minor comments (2)

[Abstract] Abstract: The title uses 'SpMV' without expansion; the first sentence of the abstract should spell out 'sparse matrix-vector multiplication' for readers outside the linear-algebra community.
[§5] The manuscript would benefit from a small table in §5 contrasting generated vs. hand-written BC code size and kernel performance for at least one user-defined interpolation example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to incorporate additional analysis and experimental details.

read point-by-point responses

Referee: [§3] §3 (Mat2Boundary DSL and compiler): The claim that modeling user-defined interpolation as affine sparse operators preserves both generality and matrix-free efficiency under composition is load-bearing for the central contribution, yet no analysis is provided showing that the polyhedral pass continues to fire or that communication volume remains minimal once a user-supplied interpolation matrix is inserted at block edges.

Authors: We agree that the manuscript would be strengthened by explicit analysis of polyhedral behavior and communication volume under composition with user-defined interpolation matrices. In the revised version we will add a dedicated subsection to §3 that (1) shows how the affine representation of user-supplied interpolation matrices is still recognized by the polyhedral analyzer, (2) provides a small worked example of the composed operator on a block edge, and (3) reports measured communication volumes before and after composition on the cubed-sphere grids used in the evaluation. revision: yes
Referee: [§5] §5 (Experimental Evaluation): The 7.6× speedup and 72-88% scaling figures are reported without explicit baselines for the BC kernels, without distinguishing structured vs. irregular (user-defined matrix) cases, and without communication-volume measurements under composition; this leaves open whether the polyhedral guarantees survive on cubed-sphere edge mappings.

Authors: We accept that clearer baselines and case distinctions are needed. The revised §5 will (1) explicitly state the hand-written baseline implementations used for the BC kernels, (2) present separate speedup and scaling results for purely structured cases versus cases that include user-defined sparse interpolation matrices, and (3) add communication-volume measurements for the composed operators on the cubed-sphere edge mappings. These additions will be cross-referenced to the new analysis in §3. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering artifact with independent evaluation

full rationale

The paper presents Mat2Boundary as a new DSL/compiler that models boundary conditions as affine sparse linear operators and applies multi-stage programming plus polyhedral analysis to emit kernels and schedules. No equations, derivations, or first-principles results are claimed; the central contribution is an engineering abstraction evaluated on external benchmarks (shallow-water solvers on cubed-sphere grids and HPCG). No self-citations are load-bearing for any mathematical claim, no parameters are fitted then renamed as predictions, and no uniqueness theorems or ansatzes are smuggled in. The system is therefore self-contained against external benchmarks with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard linear-algebra operations and compiler technology rather than new fitted constants or invented physical entities.

axioms (1)

standard math Sparse matrix-vector multiplication and polyhedral analysis are standard, well-defined operations in linear algebra and compiler theory.
Invoked as the foundation for modeling and optimizing boundary computations.

invented entities (1)

Mat2Boundary DSL and compiler no independent evidence
purpose: To provide a unified interface for expressing and compiling boundary conditions as affine sparse operators.
New software artifact introduced by the authors; no independent evidence outside the paper is supplied.

pith-pipeline@v0.9.0 · 5517 in / 1344 out tokens · 35957 ms · 2026-05-15T14:29:05.945300+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 3 internal anchors

[1]

Migration by extrapolation of time-dependent boundary values,

G. Mcmechan, “Migration by extrapolation of time-dependent boundary values,”Geophysical Prospecting, vol. 31, pp. 413–420, 04 2006. [Online]. Available: https://doi.org/10.1111/j.1365-2478.1983.tb01060.x

work page doi:10.1111/j.1365-2478.1983.tb01060.x 2006
[2]

A description of the advanced research wrf version 4,

W. C. Skamarock, J. B. Klemp, J. Dudhia, D. O. Gill, Z. Liu, J. Berner, W. Wang, J. G. Powers, M. G. Duda, D. M. Barkeret al., “A description of the advanced research wrf version 4,”NCAR tech. note ncar/tn-556+ str, vol. 145, no. 10.5065, 2019. [Online]. Available: https://doi.org/10.5065/1DFH-6P97

work page doi:10.5065/1dfh-6p97 2019
[3]

Mat2stencil: A modular matrix-based dsl for explicit and implicit matrix-free pde solvers on structured grid,

H. Cao, S. Tang, Q. Zhu, B. Yu, and W. Chen, “Mat2stencil: A modular matrix-based dsl for explicit and implicit matrix-free pde solvers on structured grid,”Proceedings of the ACM on Programming Languages, vol. 7, no. OOPSLA2, pp. 686–715, 2023. [Online]. Available: https://doi.org/10.1145/3622822

work page doi:10.1145/3622822 2023
[4]

Gt4py: High performance stencils for weather and climate applications using python,

E. G. Paredes, L. Groner, S. Ubbiali, H. V ogt, A. Madonna, K. Mariotti, F. Cruz, L. Benedicic, M. Bianco, J. VandeV ondeleet al., “Gt4py: High performance stencils for weather and climate applications using python,”arXiv preprint arXiv:2311.08322, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2311.08322

work page doi:10.48550/arxiv.2311.08322 2023
[5]

Devito (v3. 1.0): an embedded domain-specific language for finite differences and geophysical exploration,

M. Louboutin, M. Lange, F. Luporini, N. Kukreja, P. A. Witte, F. J. Herrmann, P. Velesko, and G. J. Gorman, “Devito (v3. 1.0): an embedded domain-specific language for finite differences and geophysical exploration,”Geoscientific Model Development, vol. 12, no. 3, pp. 1165–1187, 2019. [Online]. Available: https: //doi.org/10.5194/gmd-12-1165-2019

work page doi:10.5194/gmd-12-1165-2019 2019
[6]

Exastencils: advanced multigrid solver generation,

C. Lengauer, S. Apel, M. Bolten, S. Chiba, U. R ¨ude, J. Teich, A. Gr ¨oßlinger, F. Hannig, H. K ¨ostler, L. Clauset al., “Exastencils: advanced multigrid solver generation,” inSoftware for Exascale Computing-SPPEXA 2016-2019. Springer International Publishing Cham, 2020, pp. 405–452. [Online]. Available: https://doi.org/10.1007/ 978-3-030-47956-5 14

work page 2016
[7]

hypre: A library of high performance preconditioners,

R. D. Falgout and U. M. Yang, “hypre: A library of high performance preconditioners,” inInternational Conference on computational science. Springer, 2002, pp. 632–641. [Online]. Available: https://doi.org/10.100 7/3-540-47789-6 66

work page 2002
[8]

Dynamical core evaluation test report for noaa’s next generation global prediction system (nggps),

M. Ji and F. Toepfer, “Dynamical core evaluation test report for noaa’s next generation global prediction system (nggps),” 2016. [Online]. Available: https://doi.org/10.25923/ztzy-qn82

work page doi:10.25923/ztzy-qn82 2016
[9]

Finite-volume transport on various cubed- sphere grids,

W. M. Putman and S.-J. Lin, “Finite-volume transport on various cubed- sphere grids,”Journal of Computational Physics, vol. 227, no. 1, pp. 55– 78, 2007. [Online]. Available: https://doi.org/10.1016/j.jcp.2007.07.022

work page doi:10.1016/j.jcp.2007.07.022 2007
[10]

Shallow water model on cubed-sphere by multi-moment finite volume method,

C. Chen and F. Xiao, “Shallow water model on cubed-sphere by multi-moment finite volume method,”Journal of Computational Physics, vol. 227, no. 10, pp. 5019–5044, 2008. [Online]. Available: https://doi.org/10.1016/j.jcp.2008.01.033

work page doi:10.1016/j.jcp.2008.01.033 2008
[11]

High-order finite- volume methods for the shallow-water equations on the sphere,

P. A. Ullrich, C. Jablonowski, and B. Van Leer, “High-order finite- volume methods for the shallow-water equations on the sphere,” Journal of Computational Physics, vol. 229, no. 17, pp. 6104–6134,

work page
[12]

Available: https://doi.org/10.1016/j.jcp.2010.04.044

[Online]. Available: https://doi.org/10.1016/j.jcp.2010.04.044

work page doi:10.1016/j.jcp.2010.04.044 2010
[13]

Hope: an arbitrary-order non- oscillatory finite-volume shallow water dynamical core with automatic differentiation,

L. Zhou, W. Xue, and X. Shen, “Hope: an arbitrary-order non- oscillatory finite-volume shallow water dynamical core with automatic differentiation,”Geoscientific Model Development, vol. 18, no. 21, pp. 8175–8201, 2025, https://doi.org/10.5194/gmd-18-8175-2025. [Online]. Available: https://gmd.copernicus.org/articles/18/8175/2025/

work page doi:10.5194/gmd-18-8175-2025 2025
[14]

Gungho! a new dynamical core for the unified model,

A. Staniforth, T. Melvin, and N. Wood, “Gungho! a new dynamical core for the unified model,” inProceeding of the ECMWF workshop on recent developments in numerical methods for atmosphere and ocean modelling, 2013, pp. 15–29

work page 2013
[15]

Productive performance engineering for weather and climate modeling with python,

T. Ben-Nun, L. Groner, F. Deconinck, T. Wicky, E. Davis, J. Dahm, O. D. Elbert, R. George, J. McGibbon, L. Tr”umperet al., “Productive performance engineering for weather and climate modeling with python,” inSC22: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2022, pp. 1–14. [Online]. Available: https://d...

work page doi:10.5555/3571885.3571982 2022
[16]

Artifact of mat2stencil: A modular matrix-based dsl for explicit and implicit matrix-free pde solvers on structured grid,

H. Cao, “Artifact of mat2stencil: A modular matrix-based dsl for explicit and implicit matrix-free pde solvers on structured grid,” Jul

work page
[17]

Available: https://doi.org/10.5281/zenodo.8149701

[Online]. Available: https://doi.org/10.5281/zenodo.8149701

work page doi:10.5281/zenodo.8149701
[18]

Automated mpi-x code generation for scalable finite- difference solvers,

G. Bisbas, R. Nelson, M. Louboutin, F. Luporini, P. H. Kelly, and G. Gorman, “Automated mpi-x code generation for scalable finite- difference solvers,” in2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2025, pp. 689–701. [Online]. Available: https://doi.org/10.1109/IPDPS64566.2025.00067

work page doi:10.1109/ipdps64566.2025.00067 2025
[19]

Z3: an efficient SMT solver,

L. M. de Moura and N. S. Bjørner, “Z3: an efficient SMT solver,” in Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings, ser. Lecture Notes in Comput...

work page doi:10.1007/978-3-540-78800-3 2008
[20]

Freetensor: a free-form dsl with holistic optimizations for irregular tensor programs,

S. Tang, J. Zhai, H. Wang, L. Jiang, L. Zheng, Z. Yuan, and C. Zhang, “Freetensor: a free-form dsl with holistic optimizations for irregular tensor programs,” inProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2022, pp. 872–887. [Online]. Available: https: //doi.org/10.1145/3519939.3523448

work page doi:10.1145/3519939.3523448 2022
[21]

High-performance conjugate-gradient benchmark: A new metric for ranking high- performance computing systems,

J. Dongarra, M. A. Heroux, and P. Luszczek, “High-performance conjugate-gradient benchmark: A new metric for ranking high- performance computing systems,”The International Journal of High Performance Computing Applications, vol. 30, no. 1, pp. 3–10, 2016. [Online]. Available: https://doi.org/10.1177/1094342015593158

work page doi:10.1177/1094342015593158 2016
[22]

High order prediction environment,

L. Zhou, “High order prediction environment,” Jul. 2025. [Online]. Available: https://doi.org/10.5281/zenodo.16635583

work page doi:10.5281/zenodo.16635583 2025
[23]

Enabling and scaling the HPCG benchmark on the newest generation sunway supercomputer with 42 million heterogeneous cores,

Q. Zhu, H. Luo, C. Yang, M. Ding, W. Yin, and X. Yuan, “Enabling and scaling the HPCG benchmark on the newest generation sunway supercomputer with 42 million heterogeneous cores,” inInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021, B. R. de Supinski, M. W. Ha...

work page doi:10.1145/3458817.3476158 2021
[24]

L. Zhou. (2022) MCV SW repository. Accessed: April 9, 2026. [Online]. Available: https://gitee.com/DwyaneChou/MCV SW

work page 2022
[25]

Gropp w., et al,

B. Satish, A. Shrirang, A. Mark, B. Jed, B. Peter, B. Kris, D. Lisandro, D. Alp, and E. Victor, “Gropp w., et al,”PETSc Users Manual, 2019

work page 2019
[26]

Exastencils: Advanced stencil-code engineering,

C. Lengauer, S. Apel, M. Bolten, A. Gr ¨oßlinger, F. Hannig, H. K ¨ostler, U. R ¨ude, J. Teich, A. Grebhahn, S. Kronawitteret al., “Exastencils: Advanced stencil-code engineering,” inEuropean Conference on Parallel Processing. Springer, 2014, pp. 553–564. [Online]. Available: https://doi.org/10.1007/978-3-319-14313-2 47

work page doi:10.1007/978-3-319-14313-2 2014
[27]

Stella: a domain-specific tool for structured grid methods in weather and climate models,

T. Gysi, C. Osuna, O. Fuhrer, M. Bianco, and T. C. Schulthess, “Stella: a domain-specific tool for structured grid methods in weather and climate models,” inProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC ’15. New York, NY , USA: Association for Computing Machinery,

work page
[28]

Gamblin, M

[Online]. Available: https://doi.org/10.1145/2807591.2807627

work page doi:10.1145/2807591.2807627
[29]

Dawn: a high-level domain-specific language compiler toolchain for weather and climate applications,

C. Osuna, T. Wicky, F. Thuering, T. Hoefler, and O. Fuhrer, “Dawn: a high-level domain-specific language compiler toolchain for weather and climate applications,”Supercomputing Frontiers and Innovations, vol. 7, no. 2, pp. 79–97, 2020. [Online]. Available: https://doi.org/10.14529/jsfi200205

work page doi:10.14529/jsfi200205 2020
[30]

(2021) PSycloneBench: small benchmarks used to inform the development of the PSyclone Domain-Specific Compiler

Science and Technology Facilities Council (STFC). (2021) PSycloneBench: small benchmarks used to inform the development of the PSyclone Domain-Specific Compiler. Accessed: Apr. 24, 2023. [Online]. Available: https://github.com/stfc/PSycloneBench

work page 2021
[31]

Lfric: Meeting the challenges of scalability and performance portability in weather and climate models,

S. V . Adams, R. W. Ford, M. Hambley, J. Hobson, I. Kav ˇciˇc, C. M. Maynard, T. Melvin, E. H. M”uller, S. Mullerworth, A. R. Porter et al., “Lfric: Meeting the challenges of scalability and performance portability in weather and climate models,”Journal of Parallel and Distributed Computing, vol. 132, pp. 383–396, 2019. [Online]. Available: https://doi.or...

work page doi:10.1016/j.jpdc.2019.02.007 2019
[32]

Domain-specific multi-level ir rewriting for gpu: The open earth compiler for gpu-accelerated climate simulation,

T. Gysi, C. M”uller, O. Zinenko, S. Herhut, E. Davis, T. Wicky, O. Fuhrer, T. Hoefler, and T. Grosser, “Domain-specific multi-level ir rewriting for gpu: The open earth compiler for gpu-accelerated climate simulation,”ACM Transactions on Architecture and Code Optimization (TACO), vol. 18, no. 4, pp. 1–23, 2021. [Online]. Available: https://doi.org/10.1145/3469030

work page doi:10.1145/3469030 2021
[33]

Mlir: Scaling compiler infrastructure for domain specific computation,

C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, and O. Zinenko, “Mlir: Scaling compiler infrastructure for domain specific computation,” in 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2021, pp. 2–14. [Online]. Available: https://doi.org/10.1109/CGO51591....

work page doi:10.1109/cgo51591.2021.9370308 2021
[34]

A shared compilation stack for distributed-memory parallelism in stencil dsls,

G. Bisbas, A. Lydike, E. Bauer, N. Brown, M. Fehr, L. Mitchell, G. Rodriguez-Canal, M. Jamieson, P. H. Kelly, M. Steuweret al., “A shared compilation stack for distributed-memory parallelism in stencil dsls,” inProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, 2024, pp....

work page doi:10.1145/3620666.3651344 2024
[35]

Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls,

T. Rompf and M. Odersky, “Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls,” inProceedings of the ninth international conference on Generative programming and component engineering, 2010, pp. 127–136. [Online]. Available: https://doi.org/10.1145/1868294.1868314

work page doi:10.1145/1868294.1868314 2010
[36]

Advanced compiler optimizations for supercomputers,

D. A. Padua and M. J. Wolfe, “Advanced compiler optimizations for supercomputers,”Communications of the ACM, vol. 29, no. 12, pp. 1184–1201, 1986. [Online]. Available: https://doi.org/10.1145/7902.790 4

work page doi:10.1145/7902.790 1986
[37]

Kennedy and J

K. Kennedy and J. R. Allen,Optimizing compilers for modern archi- tectures: a dependence-based approach. Morgan Kaufmann Publishers Inc., 2001

work page 2001
[38]

Possibly not closed convex polyhedra and the parma polyhedra library,

R. Bagnara, E. Ricci, E. Zaffanella, and P. M. Hill, “Possibly not closed convex polyhedra and the parma polyhedra library,” inInternational Static Analysis Symposium. Springer, 2002, pp. 213–229. [Online]. Available: https://doi.org/10.1007/3-540-45789-5 17

work page doi:10.1007/3-540-45789-5 2002
[39]

isl: An integer set library for the polyhedral model,

S. Verdoolaege, “isl: An integer set library for the polyhedral model,” inInternational Congress on Mathematical Software. Springer, 2010, pp. 299–302. [Online]. Available: https://doi.org/10.1007/978-3-642-1 5582-6 49

work page doi:10.1007/978-3-642-1 2010
[40]

Pluto: A practical and fully automatic polyhedral program optimization sys- tem,

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, “Pluto: A practical and fully automatic polyhedral program optimization sys- tem,” inProceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), Tucson, AZ (June 2008). Citeseer, vol. 146, 2008

work page 2008
[41]

Scheduling for ppcg,

S. Verdoolaege and G. Janssens, “Scheduling for ppcg,”Report CW, vol. 706, 2017

work page 2017
[42]

The sparse polyhedral framework: Composing compiler-generated inspector-executor code,

M. M. Strout, M. Hall, and C. Olschanowsky, “The sparse polyhedral framework: Composing compiler-generated inspector-executor code,” Proceedings of the IEEE, vol. 106, no. 11, pp. 1921–1934, 2018. [Online]. Available: https://doi.org/10.1109/JPROC.2018.2857721

work page doi:10.1109/jproc.2018.2857721 1921
[43]

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. DeVito, W. S. Moses, S. Verdoolaege, A. Adams, and A. Cohen, “Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions,”arXiv preprint arXiv:1802.04730, 2018. [Online]. Available: https://doi.org/10.48550/arXiv.1802.04730

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1802.04730 2018
[44]

Tiramisu: A polyhedral compiler for expressing fast and portable code,

R. Baghdadi, J. Ray, M. B. Romdhane, E. Del Sozzo, A. Akkas, Y . Zhang, P. Suriana, S. Kamil, and S. Amarasinghe, “Tiramisu: A polyhedral compiler for expressing fast and portable code,” in 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2019, pp. 193–205. [Online]. Available: https://doi.org/10.1109/CGO.2019.8661197

work page doi:10.1109/cgo.2019.8661197 2019
[45]

Gemini: A Family of Highly Capable Multimodal Models

G. Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millicanet al., “Gemini: a family of highly capable multimodal models,”arXiv preprint arXiv:2312.11805,

work page internal anchor Pith review Pith/arXiv arXiv
[46]

Gemini: A Family of Highly Capable Multimodal Models

[Online]. Available: https://doi.org/10.48550/arXiv.2312.11805

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.11805