Program Generation for Linear Algebra Using Multiple Layers of DSLs

(2) RWTH Aachen University); Daniele G. Spampinato (1); Diego Fabregat-Traver (2); Markus P\"uschel (1); Paolo Bientinesi (2) ((1) ETH Zurich

arxiv: 1906.08613 · v1 · pith:6RWQPB62new · submitted 2019-06-20 · 💻 cs.MS

Program Generation for Linear Algebra Using Multiple Layers of DSLs

Daniele G. Spampinato (1) , Diego Fabregat-Traver (2) , Markus P\"uschel (1) , Paolo Bientinesi (2) ((1) ETH Zurich , (2) RWTH Aachen University) This is my paper

Pith reviewed 2026-05-25 19:07 UTC · model grok-4.3

classification 💻 cs.MS

keywords program generationdomain specific languageslinear algebraBLASLAPACKcode generationnumerical libraries

0 comments

The pith

Domain-specific generator with multiple DSL layers creates tailored linear algebra routines

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper establishes that a program generator built from multiple layers of domain-specific languages can produce linear algebra library routines customized to an application's exact sizes, interfaces, and target architecture. Standard libraries such as BLAS and LAPACK provide portable performance but fall short on flexibility for specific needs. By using layered DSLs, the generator allows creation of routines that fit the application precisely rather than forcing the application to adapt to the library. A reader would care if this leads to higher performance and easier development in computational applications. The approach directly addresses the limitations of fixed libraries by enabling on-demand generation of optimized code.

Core claim

We advocate a domain-specific program generator capable of producing library routines tailored to the specific needs of the application in terms of sizes, interface, and target architecture.

What carries the argument

A program generator employing multiple layers of domain-specific languages to synthesize linear algebra routines.

Load-bearing premise

Limitations in the flexibility of existing libraries such as BLAS and LAPACK can be effectively overcome by a domain-specific program generator using multiple layers of DSLs.

What would settle it

Demonstration that for common application scenarios the generated code does not achieve the required customization or performance levels compared to standard approaches.

Figures

Figures reproduced from arXiv: 1906.08613 by (2) RWTH Aachen University), Daniele G. Spampinato (1), Diego Fabregat-Traver (2), Markus P\"uschel (1), Paolo Bientinesi (2) ((1) ETH Zurich.

**Figure 2.** Figure 2: Performance results for: (a) XT u Xu = A, (b) LXs + XsLT = S, and (c) LX + XU = C. All matrices ∈ Rn×n; A, L, S, U, and C are inputs, X∗ are outputs; L is lower triangular, U, Xu are upper triangular, S, Xs are symmetric, and A is symmetric positive definite. In (a) f ≈ n 3 3 flops while in (b)–(c) f ≈ 2n 3 flops. Tests compiled with icc v.16 and run on an Intel Sandy Bridge (AVX, 32 kB L1-D cache, 256 kB … view at source ↗

read the original abstract

Numerical software in computational science and engineering often relies on highly-optimized building blocks from libraries such as BLAS and LAPACK, and while such libraries provide portable performance for a wide range of computing architectures, they still present limitations in terms of flexibility. We advocate a domain-specific program generator capable of producing library routines tailored to the specific needs of the application in terms of sizes, interface, and target architecture.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a position paper advocating multi-layer DSLs for custom linear algebra code generation, but the abstract supplies no evidence, details, or results.

read the letter

The main takeaway is that the paper pushes a domain-specific program generator built from multiple layers of DSLs to produce linear algebra routines tailored to exact sizes, interfaces, and architectures, instead of using fixed libraries like BLAS and LAPACK. It frames this as a way to overcome flexibility limits in existing tools. That is the entire argument presented. The piece does a clear job naming the practical shortcomings of standard libraries for specialized needs. Extending DSL ideas to multiple layers is a logical next step within an established line of work on code generation for numerical software. The authors come from groups with relevant experience, so the motivation is grounded in real usage patterns. The central weakness is that nothing beyond advocacy appears. There is no sketch of how the layers would be structured, no example of generated code, no performance numbers, and no comparison to existing generators. The abstract states the recommendation without any derivation or data, so the claim stands unsupported. No equations, experiments, or reproducible artifacts are mentioned. This leaves the reader with an idea but no way to evaluate whether it holds up. The paper targets researchers in high-performance mathematical software who already follow DSL and code-generation work. Someone looking for new methods, benchmarks, or formal results will find little to use. It could serve as a short discussion prompt but does not contain enough substance for a full technical paper. I would not bring it to a reading group, would not cite it, and would not send it for peer review without major additions of concrete content.

Referee Report

0 major / 1 minor

Summary. The paper claims that libraries such as BLAS and LAPACK provide portable performance across architectures but suffer from limitations in flexibility with respect to problem sizes, interfaces, and target architectures. It advocates the development of a domain-specific program generator that employs multiple layers of DSLs to automatically produce customized linear algebra library routines tailored to specific application needs.

Significance. If successfully realized, the advocated multi-layer DSL generator could meaningfully improve the adaptability and performance of numerical software in computational science and engineering by enabling architecture- and application-specific code generation beyond what static libraries currently allow. The position aligns with established trends in program generation for high-performance computing.

minor comments (1)

The manuscript consists solely of a high-level advocacy statement with no concrete examples, pseudocode, or discussion of specific DSL layers, making it difficult to assess the practicality of the proposed approach.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our position paper and for the recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an advocacy/position document whose central claim is a recommendation to use multi-layer DSL program generators for producing tailored linear algebra routines. No equations, derivations, fitted parameters, or formal uniqueness theorems appear in the text. The argument remains at the conceptual level of motivation and does not reduce any prediction or result to its own inputs by construction, self-citation chains, or ansatz smuggling. The work is therefore self-contained against external benchmarks with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no free parameters, axioms, or invented entities are specified in the available text.

pith-pipeline@v0.9.0 · 5618 in / 897 out tokens · 19551 ms · 2026-05-25T19:07:32.443193+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[2]

11em plus .33em minus .07em 4000 4000 100 4000 4000 500 `\.=1000 = #1 \@IEEEnotcompsoconly \@IEEEcompsoconly #1 * [1] 0pt [0pt][0pt] #1 * [1] 0pt [0pt][0pt] #1 * \| ** #1 \@IEEEauthorblockNstyle \@IEEEcompsocnotconfonly \@IEEEauthorblockAstyle \@IEEEcompsocnotconfonly \@IEEEcompsocconfonly \@IEEEauthordefaulttextstyle \@IEEEcompsocnotconfonly \@IEEEauthor...

work page
[3]

J. J. Dongarra et al . A set of level 3 basic linear algebra subprograms. ACM Trans. on Mathematical Software (TOMS), 16 0 (1): 0 1--17, 1990

work page 1990
[4]

Anderson et al

E. Anderson et al . LAPACK Users' Guide . Society for Industrial and Applied Mathematics, third edition, 1999

work page 1999
[5]

Bientinesi, J

P. Bientinesi, J. A. Gunnels, M. E. Myers, E. S. Quintana-Ort\' i , and R. A. van de Geijn. The science of deriving dense linear algebra algorithms. ACM Trans. on Mathematical Software (TOMS), 31 0 (1): 0 1--26, 2005

work page 2005
[6]

Fabregat-Traver and P

D. Fabregat-Traver and P. Bientinesi. Automatic Generation of Loop-Invariants for Matrix Operations. In Computational Science and Its Applications (ICCSA), pp. 82--92, 2011

work page 2011
[7]

Fabregat-Traver and P

D. Fabregat-Traver and P. Bientinesi. Knowledge-Based Automatic Generation of Partitioned Matrix Expressions. In Computer Algebra in Scientific Computing (CASC), vol. 6885 of Lecture Notes in Computer Science (LNCS), pp. 144--157. Springer, 2011

work page 2011
[8]

P \"u schel, F

M. P \"u schel, F. Franchetti, and Y. Voronenko. Encyclopedia of Parallel Computing, chap. Spiral. Springer, 2011

work page 2011
[9]

D. G. Spampinato and M. P \"u schel. A basic linear algebra compiler. In Code Generation and Optimization (CGO), pp. 23--32, 2014

work page 2014
[10]

D. G. Spampinato and M. P \"u schel. A basic linear algebra compiler for structured matrices. In Code Generation and Optimization (CGO), pp. 117--127, 2016

work page 2016
[11]

C. Bastoul. Code generation in the polyhedral model is easier than you think. In Parallel Architectures and Compilation Techniques (PACT), pp. 7--16, 2004

work page 2004

[1] [1]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[2] [2]

11em plus .33em minus .07em 4000 4000 100 4000 4000 500 `\.=1000 = #1 \@IEEEnotcompsoconly \@IEEEcompsoconly #1 * [1] 0pt [0pt][0pt] #1 * [1] 0pt [0pt][0pt] #1 * \| ** #1 \@IEEEauthorblockNstyle \@IEEEcompsocnotconfonly \@IEEEauthorblockAstyle \@IEEEcompsocnotconfonly \@IEEEcompsocconfonly \@IEEEauthordefaulttextstyle \@IEEEcompsocnotconfonly \@IEEEauthor...

work page

[3] [3]

J. J. Dongarra et al . A set of level 3 basic linear algebra subprograms. ACM Trans. on Mathematical Software (TOMS), 16 0 (1): 0 1--17, 1990

work page 1990

[4] [4]

Anderson et al

E. Anderson et al . LAPACK Users' Guide . Society for Industrial and Applied Mathematics, third edition, 1999

work page 1999

[5] [5]

Bientinesi, J

P. Bientinesi, J. A. Gunnels, M. E. Myers, E. S. Quintana-Ort\' i , and R. A. van de Geijn. The science of deriving dense linear algebra algorithms. ACM Trans. on Mathematical Software (TOMS), 31 0 (1): 0 1--26, 2005

work page 2005

[6] [6]

Fabregat-Traver and P

D. Fabregat-Traver and P. Bientinesi. Automatic Generation of Loop-Invariants for Matrix Operations. In Computational Science and Its Applications (ICCSA), pp. 82--92, 2011

work page 2011

[7] [7]

Fabregat-Traver and P

D. Fabregat-Traver and P. Bientinesi. Knowledge-Based Automatic Generation of Partitioned Matrix Expressions. In Computer Algebra in Scientific Computing (CASC), vol. 6885 of Lecture Notes in Computer Science (LNCS), pp. 144--157. Springer, 2011

work page 2011

[8] [8]

P \"u schel, F

M. P \"u schel, F. Franchetti, and Y. Voronenko. Encyclopedia of Parallel Computing, chap. Spiral. Springer, 2011

work page 2011

[9] [9]

D. G. Spampinato and M. P \"u schel. A basic linear algebra compiler. In Code Generation and Optimization (CGO), pp. 23--32, 2014

work page 2014

[10] [10]

D. G. Spampinato and M. P \"u schel. A basic linear algebra compiler for structured matrices. In Code Generation and Optimization (CGO), pp. 117--127, 2016

work page 2016

[11] [11]

C. Bastoul. Code generation in the polyhedral model is easier than you think. In Parallel Architectures and Compilation Techniques (PACT), pp. 7--16, 2004

work page 2004