Symbolic Polyhedral-Based Energy Analysis for Nested Loop Programs

Avinash Mahesh Nirmala; Dominik Walter; Frank Hannig; J\"urgen Teich

arxiv: 2604.07287 · v1 · submitted 2026-04-08 · 💻 cs.AR

Symbolic Polyhedral-Based Energy Analysis for Nested Loop Programs

Avinash Mahesh Nirmala , Dominik Walter , Frank Hannig , J\"urgen Teich This is my paper

Pith reviewed 2026-05-10 17:50 UTC · model grok-4.3

classification 💻 cs.AR

keywords energy estimationsymbolic analysispolyhedral modelnested loopsprocessor arraysmapping and schedulingdesign space explorationaccelerator architectures

0 comments

The pith

A symbolic polyhedral method estimates energy consumption for nested loop programs on processor arrays independently of problem size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a symbolic method to estimate energy use when nested loop programs are mapped and scheduled onto arrays of parallel processors. It derives expressions based on the polyhedral structure of the loops that capture how mapping and scheduling choices affect total energy. This replaces simulation, which grows impractical as the number of iterations increases. A reader would care because the estimates stay valid for any problem size, letting designers quickly test many mapping options and array configurations. The authors show the method matches simulation results on smaller cases across selected benchmarks.

Core claim

What carries the argument

Symbolic polyhedral-based energy analysis that derives closed-form expressions for energy from loop mapping and scheduling decisions.

If this is right

The methodology supports design space exploration of mapping and scheduling decisions without repeated simulations.
It enables direct study of how processor array size variations change energy consumption.
It permits consistent comparisons between alternative loop nest accelerator architectures.
The estimates remain usable for arbitrarily large iteration spaces in the nested loops.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The symbolic expressions could be embedded in design tools to automatically rank mapping choices by predicted energy.
The same polyhedral setup might be reused to derive symbolic models for other quantities such as latency or memory traffic.
Direct validation against fabricated silicon would test whether the cost model holds beyond simulation.

Load-bearing premise

The polyhedral representation together with the chosen energy cost model accounts for all dominant energy-consuming behaviors in the processor arrays.

What would settle it

Detailed cycle-accurate simulation or direct hardware power measurement on large iteration spaces that produces energy values differing substantially from the symbolic predictions.

Figures

Figures reproduced from arXiv: 2604.07287 by Avinash Mahesh Nirmala, Dominik Walter, Frank Hannig, J\"urgen Teich.

**Figure 2.** Figure 2: Visualization of the tiled iteration space of the GESUMMV benchmark [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: RDG of a computational statement Sq as given in Eq. (5) 2) Energy Memory-Related Statements: For a statement S ∗ q ∈ M, we estimate the energy by statement similarly: E M q = E(L(xq,r)) + E(L(x ∗ q,r)) (10) Explanation: The first term calculates the energy needed to read the right-hand-side variable xq,r of statement S ∗ q , and the second term denotes the energy needed to copy its value to location L(x ∗ … view at source ↗

**Figure 4.** Figure 4: Comparison of symbolic and simulation-based analysis times for the [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Energy Etot and latency L vs. matrix size for GEMM on an 8×8 PE grid TCPA, showing a shift from DRAM-dominated energy to increasing on-chip communication (FD/RD) at larger scales due to growing tile sizes. rapidly [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

This work presents a symbolic approach for estimating the energy consumption for nested loop programs when mapped and scheduled on parallel processor array accelerator architectures. Instead of simulation-based evaluation, we derive a methodology for symbolic energy analysis that captures the impact of mapping and scheduling decisions of loop nests on processor arrays. We compare our approach against simulation-based results for selected benchmarks and varying sizes of the iteration spaces. Whereas the latter are not scalable, our symbolic analysis is shown to be independent of the problem size. The presented evaluation methodology can be beneficially used during the design space exploration of mapping and scheduling decisions, for studying the influence of array size variations, and for comparisons with other loop nest accelerator architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a polyhedral symbolic method for energy estimates on nested loops mapped to processor arrays that stays independent of iteration-space size.

read the letter

The core contribution is a way to derive size-independent symbolic expressions for energy use when loop nests are mapped and scheduled onto parallel processor arrays. Instead of running simulations that scale poorly with problem size, the approach folds mapping and scheduling decisions into the polyhedral representation and produces closed-form energy estimates. They validate this on selected benchmarks by showing the symbolic results track simulation outputs across different iteration-space sizes, which is the practical payoff for design-space exploration. That independence is the part worth noting; it directly addresses a common bottleneck when you want to compare array sizes or mapping variants without re-running full traces each time. The method sits squarely in the polyhedral compilation and accelerator-design literature, extending existing loop analysis tools with an energy cost model rather than inventing new abstractions from scratch. The benchmark comparisons provide concrete evidence that the symbolic results remain usable as sizes change, which is the kind of check that matters for adoption in early-stage hardware exploration. One soft spot is that the energy model itself is only as good as the underlying cost assumptions for the target arrays; if interconnect or memory effects are approximated coarsely, the estimates could drift on real silicon even if they match the simulator. The paper appears to treat the model as derived rather than fitted, which avoids circularity but still leaves the accuracy question to the specific hardware details. This work is aimed at researchers doing polyhedral-based mapping for loop accelerators or energy-aware design flows. A reader already familiar with polyhedral models and processor-array architectures will see the incremental value quickly and can judge whether the cost model fits their platform. It is worth sending to peer review because the central claim is testable, the validation is present, and the limitation to this program class is stated clearly rather than overstated.

Referee Report

1 major / 2 minor

Summary. The manuscript presents a symbolic polyhedral-based methodology for estimating the energy consumption of nested loop programs when mapped and scheduled on parallel processor array accelerators. It derives symbolic expressions that capture the impact of mapping and scheduling decisions, demonstrates that these expressions are independent of iteration-space size, and validates the approach by comparing the symbolic estimates against simulation results on selected benchmarks for varying problem sizes. The work positions the method as a scalable alternative to simulation for design-space exploration of accelerator mappings.

Significance. If the central claims hold, the work is significant for energy-aware design of loop-nest accelerators: it supplies a size-independent, symbolic alternative to simulation that directly incorporates mapping and scheduling effects via polyhedral representations. This enables efficient exploration of array-size variations and architecture comparisons without the scalability limits of simulation. The reported agreement with simulation on varying sizes constitutes positive evidence for practical utility and reproducibility.

major comments (1)

Evaluation section: The central claim of agreement with simulation and size independence is supported only by the reported comparisons. The manuscript must include quantitative error metrics (e.g., mean relative error or R² values) for the symbolic versus simulation results across all tested problem sizes and benchmarks; qualitative statements alone leave the accuracy claim under-supported.

minor comments (2)

Abstract: The claim that the symbolic analysis is independent of problem size is central but would be clearer if the abstract briefly indicated the key simplification steps that remove size-dependent terms.
Introduction or related-work section: The motivation for choosing a polyhedral energy model over other symbolic or analytical approaches should be strengthened with explicit references to prior energy-modeling work in the polyhedral compiler literature.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our work and the recommendation for minor revision. The suggestion to strengthen the evaluation with quantitative metrics is constructive, and we will incorporate it to better support our claims.

read point-by-point responses

Referee: Evaluation section: The central claim of agreement with simulation and size independence is supported only by the reported comparisons. The manuscript must include quantitative error metrics (e.g., mean relative error or R² values) for the symbolic versus simulation results across all tested problem sizes and benchmarks; qualitative statements alone leave the accuracy claim under-supported.

Authors: We agree that quantitative error metrics would provide stronger, more rigorous support for the accuracy and size-independence claims. In the revised manuscript, we will add mean relative error and R² values (or equivalent) computed across all benchmarks and problem sizes in the evaluation section, alongside the existing comparisons. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives a symbolic polyhedral methodology for energy estimation of mapped/scheduled loop nests on processor arrays, claiming size-independent expressions validated against simulation on benchmarks. No equations, parameter fits, self-citations, or ansatzes are described that would reduce the central claim (symbolic independence from iteration-space size) to a tautology or input by construction. The approach is presented as derived from polyhedral representations rather than fitted or renamed from prior results, rendering the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the polyhedral model and energy cost functions are referenced only at the level of methodology.

pith-pipeline@v0.9.0 · 5414 in / 1152 out tokens · 37988 ms · 2026-05-10T17:50:41.567287+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

A highly parameterizable parallel processor array architecture

D. Kissler, F. Hannig, A. Kupriyanov, and J. Teich. “A highly parameterizable parallel processor array architecture”. In:IEEE Int. Conf. on Field Programmable Technology (FPT), Bangkok, Thailand. IEEE, 2006, pp. 105–112.DOI: 10.1109/FPT.2006.270293

work page doi:10.1109/fpt.2006.270293 2006
[2]

Inva- sive Tightly-Coupled Processor Arrays: A Domain-Specific Archi- tecture/Compiler Co-Design Approach

F. Hannig, V . Lari, S. Boppu, A. Tanase, and O. Reiche. “Inva- sive Tightly-Coupled Processor Arrays: A Domain-Specific Archi- tecture/Compiler Co-Design Approach”. In:ACM Trans. Embedded Comput. Syst.13.4s (2014), 133:1–133:29.DOI: 10.1145/2584660

work page doi:10.1145/2584660 2014
[3]

Earfda: A lightweight and energy-efficient fall detection accelerator for ear-worn devices

D. Walter, M. Brand, C. Heidorn, M. Witterauf, F. Hannig, and J. Teich. “ALPACA: An Accelerator Chip for Nested Loop Programs”. In:Int. Symposium on Circuits and Systems (ISCAS). IEEE, 2024, pp. 1–5.DOI: 10.1109/ISCAS58744.2024.10558549

work page doi:10.1109/iscas58744.2024.10558549 2024
[4]

Morrigan: A Composite Instruction TLB Prefetcher,

V . Kandiah, S. Peverelle, M. Khairy, J. Pan, A. Manjunath, T. G. Rogers, T. M. Aamodt, and N. Hardavellas. “AccelWattch: A Power Modeling Framework for Modern GPUs”. In:Int. Symposium on Microarchitecture (MICRO). ACM, 2021, pp. 738–753.DOI: 10.1145/ 3466752.3480063

work page arXiv 2021
[5]

CGRA-EAM - Rapid Energy and Area Estimation for Coarse-grained Reconfigurable Ar- chitectures

M. Wijtvliet, H. Corporaal, and A. Kumar. “CGRA-EAM - Rapid Energy and Area Estimation for Coarse-grained Reconfigurable Ar- chitectures”. In:ACM Trans. Reconfigurable Technol. Syst.14.4 (2021), 19:1–19:28.DOI: 10.1145/3468874

work page doi:10.1145/3468874 2021
[6]

In: 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Y . N. Wu, J. S. Emer, and V . Sze. “Accelergy: An Architecture- Level Energy Estimation Methodology for Accelerator Designs”. In: Proceedings of the Int. Conf. on Computer-Aided Design, ICCAD 2019, Westminster, CO, USA, November 4-7, 2019. Ed. by D. Z. Pan. ACM, 2019, pp. 1–8.DOI: 10.1109/ICCAD45719.2019.8942149

work page doi:10.1109/iccad45719.2019.8942149 2019
[7]

Eyeriss: An Energy- Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

Y . Chen, T. Krishna, J. S. Emer, and V . Sze. “Eyeriss: An Energy- Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks”. In:IEEE J. Solid State Circuits52.1 (2017), pp. 127–138. DOI: 10.1109/JSSC.2016.2616357

work page doi:10.1109/jssc.2016.2616357 2017
[8]

Parashar et al

A. Parashar et al. “Timeloop: A Systematic Approach to DNN Accelerator Evaluation”. In:IEEE Int. Symposium on Performance Analysis of Systems and Software, ISPASS, Madison, USA. IEEE, 2019, pp. 304–315.DOI: 10.1109/ISPASS.2019.00042

work page doi:10.1109/ispass.2019.00042 2019
[9]

Analytical computation of Ehrhart polynomials: en- abling more compiler analyses and optimizations

S. Verdoolaege, R. Seghir, K. Beyls, V . Loechner, and M. Bruynooghe. “Analytical computation of Ehrhart polynomials: en- abling more compiler analyses and optimizations”. In:Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES). ACM, 2004, pp. 248–258.ISBN: 1581138903.DOI: 10 . 1145/1023833.1023868

work page arXiv 2004
[10]

A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed

A. I. Barvinok. “A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed”. In:Mathematics of Operations Research19.4 (1994), pp. 769–779

work page 1994
[11]

Analytical computation of Ehrhart polynomials and its application in compile- time generated cache hints

R. Seghir, S. Verdoolaege, K. Beyls, and V . Loechner. “Analytical computation of Ehrhart polynomials and its application in compile- time generated cache hints”. In:Int. Conf. on Compilers, Architecture and Synthesis for Embedded Systems (CASES). 2004

work page 2004
[12]

Coarse grained reconfig- urable architectures in the past 25 years: Overview and classification

M. Wijtvliet, L. Waeijen, and H. Corporaal. “Coarse grained reconfig- urable architectures in the past 25 years: Overview and classification”. In:SAMOS. 2016, pp. 235–244.DOI: 10 . 1109 / SAMOS . 2016 . 7818353

work page 2016
[13]

A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications

L. Liu, J. Zhu, Z. Li, Y . Lu, Y . Deng, J. Han, S. Yin, and S. Wei. “A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications”. In:Comput. Surv.52.6 (2019), 118:1–118:39.DOI: 10.1145/3357375

work page doi:10.1145/3357375 2019
[14]

Modeling and Mapping of Regular Nested Loops on Processor Arrays: CGRAs vs. TCPAs

D. Walter, M. Halm, D. Seidel, I. Ghosh, C. Heidorn, F. Hannig, and J. Teich. “Modeling and Mapping of Regular Nested Loops on Processor Arrays: CGRAs vs. TCPAs”. In:29. Workshop zu Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen(W ¨urzburg). 2026

work page 2026
[15]

Symbolic Mapping of Loop Programs onto Processor Arrays

J. Teich, A. Tanase, and F. Hannig. “Symbolic Mapping of Loop Programs onto Processor Arrays”. In:J. Signal Process. Syst.77.1-2 (2014), pp. 31–59.DOI: 10.1007/S11265-014-0905-0

work page doi:10.1007/s11265-014-0905-0 2014
[16]

Real-time Scheduling of I/O Transfers for Massively Parallel Processor Arrays

D. Walter, M. Witterauf, and J. Teich. “Real-time Scheduling of I/O Transfers for Massively Parallel Processor Arrays”. In:18th ACM/IEEE Int. Conf. on Formal Methods and Models for System Design, MEMOCODE 2020, Jaipur, India, December 2-4, 2020. IEEE, 2020, pp. 1–11.DOI: 10 . 1109 / MEMOCODE51338 . 2020 . 9315179

work page 2020
[17]

Symbolic Loop Compilation for Tightly Coupled Processor Arrays

M. Witterauf, D. Walter, F. Hannig, and J. Teich. “Symbolic Loop Compilation for Tightly Coupled Processor Arrays”. In:ACM Trans. Embedded Comput. Syst. (TECS)20.5 (2021), pp. 1–31

work page 2021
[18]

Partitioning of processor arrays: a piecewise regular approach

J. Teich and L. Thiele. “Partitioning of processor arrays: a piecewise regular approach”. In:Integr.14.3 (1993), pp. 297–332.DOI: 10.1016/ 0167-9260(93)90013-3

work page 1993
[19]

A compiler for application specific processor arrays

J. Teich. “A compiler for application specific processor arrays”. PhD thesis. Saarland University, Germany, 1993.ISBN: 978-3-86111-701- 8

work page 1993
[20]

Pouchet.PolyBench: The Polyhedral Benchmark Suite

L.-N. Pouchet.PolyBench: The Polyhedral Benchmark Suite. Ac- cessed: Oct. 10, 2025

work page 2025
[21]

Partitioning Processor Arrays under Resource Constraints

J. Teich, L. Thiele, and L. Z. Zhang. “Partitioning Processor Arrays under Resource Constraints”. In:J. VLSI Signal Process.17.1 (1997), pp. 5–20.DOI: 10.1023/A:1007935215591

work page doi:10.1023/a:1007935215591 1997
[22]

Modulo scheduling of symbolically tiled loops for Tightly Coupled Processor Arrays

M. Witterauf, A. Tanase, F. Hannig, and J. Teich. “Modulo scheduling of symbolically tiled loops for Tightly Coupled Processor Arrays”. In:Int. Conf. on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 2016, pp. 58–66.DOI: 10 . 1109 / ASAP. 2016.7760773

work page arXiv 2016
[23]

Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era

A. Pedram, S. Richardson, M. Horowitz, S. Galal, and S. Kvatinsky. “Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era”. In:IEEE Des. Test34.2 (2017), pp. 39–50.DOI: 10.1109/MDAT.2016.2573586

work page doi:10.1109/mdat.2016.2573586 2017
[24]

A. I. Barvinok.A course in convexity. V ol. 54. Graduate studies in mathematics. American Mathematical Society, 2002.ISBN: 978-0- 8218-2968-4

work page 2002
[25]

In: Proc

S. Verdoolaege. “isl: An Integer Set Library for the Polyhedral Model”. In:Third Int. Congress on Mathematical Software (ICMS). Ed. by K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama. V ol. 6327. Springer, 2010, pp. 299–302.DOI: 10.1007/978- 3- 642- 15582-6 49

work page doi:10.1007/978- 2010

[1] [1]

A highly parameterizable parallel processor array architecture

D. Kissler, F. Hannig, A. Kupriyanov, and J. Teich. “A highly parameterizable parallel processor array architecture”. In:IEEE Int. Conf. on Field Programmable Technology (FPT), Bangkok, Thailand. IEEE, 2006, pp. 105–112.DOI: 10.1109/FPT.2006.270293

work page doi:10.1109/fpt.2006.270293 2006

[2] [2]

Inva- sive Tightly-Coupled Processor Arrays: A Domain-Specific Archi- tecture/Compiler Co-Design Approach

F. Hannig, V . Lari, S. Boppu, A. Tanase, and O. Reiche. “Inva- sive Tightly-Coupled Processor Arrays: A Domain-Specific Archi- tecture/Compiler Co-Design Approach”. In:ACM Trans. Embedded Comput. Syst.13.4s (2014), 133:1–133:29.DOI: 10.1145/2584660

work page doi:10.1145/2584660 2014

[3] [3]

Earfda: A lightweight and energy-efficient fall detection accelerator for ear-worn devices

D. Walter, M. Brand, C. Heidorn, M. Witterauf, F. Hannig, and J. Teich. “ALPACA: An Accelerator Chip for Nested Loop Programs”. In:Int. Symposium on Circuits and Systems (ISCAS). IEEE, 2024, pp. 1–5.DOI: 10.1109/ISCAS58744.2024.10558549

work page doi:10.1109/iscas58744.2024.10558549 2024

[4] [4]

Morrigan: A Composite Instruction TLB Prefetcher,

V . Kandiah, S. Peverelle, M. Khairy, J. Pan, A. Manjunath, T. G. Rogers, T. M. Aamodt, and N. Hardavellas. “AccelWattch: A Power Modeling Framework for Modern GPUs”. In:Int. Symposium on Microarchitecture (MICRO). ACM, 2021, pp. 738–753.DOI: 10.1145/ 3466752.3480063

work page arXiv 2021

[5] [5]

CGRA-EAM - Rapid Energy and Area Estimation for Coarse-grained Reconfigurable Ar- chitectures

M. Wijtvliet, H. Corporaal, and A. Kumar. “CGRA-EAM - Rapid Energy and Area Estimation for Coarse-grained Reconfigurable Ar- chitectures”. In:ACM Trans. Reconfigurable Technol. Syst.14.4 (2021), 19:1–19:28.DOI: 10.1145/3468874

work page doi:10.1145/3468874 2021

[6] [6]

In: 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Y . N. Wu, J. S. Emer, and V . Sze. “Accelergy: An Architecture- Level Energy Estimation Methodology for Accelerator Designs”. In: Proceedings of the Int. Conf. on Computer-Aided Design, ICCAD 2019, Westminster, CO, USA, November 4-7, 2019. Ed. by D. Z. Pan. ACM, 2019, pp. 1–8.DOI: 10.1109/ICCAD45719.2019.8942149

work page doi:10.1109/iccad45719.2019.8942149 2019

[7] [7]

Eyeriss: An Energy- Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

Y . Chen, T. Krishna, J. S. Emer, and V . Sze. “Eyeriss: An Energy- Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks”. In:IEEE J. Solid State Circuits52.1 (2017), pp. 127–138. DOI: 10.1109/JSSC.2016.2616357

work page doi:10.1109/jssc.2016.2616357 2017

[8] [8]

Parashar et al

A. Parashar et al. “Timeloop: A Systematic Approach to DNN Accelerator Evaluation”. In:IEEE Int. Symposium on Performance Analysis of Systems and Software, ISPASS, Madison, USA. IEEE, 2019, pp. 304–315.DOI: 10.1109/ISPASS.2019.00042

work page doi:10.1109/ispass.2019.00042 2019

[9] [9]

Analytical computation of Ehrhart polynomials: en- abling more compiler analyses and optimizations

S. Verdoolaege, R. Seghir, K. Beyls, V . Loechner, and M. Bruynooghe. “Analytical computation of Ehrhart polynomials: en- abling more compiler analyses and optimizations”. In:Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems (CASES). ACM, 2004, pp. 248–258.ISBN: 1581138903.DOI: 10 . 1145/1023833.1023868

work page arXiv 2004

[10] [10]

A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed

A. I. Barvinok. “A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed”. In:Mathematics of Operations Research19.4 (1994), pp. 769–779

work page 1994

[11] [11]

Analytical computation of Ehrhart polynomials and its application in compile- time generated cache hints

R. Seghir, S. Verdoolaege, K. Beyls, and V . Loechner. “Analytical computation of Ehrhart polynomials and its application in compile- time generated cache hints”. In:Int. Conf. on Compilers, Architecture and Synthesis for Embedded Systems (CASES). 2004

work page 2004

[12] [12]

Coarse grained reconfig- urable architectures in the past 25 years: Overview and classification

M. Wijtvliet, L. Waeijen, and H. Corporaal. “Coarse grained reconfig- urable architectures in the past 25 years: Overview and classification”. In:SAMOS. 2016, pp. 235–244.DOI: 10 . 1109 / SAMOS . 2016 . 7818353

work page 2016

[13] [13]

A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications

L. Liu, J. Zhu, Z. Li, Y . Lu, Y . Deng, J. Han, S. Yin, and S. Wei. “A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications”. In:Comput. Surv.52.6 (2019), 118:1–118:39.DOI: 10.1145/3357375

work page doi:10.1145/3357375 2019

[14] [14]

Modeling and Mapping of Regular Nested Loops on Processor Arrays: CGRAs vs. TCPAs

D. Walter, M. Halm, D. Seidel, I. Ghosh, C. Heidorn, F. Hannig, and J. Teich. “Modeling and Mapping of Regular Nested Loops on Processor Arrays: CGRAs vs. TCPAs”. In:29. Workshop zu Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen(W ¨urzburg). 2026

work page 2026

[15] [15]

Symbolic Mapping of Loop Programs onto Processor Arrays

J. Teich, A. Tanase, and F. Hannig. “Symbolic Mapping of Loop Programs onto Processor Arrays”. In:J. Signal Process. Syst.77.1-2 (2014), pp. 31–59.DOI: 10.1007/S11265-014-0905-0

work page doi:10.1007/s11265-014-0905-0 2014

[16] [16]

Real-time Scheduling of I/O Transfers for Massively Parallel Processor Arrays

D. Walter, M. Witterauf, and J. Teich. “Real-time Scheduling of I/O Transfers for Massively Parallel Processor Arrays”. In:18th ACM/IEEE Int. Conf. on Formal Methods and Models for System Design, MEMOCODE 2020, Jaipur, India, December 2-4, 2020. IEEE, 2020, pp. 1–11.DOI: 10 . 1109 / MEMOCODE51338 . 2020 . 9315179

work page 2020

[17] [17]

Symbolic Loop Compilation for Tightly Coupled Processor Arrays

M. Witterauf, D. Walter, F. Hannig, and J. Teich. “Symbolic Loop Compilation for Tightly Coupled Processor Arrays”. In:ACM Trans. Embedded Comput. Syst. (TECS)20.5 (2021), pp. 1–31

work page 2021

[18] [18]

Partitioning of processor arrays: a piecewise regular approach

J. Teich and L. Thiele. “Partitioning of processor arrays: a piecewise regular approach”. In:Integr.14.3 (1993), pp. 297–332.DOI: 10.1016/ 0167-9260(93)90013-3

work page 1993

[19] [19]

A compiler for application specific processor arrays

J. Teich. “A compiler for application specific processor arrays”. PhD thesis. Saarland University, Germany, 1993.ISBN: 978-3-86111-701- 8

work page 1993

[20] [20]

Pouchet.PolyBench: The Polyhedral Benchmark Suite

L.-N. Pouchet.PolyBench: The Polyhedral Benchmark Suite. Ac- cessed: Oct. 10, 2025

work page 2025

[21] [21]

Partitioning Processor Arrays under Resource Constraints

J. Teich, L. Thiele, and L. Z. Zhang. “Partitioning Processor Arrays under Resource Constraints”. In:J. VLSI Signal Process.17.1 (1997), pp. 5–20.DOI: 10.1023/A:1007935215591

work page doi:10.1023/a:1007935215591 1997

[22] [22]

Modulo scheduling of symbolically tiled loops for Tightly Coupled Processor Arrays

M. Witterauf, A. Tanase, F. Hannig, and J. Teich. “Modulo scheduling of symbolically tiled loops for Tightly Coupled Processor Arrays”. In:Int. Conf. on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 2016, pp. 58–66.DOI: 10 . 1109 / ASAP. 2016.7760773

work page arXiv 2016

[23] [23]

Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era

A. Pedram, S. Richardson, M. Horowitz, S. Galal, and S. Kvatinsky. “Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era”. In:IEEE Des. Test34.2 (2017), pp. 39–50.DOI: 10.1109/MDAT.2016.2573586

work page doi:10.1109/mdat.2016.2573586 2017

[24] [24]

A. I. Barvinok.A course in convexity. V ol. 54. Graduate studies in mathematics. American Mathematical Society, 2002.ISBN: 978-0- 8218-2968-4

work page 2002

[25] [25]

In: Proc

S. Verdoolaege. “isl: An Integer Set Library for the Polyhedral Model”. In:Third Int. Congress on Mathematical Software (ICMS). Ed. by K. Fukuda, J. van der Hoeven, M. Joswig, and N. Takayama. V ol. 6327. Springer, 2010, pp. 299–302.DOI: 10.1007/978- 3- 642- 15582-6 49

work page doi:10.1007/978- 2010