TCM finds provably optimal DNN accelerator mappings by pruning the search space up to 32 orders of magnitude with a new dataplacement concept, delivering 1.2-6.5x better energy-delay-product in 17 seconds instead of hours.
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks,
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.AR 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
FFM finds optimal fused mappings for tensor accelerators over 10,000 times faster than prior mappers while cutting energy-delay product by up to 1.8x versus hand-tuned designs.
SEADA introduces an analytical framework combining cost models, mapping tools, and entropy-based precision selection to optimize mixed-precision DNNs on multi-precision spatial architectures.
Extending the accel dialect in AXI4MLIR with direct DMA-mapped allocation eliminates a staging copy and reduces main memory data movement by up to 2x on matrix multiplication accelerators.
citing papers explorer
-
The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design
TCM finds provably optimal DNN accelerator mappings by pruning the search space up to 32 orders of magnitude with a new dataplacement concept, delivering 1.2-6.5x better energy-delay-product in 17 seconds instead of hours.
-
Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design
FFM finds optimal fused mappings for tensor accelerators over 10,000 times faster than prior mappers while cutting energy-delay product by up to 1.8x versus hand-tuned designs.
-
SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures
SEADA introduces an analytical framework combining cost models, mapping tools, and entropy-based precision selection to optimize mixed-precision DNNs on multi-precision spatial architectures.
-
Defeat the Heap: Zero-Copy Data Movement in AXI4MLIR
Extending the accel dialect in AXI4MLIR with direct DMA-mapped allocation eliminates a staging copy and reduces main memory data movement by up to 2x on matrix multiplication accelerators.