Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

Amir Yazdanbakhsh; Byung Hoon Ahn; Hadi Esmaeilzadeh; Prannoy Pilligundla

arxiv: 2001.08743 · v1 · pith:T7SRB7XUnew · submitted 2020-01-23 · 💻 cs.LG · stat.ML

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

Byung Hoon Ahn , Prannoy Pilligundla , Amir Yazdanbakhsh , Hadi Esmaeilzadeh This is my paper

classification 💻 cs.LG stat.ML

keywords timechameleoncompilationhardwarenetworksneuraloptimizationsolution

0 comments

read the original abstract

Achieving faster execution with shorter compilation time can foster further diversity and innovation in neural networks. However, the current paradigm of executing neural networks either relies on hand-optimized libraries, traditional compilation heuristics, or very recently genetic algorithms and other stochastic methods. These methods suffer from frequent costly hardware measurements rendering them not only too time consuming but also suboptimal. As such, we devise a solution that can learn to quickly adapt to a previously unseen design space for code optimization, both accelerating the search and improving the output performance. This solution dubbed Chameleon leverages reinforcement learning whose solution takes fewer steps to converge, and develops an adaptive sampling algorithm that not only focuses on the costly samples (real hardware measurements) on representative points but also uses a domain-knowledge inspired logic to improve the samples itself. Experimentation with real hardware shows that Chameleon provides 4.45x speed up in optimization time over AutoTVM, while also improving inference time of the modern deep networks by 5.6%.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning
cs.LG 2026-04 conditional novelty 6.0

TCL delivers 16.8x faster tuning on CPU and 12.48x on GPU with modestly lower inference latency by combining RDU active sampling, a lightweight Mamba cost model, and cross-platform continual knowledge distillation.