TCL delivers 16.8x faster tuning on CPU and 12.48x on GPU with modestly lower inference latency by combining RDU active sampling, a lightweight Mamba cost model, and cross-platform continual knowledge distillation.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
AILFM uses active imitation learning to learn thermal- and kernel-aware scheduling policies for LFM inference on 3D S-NUCA many-cores, outperforming baselines while maintaining thermal safety.
citing papers explorer
-
TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning
TCL delivers 16.8x faster tuning on CPU and 12.48x on GPU with modestly lower inference latency by combining RDU active sampling, a lightweight Mamba cost model, and cross-platform continual knowledge distillation.
-
Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores
AILFM uses active imitation learning to learn thermal- and kernel-aware scheduling policies for LFM inference on 3D S-NUCA many-cores, outperforming baselines while maintaining thermal safety.