Performance Benchmarking of Tensor Trains for accelerated Quantum-Inspired Homogenization on TPU, GPU and CPU architectures
read the original abstract
Recent advances in high-resolution CT-imaging technology are creating a new class of ultra-high resolved microstructural datasets that challenge the limits of traditional homogenization approaches. While state-of-the-art FFT-based homogenization techniques remain effective for moderate datasets, their memory footprint and computational cost grow rapidly with increasing resolution, making them progressively inefficient for industrial-scale problems. To address these challenges, the recently developed Superfast-Fourier Transform (SFFT)-based homogenization algorithm leverages the memory-efficient low-rank representations of Tensor Trains (TTs), which reduce the storage and computational requirements of large-scale homogenization problems. Developed for CPU usage, SFFT-based Homogenization efficiently handles high-resolution datasets, assuming the underlying data is well-behaved. In this work, we investigate the performance of fundamental TT operations on modern hardware accelerators using the JAX framework. A benchmarking study across CPUs, GPUs, and TPUs evaluates execution times and computational efficiency, highlighting the strengths and limitations of TT operations on different architectures and motivating future hybrid approaches. Building on these insights, we adapt the SFFT-based homogenization algorithm for accelerator execution, enabling homogenization at high resolutions ranging from 300 million to 70 billion grid points, which are infeasible for the best available GPU-based FFT reference implementation. While the observed scaling behavior is geometry-dependent, the results demonstrate the potential of accelerator-based quantum-inspired homogenization for high-performance multiscale simulations.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.