Textual Inversion learns a single embedding vector from a few images to represent personal concepts inside the text embedding space of a frozen text-to-image model, enabling their composition in natural language prompts.
Soft-to-hard vector quantization for end-to-end learned compression of images and neural networks
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discrete counterparts throughout training. We showcase this method for two challenging applications: Image compression and neural network compression. While these tasks have typically been approached with different methods, our soft-to-hard quantization approach gives results competitive with the state-of-the-art for both.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
unclear 1representative citing papers
Learned image and video compression via autoencoders with spatial-temporal energy compaction penalties outperforms standards on MS-SSIM and visual quality.
A learned image compression system using deep residual learning and sub-pixel convolution reaches 0.972 MS-SSIM at 0.15 bits per pixel in the CLIC validation phase.
citing papers explorer
-
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Textual Inversion learns a single embedding vector from a few images to represent personal concepts inside the text embedding space of a frozen text-to-image model, enabling their composition in natural language prompts.
-
Learning Image and Video Compression through Spatial-Temporal Energy Compaction
Learned image and video compression via autoencoders with spatial-temporal energy compaction penalties outperforms standards on MS-SSIM and visual quality.
-
Deep Residual Learning for Image Compression
A learned image compression system using deep residual learning and sub-pixel convolution reaches 0.972 MS-SSIM at 0.15 bits per pixel in the CLIC validation phase.