LatentBox: Storing AI-Generated Images at Scale via a Latent-First Design
Pith reviewed 2026-05-21 07:21 UTC · model grok-4.3
The pith
LatentBox stores AI-generated images as compact latents to cut persistent storage by 78.7% while matching or beating traditional image storage latency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LatentBox treats compressed latent tensors as the durable primary storage format for AI-generated images and performs GPU reconstruction only on the read path when a request arrives. It maintains a hybrid cache that holds frequently requested images in decoded pixel form while keeping less-active objects as latents, using the production trace to drive ongoing adjustments to the split between the two caches. When evaluated against the same trace, the design reduces persistent storage by 78.7 percent while producing mean and tail latencies that are competitive with or better than a conventional image-only store.
What carries the argument
The hybrid latent-image cache that stores hot objects decoded and cold objects as compressed latents, with dynamic allocation tuned from observed access frequencies.
If this is right
- Platforms can host several times more images on the same storage hardware without expanding capacity.
- Storage bandwidth drops because latents are much smaller than full pixel blobs.
- Compute is spent only for reconstruction on cache misses for infrequent objects.
- User-visible latency stays low by keeping popular images ready in decoded form.
Where Pith is reading between the lines
- Similar latent-first designs could extend to other generative outputs such as audio clips or short videos that also have compact internal representations.
- Object stores might eventually add native support for model-specific latent formats so applications do not need custom reconstruction logic.
- If reconstruction demand grows very large, batching or specialized hardware accelerators could become necessary to keep tail latencies low.
Load-bearing premise
The 35-month trace of two billion requests from one platform represents typical future access patterns and that GPU reconstruction latency will stay acceptable to users even under varying load.
What would settle it
Running a live deployment of LatentBox against real user traffic for several months and directly comparing measured storage consumption and end-to-end request latencies against a pure image-based baseline.
Figures
read the original abstract
The explosive growth of AI-generated images has created a sustainability challenge for storage infrastructure. Platforms like Midjourney and Adobe Firefly already host billions of generative images, yet conventional object stores persist them as blobs with full-resolution pixels, consuming huge amounts of storage capacity and bandwidth. Unlike natural photos, however, AI-generated images can be deterministically reconstructed from compact, model-native latent tensors, making persistent image storage fundamentally redundant. This paper presents LatentBox, a latent-first storage system for AI-generated images. LatentBox treats compressed latents as durable storage objects and uses on-demand GPU reconstruction on the read path to trade inexpensive compute for large persistent storage savings. Our design is guided by the first large-scale analysis of AI-generated image access we are aware of, based on a 35-month, 2-billion-request production trace from a major generative-content platform. Motivated by the trace analysis, LatentBox keeps frequently accessed images in decoded pixel format for fast hits, stores less-active objects as compressed latents to expand effective cache capacity, and continuously adjusts the splits between the image and latent cache to optimize user-perceived access latency.We build a LatentBox prototype and evaluate it with the production trace. LatentBox reduces persistent storage by 78.7% with competitive or even lower mean and tail latency over a pure image-based storage.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents LatentBox, a latent-first storage system for AI-generated images that persists compact compressed latents as durable objects and performs on-demand GPU reconstruction on the read path. Motivated by analysis of a 35-month, 2-billion-request production trace, the design keeps hot objects in decoded pixel format while storing colder objects as latents, dynamically adjusting the image/latent cache split to optimize user-perceived latency. The prototype evaluation reports a 78.7% reduction in persistent storage with competitive or lower mean and tail latency versus a pure image-based baseline.
Significance. If the latency results hold under realistic load, the work offers a practical approach to the storage sustainability problem for generative-AI platforms by trading inexpensive compute for large capacity savings. The grounding in a real production trace and the concrete prototype measurements are strengths that increase relevance for systems research on AI infrastructure.
major comments (2)
- [Evaluation] Evaluation section: the headline claim that mean and tail latency remain competitive (or better) with pure pixel storage depends on on-demand reconstruction plus the dynamic cache split, yet the manuscript provides no measured per-request GPU reconstruction time, explicit queuing model for concurrent cold-object reconstructions, or sensitivity results when the trace exhibits bursts that exceed available GPUs. Without these, it is impossible to verify that tail latency does not exceed the baseline under the reported access patterns.
- [Evaluation] Trace-driven evaluation: the 78.7% storage-reduction figure and latency competitiveness rest on the assumption that the 35-month trace accurately represents access patterns and that reconstruction remains acceptable under varying load, but no additional experiments (e.g., synthetic burst workloads or different GPU counts) are reported to test this assumption.
minor comments (2)
- [Abstract] Abstract and §4: the comparison baseline (pure image storage) and exact latency metrics (mean, p99, etc.) should be stated more explicitly so readers can reproduce the competitiveness claim.
- [Design] Notation for cache-split thresholds and reconstruction cost model could be formalized with a short equation or pseudocode to improve clarity and reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments on the evaluation below and commit to revisions that directly strengthen the presentation of our latency and trace-driven results.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the headline claim that mean and tail latency remain competitive (or better) with pure pixel storage depends on on-demand reconstruction plus the dynamic cache split, yet the manuscript provides no measured per-request GPU reconstruction time, explicit queuing model for concurrent cold-object reconstructions, or sensitivity results when the trace exhibits bursts that exceed available GPUs. Without these, it is impossible to verify that tail latency does not exceed the baseline under the reported access patterns.
Authors: We agree that the current manuscript would benefit from greater transparency on these points. The reported mean and tail latencies are end-to-end measurements obtained by replaying the production trace on the prototype, so reconstruction costs are already embedded in the results. To make this explicit, the revised version will add (1) a table and CDF of per-request GPU reconstruction times measured on our test hardware, (2) a simple M/M/k-style queuing analysis parameterized by the observed request rates and GPU count from the prototype runs, and (3) sensitivity curves for synthetic burst workloads that temporarily exceed the provisioned GPUs. These additions will allow readers to verify tail-latency behavior directly. revision: yes
-
Referee: [Evaluation] Trace-driven evaluation: the 78.7% storage-reduction figure and latency competitiveness rest on the assumption that the 35-month trace accurately represents access patterns and that reconstruction remains acceptable under varying load, but no additional experiments (e.g., synthetic burst workloads or different GPU counts) are reported to test this assumption.
Authors: The 78.7% storage reduction is obtained by comparing latent versus pixel sizes for every object referenced in the trace and is therefore independent of load assumptions. The latency comparison is likewise a direct trace replay. We acknowledge, however, that the manuscript does not vary GPU count or inject synthetic bursts. In the revision we will add two new experiments: (a) replaying the trace while varying the number of available GPUs from 1 to 8, and (b) synthetic burst workloads that double the peak request rate for short intervals. These results will be reported alongside the original trace-driven numbers. revision: yes
Circularity Check
No significant circularity; claims rest on external trace and prototype measurements
full rationale
The paper's core results—the 78.7% persistent storage reduction and competitive mean/tail latency—are presented as direct empirical outcomes from evaluating a built prototype against the 35-month, 2-billion-request external production trace. Design choices (dynamic image/latent cache splits) are motivated by trace analysis but do not tautologically define the reported savings or latency figures; those are measured quantities. No self-citations, uniqueness theorems, fitted parameters renamed as predictions, or ansatzes appear in the derivation chain. The evaluation is self-contained against independent external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AI-generated images can be deterministically reconstructed from compact, model-native latent tensors
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
stores AI-generated images as compressed latents and reconstructs them on demand using GPU decoding
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
- [1]
-
[2]
https://www.midjourney.com , 2026
Midjourney. https://www.midjourney.com , 2026. Accessed: April 2026
work page 2026
-
[3]
Adobe. Introducing firefly foundry. https://busine ss.adobe.com/blog/introducing-firefly-fou ndry, 2023. Accessed: 2026-04
work page 2023
-
[4]
Taming Throughput-Latency tradeoff in LLM inference with Sarathi-Serve
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav Gulavani, Alexey Tumanov, and Ramachandran Ramjee. Taming Throughput-Latency tradeoff in LLM inference with Sarathi-Serve. In18th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 24), pages 117–134, Santa Clara, CA, July 2024. USENIX Association
work page 2024
- [5]
-
[6]
Amazon S3 Glacier instant retrieval storage class, 2026
Amazon Web Services. Amazon S3 Glacier instant retrieval storage class, 2026. Accessed: 2026-04-23
work page 2026
-
[7]
Amazon S3 Glacier Instant Retrieval storage class
Amazon Web Services. Amazon S3 Glacier Instant Retrieval storage class. https://aws.amazon.com/s 3/storage-classes/glacier/instant-retriev al/, 2026. Accessed: 2026-04
work page 2026
-
[8]
Best practices design patterns: Optimizing amazon s3 performance
Amazon Web Services. Best practices design patterns: Optimizing amazon s3 performance. https://docs .aws.amazon.com/AmazonS3/latest/userguide/ optimizing-performance.html , 2026. Accessed: 2026-04-23
work page 2026
-
[9]
Pelican: A building block for exascale cold data storage
Shobana Balakrishnan, Richard Black, Austin Don- nelly, Paul England, Adam Glass, Dave Harper, Sergey Legtchenko, Aaron Ogus, Eric Peterson, and Antony Rowstron. Pelican: A building block for exascale cold data storage. In11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 351–365, 2014
work page 2014
-
[10]
Finding a needle in haystack: Face- book’s photo storage
Doug Beaver, Sanjeev Kumar, Harry C Li, Jason Sobel, and Peter Vajgel. Finding a needle in haystack: Face- book’s photo storage. In9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10), 2010
work page 2010
-
[11]
L. A. Belady. A study of replacement algorithms for a virtual-storage computer.IBM Systems Journal, 5(2):78– 101, 1966
work page 1966
-
[12]
Win- dows azure storage: a highly available cloud storage service with strong consistency
Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakan- tan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shash- wat Srivastav, Jiesheng Wu, Huseyin Simitci, et al. Win- dows azure storage: a highly available cloud storage service with strong consistency. InProceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143–157, 2011
work page 2011
-
[13]
PixArt- α: Fast training of diffusion transformer for photorealistic text- to-image synthesis
Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. PixArt- α: Fast training of diffusion transformer for photorealistic text- to-image synthesis. InInternational Conference on Learning Representations (ICLR), 2024
work page 2024
-
[14]
Junyu Chen, Han Cai, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, and Song Han. Deep compression autoencoder for efficient high-resolution diffusion models.arXiv preprint arXiv:2410.10733, 2024
-
[15]
Reproducible scaling laws for contrastive language- image learning
Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jit- sev. Reproducible scaling laws for contrastive language- image learning. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 2818–2829, 2023
work page 2023
-
[16]
Cliffhanger: Scaling performance cliffs in web memory caches
Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti. Cliffhanger: Scaling performance cliffs in web memory caches. In13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 379–392, 2016
work page 2016
-
[17]
Disk prices — current hard drive cost per gigabyte
DiskPrices.com. Disk prices — current hard drive cost per gigabyte. https://diskprices.com/ , 2025. Accessed 2026-04
work page 2025
-
[18]
Data on machine learning hardware
Epoch AI. Data on machine learning hardware. https: //epoch.ai/data/machine-learning-hardware ,
- [19]
-
[20]
Scaling recti- fied flow transformers for high-resolution image synthe- sis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthe- sis. InForty-first international conference on machine learning, 2024
work page 2024
-
[21]
Tam- ing transformers for high-resolution image synthesis
Patrick Esser, Robin Rombach, and Bjorn Ommer. Tam- ing transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021
work page 2021
-
[22]
Everypixel Journal. Ai image statistics. https://jo urnal.everypixel.com/ai-image-statistics ,
-
[23]
Accessed: 2026-04. 13
work page 2026
-
[24]
Zstandard - Fast real-time compression al- gorithm
Facebook. Zstandard - Fast real-time compression al- gorithm. https://github.com/facebook/zstd . Accessed: 2026-04
work page 2026
-
[25]
NVIDIA RTX 5090 gpu guide and pric- ing
GetDeploying. NVIDIA RTX 5090 gpu guide and pric- ing. https://getdeploying.com/gpus/nvidia-r tx-5090, 2026. Accessed: 2026-04
work page 2026
-
[26]
Rafael C. Gonzalez and Richard E. Woods.Digital Image Processing. Pearson, 4th edition, 2018
work page 2018
-
[27]
Image quality metrics: Psnr vs
Alain Hore and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In2010 20th international conference on pattern recognition, pages 2366–2369. IEEE, 2010
work page 2010
-
[28]
NVIDIA H100 price guide 2026: GPU costs, cloud pricing & buy vs rent
JarvisLabs. NVIDIA H100 price guide 2026: GPU costs, cloud pricing & buy vs rent. https://jarvisla bs.ai/blog/h100-price, 2026. Accessed: 2026-04
work page 2026
-
[29]
Elucidating the design space of diffusion-based gener- ative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based gener- ative models. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), 2022
work page 2022
-
[30]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[31]
Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Do- minik Lorenz, Jonas Müller, Dustin Podell, Robin Rom- bach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context im...
work page 2025
-
[32]
Software-defined far memory in warehouse-scale computers
Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thorat, Adrian Yurtsever, Daniel Zolnowski, Kim Hazelwood, Martin Maas, Thomas Mccauley, and Rohit Sen. Software-defined far memory in warehouse-scale computers. InProceedings of the 24th Internatio...
work page 2019
-
[33]
Pseudo numerical methods for diffusion models on manifolds
Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds. InInternational Conference on Learning Representa- tions (ICLR), 2022
work page 2022
-
[34]
Pcodec: Better compression for numerical sequences, 2025
Martin Loncaric, Niels Jeppesen, and Ben Zinberg. Pcodec: Better compression for numerical sequences, 2025
work page 2025
-
[35]
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[36]
LZ4 - Extremely fast compression
lz4. LZ4 - Extremely fast compression. https://gith ub.com/lz4/lz4. Accessed: 2026-04
work page 2026
-
[37]
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps
Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference- time scaling for diffusion models beyond scaling denois- ing steps.arXiv preprint arXiv:2501.09732, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[38]
Deep- cache: Accelerating diffusion models for free
Xinyin Ma, Gongfan Fang, and Xinchao Wang. Deep- cache: Accelerating diffusion models for free. InPro- ceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 15762–15772, 2024
work page 2024
-
[39]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies.IBM Syst. J., 9(2):78–117, June 1970
work page 1970
- [40]
-
[41]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumber, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. Ray: A distributed framework for emerging AI applications. In13th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 18), pages 561–577, 2018
work page 2018
-
[42]
f4: Facebook’s warm BLOB storage system
Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, et al. f4: Facebook’s warm BLOB storage system. In11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 383–398, 2014
work page 2014
-
[43]
Ambry: Linkedin’s scalable geo-distributed object store
Shadi A Noghabi, Sriram Subramanian, Priyesh Narayanan, Sivabalan Narayanan, Gopalakrishna Holla, Mammad Zadeh, Tianwei Li, Indranil Gupta, and Roy H Campbell. Ambry: Linkedin’s scalable geo-distributed object store. InProceedings of the 2016 International Conference on Management of Data, pages 253–265, 2016
work page 2016
-
[44]
NVIDIA. CUDA graphs. https://developer.nvid ia.com/blog/cuda-graphs/, 2019
work page 2019
-
[45]
TensorRT: High-performance deep learning inference sdk
NVIDIA. TensorRT: High-performance deep learning inference sdk. https://developer.nvidia.com/t ensorrt, 2025. Accessed: 2026-04. 14
work page 2025
-
[46]
Price trends: GeForce RTX 4090
PCPartPicker. Price trends: GeForce RTX 4090. https: //pcpartpicker.com/trends/price/video-card/ #gpu.chipset.geforce-rtx-4090, 2026. Accessed: 2026-04
work page 2026
-
[47]
Price trends: GeForce RTX 5090
PCPartPicker. Price trends: GeForce RTX 5090. https: //pcpartpicker.com/trends/price/video-card/ #gpu.chipset.geforce-rtx-5090, 2026. Accessed: 2026-04
work page 2026
-
[48]
Scalable diffu- sion models with transformers
William Peebles and Saining Xie. Scalable diffu- sion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023
work page 2023
-
[49]
Learning transferable visual models from natural lan- guage supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural lan- guage supervision. InInternational conference on ma- chine learning, pages 8748–8763. PMLR, 2021
work page 2021
-
[50]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020
work page 2020
-
[51]
Performance per dollar improves around 30% each year
Robi Rahman. Performance per dollar improves around 30% each year. https://epoch.ai/data-insight s/price-performance-hardware , 2024. Epoch AI data insight. Accessed 2026-04
work page 2024
-
[52]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[53]
Stochastic backpropagation and approxi- mate inference in deep generative models
Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approxi- mate inference in deep generative models. InInter- national conference on machine learning, pages 1278–
-
[54]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
work page 2022
-
[55]
De- noising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. De- noising diffusion implicit models. InInternational Con- ference on Learning Representations (ICLR), 2021
work page 2021
-
[56]
Irina C. Tuduce and Thomas Gross. Adaptive main memory compression. InUSENIX Annual Technical Conference (ATC), pages 237–250, 2005
work page 2005
-
[57]
Aaron Van Den Oord, Oriol Vinyals, et al. Neural dis- crete representation learning.Advances in neural infor- mation processing systems, 30, 2017
work page 2017
-
[58]
Waldspurger, Nohhyun Park, Alexander Garth- waite, and Irfan Ahmad
Carl A. Waldspurger, Nohhyun Park, Alexander Garth- waite, and Irfan Ahmad. Efficient MRC construction with SHARDS. In13th USENIX Conference on File and Storage Technologies (FAST 15), pages 95–110, Santa Clara, CA, February 2015. USENIX Association
work page 2015
-
[59]
TMO: Transparent memory offload- ing in datacenters
Johannes Weiner, Niket Agarwal, Dan Schatzberg, Leon Yang, Hao Wang, Blaise Sanouillet, Bikash Sharma, Tejun Heo, Mayank Jain, Chunqiang Tang, and Dim- itrios Skarlatos. TMO: Transparent memory offload- ing in datacenters. InProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASP- ...
work page 2022
-
[60]
Paul R. Wilson, Scott F. Kaplan, and Yannis Smarag- dakis. The case for compressed caching in virtual mem- ory systems. InUSENIX Annual Technical Conference (ATC), pages 101–116, 1999
work page 1999
-
[61]
Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas J. A. Harvey, and Andrew Warfield. Characterizing stor- age workloads with counter stacks. In11th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 14), pages 335–349, Broomfield, CO, October 2014. USENIX Association
work page 2014
-
[62]
zexpander: a key-value cache with both high performance and fewer misses
Xingbo Wu, Li Zhang, Yandong Wang, Yufei Ren, Michel Hack, and Song Jiang. zexpander: a key-value cache with both high performance and fewer misses. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys ’16, New York, NY , USA,
-
[63]
Association for Computing Machinery
-
[64]
Fifo queues are all you need for cache eviction
Juncheng Yang, Yazhuo Zhang, Ziyue Qiu, Yao Yue, and Rashmi Vinayak. Fifo queues are all you need for cache eviction. InProceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23, page 130–149, New York, NY , USA, 2023. Association for Computing Machinery
work page 2023
-
[65]
One-step diffusion with distribution match- ing distillation
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Tae- sung Park. One-step diffusion with distribution match- ing distillation. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 6613–6623, 2024
work page 2024
-
[66]
Xingyu Zheng, Xianglong Liu, Yichen Bian, Xudong Ma, Yulun Zhang, Jiakai Wang, Jinyang Guo, and Hao- tong Qin. Bidm: Pushing the limit of quantization for diffusion models.Advances in Neural Information Pro- cessing Systems, 37:39009–39035, 2024. 15
work page 2024
-
[67]
Demystifying cache policies for photo stores at scale: A tencent case study
Ke Zhou, Si Sun, Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, and Tianming Yang. Demystifying cache policies for photo stores at scale: A tencent case study. InProceedings of the 2018 Interna- tional Conference on Supercomputing, pages 284–294, 2018. 16
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.