LatentBox: Storing AI-Generated Images at Scale via a Latent-First Design

Haoran Ni; Juncheng Yang; Tingfeng Lan; Yue Cheng; Yunjia Zheng; Zhaoyuan Su; Zirui Wang

arxiv: 2605.19385 · v2 · pith:B2STPUNSnew · submitted 2026-05-19 · 💻 cs.DC · cs.DB

LatentBox: Storing AI-Generated Images at Scale via a Latent-First Design

Zirui Wang , Yunjia Zheng , Tingfeng Lan , Zhaoyuan Su , Haoran Ni , Juncheng Yang , Yue Cheng This is my paper

Pith reviewed 2026-05-21 07:21 UTC · model grok-4.3

classification 💻 cs.DC cs.DB

keywords AI-generated imageslatent storagegenerative AIstorage optimizationhybrid cacheon-demand reconstructionproduction traceimage caching

0 comments

The pith

LatentBox stores AI-generated images as compact latents to cut persistent storage by 78.7% while matching or beating traditional image storage latency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LatentBox, a storage system built for the billions of images created by generative AI models. It shows that full-resolution pixel files are redundant because the same images can be rebuilt on demand from the small latent tensors the model itself uses. By studying real access patterns from a long production trace, the system keeps popular images decoded for speed and stores the rest as latents to save space, then continuously tunes how much of each format to hold in cache. This approach delivers large capacity gains without increasing the time users wait for images.

Core claim

LatentBox treats compressed latent tensors as the durable primary storage format for AI-generated images and performs GPU reconstruction only on the read path when a request arrives. It maintains a hybrid cache that holds frequently requested images in decoded pixel form while keeping less-active objects as latents, using the production trace to drive ongoing adjustments to the split between the two caches. When evaluated against the same trace, the design reduces persistent storage by 78.7 percent while producing mean and tail latencies that are competitive with or better than a conventional image-only store.

What carries the argument

The hybrid latent-image cache that stores hot objects decoded and cold objects as compressed latents, with dynamic allocation tuned from observed access frequencies.

If this is right

Platforms can host several times more images on the same storage hardware without expanding capacity.
Storage bandwidth drops because latents are much smaller than full pixel blobs.
Compute is spent only for reconstruction on cache misses for infrequent objects.
User-visible latency stays low by keeping popular images ready in decoded form.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar latent-first designs could extend to other generative outputs such as audio clips or short videos that also have compact internal representations.
Object stores might eventually add native support for model-specific latent formats so applications do not need custom reconstruction logic.
If reconstruction demand grows very large, batching or specialized hardware accelerators could become necessary to keep tail latencies low.

Load-bearing premise

The 35-month trace of two billion requests from one platform represents typical future access patterns and that GPU reconstruction latency will stay acceptable to users even under varying load.

What would settle it

Running a live deployment of LatentBox against real user traffic for several months and directly comparing measured storage consumption and end-to-end request latencies against a pure image-based baseline.

Figures

Figures reproduced from arXiv: 2605.19385 by Haoran Ni, Juncheng Yang, Tingfeng Lan, Yue Cheng, Yunjia Zheng, Zhaoyuan Su, Zirui Wang.

**Figure 1.** Figure 1: Illustration of latent-first storage. (a) Conventional object stores persist AI-generated images as opaque blobs, whereas LatentBox (LB) stores compact model-native latents (intermediate state) and reconstructs images on demand. (b) Cost–latency tradeoff of five storage strategies. LatentBox achieves low cost and latency. Existing large-scale image storage systems, including Facebook’s Haystack [10] and … view at source ↗

**Figure 2.** Figure 2: Adobe Firefly sees explosive gen-image growth [3]. VAE decoder Text prompt CLIP/T5 Diffusion Latent z Image x Stage 1: Denoising (seconds) Stage 2: Decode (tens of ms) [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 4.** Figure 4: CompanyX trace characterization. (a) Image popularity CDF. (b) Mean access rate vs. age, stratified by lifetime-view quartile. (c) Miss ratio vs. cache size for three policies. (d) CDF of intervals between consecutive accesses. 3 Motivation This section motivates the design of LatentBox by analyzing a production trace (§3.1) and characterizing the cost of ondemand pixel reconstruction (§3.2). 3.1 Producti… view at source ↗

**Figure 5.** Figure 5: LatentBox architecture and request flow. to map each identifier to its owner node and tracks per-GPU queue depths for load-aware dispatch. Every GPU node is functionally identical: it hosts a dual-format cache that holds each object either as a decoded image or as a compressed latent (§4.2), an adaptive resizer that balances the two tiers online (§4.3), and one decode pipeline per GPU streaming CPU decompr… view at source ↗

**Figure 6.** Figure 6: Dual-format cache design. over the full request stream: a request counts as an image miss if the requested object is not found in the image cache, regardless of whether it is later found in the latent cache. Let MRlat(α) denote the latent-cache miss ratio at this allocation, measured over the image-miss stream, i.e., only those requests that already missed the image cache. A latent miss is therefore a full… view at source ↗

**Figure 7.** Figure 7: End-to-end read performance. (a) CDF of read latency for five store-and-read configurations. (b) Stacked cache hit distribution: image hit, latent hit, and full miss fractions. (c) Mean latency breakdown for cache-hit requests (image hit + latent hit); numbers above bars show total mean (ms). (d) Mean latency breakdown for cache-miss requests [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Cumulative cost at four time horizons (2026, 2030, 2040, 2050), normalized so ImgStore at the trace end (March 2026) equals 1. (Top) constant prices. (Bottom) with annual price decay (GPU −20%/yr, storage −10%/yr from 2026 [17, 18, 38, 49]). 5090 saves 64%, because cheaper GPUs amplify the decodebased architecture’s advantage while storage-price reductions benefit both strategies proportionally. 6.5 Ablat… view at source ↗

**Figure 11.** Figure 11: Sensitivity analysis. LatentBox is robust to step size, window size, and tail fraction; the promotion threshold h has the largest impact. 6.5.3 Spillover Dispatch To validate the effectiveness of the spillover path, we replay the same 48-hour trace on a 6-node GPU cluster at 1000× speed and an overflow threshold θ=4. The without-spillover baseline sets θ to infinity, so every request is dispatched to its … view at source ↗

**Figure 9.** Figure 9: Per-window latency improvement and α trajectory. E2E Mean E2E P99 GPU Wait Mean GPU Wait P99 0 200 400 600 Read Latency (ms) 79 94 360 472 5 8 78 153 w/ Spillover w/o Spillover [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 12.** Figure 12: Reconstruction fidelity over 10 K SD 3.5 images (1024×1024). (a) Signed per-channel pixel difference aggregated across 2,000 sampled images; 47% of pixel-channel values are unchanged. (b)-(c) White dot: median; thick bar: interquartile range. “q” in “q95” denotes quality factor; higher PSNR (dB) and SSIM closer to 1 indicate better fidelity. sults indicate that practitioners can deploy LatentBox without … view at source ↗

read the original abstract

The explosive growth of AI-generated images has created a sustainability challenge for storage infrastructure. Platforms like Midjourney and Adobe Firefly already host billions of generative images, yet conventional object stores persist them as blobs with full-resolution pixels, consuming huge amounts of storage capacity and bandwidth. Unlike natural photos, however, AI-generated images can be deterministically reconstructed from compact, model-native latent tensors, making persistent image storage fundamentally redundant. This paper presents LatentBox, a latent-first storage system for AI-generated images. LatentBox treats compressed latents as durable storage objects and uses on-demand GPU reconstruction on the read path to trade inexpensive compute for large persistent storage savings. Our design is guided by the first large-scale analysis of AI-generated image access we are aware of, based on a 35-month, 2-billion-request production trace from a major generative-content platform. Motivated by the trace analysis, LatentBox keeps frequently accessed images in decoded pixel format for fast hits, stores less-active objects as compressed latents to expand effective cache capacity, and continuously adjusts the splits between the image and latent cache to optimize user-perceived access latency.We build a LatentBox prototype and evaluate it with the production trace. LatentBox reduces persistent storage by 78.7% with competitive or even lower mean and tail latency over a pure image-based storage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LatentBox shows a practical 78.7% storage cut for AI images by storing latents and reconstructing on demand, with trace-driven caching as the main new piece, though latency under load is only moderately supported.

read the letter

The main point is that LatentBox shows how storing AI-generated images primarily as latents rather than full-resolution pixels can cut storage by 78.7% in a prototype, guided by real access patterns from a large production trace, while trying to keep latencies competitive through dynamic caching and on-demand GPU reconstruction. The paper does a few things right. First, the trace analysis stands out. Looking at 35 months and 2 billion requests gives concrete motivation for the design choices, like splitting the cache between hot pixel images and latent storage for less frequent ones, with continuous adjustments. That's more grounded than many storage papers that rely on assumptions. Second, the prototype evaluation uses that trace to demonstrate the storage reduction. Treating latents as the durable format and reconstructing only on reads is a logical fit for these images, since they are model-native and deterministic. This approach directly addresses the sustainability issue for platforms hosting billions of such images. On the downside, the latency results feel less solid. The claim of competitive or better mean and tail latency compared to pure image storage depends on the reconstruction not becoming a bottleneck. But the design lacks a detailed queuing model or reported sensitivity tests for bursts of cold accesses. If reconstruction takes significant time and multiple requests hit at once with limited GPUs, tails could suffer even if hit rates are high on average. The abstract mentions the numbers but without error bars or full setup details, it's moderate support at best. The weakest assumption is that the trace represents future patterns and that reconstruction stays acceptable under load. This paper targets systems researchers interested in AI-specific storage optimizations. It has enough novelty in the latent-first design with trace-driven management to merit serious peer review, though it would benefit from expanded evaluation on the performance side. Recommendation: Yes, send it to referees.

Referee Report

2 major / 2 minor

Summary. The paper presents LatentBox, a latent-first storage system for AI-generated images that persists compact compressed latents as durable objects and performs on-demand GPU reconstruction on the read path. Motivated by analysis of a 35-month, 2-billion-request production trace, the design keeps hot objects in decoded pixel format while storing colder objects as latents, dynamically adjusting the image/latent cache split to optimize user-perceived latency. The prototype evaluation reports a 78.7% reduction in persistent storage with competitive or lower mean and tail latency versus a pure image-based baseline.

Significance. If the latency results hold under realistic load, the work offers a practical approach to the storage sustainability problem for generative-AI platforms by trading inexpensive compute for large capacity savings. The grounding in a real production trace and the concrete prototype measurements are strengths that increase relevance for systems research on AI infrastructure.

major comments (2)

[Evaluation] Evaluation section: the headline claim that mean and tail latency remain competitive (or better) with pure pixel storage depends on on-demand reconstruction plus the dynamic cache split, yet the manuscript provides no measured per-request GPU reconstruction time, explicit queuing model for concurrent cold-object reconstructions, or sensitivity results when the trace exhibits bursts that exceed available GPUs. Without these, it is impossible to verify that tail latency does not exceed the baseline under the reported access patterns.
[Evaluation] Trace-driven evaluation: the 78.7% storage-reduction figure and latency competitiveness rest on the assumption that the 35-month trace accurately represents access patterns and that reconstruction remains acceptable under varying load, but no additional experiments (e.g., synthetic burst workloads or different GPU counts) are reported to test this assumption.

minor comments (2)

[Abstract] Abstract and §4: the comparison baseline (pure image storage) and exact latency metrics (mean, p99, etc.) should be stated more explicitly so readers can reproduce the competitiveness claim.
[Design] Notation for cache-split thresholds and reconstruction cost model could be formalized with a short equation or pseudocode to improve clarity and reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments on the evaluation below and commit to revisions that directly strengthen the presentation of our latency and trace-driven results.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the headline claim that mean and tail latency remain competitive (or better) with pure pixel storage depends on on-demand reconstruction plus the dynamic cache split, yet the manuscript provides no measured per-request GPU reconstruction time, explicit queuing model for concurrent cold-object reconstructions, or sensitivity results when the trace exhibits bursts that exceed available GPUs. Without these, it is impossible to verify that tail latency does not exceed the baseline under the reported access patterns.

Authors: We agree that the current manuscript would benefit from greater transparency on these points. The reported mean and tail latencies are end-to-end measurements obtained by replaying the production trace on the prototype, so reconstruction costs are already embedded in the results. To make this explicit, the revised version will add (1) a table and CDF of per-request GPU reconstruction times measured on our test hardware, (2) a simple M/M/k-style queuing analysis parameterized by the observed request rates and GPU count from the prototype runs, and (3) sensitivity curves for synthetic burst workloads that temporarily exceed the provisioned GPUs. These additions will allow readers to verify tail-latency behavior directly. revision: yes
Referee: [Evaluation] Trace-driven evaluation: the 78.7% storage-reduction figure and latency competitiveness rest on the assumption that the 35-month trace accurately represents access patterns and that reconstruction remains acceptable under varying load, but no additional experiments (e.g., synthetic burst workloads or different GPU counts) are reported to test this assumption.

Authors: The 78.7% storage reduction is obtained by comparing latent versus pixel sizes for every object referenced in the trace and is therefore independent of load assumptions. The latency comparison is likewise a direct trace replay. We acknowledge, however, that the manuscript does not vary GPU count or inject synthetic bursts. In the revision we will add two new experiments: (a) replaying the trace while varying the number of available GPUs from 1 to 8, and (b) synthetic burst workloads that double the peak request rate for short intervals. These results will be reported alongside the original trace-driven numbers. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external trace and prototype measurements

full rationale

The paper's core results—the 78.7% persistent storage reduction and competitive mean/tail latency—are presented as direct empirical outcomes from evaluating a built prototype against the 35-month, 2-billion-request external production trace. Design choices (dynamic image/latent cache splits) are motivated by trace analysis but do not tautologically define the reported savings or latency figures; those are measured quantities. No self-citations, uniqueness theorems, fitted parameters renamed as predictions, or ansatzes appear in the derivation chain. The evaluation is self-contained against independent external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The design rests on the domain assumption that latents allow deterministic high-quality reconstruction and that the observed trace reflects future access behavior; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption AI-generated images can be deterministically reconstructed from compact, model-native latent tensors
This premise is stated directly in the abstract as the foundation for treating latents as durable storage objects.

pith-pipeline@v0.9.0 · 5787 in / 1103 out tokens · 40347 ms · 2026-05-21T07:21:36.948751+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

stores AI-generated images as compressed latents and reconstructs them on demand using GPU decoding

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 4 internal anchors

[1]

https://on nx.ai, 2019

ONNX: Open neural network exchange. https://on nx.ai, 2019

work page 2019
[2]

https://www.midjourney.com , 2026

Midjourney. https://www.midjourney.com , 2026. Accessed: April 2026

work page 2026
[3]

Introducing firefly foundry

Adobe. Introducing firefly foundry. https://busine ss.adobe.com/blog/introducing-firefly-fou ndry, 2023. Accessed: 2026-04

work page 2023
[4]

Taming Throughput-Latency tradeoff in LLM inference with Sarathi-Serve

Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav Gulavani, Alexey Tumanov, and Ramachandran Ramjee. Taming Throughput-Latency tradeoff in LLM inference with Sarathi-Serve. In18th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 24), pages 117–134, Santa Clara, CA, July 2024. USENIX Association

work page 2024
[5]

Amazon S3 pricing

Amazon Web Services. Amazon S3 pricing. Online, 2024

work page 2024
[6]

Amazon S3 Glacier instant retrieval storage class, 2026

Amazon Web Services. Amazon S3 Glacier instant retrieval storage class, 2026. Accessed: 2026-04-23

work page 2026
[7]

Amazon S3 Glacier Instant Retrieval storage class

Amazon Web Services. Amazon S3 Glacier Instant Retrieval storage class. https://aws.amazon.com/s 3/storage-classes/glacier/instant-retriev al/, 2026. Accessed: 2026-04

work page 2026
[8]

Best practices design patterns: Optimizing amazon s3 performance

Amazon Web Services. Best practices design patterns: Optimizing amazon s3 performance. https://docs .aws.amazon.com/AmazonS3/latest/userguide/ optimizing-performance.html , 2026. Accessed: 2026-04-23

work page 2026
[9]

Pelican: A building block for exascale cold data storage

Shobana Balakrishnan, Richard Black, Austin Don- nelly, Paul England, Adam Glass, Dave Harper, Sergey Legtchenko, Aaron Ogus, Eric Peterson, and Antony Rowstron. Pelican: A building block for exascale cold data storage. In11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 351–365, 2014

work page 2014
[10]

Finding a needle in haystack: Face- book’s photo storage

Doug Beaver, Sanjeev Kumar, Harry C Li, Jason Sobel, and Peter Vajgel. Finding a needle in haystack: Face- book’s photo storage. In9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10), 2010

work page 2010
[11]

L. A. Belady. A study of replacement algorithms for a virtual-storage computer.IBM Systems Journal, 5(2):78– 101, 1966

work page 1966
[12]

Win- dows azure storage: a highly available cloud storage service with strong consistency

Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakan- tan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shash- wat Srivastav, Jiesheng Wu, Huseyin Simitci, et al. Win- dows azure storage: a highly available cloud storage service with strong consistency. InProceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143–157, 2011

work page 2011
[13]

PixArt- α: Fast training of diffusion transformer for photorealistic text- to-image synthesis

Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. PixArt- α: Fast training of diffusion transformer for photorealistic text- to-image synthesis. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024
[14]

Deep compression autoencoder for efficient high-resolution diffusion models.arXiv preprint arXiv:2410.10733, 2024

Junyu Chen, Han Cai, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, and Song Han. Deep compression autoencoder for efficient high-resolution diffusion models.arXiv preprint arXiv:2410.10733, 2024

work page arXiv 2024
[15]

Reproducible scaling laws for contrastive language- image learning

Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jit- sev. Reproducible scaling laws for contrastive language- image learning. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 2818–2829, 2023

work page 2023
[16]

Cliffhanger: Scaling performance cliffs in web memory caches

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti. Cliffhanger: Scaling performance cliffs in web memory caches. In13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 379–392, 2016

work page 2016
[17]

Disk prices — current hard drive cost per gigabyte

DiskPrices.com. Disk prices — current hard drive cost per gigabyte. https://diskprices.com/ , 2025. Accessed 2026-04

work page 2025
[18]

Data on machine learning hardware

Epoch AI. Data on machine learning hardware. https: //epoch.ai/data/machine-learning-hardware ,

work page
[19]

CC-BY 4.0

Dataset of >170 AI accelerators (GPUs, TPUs) with performance, price, and efficiency metrics. CC-BY 4.0. Accessed 2026-04

work page 2026
[20]

Scaling recti- fied flow transformers for high-resolution image synthe- sis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthe- sis. InForty-first international conference on machine learning, 2024

work page 2024
[21]

Tam- ing transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Bjorn Ommer. Tam- ing transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021

work page 2021
[22]

Ai image statistics

Everypixel Journal. Ai image statistics. https://jo urnal.everypixel.com/ai-image-statistics ,

work page
[23]

Accessed: 2026-04. 13

work page 2026
[24]

Zstandard - Fast real-time compression al- gorithm

Facebook. Zstandard - Fast real-time compression al- gorithm. https://github.com/facebook/zstd . Accessed: 2026-04

work page 2026
[25]

NVIDIA RTX 5090 gpu guide and pric- ing

GetDeploying. NVIDIA RTX 5090 gpu guide and pric- ing. https://getdeploying.com/gpus/nvidia-r tx-5090, 2026. Accessed: 2026-04

work page 2026
[26]

Gonzalez and Richard E

Rafael C. Gonzalez and Richard E. Woods.Digital Image Processing. Pearson, 4th edition, 2018

work page 2018
[27]

Image quality metrics: Psnr vs

Alain Hore and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In2010 20th international conference on pattern recognition, pages 2366–2369. IEEE, 2010

work page 2010
[28]

NVIDIA H100 price guide 2026: GPU costs, cloud pricing & buy vs rent

JarvisLabs. NVIDIA H100 price guide 2026: GPU costs, cloud pricing & buy vs rent. https://jarvisla bs.ai/blog/h100-price, 2026. Accessed: 2026-04

work page 2026
[29]

Elucidating the design space of diffusion-based gener- ative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based gener- ative models. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), 2022

work page 2022
[30]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[31]

Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Do- minik Lorenz, Jonas Müller, Dustin Podell, Robin Rom- bach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context im...

work page 2025
[32]

Software-defined far memory in warehouse-scale computers

Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thorat, Adrian Yurtsever, Daniel Zolnowski, Kim Hazelwood, Martin Maas, Thomas Mccauley, and Rohit Sen. Software-defined far memory in warehouse-scale computers. InProceedings of the 24th Internatio...

work page 2019
[33]

Pseudo numerical methods for diffusion models on manifolds

Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds. InInternational Conference on Learning Representa- tions (ICLR), 2022

work page 2022
[34]

Pcodec: Better compression for numerical sequences, 2025

Martin Loncaric, Niels Jeppesen, and Ben Zinberg. Pcodec: Better compression for numerical sequences, 2025

work page 2025
[35]

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

LZ4 - Extremely fast compression

lz4. LZ4 - Extremely fast compression. https://gith ub.com/lz4/lz4. Accessed: 2026-04

work page 2026
[37]

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference- time scaling for diffusion models beyond scaling denois- ing steps.arXiv preprint arXiv:2501.09732, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

Deep- cache: Accelerating diffusion models for free

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Deep- cache: Accelerating diffusion models for free. InPro- ceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 15762–15772, 2024

work page 2024
[39]

R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies.IBM Syst. J., 9(2):78–117, June 1970

work page 1970
[40]

McCallum

John C. McCallum. Disk drive prices (1955–2023). ht tps://jcmit.net/diskprice.htm , 2023. Monthly survey of consumer HDD prices from NewEgg.com, 1955–2023. Archived at https://web.archive.or g/web/2024/https://jcmit.net/diskprice.htm . Accessed 2026-04

work page 1955
[41]

Jordan, and Ion Stoica

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumber, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. Ray: A distributed framework for emerging AI applications. In13th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 18), pages 561–577, 2018

work page 2018
[42]

f4: Facebook’s warm BLOB storage system

Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, et al. f4: Facebook’s warm BLOB storage system. In11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 383–398, 2014

work page 2014
[43]

Ambry: Linkedin’s scalable geo-distributed object store

Shadi A Noghabi, Sriram Subramanian, Priyesh Narayanan, Sivabalan Narayanan, Gopalakrishna Holla, Mammad Zadeh, Tianwei Li, Indranil Gupta, and Roy H Campbell. Ambry: Linkedin’s scalable geo-distributed object store. InProceedings of the 2016 International Conference on Management of Data, pages 253–265, 2016

work page 2016
[44]

CUDA graphs

NVIDIA. CUDA graphs. https://developer.nvid ia.com/blog/cuda-graphs/, 2019

work page 2019
[45]

TensorRT: High-performance deep learning inference sdk

NVIDIA. TensorRT: High-performance deep learning inference sdk. https://developer.nvidia.com/t ensorrt, 2025. Accessed: 2026-04. 14

work page 2025
[46]

Price trends: GeForce RTX 4090

PCPartPicker. Price trends: GeForce RTX 4090. https: //pcpartpicker.com/trends/price/video-card/ #gpu.chipset.geforce-rtx-4090, 2026. Accessed: 2026-04

work page 2026
[47]

Price trends: GeForce RTX 5090

PCPartPicker. Price trends: GeForce RTX 5090. https: //pcpartpicker.com/trends/price/video-card/ #gpu.chipset.geforce-rtx-5090, 2026. Accessed: 2026-04

work page 2026
[48]

Scalable diffu- sion models with transformers

William Peebles and Saining Xie. Scalable diffu- sion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023

work page 2023
[49]

Learning transferable visual models from natural lan- guage supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural lan- guage supervision. InInternational conference on ma- chine learning, pages 8748–8763. PMLR, 2021

work page 2021
[50]

Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

work page 2020
[51]

Performance per dollar improves around 30% each year

Robi Rahman. Performance per dollar improves around 30% each year. https://epoch.ai/data-insight s/price-performance-hardware , 2024. Epoch AI data insight. Accessed 2026-04

work page 2024
[52]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[53]

Stochastic backpropagation and approxi- mate inference in deep generative models

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approxi- mate inference in deep generative models. InInter- national conference on machine learning, pages 1278–

work page
[54]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[55]

De- noising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. De- noising diffusion implicit models. InInternational Con- ference on Learning Representations (ICLR), 2021

work page 2021
[56]

Tuduce and Thomas Gross

Irina C. Tuduce and Thomas Gross. Adaptive main memory compression. InUSENIX Annual Technical Conference (ATC), pages 237–250, 2005

work page 2005
[57]

Neural dis- crete representation learning.Advances in neural infor- mation processing systems, 30, 2017

Aaron Van Den Oord, Oriol Vinyals, et al. Neural dis- crete representation learning.Advances in neural infor- mation processing systems, 30, 2017

work page 2017
[58]

Waldspurger, Nohhyun Park, Alexander Garth- waite, and Irfan Ahmad

Carl A. Waldspurger, Nohhyun Park, Alexander Garth- waite, and Irfan Ahmad. Efficient MRC construction with SHARDS. In13th USENIX Conference on File and Storage Technologies (FAST 15), pages 95–110, Santa Clara, CA, February 2015. USENIX Association

work page 2015
[59]

TMO: Transparent memory offload- ing in datacenters

Johannes Weiner, Niket Agarwal, Dan Schatzberg, Leon Yang, Hao Wang, Blaise Sanouillet, Bikash Sharma, Tejun Heo, Mayank Jain, Chunqiang Tang, and Dim- itrios Skarlatos. TMO: Transparent memory offload- ing in datacenters. InProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASP- ...

work page 2022
[60]

Wilson, Scott F

Paul R. Wilson, Scott F. Kaplan, and Yannis Smarag- dakis. The case for compressed caching in virtual mem- ory systems. InUSENIX Annual Technical Conference (ATC), pages 101–116, 1999

work page 1999
[61]

Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas J. A. Harvey, and Andrew Warfield. Characterizing stor- age workloads with counter stacks. In11th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 14), pages 335–349, Broomfield, CO, October 2014. USENIX Association

work page 2014
[62]

zexpander: a key-value cache with both high performance and fewer misses

Xingbo Wu, Li Zhang, Yandong Wang, Yufei Ren, Michel Hack, and Song Jiang. zexpander: a key-value cache with both high performance and fewer misses. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys ’16, New York, NY , USA,

work page
[63]

Association for Computing Machinery

work page
[64]

Fifo queues are all you need for cache eviction

Juncheng Yang, Yazhuo Zhang, Ziyue Qiu, Yao Yue, and Rashmi Vinayak. Fifo queues are all you need for cache eviction. InProceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23, page 130–149, New York, NY , USA, 2023. Association for Computing Machinery

work page 2023
[65]

One-step diffusion with distribution match- ing distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Tae- sung Park. One-step diffusion with distribution match- ing distillation. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 6613–6623, 2024

work page 2024
[66]

Bidm: Pushing the limit of quantization for diffusion models.Advances in Neural Information Pro- cessing Systems, 37:39009–39035, 2024

Xingyu Zheng, Xianglong Liu, Yichen Bian, Xudong Ma, Yulun Zhang, Jiakai Wang, Jinyang Guo, and Hao- tong Qin. Bidm: Pushing the limit of quantization for diffusion models.Advances in Neural Information Pro- cessing Systems, 37:39009–39035, 2024. 15

work page 2024
[67]

Demystifying cache policies for photo stores at scale: A tencent case study

Ke Zhou, Si Sun, Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, and Tianming Yang. Demystifying cache policies for photo stores at scale: A tencent case study. InProceedings of the 2018 Interna- tional Conference on Supercomputing, pages 284–294, 2018. 16

work page 2018

[1] [1]

https://on nx.ai, 2019

ONNX: Open neural network exchange. https://on nx.ai, 2019

work page 2019

[2] [2]

https://www.midjourney.com , 2026

Midjourney. https://www.midjourney.com , 2026. Accessed: April 2026

work page 2026

[3] [3]

Introducing firefly foundry

Adobe. Introducing firefly foundry. https://busine ss.adobe.com/blog/introducing-firefly-fou ndry, 2023. Accessed: 2026-04

work page 2023

[4] [4]

Taming Throughput-Latency tradeoff in LLM inference with Sarathi-Serve

Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav Gulavani, Alexey Tumanov, and Ramachandran Ramjee. Taming Throughput-Latency tradeoff in LLM inference with Sarathi-Serve. In18th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 24), pages 117–134, Santa Clara, CA, July 2024. USENIX Association

work page 2024

[5] [5]

Amazon S3 pricing

Amazon Web Services. Amazon S3 pricing. Online, 2024

work page 2024

[6] [6]

Amazon S3 Glacier instant retrieval storage class, 2026

Amazon Web Services. Amazon S3 Glacier instant retrieval storage class, 2026. Accessed: 2026-04-23

work page 2026

[7] [7]

Amazon S3 Glacier Instant Retrieval storage class

Amazon Web Services. Amazon S3 Glacier Instant Retrieval storage class. https://aws.amazon.com/s 3/storage-classes/glacier/instant-retriev al/, 2026. Accessed: 2026-04

work page 2026

[8] [8]

Best practices design patterns: Optimizing amazon s3 performance

Amazon Web Services. Best practices design patterns: Optimizing amazon s3 performance. https://docs .aws.amazon.com/AmazonS3/latest/userguide/ optimizing-performance.html , 2026. Accessed: 2026-04-23

work page 2026

[9] [9]

Pelican: A building block for exascale cold data storage

Shobana Balakrishnan, Richard Black, Austin Don- nelly, Paul England, Adam Glass, Dave Harper, Sergey Legtchenko, Aaron Ogus, Eric Peterson, and Antony Rowstron. Pelican: A building block for exascale cold data storage. In11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 351–365, 2014

work page 2014

[10] [10]

Finding a needle in haystack: Face- book’s photo storage

Doug Beaver, Sanjeev Kumar, Harry C Li, Jason Sobel, and Peter Vajgel. Finding a needle in haystack: Face- book’s photo storage. In9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10), 2010

work page 2010

[11] [11]

L. A. Belady. A study of replacement algorithms for a virtual-storage computer.IBM Systems Journal, 5(2):78– 101, 1966

work page 1966

[12] [12]

Win- dows azure storage: a highly available cloud storage service with strong consistency

Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakan- tan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shash- wat Srivastav, Jiesheng Wu, Huseyin Simitci, et al. Win- dows azure storage: a highly available cloud storage service with strong consistency. InProceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143–157, 2011

work page 2011

[13] [13]

PixArt- α: Fast training of diffusion transformer for photorealistic text- to-image synthesis

Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. PixArt- α: Fast training of diffusion transformer for photorealistic text- to-image synthesis. InInternational Conference on Learning Representations (ICLR), 2024

work page 2024

[14] [14]

Deep compression autoencoder for efficient high-resolution diffusion models.arXiv preprint arXiv:2410.10733, 2024

Junyu Chen, Han Cai, Junsong Chen, Enze Xie, Shang Yang, Haotian Tang, Muyang Li, Yao Lu, and Song Han. Deep compression autoencoder for efficient high-resolution diffusion models.arXiv preprint arXiv:2410.10733, 2024

work page arXiv 2024

[15] [15]

Reproducible scaling laws for contrastive language- image learning

Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jit- sev. Reproducible scaling laws for contrastive language- image learning. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 2818–2829, 2023

work page 2023

[16] [16]

Cliffhanger: Scaling performance cliffs in web memory caches

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti. Cliffhanger: Scaling performance cliffs in web memory caches. In13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 379–392, 2016

work page 2016

[17] [17]

Disk prices — current hard drive cost per gigabyte

DiskPrices.com. Disk prices — current hard drive cost per gigabyte. https://diskprices.com/ , 2025. Accessed 2026-04

work page 2025

[18] [18]

Data on machine learning hardware

Epoch AI. Data on machine learning hardware. https: //epoch.ai/data/machine-learning-hardware ,

work page

[19] [19]

CC-BY 4.0

Dataset of >170 AI accelerators (GPUs, TPUs) with performance, price, and efficiency metrics. CC-BY 4.0. Accessed 2026-04

work page 2026

[20] [20]

Scaling recti- fied flow transformers for high-resolution image synthe- sis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling recti- fied flow transformers for high-resolution image synthe- sis. InForty-first international conference on machine learning, 2024

work page 2024

[21] [21]

Tam- ing transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Bjorn Ommer. Tam- ing transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021

work page 2021

[22] [22]

Ai image statistics

Everypixel Journal. Ai image statistics. https://jo urnal.everypixel.com/ai-image-statistics ,

work page

[23] [23]

Accessed: 2026-04. 13

work page 2026

[24] [24]

Zstandard - Fast real-time compression al- gorithm

Facebook. Zstandard - Fast real-time compression al- gorithm. https://github.com/facebook/zstd . Accessed: 2026-04

work page 2026

[25] [25]

NVIDIA RTX 5090 gpu guide and pric- ing

GetDeploying. NVIDIA RTX 5090 gpu guide and pric- ing. https://getdeploying.com/gpus/nvidia-r tx-5090, 2026. Accessed: 2026-04

work page 2026

[26] [26]

Gonzalez and Richard E

Rafael C. Gonzalez and Richard E. Woods.Digital Image Processing. Pearson, 4th edition, 2018

work page 2018

[27] [27]

Image quality metrics: Psnr vs

Alain Hore and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In2010 20th international conference on pattern recognition, pages 2366–2369. IEEE, 2010

work page 2010

[28] [28]

NVIDIA H100 price guide 2026: GPU costs, cloud pricing & buy vs rent

JarvisLabs. NVIDIA H100 price guide 2026: GPU costs, cloud pricing & buy vs rent. https://jarvisla bs.ai/blog/h100-price, 2026. Accessed: 2026-04

work page 2026

[29] [29]

Elucidating the design space of diffusion-based gener- ative models

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based gener- ative models. InAdvances in Neural Information Pro- cessing Systems (NeurIPS), 2022

work page 2022

[30] [30]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[31] [31]

Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Do- minik Lorenz, Jonas Müller, Dustin Podell, Robin Rom- bach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context im...

work page 2025

[32] [32]

Software-defined far memory in warehouse-scale computers

Andres Lagar-Cavilla, Junwhan Ahn, Suleiman Souhlal, Neha Agarwal, Radoslaw Burny, Shakeel Butt, Jichuan Chang, Ashwin Chaugule, Nan Deng, Junaid Shahid, Greg Thorat, Adrian Yurtsever, Daniel Zolnowski, Kim Hazelwood, Martin Maas, Thomas Mccauley, and Rohit Sen. Software-defined far memory in warehouse-scale computers. InProceedings of the 24th Internatio...

work page 2019

[33] [33]

Pseudo numerical methods for diffusion models on manifolds

Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds. InInternational Conference on Learning Representa- tions (ICLR), 2022

work page 2022

[34] [34]

Pcodec: Better compression for numerical sequences, 2025

Martin Loncaric, Niels Jeppesen, and Ben Zinberg. Pcodec: Better compression for numerical sequences, 2025

work page 2025

[35] [35]

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[36] [36]

LZ4 - Extremely fast compression

lz4. LZ4 - Extremely fast compression. https://gith ub.com/lz4/lz4. Accessed: 2026-04

work page 2026

[37] [37]

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, and Saining Xie. Inference- time scaling for diffusion models beyond scaling denois- ing steps.arXiv preprint arXiv:2501.09732, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[38] [38]

Deep- cache: Accelerating diffusion models for free

Xinyin Ma, Gongfan Fang, and Xinchao Wang. Deep- cache: Accelerating diffusion models for free. InPro- ceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 15762–15772, 2024

work page 2024

[39] [39]

R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies.IBM Syst. J., 9(2):78–117, June 1970

work page 1970

[40] [40]

McCallum

John C. McCallum. Disk drive prices (1955–2023). ht tps://jcmit.net/diskprice.htm , 2023. Monthly survey of consumer HDD prices from NewEgg.com, 1955–2023. Archived at https://web.archive.or g/web/2024/https://jcmit.net/diskprice.htm . Accessed 2026-04

work page 1955

[41] [41]

Jordan, and Ion Stoica

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumber, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. Ray: A distributed framework for emerging AI applications. In13th USENIX Symposium on Oper- ating Systems Design and Implementation (OSDI 18), pages 561–577, 2018

work page 2018

[42] [42]

f4: Facebook’s warm BLOB storage system

Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin, Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, et al. f4: Facebook’s warm BLOB storage system. In11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 383–398, 2014

work page 2014

[43] [43]

Ambry: Linkedin’s scalable geo-distributed object store

Shadi A Noghabi, Sriram Subramanian, Priyesh Narayanan, Sivabalan Narayanan, Gopalakrishna Holla, Mammad Zadeh, Tianwei Li, Indranil Gupta, and Roy H Campbell. Ambry: Linkedin’s scalable geo-distributed object store. InProceedings of the 2016 International Conference on Management of Data, pages 253–265, 2016

work page 2016

[44] [44]

CUDA graphs

NVIDIA. CUDA graphs. https://developer.nvid ia.com/blog/cuda-graphs/, 2019

work page 2019

[45] [45]

TensorRT: High-performance deep learning inference sdk

NVIDIA. TensorRT: High-performance deep learning inference sdk. https://developer.nvidia.com/t ensorrt, 2025. Accessed: 2026-04. 14

work page 2025

[46] [46]

Price trends: GeForce RTX 4090

PCPartPicker. Price trends: GeForce RTX 4090. https: //pcpartpicker.com/trends/price/video-card/ #gpu.chipset.geforce-rtx-4090, 2026. Accessed: 2026-04

work page 2026

[47] [47]

Price trends: GeForce RTX 5090

PCPartPicker. Price trends: GeForce RTX 5090. https: //pcpartpicker.com/trends/price/video-card/ #gpu.chipset.geforce-rtx-5090, 2026. Accessed: 2026-04

work page 2026

[48] [48]

Scalable diffu- sion models with transformers

William Peebles and Saining Xie. Scalable diffu- sion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023

work page 2023

[49] [49]

Learning transferable visual models from natural lan- guage supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sas- try, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural lan- guage supervision. InInternational conference on ma- chine learning, pages 8748–8763. PMLR, 2021

work page 2021

[50] [50]

Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

work page 2020

[51] [51]

Performance per dollar improves around 30% each year

Robi Rahman. Performance per dollar improves around 30% each year. https://epoch.ai/data-insight s/price-performance-hardware , 2024. Epoch AI data insight. Accessed 2026-04

work page 2024

[52] [52]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[53] [53]

Stochastic backpropagation and approxi- mate inference in deep generative models

Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approxi- mate inference in deep generative models. InInter- national conference on machine learning, pages 1278–

work page

[54] [54]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022

[55] [55]

De- noising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. De- noising diffusion implicit models. InInternational Con- ference on Learning Representations (ICLR), 2021

work page 2021

[56] [56]

Tuduce and Thomas Gross

Irina C. Tuduce and Thomas Gross. Adaptive main memory compression. InUSENIX Annual Technical Conference (ATC), pages 237–250, 2005

work page 2005

[57] [57]

Neural dis- crete representation learning.Advances in neural infor- mation processing systems, 30, 2017

Aaron Van Den Oord, Oriol Vinyals, et al. Neural dis- crete representation learning.Advances in neural infor- mation processing systems, 30, 2017

work page 2017

[58] [58]

Waldspurger, Nohhyun Park, Alexander Garth- waite, and Irfan Ahmad

Carl A. Waldspurger, Nohhyun Park, Alexander Garth- waite, and Irfan Ahmad. Efficient MRC construction with SHARDS. In13th USENIX Conference on File and Storage Technologies (FAST 15), pages 95–110, Santa Clara, CA, February 2015. USENIX Association

work page 2015

[59] [59]

TMO: Transparent memory offload- ing in datacenters

Johannes Weiner, Niket Agarwal, Dan Schatzberg, Leon Yang, Hao Wang, Blaise Sanouillet, Bikash Sharma, Tejun Heo, Mayank Jain, Chunqiang Tang, and Dim- itrios Skarlatos. TMO: Transparent memory offload- ing in datacenters. InProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASP- ...

work page 2022

[60] [60]

Wilson, Scott F

Paul R. Wilson, Scott F. Kaplan, and Yannis Smarag- dakis. The case for compressed caching in virtual mem- ory systems. InUSENIX Annual Technical Conference (ATC), pages 101–116, 1999

work page 1999

[61] [61]

Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas J. A. Harvey, and Andrew Warfield. Characterizing stor- age workloads with counter stacks. In11th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 14), pages 335–349, Broomfield, CO, October 2014. USENIX Association

work page 2014

[62] [62]

zexpander: a key-value cache with both high performance and fewer misses

Xingbo Wu, Li Zhang, Yandong Wang, Yufei Ren, Michel Hack, and Song Jiang. zexpander: a key-value cache with both high performance and fewer misses. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys ’16, New York, NY , USA,

work page

[63] [63]

Association for Computing Machinery

work page

[64] [64]

Fifo queues are all you need for cache eviction

Juncheng Yang, Yazhuo Zhang, Ziyue Qiu, Yao Yue, and Rashmi Vinayak. Fifo queues are all you need for cache eviction. InProceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23, page 130–149, New York, NY , USA, 2023. Association for Computing Machinery

work page 2023

[65] [65]

One-step diffusion with distribution match- ing distillation

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Tae- sung Park. One-step diffusion with distribution match- ing distillation. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 6613–6623, 2024

work page 2024

[66] [66]

Bidm: Pushing the limit of quantization for diffusion models.Advances in Neural Information Pro- cessing Systems, 37:39009–39035, 2024

Xingyu Zheng, Xianglong Liu, Yichen Bian, Xudong Ma, Yulun Zhang, Jiakai Wang, Jinyang Guo, and Hao- tong Qin. Bidm: Pushing the limit of quantization for diffusion models.Advances in Neural Information Pro- cessing Systems, 37:39009–39035, 2024. 15

work page 2024

[67] [67]

Demystifying cache policies for photo stores at scale: A tencent case study

Ke Zhou, Si Sun, Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, and Tianming Yang. Demystifying cache policies for photo stores at scale: A tencent case study. InProceedings of the 2018 Interna- tional Conference on Supercomputing, pages 284–294, 2018. 16

work page 2018