Fleet: Few Shots Lead Effective AI-generated Image Detection

Jiaan Wang; Juan Cao; Kaiyuan Yang; Sheng Tang; Sirui Liu; Yu Li

arxiv: 2606.31082 · v1 · pith:Q3BXF4WKnew · submitted 2026-06-30 · 💻 cs.CV

Fleet: Few Shots Lead Effective AI-generated Image Detection

Jiaan Wang , Sirui Liu , Yu Li , Kaiyuan Yang , Juan Cao , Sheng Tang This is my paper

Pith reviewed 2026-07-01 06:30 UTC · model grok-4.3

classification 💻 cs.CV

keywords AI-generated image detectionfew-shot adaptationdynamic adaptationrouting correctiondecoupled subspacesAIGI detectionadversarial defense

0 comments

The pith

Constrained routing correction in decoupled subspaces lets 10-shot adaptation restore AI image detection performance against new generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that static feature spaces learned from past data lose effectiveness as new image generators appear, causing detection accuracy to collapse on models like SD3. It proposes shifting to dynamic few-shot adaptation instead, where models evolve continuously by correcting decision routes rather than retraining freely. This matters because open-world use requires handling rapidly changing generators without waiting for large new datasets. Fleet implements the shift by identifying decoupled subspaces and applying avoidance routing to steer novel AI samples away from real-image routes. Experiments on a 64-model benchmark show the method lifting performance from 20.4 percent to 73.1 percent with only 10 examples from one commercial generator.

Core claim

The paper claims that replacing unconstrained feature updates with constrained routing correction enables effective few-shot alignment to new generative threats, where avoidance routing redirects novel AI samples away from Non-AI-dominated routes inside decoupled subspaces, as shown by large gains on the Treasure benchmark spanning 64 models and 360k images.

What carries the argument

Constrained routing correction that redirects novel AI samples away from Non-AI-dominated routes within decoupled subspaces.

If this is right

Static detection methods that rely on invariant artifacts suffer catastrophic drops against generators released after training data collection.
10-shot adaptation using routing correction raises accuracy from 20.4 percent to 73.1 percent on Doubao Seedream 4.0.
The Treasure benchmark of 64 models and 360k images exposes the gap between laboratory saturation and open-world performance.
Dynamic adaptation supports continuous evolution against commercial closed-source engines without full retraining.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Detection pipelines may need built-in mechanisms to discover and stabilize new subspaces on the fly as generators appear.
The same routing-correction idea could apply to other domains where adversaries evolve faster than labeled data can be collected.
Long-term monitoring of subspace stability across successive 10-shot updates would test whether the method scales beyond single-adaptation episodes.

Load-bearing premise

That the decoupled subspaces and the avoidance routing mechanism can be identified and stabilized from only 10 new samples without creating failures on the original distribution or on future generators.

What would settle it

Measuring whether accuracy on a new generator outside the 10-shot set falls below 40 percent after adaptation, or whether accuracy on the original held-out test set drops after the routing correction is applied.

Figures

Figures reproduced from arXiv: 2606.31082 by Jiaan Wang, Juan Cao, Kaiyuan Yang, Sheng Tang, Sirui Liu, Yu Li.

**Figure 1.** Figure 1: Analysis of Benchmarks. (a) Performance of SOTA methods on common datasets. (AIGIBench-13 refers to AIGIBench’s subset of 13 full-graph generation models.) (b) Detection accuracy of SOTA methods on generated images across different release years. The models in the image (from left to right) are: SDv2.1, Midjourney v6, Playground v2.5, SD3, Midjourney v7, Doubao Seedream 3.0, Imagen 4, Nano Banana Pro. • … view at source ↗

**Figure 2.** Figure 2: Overview of Treasure Dataset. (a) We have meticulously collected a large number of images generated by current mainstream generative models, assigning authenticity labels and artistic style tags to each image. (b, c) Illustration of image and prompt source distribution. Our dataset maintains excellent diversity in data sources to ensure its ability to simulate real-world inputs. (d, e, f) Comparison with o… view at source ↗

**Figure 3.** Figure 3: Overview of Fleet.The model employs a dual-branch architecture, utilizing the high-frequency signal rfreq to generate subspace routing weights. During pre-training, Lorth and Lcov decouple high-dimensional features into mutually exclusive AI (Red) and Non-AI (Green) subspaces. In the few-shot adaptation phase, Lavoid forces the redirection of feature flows from novel generated images toward the most releva… view at source ↗

**Figure 4.** Figure 4: Sensitivity Analysis. (a) Analysis on replay buffer size. (b) Ablation on number of subspaces [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Visualization of Subspace Routing Weights Before and After Few-shot Training. (a) Query set mean routing weights before and after few-shot training. (b) Mean routing weights on AI classes in the AIGIBench validation set. (c) Mean routing weights on Non-AI classes in the AIGIBench validation set. 5.5. Robustness Experiment In real-world transmission and interaction scenarios, images inevitably encounter va… view at source ↗

**Figure 7.** Figure 7: Long-horizon continual few-shot adaptation over 20 sequential generator stages.The upper panel tracks the query accuracy of the first five adapted generators during later stages. The lower panel shows zero-shot, before-training, and after-training accuracy for each newly introduced generator, along with the pretraining validation accuracy and the decreasing pretraining-data ratio in the replay buffer. A.4.… view at source ↗

read the original abstract

AI-generated image (AIGI) detection is undergoing a critical transition from laboratory benchmarks to open-world adversarial defense. The prevalent paradigm focuses on finding static feature spaces, assuming that some invariant artifacts learned from historical data can achieve universal zero-shot generalization. While achieving saturation on several AIGI benchmarks, this static hypothesis suffers a severe performance drop against rapidly evolving generators (e.g., SD3, Nano Banana Pro). To address these limitations, we propose that the field should expand beyond "static generalization" to a new paradigm of "dynamic adaptation". We introduce Fleet, a framework that pioneers a dynamic paradigm of continuous few-shot evolution, enabling rapid alignment with emerging generative threats. Fleet improves few-shot adaptation by replacing unconstrained feature updates with constrained routing correction, where avoidance routing redirects novel AI samples away from Non-AI-dominated routes within decoupled subspaces. To validate this, we present Treasure, a benchmark spanning 64 models and 360k images, featuring diverse architectures and 20 closed-source commercial engines. Experiments reveal that while static SOTA methods fail catastrophically on modern generators, Fleet restores performance from 20.4% to 73.1% with only 10-shot adaptation on "Doubao Seedream 4.0". Code and data are available at https://github.com/ICTMCG/Fleet .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fleet claims a big practical fix for AIGI detectors on new generators via constrained routing in subspaces, but the mechanism for discovering those subspaces from 10 shots stays unspecified.

read the letter

The paper's main point is that static AIGI detectors drop hard on fresh generators like SD3 or Doubao Seedream 4.0, and Fleet tries to recover performance through few-shot dynamic adaptation. It replaces plain feature updates with constrained routing correction that pushes new AI samples away from Non-AI routes inside decoupled subspaces, reporting a lift from 20.4% to 73.1% accuracy with 10 shots on that one model. They also release Treasure, a benchmark with 64 models and 360k images that includes closed-source engines.

The shift from static to dynamic adaptation is the clearest new angle, and the benchmark itself looks like a useful addition for anyone testing against real commercial generators. If the routing correction holds up, it directly targets a failure mode that matters for platforms and forensics.

The soft spot is exactly the one the stress-test flags: the abstract and available text give no account of how the subspaces get identified, how Non-AI-dominated routes are located, or how the avoidance routing is computed and kept stable from only 10 samples. Without those steps, the reported gain cannot be checked for reproducibility or side effects on the original distribution. No error bars, dataset splits, or ablation numbers appear either, so the central improvement stays hard to evaluate.

This is for people working on AIGI detection who already know the static paradigm is brittle and want to see concrete adaptation attempts. The benchmark alone could be worth a look for testing pipelines.

It should go to peer review because the problem is real, the benchmark is new and broad, and the empirical claim is large enough to need proper scrutiny even if the method description needs expansion.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes Fleet, a framework for dynamic few-shot adaptation in AI-generated image detection. It argues that static feature spaces fail against new generators and introduces constrained routing correction in decoupled subspaces, where avoidance routing redirects novel AI samples away from Non-AI-dominated routes. The central empirical claim is restoration of accuracy from 20.4% to 73.1% with 10-shot adaptation on Doubao Seedream 4.0, validated on the new Treasure benchmark spanning 64 models and 360k images.

Significance. If the result holds, the work would be significant for advocating a shift from static generalization to continuous dynamic adaptation in open-world AIGI detection, a timely concern given evolving generators. The Treasure benchmark, covering diverse architectures and commercial engines, represents a useful community resource. The reported gains highlight the potential of constrained adaptation, though their broader applicability depends on the stability of the proposed mechanism.

major comments (2)

[Abstract] Abstract: The avoidance routing mechanism within decoupled subspaces is described only at a high level, with no details on how the subspaces are discovered, how Non-AI-dominated routes are identified, or how the routing correction is computed and stabilized from exactly 10 samples. This mechanism is load-bearing for the central claim of the 20.4% to 73.1% performance restoration.
[Experiments] Experiments: The reported accuracies on Doubao Seedream 4.0 and across the Treasure benchmark are given without error bars, details on 10-shot selection or splits, or ablation studies isolating the effect of the routing correction versus unconstrained updates. This undermines evaluation of the cross-generator claims.

minor comments (1)

[Abstract] The abstract provides a GitHub link for code and data; ensure the repository includes full experimental protocols, benchmark splits, and implementation of the subspace discovery procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major point below and indicate where revisions will be made.

read point-by-point responses

Referee: [Abstract] Abstract: The avoidance routing mechanism within decoupled subspaces is described only at a high level, with no details on how the subspaces are discovered, how Non-AI-dominated routes are identified, or how the routing correction is computed and stabilized from exactly 10 samples. This mechanism is load-bearing for the central claim of the 20.4% to 73.1% performance restoration.

Authors: The abstract is intentionally concise, but Section 3.2 of the manuscript details subspace discovery via principal component analysis on the feature extractor, identification of Non-AI-dominated routes by majority vote on routing statistics from the base training set, and the exact formulation of the constrained routing correction (including the avoidance term and its stabilization via a small regularization coefficient fitted on the 10-shot support set). We will expand the abstract with a one-sentence pointer to these elements for improved readability. revision: partial
Referee: [Experiments] Experiments: The reported accuracies on Doubao Seedream 4.0 and across the Treasure benchmark are given without error bars, details on 10-shot selection or splits, or ablation studies isolating the effect of the routing correction versus unconstrained updates. This undermines evaluation of the cross-generator claims.

Authors: We agree that error bars, explicit 10-shot sampling protocol (random stratified selection with fixed seeds), train/validation splits on the support set, and ablations contrasting constrained routing against unconstrained fine-tuning are currently missing. These will be added in the revision, including standard deviations over five independent 10-shot draws and a dedicated ablation table. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical adaptation method with independent benchmark validation

full rationale

The paper presents Fleet as an empirical framework for few-shot adaptation in AIGI detection, replacing unconstrained updates with constrained routing correction in decoupled subspaces. No equations, fitted parameters, or derivations are described that reduce the reported accuracy gains (e.g., 20.4% to 73.1%) to the same inputs by construction. The Treasure benchmark (64 models, 360k images) and 10-shot results on Doubao Seedream 4.0 are presented as external validation rather than self-referential fits. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The derivation chain is self-contained as a proposed method tested on new data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unverified existence and separability of Non-AI-dominated routes and the assumption that 10-shot samples suffice to identify and correct routing without side effects; no free parameters or invented physical entities are named, but the routing constructs are introduced without prior independent evidence.

axioms (1)

domain assumption Feature space contains identifiable Non-AI-dominated routes that can be redirected via constrained updates
Invoked to justify why routing correction works with few shots instead of unconstrained updates.

invented entities (1)

avoidance routing in decoupled subspaces no independent evidence
purpose: To enable stable few-shot correction without overfitting
New mechanism introduced to solve the static generalization failure; no independent evidence outside the paper is provided in the abstract.

pith-pipeline@v0.9.1-grok · 5772 in / 1348 out tokens · 28310 ms · 2026-07-01T06:30:12.568026+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 19 canonical work pages · 7 internal anchors

[1]

HunyuanImage 3.0 Technical Report

URLhttps://arxiv.org/abs/2509.23951. Chang, H., Zhang, H., Jiang, L., Liu, C., and Freeman, W. T. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11315–11325,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Cozzolino, D., Poggi, G., Corvi, R., Nießner, M., and Ver- doliva, L

doi: 10.1109/ CVPR.2018.00916. Cozzolino, D., Poggi, G., Corvi, R., Nießner, M., and Ver- doliva, L. Raising the bar of ai-generated image detection with clip. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 4356–4366,

work page arXiv 2018
[3]

doi: 10.1109/CVPRW63382.2024. 00439. DeepFloyd. DeepFloyd. https://github.com/dee p-floyd/IF,

work page doi:10.1109/cvprw63382.2024 2024
[4]

Seedream 3.0 Technical Report

URL https://arxiv.org/abs/2504.11346. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y . Generative adversarial nets. InAdvances in Neural Information Processing Systems, pp. 2672–2680,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022

doi: 10.1109/CVPR52688.2022.01043. guangyil. Laion-coco subset with aesthetic and watermark scores. Hugging Face Dataset,

work page doi:10.1109/cvpr52688.2022.01043 2022
[6]

In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

doi: 10.1109/CVPR52729.2023.00976. Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. InProceedings of the International Conference on Learning Representations,

work page doi:10.1109/cvpr52729.2023.00976 2023
[7]

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

Li, D., Kamko, A., Sabet, A., Akhgari, E., Xu, L., and Doshi, S. Playground v2. URL [https://huggingfac e.co/playgroundai/playground-v2-1024p x-aesthetic](https://huggingface.co/p laygroundai/playground-v2-1024px-aes thetic). Li, D., Kamko, A., Akhgari, E., Sabet, A., Xu, L., and Doshi, S. Playground v2.5: Three insights towards enhancing aesthetic qualit...

work page internal anchor Pith review Pith/arXiv arXiv
[8]

In: IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025

doi: 10.1109/CVPR52734.2025.00289. Nichol, A. Q., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., Mcgrew, B., Sutskever, I., and Chen, M. Glide: Towards photorealistic image generation and edit- ing with text-guided diffusion models. InProceedings of the International Conference on Machine Learning, pp. 16784–16804. PMLR,

work page doi:10.1109/cvpr52734.2025.00289 2025
[9]

doi: 10.1109/CVPR52729.2023.02345. OpenAI. Dall-e 3 technical report. https://openai.c om/index/dall-e-3/,

work page doi:10.1109/cvpr52729.2023.02345 2023
[10]

URL https://ar xiv.org/abs/2410.21276. OpenAI. Gpt image 1.5: Openai’s latest image generation model. OpenAI Official Documentation,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Hierarchical Text-Conditional Image Generation with CLIP Latents

URL https://arxiv.org/ab s/2204.06125. Rebuffi, S.-A., Kolesnikov, A., Sperl, G., and Lampert, C. H. icarl: Incremental classifier and representation learning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010,

work page internal anchor Pith review Pith/arXiv arXiv 2001
[12]

In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022

doi: 10.1109/CVPR52688.2022.01042. Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Den- ton, E., Ghasemipour, S. K. S., Gontijo-Lopes, R., Ayan, B. K., Salimans, T., Ho, J., Fleet, D. J., and Norouzi, M. Photorealistic text-to-image diffusion models with deep language understanding. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.),Advance...

work page doi:10.1109/cvpr52688.2022.01042 2022
[13]

DINOv3

URLhttps://arxiv.org/abs/2508.10104. Stability AI. Stable diffusion v2.1 and dreamstudio updates. https://stability.ai/blog/stablediff usion2-1-release7-dec-2022,

work page internal anchor Pith review Pith/arXiv arXiv 2022
[14]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Tan, C., Zhao, Y ., Wei, S., Gu, G., Liu, P., and Wei, Y . Frequency-aware deepfake detection: Improving gener- alizability through frequency space domain learning. In Proceedings of the AAAI Conference on Artificial Intelli- gence, volume 38, pp. 5052–5060, 2024a. Tan, C., Zhao, Y ., Wei, S., Gu, G., Liu, P., and Wei, Y . Rethinking the up-sampling opera...

work page doi:10.1109/cvpr52733.2024.02657 2024
[15]

doi: 10.1109/CVPR52688.2022.01602. Team Seedream, Chen, Y ., Gao, Y ., Gong, L., Guo, M., Guo, Q., Guo, Z., Hou, X., Huang, W., Huang, Y ., Jian, X., Kuang, H., Lai, Z., Li, F., Li, L., Lian, X., Liao, C., Liu, L., Liu, W., Lu, Y ., Luo, Z., Ou, T., Shi, G., Shi, Y ., Sun, S., Tian, Y ., Tian, Z., Wang, P., Wang, R., Wang, X., Wang, Y ., Wu, G., Wu, J., W...

work page doi:10.1109/cvpr52688.2022.01602 2022
[16]

URL https://arxiv.org/abs/2509.20427. Team Wan, Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.-W., Chen, D., Yu, F., Zhao, H., Yang, J., Zeng, J., Wang, J., Zhang, J., Zhou, J., Wang, J., Chen, J., Zhu, K., Zhao, K., Yan, K., Huang, L., Feng, M., Zhang, N., Li, P., Wu, P., Chu, R., Feng, R., Zhang, S., Sun, S., Fang, T., Wang, T., Gui, T., Weng, T., Shen, T....

work page internal anchor Pith review Pith/arXiv arXiv
[17]

doi: 10.1109/CVPR42600.2020.008

work page doi:10.1109/cvpr42600.2020.008 2020
[18]

In: IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023

Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., and Li, H. Dire for diffusion-generated image detection. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision, pp. 22445–22455, 2023a. doi: 10.1109/ICCV51070.2023.02051. Wang, Z. J., Montoya, E., Munechika, D., Yang, H., Hoover, B., and Chau, D. H. Diffusiondb: A large-scale p...

work page doi:10.1109/iccv51070.2023.02051 2023
[19]

In: IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025

Zhang, H., He, Q., Bi, X., Li, W., Liu, B., and Xiao, B. Towards universal ai-generated image detection by variational information bottleneck network. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23828–23837, 2025a. doi: 10.1109/CVPR52734.2025.02219. Zhang, Y ., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., ...

work page doi:10.1109/cvpr52734.2025.02219 2025
[20]

However, in the first 5 epochs, we setλcls = 0to prioritize learning the correct routing path. A.1.4. PREPROCESSING& HARDWARE The high-frequency component branch employs random cropping during training and center cropping during testing. All experiments are conducted on 8 NVIDIA GeForce RTX 3090 GPUs. Under this setup, pre-training requires approximately ...

2025
[21]

SUPERMERCATI BASKO

Table 6.Exemplar prompts from the four subsets of the Treasure library. Subset Exemplar Prompts Authentic User Input(1)Emma Watson as migrant mother, 1936 photo by Dorothea Lange (2)fantasy character portrait photo. female dwarf. short, broad, extremely muscular, broad face resembles cara delevingne but very squat, elaborately braided orangepink hair. (3)...

1936

[1] [1]

HunyuanImage 3.0 Technical Report

URLhttps://arxiv.org/abs/2509.23951. Chang, H., Zhang, H., Jiang, L., Liu, C., and Freeman, W. T. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11315–11325,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Cozzolino, D., Poggi, G., Corvi, R., Nießner, M., and Ver- doliva, L

doi: 10.1109/ CVPR.2018.00916. Cozzolino, D., Poggi, G., Corvi, R., Nießner, M., and Ver- doliva, L. Raising the bar of ai-generated image detection with clip. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 4356–4366,

work page arXiv 2018

[3] [3]

doi: 10.1109/CVPRW63382.2024. 00439. DeepFloyd. DeepFloyd. https://github.com/dee p-floyd/IF,

work page doi:10.1109/cvprw63382.2024 2024

[4] [4]

Seedream 3.0 Technical Report

URL https://arxiv.org/abs/2504.11346. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y . Generative adversarial nets. InAdvances in Neural Information Processing Systems, pp. 2672–2680,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022

doi: 10.1109/CVPR52688.2022.01043. guangyil. Laion-coco subset with aesthetic and watermark scores. Hugging Face Dataset,

work page doi:10.1109/cvpr52688.2022.01043 2022

[6] [6]

In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

doi: 10.1109/CVPR52729.2023.00976. Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. InProceedings of the International Conference on Learning Representations,

work page doi:10.1109/cvpr52729.2023.00976 2023

[7] [7]

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

Li, D., Kamko, A., Sabet, A., Akhgari, E., Xu, L., and Doshi, S. Playground v2. URL [https://huggingfac e.co/playgroundai/playground-v2-1024p x-aesthetic](https://huggingface.co/p laygroundai/playground-v2-1024px-aes thetic). Li, D., Kamko, A., Akhgari, E., Sabet, A., Xu, L., and Doshi, S. Playground v2.5: Three insights towards enhancing aesthetic qualit...

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

In: IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025

doi: 10.1109/CVPR52734.2025.00289. Nichol, A. Q., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., Mcgrew, B., Sutskever, I., and Chen, M. Glide: Towards photorealistic image generation and edit- ing with text-guided diffusion models. InProceedings of the International Conference on Machine Learning, pp. 16784–16804. PMLR,

work page doi:10.1109/cvpr52734.2025.00289 2025

[9] [9]

doi: 10.1109/CVPR52729.2023.02345. OpenAI. Dall-e 3 technical report. https://openai.c om/index/dall-e-3/,

work page doi:10.1109/cvpr52729.2023.02345 2023

[10] [10]

URL https://ar xiv.org/abs/2410.21276. OpenAI. Gpt image 1.5: Openai’s latest image generation model. OpenAI Official Documentation,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Hierarchical Text-Conditional Image Generation with CLIP Latents

URL https://arxiv.org/ab s/2204.06125. Rebuffi, S.-A., Kolesnikov, A., Sperl, G., and Lampert, C. H. icarl: Incremental classifier and representation learning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2001–2010,

work page internal anchor Pith review Pith/arXiv arXiv 2001

[12] [12]

In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022

doi: 10.1109/CVPR52688.2022.01042. Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Den- ton, E., Ghasemipour, S. K. S., Gontijo-Lopes, R., Ayan, B. K., Salimans, T., Ho, J., Fleet, D. J., and Norouzi, M. Photorealistic text-to-image diffusion models with deep language understanding. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.),Advance...

work page doi:10.1109/cvpr52688.2022.01042 2022

[13] [13]

DINOv3

URLhttps://arxiv.org/abs/2508.10104. Stability AI. Stable diffusion v2.1 and dreamstudio updates. https://stability.ai/blog/stablediff usion2-1-release7-dec-2022,

work page internal anchor Pith review Pith/arXiv arXiv 2022

[14] [14]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Tan, C., Zhao, Y ., Wei, S., Gu, G., Liu, P., and Wei, Y . Frequency-aware deepfake detection: Improving gener- alizability through frequency space domain learning. In Proceedings of the AAAI Conference on Artificial Intelli- gence, volume 38, pp. 5052–5060, 2024a. Tan, C., Zhao, Y ., Wei, S., Gu, G., Liu, P., and Wei, Y . Rethinking the up-sampling opera...

work page doi:10.1109/cvpr52733.2024.02657 2024

[15] [15]

doi: 10.1109/CVPR52688.2022.01602. Team Seedream, Chen, Y ., Gao, Y ., Gong, L., Guo, M., Guo, Q., Guo, Z., Hou, X., Huang, W., Huang, Y ., Jian, X., Kuang, H., Lai, Z., Li, F., Li, L., Lian, X., Liao, C., Liu, L., Liu, W., Lu, Y ., Luo, Z., Ou, T., Shi, G., Shi, Y ., Sun, S., Tian, Y ., Tian, Z., Wang, P., Wang, R., Wang, X., Wang, Y ., Wu, G., Wu, J., W...

work page doi:10.1109/cvpr52688.2022.01602 2022

[16] [16]

URL https://arxiv.org/abs/2509.20427. Team Wan, Wang, A., Ai, B., Wen, B., Mao, C., Xie, C.-W., Chen, D., Yu, F., Zhao, H., Yang, J., Zeng, J., Wang, J., Zhang, J., Zhou, J., Wang, J., Chen, J., Zhu, K., Zhao, K., Yan, K., Huang, L., Feng, M., Zhang, N., Li, P., Wu, P., Chu, R., Feng, R., Zhang, S., Sun, S., Fang, T., Wang, T., Gui, T., Weng, T., Shen, T....

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

doi: 10.1109/CVPR42600.2020.008

work page doi:10.1109/cvpr42600.2020.008 2020

[18] [18]

In: IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023

Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., and Li, H. Dire for diffusion-generated image detection. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision, pp. 22445–22455, 2023a. doi: 10.1109/ICCV51070.2023.02051. Wang, Z. J., Montoya, E., Munechika, D., Yang, H., Hoover, B., and Chau, D. H. Diffusiondb: A large-scale p...

work page doi:10.1109/iccv51070.2023.02051 2023

[19] [19]

In: IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025

Zhang, H., He, Q., Bi, X., Li, W., Liu, B., and Xiao, B. Towards universal ai-generated image detection by variational information bottleneck network. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23828–23837, 2025a. doi: 10.1109/CVPR52734.2025.02219. Zhang, Y ., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., ...

work page doi:10.1109/cvpr52734.2025.02219 2025

[20] [20]

However, in the first 5 epochs, we setλcls = 0to prioritize learning the correct routing path. A.1.4. PREPROCESSING& HARDWARE The high-frequency component branch employs random cropping during training and center cropping during testing. All experiments are conducted on 8 NVIDIA GeForce RTX 3090 GPUs. Under this setup, pre-training requires approximately ...

2025

[21] [21]

SUPERMERCATI BASKO

Table 6.Exemplar prompts from the four subsets of the Treasure library. Subset Exemplar Prompts Authentic User Input(1)Emma Watson as migrant mother, 1936 photo by Dorothea Lange (2)fantasy character portrait photo. female dwarf. short, broad, extremely muscular, broad face resembles cara delevingne but very squat, elaborately braided orangepink hair. (3)...

1936