Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models

Amro Abdalla; Ana-Maria Cretu; Carmela Troncoso; Elissa M. Redmiles; Klim Kireev; Raphael Meier; Sarah Adel Bargal; Wisdom Obinna

arxiv: 2512.05707 · v2 · submitted 2025-12-05 · 💻 cs.CR

Evaluating Concept Filtering Defenses against Child Sexual Abuse Material Generation by Text-to-Image Models

Ana-Maria Cretu , Klim Kireev , Amro Abdalla , Wisdom Obinna , Raphael Meier , Sarah Adel Bargal , Elissa M. Redmiles , Carmela Troncoso This is my paper

Pith reviewed 2026-05-17 01:11 UTC · model grok-4.3

classification 💻 cs.CR

keywords CSAM generationtext-to-image modelsconcept filteringmachine learning securityfine-tuninggenerative modelsdataset filteringchild images

0 comments

The pith

Current child filtering methods offer limited protection to closed-weight text-to-image models and none to open-weight models against CSAM generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether removing images of children from training datasets can prevent text-to-image models from being misused to create child sexual abuse material. It frames the defense problem with a game-based security definition that accounts for attacker prompting strategies and query budgets. Experiments using an ethical proxy task of generating images of children wearing glasses show that even small amounts of residual child data allow generation with only modestly more queries than on unfiltered models. Fine-tuning further reduces this overhead and can re-introduce the child concept even after perfect filtering. The results indicate that filtering reduces the model's ability to handle child-related concepts in general while providing only partial or no real defense.

Core claim

The authors establish that detection methods cannot remove all child images from datasets, so residual examples remain available to attackers. With the child-wearing-glasses proxy, they demonstrate that prompting strategies succeed in generating the target concept using only a few more queries than on unfiltered training data, and that fine-tuning on child images eliminates most of the added cost. Even perfect filtering can be bypassed by subsequent fine-tuning that re-introduces the concept. These outcomes translate to limited protection for closed-weight models and no protection for open-weight models, accompanied by reduced model generality through hindered or altered representation of 7

What carries the argument

The game-based security definition that models defender filtering against attacker prompting and query budgets, evaluated through the ethical proxy of generating images of a child wearing glasses.

Load-bearing premise

That the proxy task of generating images of a child wearing glasses sufficiently captures the dynamics of generating actual CSAM and that the game-based security definition accurately reflects realistic attacker capabilities and query budgets.

What would settle it

An experiment in which no sequence of prompts or fine-tuning on child images succeeds in producing child-related outputs on a model trained after complete filtering, or in which the additional query overhead remains orders of magnitude higher than on unfiltered data.

Figures

Figures reproduced from arXiv: 2512.05707 by Amro Abdalla, Ana-Maria Cretu, Carmela Troncoso, Elissa M. Redmiles, Klim Kireev, Raphael Meier, Sarah Adel Bargal, Wisdom Obinna.

**Figure 2.** Figure 2: Raters’ confidence that images in each experi [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: The estimated probability of obtaining at least [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 5.** Figure 5: Age shift in images produced by CC3M (left) and LAION-face (right) models in response to heuristic [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of images generated in the Sprigatito experiments (one row per experiment). Images in the [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Convergence curves for unfiltered CC3M- and LAION-Face trained models, shown as CMMD scores [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Examples of images produced by CC3M (left) and LAION-Face (right) models in response to prompts [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗

**Figure 9.** Figure 9: Examples of images produced by CC3M (left) and LAION-Face (right) models in response to prompts [PITH_FULL_IMAGE:figures/full_fig_p032_9.png] view at source ↗

**Figure 10.** Figure 10: Examples of images produced by CC3M (left) and LAION-Face (right) models in response to prompts [PITH_FULL_IMAGE:figures/full_fig_p033_10.png] view at source ↗

read the original abstract

We evaluate the effectiveness of filtering child images from training datasets of text-to-image models to prevent model misuse to create child sexual abuse material (CSAM). First, we capture the complexity of preventing CSAM generation using a game-based security definition. Second, we show that current detection methods cannot remove all children from a dataset. Third, using an ethical proxy for CSAM (a child wearing glasses), we show that even when only a small percentage of child images are left in the training dataset after filtering, there exist prompting strategies that generate a child wearing glasses using only a few more queries than when the model is trained on the unfiltered data. Fine-tuning the filtered model on child images further reduces the additional query overhead. We also show that re-introducing a concept is possible via fine-tuning even if filtering is perfect. Our results show that current child filtering methods offer limited protection to closed-weight models and no protection to open-weight models, while reducing the generality of the model by hindering the generation of child-related concepts or changing their representation. We conclude by outlining challenges in conducting evaluations that establish robust evidence on the impact of concept filtering defenses for CSAM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Filtering leaves T2I models open to child-image generation via prompting or fine-tuning, with the glasses proxy as the main open question on how far the results extend to real CSAM.

read the letter

The core finding is that child-concept filtering in text-to-image training data gives only limited protection on closed models and none on open ones. Prompting strategies recover the target with modest extra queries, and fine-tuning cuts that overhead further. Even perfect filtering can be undone by later fine-tuning on child images. The paper also notes the side effect of reduced model generality on child-related outputs overall. This matches what the experiments show on both attack success and the generality cost.

Referee Report

2 major / 2 minor

Summary. The paper evaluates the effectiveness of filtering child images from training datasets of text-to-image models to prevent CSAM generation. It introduces a game-based security definition, shows that current detection methods leave residual child images in datasets, and uses an ethical proxy task (generating images of a child wearing glasses) to demonstrate that prompting strategies can produce the proxy concept with only modestly more queries than on unfiltered models. Fine-tuning on child images further reduces query overhead, and the work concludes that filtering offers limited protection to closed-weight models and none to open-weight models while also reducing model generality for child-related concepts.

Significance. If the proxy results generalize, the findings would highlight important practical limitations of concept filtering as a defense against misuse of T2I models. The game-based security definition provides a structured threat model, and the empirical demonstration of fine-tuning recovery even under perfect filtering is a useful observation for the AI safety community.

major comments (2)

[Section describing the ethical proxy and experimental results] The central claim that filtering provides only limited or no protection against CSAM rests on experiments with the proxy of generating images of a child wearing glasses. The manuscript provides no direct comparison, ablation, or analysis showing that this non-sexual child concept exhibits the same filtering resistance, prompting sensitivity, or fine-tuning recovery dynamics as explicit CSAM concepts (which involve sexual content that may engage different internal representations or safety alignments). Without such validation, the measured query overheads and protection levels do not necessarily generalize to actual CSAM.
[Experimental evaluation and results sections] The reported experimental outcomes lack sufficient detail on exact models, training dataset sizes, number of trials or queries per condition, statistical significance testing, or error bars. This directly affects the ability to assess the reliability of the claims about 'a few more queries' and the differential protection levels between closed- and open-weight models.

minor comments (2)

Clarify the precise attacker query budget and capabilities assumed in the game-based security definition, including any concrete examples of prompting strategies tested.
Add discussion of potential limitations or failure modes of the proxy approach in the conclusion or dedicated limitations section.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments, which have helped clarify the scope and presentation of our results. We respond point-by-point to the major comments below, indicating revisions made to the manuscript.

read point-by-point responses

Referee: [Section describing the ethical proxy and experimental results] The central claim that filtering provides only limited or no protection against CSAM rests on experiments with the proxy of generating images of a child wearing glasses. The manuscript provides no direct comparison, ablation, or analysis showing that this non-sexual child concept exhibits the same filtering resistance, prompting sensitivity, or fine-tuning recovery dynamics as explicit CSAM concepts (which involve sexual content that may engage different internal representations or safety alignments). Without such validation, the measured query overheads and protection levels do not necessarily generalize to actual CSAM.

Authors: We agree that direct validation against explicit CSAM would strengthen the work but is not feasible. The proxy was chosen to isolate the child-generation capability that underlies CSAM prompts while remaining within ethical bounds. In the revised manuscript we have added a new subsection in the Discussion that explains this rationale, references prior studies on hierarchical concept learning in diffusion models, and explicitly states that results pertain to child-concept filtering rather than claiming identical dynamics for all sexualized variants. Claims have been tempered accordingly. revision: partial
Referee: [Experimental evaluation and results sections] The reported experimental outcomes lack sufficient detail on exact models, training dataset sizes, number of trials or queries per condition, statistical significance testing, or error bars. This directly affects the ability to assess the reliability of the claims about 'a few more queries' and the differential protection levels between closed- and open-weight models.

Authors: We accept this criticism. The revised Experimental Setup and Results sections now specify the exact models (Stable Diffusion v1.5 and v2.1), pre- and post-filter dataset sizes, number of trials (30 independent runs per condition), query counts per strategy, standard-error bars, and statistical comparisons (two-sided t-tests with reported p-values). These additions directly address concerns about reliability and reproducibility. revision: yes

standing simulated objections not resolved

Direct empirical comparison or ablation using explicit CSAM prompts or training data, which is prohibited by ethical review boards and applicable laws.

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with independent experimental measurements

full rationale

The paper conducts an empirical study of filtering effectiveness using a game-based security definition, proxy tasks (child wearing glasses), and measurements of query overheads and fine-tuning recovery. No derivations, equations, or fitted parameters are presented as predictions that reduce to the inputs by construction. Claims rest on replicable experimental results rather than self-referential definitions or self-citation chains that bear the central load. The proxy choice and security model are stated assumptions open to external validation, not tautologies.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the representativeness of the chosen proxy and the adequacy of the security game to model real attackers; no free parameters or invented entities are introduced.

axioms (2)

domain assumption The proxy concept (child wearing glasses) exhibits filtering and generation behavior sufficiently similar to actual CSAM concepts for the purpose of evaluating defenses.
Invoked to enable ethical experimentation while claiming the results generalize to CSAM prevention.
domain assumption The game-based security definition captures the relevant attacker model including query budget and access level.
Used to frame the evaluation of filtering effectiveness.

pith-pipeline@v0.9.0 · 5535 in / 1406 out tokens · 42161 ms · 2026-05-17T01:11:48.715151+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate the effectiveness of filtering child images from training datasets of text-to-image models... using an ethical proxy for CSAM (a child wearing glasses)
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AIG-CSAM security game G(A, M, L, l-bar)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

How to Stop Playing Whack-a-Mole: Mapping the Ecosystem of Technologies Facilitating AI-Generated Non-Consensual Intimate Images
cs.CY 2026-02 unverdicted novelty 7.0

The paper introduces the first comprehensive taxonomy and visualization of 11 categories of technologies facilitating AI-generated non-consensual intimate images, derived from synthesis of primary sources and demonstr...
"Unlimited Realm of Exploration and Experimentation": Methods and Motivations of AI-Generated Sexual Content Creators
cs.CY 2026-01 conditional novelty 7.0

Interviews with 28 AIG-SC creators show motivations spanning sexual exploration, creative expression, technical experimentation, and occasional production of non-consensual intimate imagery.
Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM
cs.LG 2026-04 unverdicted novelty 6.0

Gaussian probing infers harmful model specialization from parameter perturbations and internal representation responses to Gaussian latent ensembles rather than from generated outputs.
The Algorithmic Gaze of Image Quality Assessment: An Audit and Trace Ethnography of the LAION-Aesthetics Predictor
cs.HC 2026-01 conditional novelty 6.0

LAION-Aesthetics Predictor reinforces Western and male biases by preferentially selecting images associated with women and realistic Western/Japanese art while excluding men, LGBTQ+ references, and other styles.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · cited by 4 Pith papers · 3 internal anchors

[1]

United States Code, Title 18, Crimes and Criminal Procedure, Chapter 71

18 u.s.c.§1466a obscene visual representations of the sexual abuse of children, 2003. United States Code, Title 18, Crimes and Criminal Procedure, Chapter 71

work page 2003
[2]

Neglected risks: The disturbing reality of children’s images in datasets and the urgent call for accountability

Carlos Caetano, Gabriel O dos Santos, Caio Petrucci, Artur Barros, Camila Laranjeira, Leo Sampaio Ferraz Ribeiro, J´ ulia Fernandes de Mendon¸ ca, Jefersson A dos Santos, and Sandra Avila. Neglected risks: The disturbing reality of children’s images in datasets and the urgent call for accountability. InACM FACCT, 2025

work page 2025
[3]

Psychological perspectives of virtual child sexual abuse material.Sexuality & Culture, 2021

Larissa S Christensen, Dominique Moritz, and Ashley Pearson. Psychological perspectives of virtual child sexual abuse material.Sexuality & Culture, 2021

work page 2021
[4]

Stable diffusion v1-4 model card.https://huggingface.co/CompVis/stable-diffusion-v1-4,

CompVis. Stable diffusion v1-4 model card.https://huggingface.co/CompVis/stable-diffusion-v1-4,

work page
[5]

Accessed: 2025-11-03

work page 2025
[6]

Feder Cooper, Christopher A

A Feder Cooper, Christopher A Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, et al. Machine Unlearning Doesn’t Do What You Think: Lessons for Generative AI Policy, Research, and Practice.arXiv preprint arXiv:2412.06966, 2024

work page arXiv 2024
[7]

The General-Purpose AI Code of Practice, 2025

European Commission. The General-Purpose AI Code of Practice, 2025

work page 2025
[8]

Child Sexual Abuse Material Created by Generative AI and Similar Online Tools is Illegal, 2024

FBI. Child Sexual Abuse Material Created by Generative AI and Similar Online Tools is Illegal, 2024

work page 2024
[9]

Unified concept editing in diffusion models

Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzy´ nska, and David Bau. Unified concept editing in diffusion models. InWACV, 2024

work page 2024
[10]

Measuring the uncanny valley effect: Refinements to indices for perceived humanness, attractiveness, and eeriness.International Journal of Social Robotics, 2017

Chin-Chang Ho and Karl F MacDorman. Measuring the uncanny valley effect: Refinements to indices for perceived humanness, attractiveness, and eeriness.International Journal of Social Robotics, 2017

work page 2017
[11]

Lora: Low-rank adaptation of large language models.ICLR, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models.ICLR, 2022

work page 2022
[12]

Global CSAM Legislative Overview: An overview of national CSAM legislations in INHOPE Member Countries and the Lanzarote Convention State Parties

International Association of Internet Hotlines. Global CSAM Legislative Overview: An overview of national CSAM legislations in INHOPE Member Countries and the Lanzarote Convention State Parties. Technical report, 2024. Second edition. 18

work page 2024
[13]

Rethinking FID: Towards a Better Evaluation Metric for Image Generation

Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, and Sanjiv Kumar. Rethinking FID: Towards a Better Evaluation Metric for Image Generation. InCVPR, 2024

work page 2024
[14]

Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation

Kimmo Karkkainen and Jungseock Joo. Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. InWACV, 2021

work page 2021
[15]

A manually annotated image-caption dataset for detecting children in the wild.arXiv preprint arXiv:2506.10117, 2025

Klim Kireev, Ana-Maria Cret ¸u, Raphael Meier, Sarah Adel Bargal, Elissa Redmiles, and Carmela Troncoso. A manually annotated image-caption dataset for detecting children in the wild.arXiv preprint arXiv:2506.10117, 2025

work page arXiv 2025
[16]

The challenges of identifying and classifying child sexual abuse material.Sexual Abuse, 2019

Juliane A Kloess, Jessica Woodhams, Helen Whittle, Tim Grant, and Catherine E Hamilton-Giachritsis. The challenges of identifying and classifying child sexual abuse material.Sexual Abuse, 2019

work page 2019
[17]

Unveiling AI’s Threats to Child Protection: Regula- tory efforts to Criminalize AI-Generated CSAM and Emerging Children’s Rights Violations.arXiv preprint arXiv:2503.00433, 2025

Emmanouela Kokolaki and Paraskevi Fragopoulou. Unveiling AI’s Threats to Child Protection: Regula- tory efforts to Criminalize AI-Generated CSAM and Emerging Children’s Rights Violations.arXiv preprint arXiv:2503.00433, 2025

work page arXiv 2025
[18]

Mivolo: Multi-input transformer for age and gender estimation

Maksim Kuprashevich and Irina Tolstykh. Mivolo: Multi-input transformer for age and gender estimation. In AIST, 2023

work page 2023
[19]

Schr¨ odinger’s Crime: AI-generated Child Sexual Abuse Material as a Victimless Offense

Maria Lazaridou. Schr¨ odinger’s Crime: AI-generated Child Sexual Abuse Material as a Victimless Offense. Master’s thesis, Utrecht University, 2025

work page 2025
[20]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InComputer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pages 740–755. Springer, 2014

work page 2014
[21]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Visual instruction tuning.NeurIPS, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.NeurIPS, 2023

work page 2023
[23]

Public comment: CSAM Sentencing Enhancements 50-State Comparison, 2025

Mary-Dulany James. Public comment: CSAM Sentencing Enhancements 50-State Comparison, 2025

work page 2025
[24]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[25]

Compositional abilities emerge multi- plicatively: Exploring diffusion models on a synthetic task.NeurIPS, 2024

Maya Okawa, Ekdeep S Lubana, Robert Dick, and Hidenori Tanaka. Compositional abilities emerge multi- plicatively: Exploring diffusion models on a synthetic task.NeurIPS, 2024

work page 2024
[26]

One shot lora.https://oneshotlora.com/gudrun/index.html, 2025

OneShotLoRA. One shot lora.https://oneshotlora.com/gudrun/index.html, 2025. Accessed: 2025-11-03

work page 2025
[27]

Introducing vision to the fine-tuning API.https://openai.com/index/introducing-vision-t o-the-fine-tuning-api/

Open AI. Introducing vision to the fine-tuning API.https://openai.com/index/introducing-vision-t o-the-fine-tuning-api/. Accessed: 10-07-2025

work page 2025
[28]

A call to reflect on evaluation practices for age estimation: comparative analysis of the state-of-the-art and a unified benchmark

Jakub Paplh´ am, Vojt Franc, et al. A call to reflect on evaluation practices for age estimation: comparative analysis of the state-of-the-art and a unified benchmark. InCVPR, 2024

work page 2024
[29]

Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms.PNAS, 2018

P Jonathon Phillips, Amy N Yates, Ying Hu, Carina A Hahn, Eilidh Noyes, Kelsey Jackson, Jacqueline G Cavazos, G´ eraldine Jeckeln, Rajeev Ranjan, Swami Sankaranarayanan, et al. Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms.PNAS, 2018

work page 2018
[30]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICML, 2021

work page 2021
[31]

One day this could happen to me

Children’s Commisisioner’s. “One day this could happen to me” Children, nudification tools and sexually explicit deepfakes, 2025

work page 2025
[32]

State Laws Criminalizing AI-generated or Computer-Edited CSAM, 2025

Enough abuse. State Laws Criminalizing AI-generated or Computer-Edited CSAM, 2025. 19

work page 2025
[33]

How AI is being abused to create child sexual abuse imagery

Internet Watch Foundation. How AI is being abused to create child sexual abuse imagery. Technical report, 2023

work page 2023
[34]

Stablediffusion training with mosaic ml.https://github.com/mosaicml/diffusion, 2023

Mosaic ML. Stablediffusion training with mosaic ml.https://github.com/mosaicml/diffusion, 2023. Accessed: 2025-11-03

work page 2023
[35]

Reducing risks posed by synthetic content an overview of technical approaches to digital content transparency., 2024

National Institute of Standards and Technology. Reducing risks posed by synthetic content an overview of technical approaches to digital content transparency., 2024

work page 2024
[36]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022

work page 2022
[37]

Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. InCVPR, 2023

work page 2023
[38]

Photorealistic text-to-image diffusion models with deep language understanding.NeurIPS, 2022

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.NeurIPS, 2022

work page 2022
[39]

Laion-5b: An open large-scale dataset for training next generation image-text models.NeurIPS, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models.NeurIPS, 2022

work page 2022
[40]

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, and Aran Komatsuzaki. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs.arXiv preprint arXiv:2111.02114, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[41]

OpenAI, Meta and Google Sign On to New Child Exploitation Safety Measures

Deepa Seetharaman. OpenAI, Meta and Google Sign On to New Child Exploitation Safety Measures. Wall Street Journal, 2024

work page 2024
[42]

Stretching each dollar: Diffusion training from scratch on a micro-budget

Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, and Lingjuan Lyu. Stretching each dollar: Diffusion training from scratch on a micro-budget. InCVPR, 2025

work page 2025
[43]

Conceptual captions: A cleaned, hyper- nymed, image alt-text dataset for automatic image captioning

Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. Conceptual captions: A cleaned, hyper- nymed, image alt-text dataset for automatic image captioning. InACL, 2018

work page 2018
[44]

Generative ML and CSAM: Implications and mitigations

David Thiel, Melissa Stroebel, and Rebecca Portnoff. Generative ML and CSAM: Implications and mitigations. InStanford digital repository. 2023

work page 2023
[45]

A Pedophile Filmed Kids At Disney World To Make AI Child Abuse Images, Cops Say

Brewster Thomas. A Pedophile Filmed Kids At Disney World To Make AI Child Abuse Images, Cops Say. Forbes, 2024

work page 2024
[46]

Thorn Safety by Design for Generative AI: Preventing Child Sexual Abuse, 2024

Thorn & ATIH. Thorn Safety by Design for Generative AI: Preventing Child Sexual Abuse, 2024

work page 2024
[47]

Child Sexual Abuse Material, 2023

United States Department of Justice. Child Sexual Abuse Material, 2023

work page 2023
[48]

Approach and avoidance tendencies toward picture stimuli of (pre-) pubescent children and adults: An investigation in pedophilic and nonpedophilic samples.Sexual Abuse, 2018

K Weidacker, C K¨ argel, C Massau, S Weiß, J Kneer, THC Krueger, and B Schiffer. Approach and avoidance tendencies toward picture stimuli of (pre-) pubescent children and adults: An investigation in pedophilic and nonpedophilic samples.Sexual Abuse, 2018

work page 2018
[49]

Image-perfect imperfections: Safety, bias, and authenticity in the shadow of text-to-image model evolution

Yixin Wu, Yun Shen, Michael Backes, and Yang Zhang. Image-perfect imperfections: Safety, bias, and authenticity in the shadow of text-to-image model evolution. InACM CCS, 2024

work page 2024
[50]

yes” or “no

Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, and Fang Wen. General facial representation learning in a visual-linguistic manner. InCVPR, 2022. 20 Appendix A Ethics considerations Child detection benchmarking.To identify the best child detector, we have adapted existing methods to the child d...

work page 2022

[1] [1]

United States Code, Title 18, Crimes and Criminal Procedure, Chapter 71

18 u.s.c.§1466a obscene visual representations of the sexual abuse of children, 2003. United States Code, Title 18, Crimes and Criminal Procedure, Chapter 71

work page 2003

[2] [2]

Neglected risks: The disturbing reality of children’s images in datasets and the urgent call for accountability

Carlos Caetano, Gabriel O dos Santos, Caio Petrucci, Artur Barros, Camila Laranjeira, Leo Sampaio Ferraz Ribeiro, J´ ulia Fernandes de Mendon¸ ca, Jefersson A dos Santos, and Sandra Avila. Neglected risks: The disturbing reality of children’s images in datasets and the urgent call for accountability. InACM FACCT, 2025

work page 2025

[3] [3]

Psychological perspectives of virtual child sexual abuse material.Sexuality & Culture, 2021

Larissa S Christensen, Dominique Moritz, and Ashley Pearson. Psychological perspectives of virtual child sexual abuse material.Sexuality & Culture, 2021

work page 2021

[4] [4]

Stable diffusion v1-4 model card.https://huggingface.co/CompVis/stable-diffusion-v1-4,

CompVis. Stable diffusion v1-4 model card.https://huggingface.co/CompVis/stable-diffusion-v1-4,

work page

[5] [5]

Accessed: 2025-11-03

work page 2025

[6] [6]

Feder Cooper, Christopher A

A Feder Cooper, Christopher A Choquette-Choo, Miranda Bogen, Matthew Jagielski, Katja Filippova, Ken Ziyu Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Niloofar Mireshghallah, et al. Machine Unlearning Doesn’t Do What You Think: Lessons for Generative AI Policy, Research, and Practice.arXiv preprint arXiv:2412.06966, 2024

work page arXiv 2024

[7] [7]

The General-Purpose AI Code of Practice, 2025

European Commission. The General-Purpose AI Code of Practice, 2025

work page 2025

[8] [8]

Child Sexual Abuse Material Created by Generative AI and Similar Online Tools is Illegal, 2024

FBI. Child Sexual Abuse Material Created by Generative AI and Similar Online Tools is Illegal, 2024

work page 2024

[9] [9]

Unified concept editing in diffusion models

Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzy´ nska, and David Bau. Unified concept editing in diffusion models. InWACV, 2024

work page 2024

[10] [10]

Measuring the uncanny valley effect: Refinements to indices for perceived humanness, attractiveness, and eeriness.International Journal of Social Robotics, 2017

Chin-Chang Ho and Karl F MacDorman. Measuring the uncanny valley effect: Refinements to indices for perceived humanness, attractiveness, and eeriness.International Journal of Social Robotics, 2017

work page 2017

[11] [11]

Lora: Low-rank adaptation of large language models.ICLR, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models.ICLR, 2022

work page 2022

[12] [12]

Global CSAM Legislative Overview: An overview of national CSAM legislations in INHOPE Member Countries and the Lanzarote Convention State Parties

International Association of Internet Hotlines. Global CSAM Legislative Overview: An overview of national CSAM legislations in INHOPE Member Countries and the Lanzarote Convention State Parties. Technical report, 2024. Second edition. 18

work page 2024

[13] [13]

Rethinking FID: Towards a Better Evaluation Metric for Image Generation

Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, and Sanjiv Kumar. Rethinking FID: Towards a Better Evaluation Metric for Image Generation. InCVPR, 2024

work page 2024

[14] [14]

Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation

Kimmo Karkkainen and Jungseock Joo. Fairface: Face attribute dataset for balanced race, gender, and age for bias measurement and mitigation. InWACV, 2021

work page 2021

[15] [15]

A manually annotated image-caption dataset for detecting children in the wild.arXiv preprint arXiv:2506.10117, 2025

Klim Kireev, Ana-Maria Cret ¸u, Raphael Meier, Sarah Adel Bargal, Elissa Redmiles, and Carmela Troncoso. A manually annotated image-caption dataset for detecting children in the wild.arXiv preprint arXiv:2506.10117, 2025

work page arXiv 2025

[16] [16]

The challenges of identifying and classifying child sexual abuse material.Sexual Abuse, 2019

Juliane A Kloess, Jessica Woodhams, Helen Whittle, Tim Grant, and Catherine E Hamilton-Giachritsis. The challenges of identifying and classifying child sexual abuse material.Sexual Abuse, 2019

work page 2019

[17] [17]

Unveiling AI’s Threats to Child Protection: Regula- tory efforts to Criminalize AI-Generated CSAM and Emerging Children’s Rights Violations.arXiv preprint arXiv:2503.00433, 2025

Emmanouela Kokolaki and Paraskevi Fragopoulou. Unveiling AI’s Threats to Child Protection: Regula- tory efforts to Criminalize AI-Generated CSAM and Emerging Children’s Rights Violations.arXiv preprint arXiv:2503.00433, 2025

work page arXiv 2025

[18] [18]

Mivolo: Multi-input transformer for age and gender estimation

Maksim Kuprashevich and Irina Tolstykh. Mivolo: Multi-input transformer for age and gender estimation. In AIST, 2023

work page 2023

[19] [19]

Schr¨ odinger’s Crime: AI-generated Child Sexual Abuse Material as a Victimless Offense

Maria Lazaridou. Schr¨ odinger’s Crime: AI-generated Child Sexual Abuse Material as a Victimless Offense. Master’s thesis, Utrecht University, 2025

work page 2025

[20] [20]

Microsoft coco: Common objects in context

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InComputer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, proceedings, part v 13, pages 740–755. Springer, 2014

work page 2014

[21] [21]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Visual instruction tuning.NeurIPS, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.NeurIPS, 2023

work page 2023

[23] [23]

Public comment: CSAM Sentencing Enhancements 50-State Comparison, 2025

Mary-Dulany James. Public comment: CSAM Sentencing Enhancements 50-State Comparison, 2025

work page 2025

[24] [24]

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[25] [25]

Compositional abilities emerge multi- plicatively: Exploring diffusion models on a synthetic task.NeurIPS, 2024

Maya Okawa, Ekdeep S Lubana, Robert Dick, and Hidenori Tanaka. Compositional abilities emerge multi- plicatively: Exploring diffusion models on a synthetic task.NeurIPS, 2024

work page 2024

[26] [26]

One shot lora.https://oneshotlora.com/gudrun/index.html, 2025

OneShotLoRA. One shot lora.https://oneshotlora.com/gudrun/index.html, 2025. Accessed: 2025-11-03

work page 2025

[27] [27]

Introducing vision to the fine-tuning API.https://openai.com/index/introducing-vision-t o-the-fine-tuning-api/

Open AI. Introducing vision to the fine-tuning API.https://openai.com/index/introducing-vision-t o-the-fine-tuning-api/. Accessed: 10-07-2025

work page 2025

[28] [28]

A call to reflect on evaluation practices for age estimation: comparative analysis of the state-of-the-art and a unified benchmark

Jakub Paplh´ am, Vojt Franc, et al. A call to reflect on evaluation practices for age estimation: comparative analysis of the state-of-the-art and a unified benchmark. InCVPR, 2024

work page 2024

[29] [29]

Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms.PNAS, 2018

P Jonathon Phillips, Amy N Yates, Ying Hu, Carina A Hahn, Eilidh Noyes, Kelsey Jackson, Jacqueline G Cavazos, G´ eraldine Jeckeln, Rajeev Ranjan, Swami Sankaranarayanan, et al. Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms.PNAS, 2018

work page 2018

[30] [30]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InICML, 2021

work page 2021

[31] [31]

One day this could happen to me

Children’s Commisisioner’s. “One day this could happen to me” Children, nudification tools and sexually explicit deepfakes, 2025

work page 2025

[32] [32]

State Laws Criminalizing AI-generated or Computer-Edited CSAM, 2025

Enough abuse. State Laws Criminalizing AI-generated or Computer-Edited CSAM, 2025. 19

work page 2025

[33] [33]

How AI is being abused to create child sexual abuse imagery

Internet Watch Foundation. How AI is being abused to create child sexual abuse imagery. Technical report, 2023

work page 2023

[34] [34]

Stablediffusion training with mosaic ml.https://github.com/mosaicml/diffusion, 2023

Mosaic ML. Stablediffusion training with mosaic ml.https://github.com/mosaicml/diffusion, 2023. Accessed: 2025-11-03

work page 2023

[35] [35]

Reducing risks posed by synthetic content an overview of technical approaches to digital content transparency., 2024

National Institute of Standards and Technology. Reducing risks posed by synthetic content an overview of technical approaches to digital content transparency., 2024

work page 2024

[36] [36]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022

work page 2022

[37] [37]

Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dream- booth: Fine tuning text-to-image diffusion models for subject-driven generation. InCVPR, 2023

work page 2023

[38] [38]

Photorealistic text-to-image diffusion models with deep language understanding.NeurIPS, 2022

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.NeurIPS, 2022

work page 2022

[39] [39]

Laion-5b: An open large-scale dataset for training next generation image-text models.NeurIPS, 2022

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models.NeurIPS, 2022

work page 2022

[40] [40]

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, and Aran Komatsuzaki. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs.arXiv preprint arXiv:2111.02114, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[41] [41]

OpenAI, Meta and Google Sign On to New Child Exploitation Safety Measures

Deepa Seetharaman. OpenAI, Meta and Google Sign On to New Child Exploitation Safety Measures. Wall Street Journal, 2024

work page 2024

[42] [42]

Stretching each dollar: Diffusion training from scratch on a micro-budget

Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, and Lingjuan Lyu. Stretching each dollar: Diffusion training from scratch on a micro-budget. InCVPR, 2025

work page 2025

[43] [43]

Conceptual captions: A cleaned, hyper- nymed, image alt-text dataset for automatic image captioning

Piyush Sharma, Nan Ding, Sebastian Goodman, and Radu Soricut. Conceptual captions: A cleaned, hyper- nymed, image alt-text dataset for automatic image captioning. InACL, 2018

work page 2018

[44] [44]

Generative ML and CSAM: Implications and mitigations

David Thiel, Melissa Stroebel, and Rebecca Portnoff. Generative ML and CSAM: Implications and mitigations. InStanford digital repository. 2023

work page 2023

[45] [45]

A Pedophile Filmed Kids At Disney World To Make AI Child Abuse Images, Cops Say

Brewster Thomas. A Pedophile Filmed Kids At Disney World To Make AI Child Abuse Images, Cops Say. Forbes, 2024

work page 2024

[46] [46]

Thorn Safety by Design for Generative AI: Preventing Child Sexual Abuse, 2024

Thorn & ATIH. Thorn Safety by Design for Generative AI: Preventing Child Sexual Abuse, 2024

work page 2024

[47] [47]

Child Sexual Abuse Material, 2023

United States Department of Justice. Child Sexual Abuse Material, 2023

work page 2023

[48] [48]

Approach and avoidance tendencies toward picture stimuli of (pre-) pubescent children and adults: An investigation in pedophilic and nonpedophilic samples.Sexual Abuse, 2018

K Weidacker, C K¨ argel, C Massau, S Weiß, J Kneer, THC Krueger, and B Schiffer. Approach and avoidance tendencies toward picture stimuli of (pre-) pubescent children and adults: An investigation in pedophilic and nonpedophilic samples.Sexual Abuse, 2018

work page 2018

[49] [49]

Image-perfect imperfections: Safety, bias, and authenticity in the shadow of text-to-image model evolution

Yixin Wu, Yun Shen, Michael Backes, and Yang Zhang. Image-perfect imperfections: Safety, bias, and authenticity in the shadow of text-to-image model evolution. InACM CCS, 2024

work page 2024

[50] [50]

yes” or “no

Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, and Fang Wen. General facial representation learning in a visual-linguistic manner. InCVPR, 2022. 20 Appendix A Ethics considerations Child detection benchmarking.To identify the best child detector, we have adapted existing methods to the child d...

work page 2022