pith. sign in

arxiv: 2505.06621 · v1 · submitted 2025-05-10 · 💻 cs.LG · cs.CV

Minimizing Risk Through Minimizing Model-Data Interaction: A Protocol For Relying on Proxy Tasks When Designing Child Sexual Abuse Imagery Detection Models

Pith reviewed 2026-05-22 16:21 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords proxy tasksCSAI detectionfew-shot indoor scene classificationsensitive data protocolsmodel training under restrictionslaw enforcement collaborationtransfer from non-sensitive tasks
0
0 comments X

The pith

Proxy tasks let models detect child sexual abuse imagery after training only on non-sensitive substitutes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines proxy tasks as substitute training objectives that replace direct use of child sexual abuse imagery when building detection models. It reviews prior work through this lens, then introduces a protocol that combines chosen proxies with repeated law-enforcement guidance to keep model development safe. When the protocol is applied to few-shot indoor scene classification, the resulting model shows promising accuracy on a real CSAI dataset even though none of its weights were ever trained on sensitive images.

Core claim

A protocol that relies on proxy tasks such as few-shot indoor scene classification produces CSAI detection models whose weights remain entirely free of sensitive data yet still achieve useful performance when tested on real-world CSAI collections.

What carries the argument

The proxy-task protocol, which formalizes the deliberate substitution of non-CSA tasks for training while requiring ongoing law-enforcement input to select and validate those tasks.

If this is right

  • Law-enforcement triage systems can be developed without ever placing sensitive imagery inside training pipelines.
  • Standardized protocols replace ad-hoc choices of substitute tasks across different research groups.
  • Models remain usable even under the strictest data-access rules imposed by legal and ethical constraints.
  • Evaluation on actual CSAI collections becomes possible as an external validation step only.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same protocol could be tested on other everyday-scene or object-recognition proxies to measure how much transfer varies with the choice of substitute task.
  • If transfer holds, similar restricted-data domains such as other forms of prohibited content could adopt the approach without new data-access agreements.
  • Repeated law-enforcement feedback loops might be automated once a small set of reliable proxy tasks is identified.

Load-bearing premise

Results obtained on the chosen proxy task will transfer well enough to real CSAI detection even though the model never sees any sensitive imagery.

What would settle it

An experiment in which the proxy-trained model is evaluated on the same real-world CSAI dataset and shows clearly lower detection rates than models allowed limited direct exposure during fine-tuning.

Figures

Figures reproduced from arXiv: 2505.06621 by Jefersson A. dos Santos, Jo\~ao Macedo, Leo S. F. Ribeiro, Sandra Avila, Thamiris Coelho.

Figure 1
Figure 1. Figure 1: The sensitive nature of CSAI requires that most [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Diagram depicting the cycle of steps on our protocol. We believe that Proxy Tasks and CSAI evaluation datasets [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Confusion matrix with the results of the model [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: t-SNE on two dimensions for the Places8 test set. Many children’s room samples (blue dots) are closer to the bedroom [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Confusion matrix with the model’s results on the CSAI samples classified into indoor. Values reported are the [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Confusion matrix with the model’s results on the [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Confusion matrix with the model’s results on [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
read the original abstract

The distribution of child sexual abuse imagery (CSAI) is an ever-growing concern of our modern world; children who suffered from this heinous crime are revictimized, and the growing amount of illegal imagery distributed overwhelms law enforcement agents (LEAs) with the manual labor of categorization. To ease this burden researchers have explored methods for automating data triage and detection of CSAI, but the sensitive nature of the data imposes restricted access and minimal interaction between real data and learning algorithms, avoiding leaks at all costs. In observing how these restrictions have shaped the literature we formalize a definition of "Proxy Tasks", i.e., the substitute tasks used for training models for CSAI without making use of CSA data. Under this new terminology we review current literature and present a protocol for making conscious use of Proxy Tasks together with consistent input from LEAs to design better automation in this field. Finally, we apply this protocol to study -- for the first time -- the task of Few-shot Indoor Scene Classification on CSAI, showing a final model that achieves promising results on a real-world CSAI dataset whilst having no weights actually trained on sensitive data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript defines 'Proxy Tasks' as substitute tasks for training CSAI detection models without using actual child sexual abuse imagery. It reviews the literature through this lens, proposes a protocol for using such proxy tasks in conjunction with input from law enforcement agencies (LEAs), and applies the protocol to the task of few-shot indoor scene classification, claiming that the resulting model achieves promising results on a real-world CSAI dataset without any model weights having been trained on sensitive data.

Significance. If the transfer from the proxy task holds and the results are reproducible, this work could provide a valuable framework for developing detection models in highly restricted domains where direct data access is prohibited. The formalization of proxy tasks and the emphasis on LEA collaboration are positive contributions. However, the lack of quantitative details in the presented claims limits the immediate impact.

major comments (3)
  1. Abstract and application section: The assertion of 'promising results' on real CSAI data after proxy training provides no quantitative metrics, baselines, error bars, dataset sizes, or ablation details, preventing verification of the central claim that the protocol enables effective detection without sensitive-data training.
  2. Section on proxy task selection and transfer: No justification is given for choosing few-shot indoor scene classification as the proxy, nor is there analysis of feature overlap or transfer mechanism to CSAI cues (e.g., human poses, skin regions, or contextual indicators) versus scene layout; this is load-bearing for the claim that the model produces promising CSAI detection performance.
  3. Protocol definition and experimental application: Details are missing on how the model trained on the proxy is adapted and applied to CSAI images (binary detection vs. scene labels) while guaranteeing zero weights trained on sensitive data during inference.
minor comments (2)
  1. Notation: The definition of Proxy Tasks would benefit from more formal notation to distinguish it from standard transfer learning or domain adaptation concepts.
  2. References: Ensure consistent citation of prior CSAI detection and privacy-preserving ML works reviewed in the literature survey.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and insightful review. We appreciate the acknowledgment of the potential value of formalizing proxy tasks and the LEA-collaboration protocol for highly restricted domains. We address each major comment below with specific plans for revision to improve clarity, rigor, and verifiability while preserving the manuscript's core contributions.

read point-by-point responses
  1. Referee: Abstract and application section: The assertion of 'promising results' on real CSAI data after proxy training provides no quantitative metrics, baselines, error bars, dataset sizes, or ablation details, preventing verification of the central claim that the protocol enables effective detection without sensitive-data training.

    Authors: We agree that the current description of results as 'promising' without accompanying quantitative details limits verifiability. The manuscript reports success on real CSAI imagery after proxy-only training, but the presentation is primarily qualitative. In the revised manuscript we will expand the abstract and add a dedicated quantitative results subsection that includes specific metrics (e.g., accuracy, precision, recall), dataset sizes for both proxy and CSAI evaluation sets, baseline comparisons where feasible, error bars from multiple runs if applicable, and ablation details on key protocol components. These additions will directly support the central claim while maintaining the guarantee of no sensitive-data training. revision: yes

  2. Referee: Section on proxy task selection and transfer: No justification is given for choosing few-shot indoor scene classification as the proxy, nor is there analysis of feature overlap or transfer mechanism to CSAI cues (e.g., human poses, skin regions, or contextual indicators) versus scene layout; this is load-bearing for the claim that the model produces promising CSAI detection performance.

    Authors: We acknowledge that the manuscript presents few-shot indoor scene classification primarily as an illustrative application of the protocol without an extended justification or feature-overlap analysis. The choice was driven by the existence of public, non-sensitive datasets and the hypothesis that scene-level contextual cues could provide transferable signals for CSAI triage. In revision we will add a dedicated subsection that (1) justifies the proxy selection on grounds of data availability and protocol alignment, (2) discusses hypothesized transferable features such as indoor layout and contextual indicators while explicitly noting the absence of direct modeling of poses or skin regions, and (3) clarifies that LEA input is intended to validate or refine such choices in operational settings. This will strengthen the load-bearing aspect of the transfer claim. revision: yes

  3. Referee: Protocol definition and experimental application: Details are missing on how the model trained on the proxy is adapted and applied to CSAI images (binary detection vs. scene labels) while guaranteeing zero weights trained on sensitive data during inference.

    Authors: We thank the referee for highlighting this gap in procedural clarity. The protocol is designed so that all weight training occurs exclusively on the proxy task with non-sensitive data; the resulting model is then applied to CSAI images in a pure inference setting with no further weight updates. In the revised manuscript we will expand both the protocol definition and the experimental application sections to provide an explicit step-by-step description of (a) how scene-classification outputs are mapped or thresholded to produce a binary CSAI detection score, (b) the precise inference pipeline, and (c) the mechanisms that ensure zero sensitive-data interaction with model weights at any stage. This will eliminate ambiguity regarding adaptation and the zero-training guarantee. revision: yes

Circularity Check

0 steps flagged

No circularity detected in protocol definition or proxy-task application

full rationale

The paper defines 'Proxy Tasks' as substitute training tasks that avoid direct use of CSA data, reviews prior literature under this terminology, presents a usage protocol incorporating LEA input, and empirically applies the protocol to few-shot indoor scene classification as a proxy for CSAI detection. No mathematical derivations, parameter fittings, or 'predictions' are described that reduce by construction to the paper's own inputs or definitions. The claimed 'promising results' on real CSAI data are presented as an empirical outcome of the proxy training rather than a statistical or definitional tautology. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are evident in the provided text. The work is self-contained as a definitional and methodological contribution with external empirical grounding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The protocol rests on the premise that proxy tasks can be chosen and validated with LEA input to produce transferable features; no explicit free parameters or invented physical entities are described in the abstract.

axioms (1)
  • domain assumption Proxy tasks selected with law-enforcement guidance will produce models whose features generalize to real CSAI detection.
    This premise is required for the protocol to be useful and is invoked when the authors claim promising results on real CSAI data after proxy-only training.
invented entities (1)
  • Proxy Task no independent evidence
    purpose: A formally defined substitute training problem that avoids any use of CSA imagery while still enabling model development for CSAI detection.
    The paper introduces and names this concept as the central organizing idea of the protocol.

pith-pipeline@v0.9.0 · 5764 in / 1323 out tokens · 81668 ms · 2026-05-22T16:21:52.282374+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages

  1. [1]

    Mhd Wesam Al-Nabki, Eduardo Fidalgo, Roberto A Vasco-Carofilis, Francisco Janez-Martino, and Javier Velasco-Mata. 2020. Evaluating performance of an adult pornography classifier for child sexual abuse detection.arXiv preprint arXiv:2005.08766(2020)

  2. [2]

    Felix Anda, Nhien-An Le-Khac, and Mark Scanlon. 2020. DeepUAge: improving underage age estimation accuracy to aid CSEM investigation.Forensic Science International: Digital Investigation32 (2020), 300921

  3. [3]

    2021.CSAM Detection - Technical Summary

    Apple. 2021.CSAM Detection - Technical Summary. Technical Report. Apple

  4. [4]

    Sandra Avila, Nicolas Thome, Matthieu Cord, Eduardo Valle, and Arnaldo de A. AraúJo. 2013. Pooling in Image Representation: The Visual Codeword Point of View.Computer Vision and Image Understanding117, 5 (2013), 453–465

  5. [5]

    Rubel Biswas, Victor González-Castro, E Fidalgo, and Deisy Chaves. 2019. Boost- ing child abuse victim identification in Forensic Tools with hashing techniques. V Jornadas Nacionales de Investigación en Ciberseguridad1 (2019), 344–345

  6. [6]

    Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis Bright

    Elie Bursztein, Einat Clarke, Michelle DeLaune, David M. Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis Bright. 2019. Rethinking the Detection of Child Sexual Abuse Imagery on the Internet. InThe World Wide Web Conference(San Francisco, CA, USA)(WWW ’19). Association for Computing Machinery, New York, NY, USA, 2601–...

  7. [7]

    Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert- Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. 2021. Extracting Training Data from Large Lan- guage Models. In30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 2633–2650. https://www.usenix.org...

  8. [8]

    Modesto Castrillón-Santana, Javier Lorenzo-Navarro, Carlos M Travieso- González, David Freire-Obregón, and Jesús B Alonso-Hernández. 2018. Evaluation of local descriptors and CNNs for non-adult detection in visual content.Pattern Recognition Letters113 (2018), 10–18

  9. [9]

    Deisy Chaves, Eduardo Fidalgo, Enrique Alegre, Francisco Jánez-Martino, and Rubel Biswas. 2020. Improving Age Estimation in Minors and Young Adults with Occluded Faces to Fight Against Child Sexual Exploitation.. InVISIGRAPP (5: VISAPP). 721–729

  10. [10]

    Haoxing Chen, Huaxiong Li, Yaohui Li, and Chunlin Chen. 2023. Sparse Spatial Transformers for Few-Shot Learning.Science China Information Sciences66, 11 (Nov. 2023), 210102. https://doi.org/10.1007/s11432-022-3700-8

  11. [11]

    Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, and Jia-Bin Huang. 2019. A Closer Look at Few-shot Classification. InInternational Conference on Learning Representations. https://openreview.net/forum?id=HkxLXnAcFQ

  12. [12]

    Mateus de Castro Polastro and Pedro Monteiro da Silva Eleuterio. 2010. Nudetec- tive: A forensic tool to help combat child pornography through automatic nudity detection. InWorkshops on Database and Expert Systems Applications. 349–353

  13. [13]

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. InConference on Computer Vision and Pattern Recognition. 248–255

  14. [14]

    Carl Doersch, Ankush Gupta, and Andrew Zisserman. 2020. CrossTransformers: spatially-aware few-shot transfer. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 21981–21993

  15. [15]

    Pedro Eleuterio, Mateus Polastro, and Brazilian Federal Police. 2012. An Adaptive Sampling Strategy for Automatic Detection of Child Pornographic Videos. In International Conference on Forensic Computer Science

  16. [16]

    Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta- learning for fast adaptation of deep networks. InInternational Conference on Machine Learning. 1126–1135

  17. [17]

    Abhishek Gangwar, Víctor González-Castro, Enrique Alegre, and Eduardo Fidalgo

  18. [18]

    AttM-CNN: Attention and metric learning based CNN for pornography, age and Child Sexual Abuse (CSA) Detection in images.Neurocomputing445 (2021), 81–104

  19. [19]

    2016.Deep Learning

    Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016.Deep Learning. MIT press

  20. [20]

    Fusheng Hao, Fengxiang He, Liu Liu, Fuxiang Wu, Dacheng Tao, Jun Cheng, et al. 2023. Class-Aware Patch Embedding Adaptation for Few-Shot Image Classification. InInternation Conference on Computer Vision. 18905–18915

  21. [21]

    Yangji He, Weihan Liang, Dongyang Zhao, Hong-Yu Zhou, Weifeng Ge, Yizhou Yu, et al. 2022. Attribute surrogates learning and spectral tokens pooling in transformers for few-shot learning. InConference on Computer Vision and Pattern Recognition. 9119–9129

  22. [22]

    Hospedales

    Shell Xu Hu, Da Li, Jan Stühmer, Minyoung Kim, and Timothy M. Hospedales

  23. [23]

    InConference on Computer Vision and Pattern Recognition

    Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference. InConference on Computer Vision and Pattern Recognition

  24. [24]

    Microsoft Inc. 2020. PhotoDNA Cloud Services. https://www.microsoft.com/en- us/PhotoDNA

  25. [25]

    Juliane A Kloess, Jessica Woodhams, and Catherine E Hamilton-Giachritsis. 2021. The Challenges of Identifying and Classifying Child Sexual Exploitation Material: Moving towards a More Ecologically Valid Pilot Study with Digital Forensics Analysts.Child Abuse & Neglect118 (2021), 105166

  26. [26]

    Juliane A Kloess, Jessica Woodhams, Helen Whittle, Tim Grant, and Catherine E Hamilton-Giachritsis. 2019. The Challenges of Identifying and Classifying Child Sexual Abuse Material.Sexual Abuse31, 2 (2019), 173–196

  27. [27]

    Bernard Koch, Emily Denton, Alex Hanna, and Jacob Foster. 2021. Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  28. [28]

    Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. InICML Deep Learning Workshop

  29. [29]

    Camila Laranjeira da Silva, João Macedo, Sandra Avila, and Jefersson dos Santos

  30. [30]

    InACM Conference on Fairness, Accountability, and Transparency

    Seeing without looking: Analysis pipeline for child sexual abuse datasets. InACM Conference on Fairness, Accountability, and Transparency. 2189–2205

  31. [31]

    Hee-Eun Lee, Tatiana Ermakova, Vasilis Ververis, and Benjamin Fabian. 2020. Detecting child sexual abuse material: A comprehensive survey.Forensic Science International: Digital Investigation34 (2020), 301022

  32. [32]

    dos Santos

    Joao Macedo, Filipe Costa, and Jefersson A. dos Santos. 2018. A Benchmark Methodology for Child Pornography Detection. InConference on Graphics, Pat- terns and Images (SIBGRAPI). 455–462

  33. [33]

    Jay Mahadeokar and Gerry Pesavento. 2016. Open Sourcing a Deep Learning Solution for Detecting NSFW Images.Retrieved August24 (2016), 2018

  34. [34]

    Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Douwe Kiela, David Jurado, David Kanter, et al. 2022. DataPerf: Benchmarks for Data-Centric AI Development. arXiv preprint arXiv:2207.10062(2022)

  35. [35]

    Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel. 2018. A Simple Neural Attentive Meta-Learner. InInternational Conference on Learning Represen- tations. A Protocol For Relying on Proxy Tasks When Designing CSAI Detection Models ACM XXXXX, XXXX XX–XX, 2025, XXXXXX, XXXXXX

  36. [36]

    Boris Oreshkin, Pau Rodríguez López, and Alexandre Lacoste. 2018. TADAM: Task dependent adaptive metric for improved few-shot learning. InAdvances in Neural Information Processing Systems, Vol. 31

  37. [37]

    Tanvi A Patel, Vipul K Dabhi, and Harshadkumar B Prajapati. 2020. Survey on Scene Classification techniques. InInternational Conference on Advanced Computing and Communication Systems. 452–458

  38. [38]

    Claudia Peersman, Christian Schulze, Awais Rashid, Margaret Brennan, and Carl Fischer. 2016. iCOP: Live forensics to reveal previously unknown criminal media on P2P networks.Digital Investigation18 (2016), 50–64

  39. [39]

    Mateus Polastro and Pedro Eleuterio. 2010. Nudetective: A Forensic Tool to Help Combat Child Pornography through Automatic Nudity Detection. InWorkshops on Database and Expert Systems Applications. 349–353

  40. [40]

    Mateus Polastro and Pedro Eleuterio. 2012. A Statistical Approach for Identifying Videos of Child Pornography at Crime Scenes. InInternational Conference on A vailability, Reliability and Security. 604–612

  41. [41]

    Neoklis Polyzotis and Matei Zaharia. 2021. What Can Data-Centric AI Learn from Data and ML Engineering?arXiv preprint arXiv:2112.06439(2021)

  42. [42]

    Jiayan Qiu, Yiding Yang, Xinchao Wang, and Dacheng Tao. 2021. Scene Essence. InConference on Computer Vision and Pattern Recognition. 8322–8333

  43. [43]

    Ariadna Quattoni and Antonio Torralba. 2009. Recognizing indoor scenes. In Conference on Computer Vision and Pattern Recognition. 413–420

  44. [44]

    Sachin Ravi and Hugo Larochelle. 2016. Optimization as a model for few-shot learning. InInternational conference on learning representations

  45. [45]

    Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell

    Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, and Raia Hadsell. 2019. Meta-Learning with Latent Embedding Optimization. InInternational Conference on Learning Representations

  46. [46]

    Napa Sae-Bae, Xiaoxi Sun, Husrev T Sencar, and Nasir D Memon. 2014. Towards Automatic Detection of Child Pornography. InIEEE International Conference on Image Processing. 5332–5336

  47. [47]

    Victor Garcia Satorras and Joan Bruna Estrach. 2018. Few-Shot Learning with Graph Neural Networks. InInternational Conference on Learning Representations

  48. [48]

    Christoph Schröer, Felix Kruse, and Jorge Marx Gómez. 2021. A Systematic Literature Review on Applying CRISP-DM Process Model.Procedia Computer Science181 (2021), 526–534

  49. [49]

    Christian Schulze, Dominik Henter, Damian Borth, and Andreas Dengel. 2014. Automatic Detection of CSA Media by Multi-Modal Feature Fusion for Law Enforcement Support. InInternational Conference on Multimedia Retrieval

  50. [50]

    Hongje Seong, Junhyuk Hyun, and Euntai Kim. 2020. Fosnet: An end-to-end trainable deep neural network for scene recognition.IEEE Access8 (2020), 82066– 82077

  51. [51]

    Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. InInternational Conference on Computer Vision. 1470–1477

  52. [52]

    Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical Networks for Few-shot Learning. InAdvances in Neural Information Processing Systems, Vol. 30

  53. [53]

    Siddharth Srivastava and Gaurav Sharma. 2024. OmniVec2 - A Novel Transformer Based Network for Large Scale Multimodal and Multitask Learning. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 27402– 27414. https://doi.org/10.1109/CVPR52733.2024.02588

  54. [54]

    you cannot unsee the darker side of life

    Clare Strickland, Juliane A Kloess, and Michael Larkin. 2023. An exploration of the personal experiences of digital forensics analysts who work with child sexual abuse material on a daily basis:“you cannot unsee the darker side of life”. Frontiers in Psychology14 (2023), 1142106

  55. [55]

    Lukas Struppek, Dominik Hintersdorf, Daniel Neider, and Kristian Kersting. 2022. Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash. InPro- ceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea)(FAccT ’22). Association for Computing Machinery, New York, NY, USA, 58–69. https://doi.org/1...

  56. [56]

    Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. InConference on Computer Vision and Pattern Recognition. 1199–1208

  57. [57]

    André Tabone, Kenneth Camilleri, Alexandra Bonnici, Stefania Cristina, Reuben Farrugia, and Mark Borg. 2021. Pornographic content classification using deep- learning. InACM Symposium on Document Engineering. 1–10

  58. [58]

    U. K. Her Majesty’s Government. 2021. Tackling Child Sexual Abuse Strategy

  59. [59]

    Pedro H. V. Valois, João Macedo, Leo S. F. Ribeiro, Jefersson A. dos Santos, and Sandra Avila. 2025. Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery.Forensic Science International: Digital Investigation (2025)

  60. [60]

    Laurens vd Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of machine learning research9 (2008), 2579–2605

  61. [61]

    Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching networks for one shot learning.Advances in Neural Information Processing Systems29 (2016), 3630–3638

  62. [62]

    Paulo Vitorino, Sandra Avila, Mauricio Perez, and Anderson Rocha. 2018. Lever- aging Deep Neural Networks to Fight Child Pornography in the Age of Social Media.Journal of Visual Communication and Image Representation50 (2018), 303–313

  63. [63]

    Paulo Vitorino, Sandra Avila, and Anderson Rocha. 2016. A Two-tier Image Representation Approach to Detecting Child Pornography. InXII Workshop de Visão Computational. 129–134

  64. [64]

    Bryce Westlake, Martin Bouchard, and Richard Frank. 2012. Comparing Methods for Detecting Child Exploitation Content Online. In2012 European Intelligence and Security Informatics Conference. 156–163. https://doi.org/10.1109/EISIC.2012.25

  65. [65]

    Ehinger, Aude Oliva, and Antonio Torralba

    Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba

  66. [66]

    2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition , volume =

    SUN database: Large-scale scene recognition from abbey to zoo. In2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 3485–3492. https://doi.org/10.1109/CVPR.2010.5539970

  67. [67]

    Han-Jia Ye, Hexiang Hu, De-Chuan Zhan, and Fei Sha. 2020. Few-shot learning via embedding adaptation with set-to-set functions. InConference on Computer Vision and Pattern Recognition. 8808–8817

  68. [68]

    Andrew Young, Stuart Campo, and Stefaan Verhulst. 2019. Responsible Data for Children: Synthesis Report. (2019)

  69. [69]

    Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, and Xia Hu. 2023. Data-Centric Artificial Intelligence: A Survey.arXiv preprint arXiv:2303.10158(2023)

  70. [70]

    Wanrong Zhang, Olga Ohrimenko, and Rachel Cummings. 2021. Attribute Pri- vacy: Framework and Mechanisms.arXiv preprint arXiv:2009.04013(2021)

  71. [71]

    Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba

  72. [72]

    Places: A 10 million image database for scene recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence40, 6 (2017), 1452–1464