Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach
Pith reviewed 2026-05-13 17:33 UTC · model grok-4.3
The pith
Positive-unlabeled learning detects media clones in cultural repositories using only one anchor image per artifact.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Treating each artifact as its own positive class defined by one anchor, the method trains a per-query Clone Encoder on augmented views of that anchor; unlabeled images are then ranked by the l2 norm of their embeddings relative to the anchor, with an interpretable threshold selecting candidates for curator review. On AtticPOT this yields F1=90.79 and AUROC=98.99, outperforming SVDD by 7.70 F1 points under identical lightweight backbone constraints, while also delivering F1=96.37 on CIFAR-10.
What carries the argument
Per-query Clone Encoder trained on augmented views of a single anchor, scoring unlabeled items via interpretable threshold on latent l2 norm.
If this is right
- Reaches F1=90.79 and AUROC=98.99 on the AtticPOT cultural repository.
- Improves F1 by 7.70 points over the SVDD baseline under the same lightweight backbone.
- Proposes duplicate candidates for curator verification without requiring pre-labeled negatives.
- Produces stable similarity neighborhoods across viewpoint and condition changes.
- Fits de-duplication, record linkage, and curator-in-the-loop workflows.
Where Pith is reading between the lines
- Single-anchor PU setups could apply directly to medical or document archives where negatives are scarce.
- Parallel per-query training would let the method scale to repositories with millions of images.
- Active learning on the ranked candidates could further cut curator labeling effort.
Load-bearing premise
Augmented views of a single anchor sufficiently represent the positive class for effective PU learning without explicit negatives in this cultural image domain.
What would settle it
A new cultural image collection containing known cross-record duplicates where the l2-norm threshold on the encoder embeddings yields F1 below 80.
Figures
read the original abstract
We formulate curator-in-the-loop duplicate discovery in the AtticPOT repository as a Positive-Unlabeled (PU) learning problem. Given a single anchor per artefact, we train a lightweight per-query Clone Encoder on augmented views of the anchor and score the unlabeled repository with an interpretable threshold on the latent l_2 norm. The system proposes candidates for curator verification, uncovering cross-record duplicates that were not verified a priori. On CIFAR-10 we obtain F1=96.37 (AUROC=97.97); on AtticPOT we reach F1=90.79 (AUROC=98.99), improving F1 by +7.70 points over the best baseline (SVDD) under the same lightweight backbone. Qualitative "find-similar" panels show stable neighbourhoods across viewpoint and condition. The method avoids explicit negatives, offers a transparent operating point, and fits de-duplication, record linkage, and curator-in-the-loop workflows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formulates curator-in-the-loop duplicate discovery in the AtticPOT cultural repository as a Positive-Unlabeled (PU) learning task. Given one anchor image per artefact, it trains a lightweight per-query Clone Encoder solely on augmented views of that anchor and scores the unlabeled repository using an interpretable threshold on the latent l2 norm. Reported results are F1=96.37 (AUROC=97.97) on CIFAR-10 and F1=90.79 (AUROC=98.99) on AtticPOT, with a +7.70 F1 gain over SVDD under the same backbone; qualitative panels illustrate stable neighbourhoods across viewpoint and condition. The method avoids explicit negatives and targets de-duplication and record-linkage workflows.
Significance. If the central empirical claims hold, the work supplies a practical, negative-free, and transparent operating-point method for clone detection in cultural image collections. The reported gains over SVDD, the use of a single anchor per artefact, and the lightweight backbone are concrete strengths that could support curator-in-the-loop pipelines. The absence of explicit negatives and the interpretable threshold are genuine advantages over standard one-class baselines.
major comments (2)
- [§3-4] §4 (Experiments) and §3 (Method): the latent l2-norm threshold selection procedure is not described, nor is any protocol for choosing the operating point on held-out data; without this, the reported F1=90.79 and AUROC=98.99 on AtticPOT cannot be verified as pre-specified rather than post-hoc tuned.
- [§3] §3 (Clone Encoder training): the central assumption that standard augmentations of a single anchor produce a latent distribution covering real cross-record clone variations (viewpoint, lighting, degradation, occlusion) receives no quantitative check; no held-out verified duplicates are shown to lie inside the chosen l2 ball, directly affecting the reliability of the PU formulation and the +7.70 F1 gain claim.
minor comments (2)
- [Abstract] Abstract: the phrase 'interpretable threshold' is used without any accompanying detail on its selection or sensitivity, which should be clarified for readers.
- [§5] §5 (Qualitative results): the 'find-similar' panels would benefit from explicit annotation of which neighbours are verified duplicates versus false positives to strengthen the visual evidence.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below, providing clarifications on our experimental protocol and indicating where revisions will be made to improve transparency and justification.
read point-by-point responses
-
Referee: [§3-4] §4 (Experiments) and §3 (Method): the latent l2-norm threshold selection procedure is not described, nor is any protocol for choosing the operating point on held-out data; without this, the reported F1=90.79 and AUROC=98.99 on AtticPOT cannot be verified as pre-specified rather than post-hoc tuned.
Authors: We agree that an explicit description of the threshold selection is necessary for reproducibility and to confirm the operating point is pre-specified. In the experiments, the threshold was computed exclusively from the anchor: 50 augmented views of the single anchor were encoded, l2 distances to the anchor embedding were calculated, and the threshold was set at the 95th percentile of this distribution. No unlabeled repository samples or held-out test data were used in this step. We will add a dedicated paragraph in §3 describing this procedure in full, including the number of augmentations, the percentile rationale (to cover expected intra-artefact variation), and a reference in §4 confirming it precedes repository scoring. revision: yes
-
Referee: [§3] §3 (Clone Encoder training): the central assumption that standard augmentations of a single anchor produce a latent distribution covering real cross-record clone variations (viewpoint, lighting, degradation, occlusion) receives no quantitative check; no held-out verified duplicates are shown to lie inside the chosen l2 ball, directly affecting the reliability of the PU formulation and the +7.70 F1 gain claim.
Authors: We acknowledge the value of a quantitative check on coverage. However, the AtticPOT collection does not contain a sufficient set of independently verified cross-record duplicates held out from the anchors that could be used for such validation without introducing selection bias or leakage. The reported F1 and AUROC figures, as well as the +7.70 gain over SVDD, were obtained under an identical evaluation protocol for all methods, preserving relative fairness. The qualitative panels already illustrate that clones across viewpoint and condition fall inside the thresholded region. We will expand §3 with a more detailed justification of the augmentation policy (explicitly listing transformations targeting degradation and occlusion) and add any feasible small-scale quantitative support if additional verified pairs can be sourced without compromising the PU setup. revision: partial
Circularity Check
No circularity: empirical PU method with held-out evaluation
full rationale
The paper formulates duplicate detection as a PU learning task and reports empirical F1/AUROC on held-out splits of CIFAR-10 and AtticPOT. No equations, derivations, or self-citations reduce the performance numbers to fitted inputs by construction; the Clone Encoder is trained on augmented anchor views and evaluated via an explicit l2-norm threshold on unlabeled data. The central claim remains an empirical outcome under stated assumptions rather than a self-definitional or fitted-input reduction.
Axiom & Free-Parameter Ledger
free parameters (1)
- latent l2 norm threshold
axioms (1)
- domain assumption Augmented views of the single anchor represent the positive class distribution
Reference graph
Works this paper leans on
-
[1]
Europeana Foundation. Iiif apis documentation. https://europeana.atlassian.net/wiki/spaces/EF/ pages/1627914244/IIIF+APIs+Documentation, 2024. Last updated 2024-10-02; accessed 2025-09-06
-
[2]
Europeana Foundation. Record api documentation. https://europeana.atlassian.net/wiki/spaces/EF/ pages/2385674279/Record+API+Documentation, 2024. Last updated 2024-10-02; accessed 2025-09-06
-
[3]
Europeana Foundation. Search api documentation. https://europeana.atlassian.net/wiki/spaces/EF/ pages/2385739812/Search+API+Documentation, 2025. Last updated 2025-02-18; accessed 2025-09-06
-
[4]
Europeana Foundation. Europeana apis (landing). https://apis.europeana.eu/, 2025. Accessed 2025-09- 06
work page 2025
-
[5]
George Pavlidis and Vasileios Sevetlidis. Demystifying publishing to europeana: A practical workflow for content providers.Scientific Culture, 1(1):1–8, 2015
work page 2015
-
[6]
Api specifications — international image interoperability framework (iiif)
IIIF Consortium. Api specifications — international image interoperability framework (iiif). https://iiif.io/ api/, 2025. Accessed 2025-09-06
work page 2025
-
[7]
The atticpot project–attic po (ttery in) t (hrace)
Eirini Chioti, Amalia Avramidou, Despoina Tsiafaki, et al. The atticpot project–attic po (ttery in) t (hrace). Bulgarian e-Journal of Archaeology, 9:293–294, 2019
work page 2019
-
[8]
Contextualizing rare shapes of athenian kerameikos from coastal and inland thrace (6th–4th c
Yiannis Mourthos, Despoina Tsiafaki, et al. Contextualizing rare shapes of athenian kerameikos from coastal and inland thrace (6th–4th c. bc): an approach through the atticpot repository.Bulgarian e-Journal of Archaeology| Bblgarsko e-Spisanie za Arheologiya, 12(2):217–243, 2022
work page 2022
-
[9]
Shiv Ram Dubey. A decade survey of content based image retrieval using deep learning.IEEE Transactions on Circuits and Systems for Video Technology, 31(11):4551–4570, 2021
work page 2021
-
[10]
Fundamentals of content-based image retrieval
Fuhui Long, Hongjiang Zhang, and David Dagan Feng. Fundamentals of content-based image retrieval. In Multimedia Information Retrieval and Management: Technological Fundamentals and Applications, pages 1–26. Springer, 2003
work page 2003
-
[11]
Thomas Mensink and Jan C. van Gemert. The rijksmuseum challenge: Museum-centered visual recognition. In Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR), pages 451–454, 2014
work page 2014
-
[12]
Omniart: Multi-task deep learning for artistic data analysis.arXiv preprint, 2017
Gjorgji Strezoski and Marcel Worring. Omniart: Multi-task deep learning for artistic data analysis.arXiv preprint, 2017
work page 2017
-
[13]
How to read paintings: Semantic art understanding with multi-modal retrieval
Noa Garcia and George V ogiatzis. How to read paintings: Semantic art understanding with multi-modal retrieval. InECCV Workshops (ECCVW) 2018, LNCS 11130, pages 676–691, 2019
work page 2018
-
[14]
An image similarity search for the european digital library and beyond
Sergiu Gordea. An image similarity search for the european digital library and beyond. In2nd International Workshop on Supporting Users Exploration of Digital Libraries (SUEDL 2013), 2013
work page 2013
-
[15]
Linked open images: Visual similarity for the semantic web.Semantic Web, 14(2):197–208, 2023
Lukas Kli ´c. Linked open images: Visual similarity for the semantic web.Semantic Web, 14(2):197–208, 2023
work page 2023
-
[16]
Art3mis: ray-based textual annotation on 3d cultural objects
Vasileios Arampatzakis, Vasileios Sevetlidis, Fotis Arnaoutoglou, Athanasios Kalogeras, Christos Koulamas, Aris Lalos, Chairi Kiourt, George Ioannakis, Anestis Koutsoudis, and George Pavlidis. Art3mis: ray-based textual annotation on 3d cultural objects. InCAA 2021 International Conference “Digital Crossroads, 2021
work page 2021
-
[17]
Sean McKeown, Peter Aaby, and Andreas Steyven. Phaser: Perceptual hashing algorithms evaluation and results — an open-source forensic framework.Forensic Science International: Digital Investigation, 2024
work page 2024
-
[18]
A survey of perceptual hashing for multimedia.ACM Computing Surveys, 2025
Haitao Wang, Xinpeng Zhang, Feng Li, Jiwu Huang, and Yun-Qing Shi. A survey of perceptual hashing for multimedia.ACM Computing Surveys, 2025
work page 2025
-
[19]
An overview of perceptual hashing.Journal of Online Trust and Safety, 1(1), 2021
Hany Farid. An overview of perceptual hashing.Journal of Online Trust and Safety, 1(1), 2021. 9 Arampatzakis et al
work page 2021
-
[20]
Mike Lyons. Ceramic fabric classification of petrographic thin sections with deep learning.Journal of computer applications in archaeology, 4(1), 2021
work page 2021
-
[21]
Aladine Chetouani, Sylvie Treuillet, Matthieu Exbrayat, and Sébastien Jesset. Classification of engraved pottery sherds mixing deep-learning features by compact bilinear pooling.Pattern Recognition Letters, 131:1–7, 2020
work page 2020
-
[22]
Francesca Anichini, Nachum Dershowitz, Nevio Dubbini, Gabriele Gattiglia, Barak Itkin, and Lior Wolf. The automatic recognition of ceramics from only one photo: The archaide app.Journal of Archaeological Science: Reports, 36:102788, 2021
work page 2021
-
[23]
Facsimiles-based deep learning for matching relief-printed decorations on medieval ceramic sherds
Khawla Brahim, Sylvie Treuillet, Matthieu Exbrayat, and Sebastien Jesset. Facsimiles-based deep learning for matching relief-printed decorations on medieval ceramic sherds. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1613–1622, 2023
work page 2023
-
[24]
Eirini Kaldeli and et al. Crowdheritage: Large-scale crowdsourcing for enriching europeana metadata.Information, 12(9):384, 2021
work page 2021
-
[25]
Crowdsourcing in the cultural heritage domain: Opportunities and challenges
Joris Oomen and Lora Aroyo. Crowdsourcing in the cultural heritage domain: Opportunities and challenges. In Proceedings of the 5th International Conference on Communities & Technologies (Workshops), 2011
work page 2011
-
[26]
Crowdsourcing our cultural heritage: Introduction
Mia Ridge. Crowdsourcing our cultural heritage: Introduction. InCrowdsourcing our cultural heritage, pages 1–14. Routledge, 2016
work page 2016
-
[27]
From tagging to theorizing: deepening engagement with cultural heritage through crowdsourcing
Mia Ridge. From tagging to theorizing: deepening engagement with cultural heritage through crowdsourcing. Curator: The Museum Journal, 56(4):435–450, 2013
work page 2013
-
[28]
Samantha Blickhan, Coleman Krawczyk, Daniel Hanson, Amy Boyer, Andrea Simenstad, and Victoria Van Hyning. Individual vs. collaborative methods of crowdsourced transcription.Journal of Data Mining & Digital Humanities, 2019
work page 2019
-
[29]
International winners 2022 (announcement & report).Wiki Loves Monuments (official site/blog), 2022
Wiki Loves Monuments. International winners 2022 (announcement & report).Wiki Loves Monuments (official site/blog), 2022. Accessed 2025-09-06
work page 2022
-
[30]
Bhargav Srinivasa Desikan, Hajime Shimao, and Helena Miton. Wikiartvectors: Style and color representations of artworks for cultural analysis via information theoretic measures.Entropy, 24(9):1175, 2022
work page 2022
-
[31]
Augmenting existing food image datasets with greek dishes.Big Data in Archaeology, page 133, 2021
V Sevetlidis, C Kiourt, C Tzouvara, G Tastzoglou, and G Pavlidis. Augmenting existing food image datasets with greek dishes.Big Data in Archaeology, page 133, 2021
work page 2021
-
[32]
G Pavlidis, SL Markantonatou, KR Toraki, AN Vacalopoulou, CL Strouthopoulos, DT Varsamis, A V Tsimpiris, SR Mouroutsos, CR Kiourt, VL Sevetlidis, et al. Ai in gastronomic tourism. InProceedings of the 2nd International Conference on Advances In Signal Processing and Artificial Intelligence, pages 168–174, 2020
work page 2020
-
[33]
Mouroutsos, and Antonios Gasteratos
Vasileios Sevetlidis, George Pavlidis, Vasileios Arampatzakis, Chairi Kiourt, Spyridon G. Mouroutsos, and Antonios Gasteratos. Web acquired image datasets need curation: An examplar pipeline evaluated on greek food images. In2021 IEEE International Conference on Imaging Systems and Techniques (IST), pages 1–6, 2021
work page 2021
-
[34]
Mouroutsos, and Antonios Gasteratos
Vasileios Sevetlidis, George Pavlidis, Spyridon G. Mouroutsos, and Antonios Gasteratos. Tackling dataset bias with an automated collection of real-world samples.IEEE Access, 10:126832–126844, 2022
work page 2022
-
[35]
Ryuichi Kiryo, Gang Niu, Marthinus C Du Plessis, and Masashi Sugiyama. Positive-unlabeled learning with non-negative risk estimator.Advances in neural information processing systems, 30, 2017
work page 2017
-
[36]
Vasileios Sevetlidis, George Pavlidis, Spyridon G Mouroutsos, and Antonios Gasteratos. Leveraging positive- unlabeled learning for enhanced black spot accident identification on greek road networks.Computers, 13(2):49, 2024
work page 2024
-
[37]
Vasileios Sevetlidis, George Pavlidis, Spyridon G Mouroutsos, and Antonios Gasteratos. Dense-pu: Learning a density-based boundary for positive and unlabeled learning.IEEE Access, 12:90287–90298, 2024
work page 2024
-
[38]
A simple framework for contrastive learning of visual representations, 2020
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations, 2020
work page 2020
-
[39]
Momentum contrast for unsupervised visual representation learning, 2020
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning, 2020
work page 2020
-
[40]
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning, 2020
work page 2020
-
[41]
Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, and Marius Kloft. Deep one-class classification. InInternational conference on machine learning, pages 4393–4402. PMLR, 2018. 10
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.