Layout-Aware Representation Learning for Open-Set ID Fraud Discovery

Cathy Chang; Daniel George; Hongkai Pan; Jinxing Li; Nicholas Ren

arxiv: 2605.05215 · v1 · submitted 2026-04-17 · 💻 cs.CV · cs.AI· cs.LG

Layout-Aware Representation Learning for Open-Set ID Fraud Discovery

Jinxing Li , Nicholas Ren , Cathy Chang , Hongkai Pan , Daniel George This is my paper

Pith reviewed 2026-05-10 09:33 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords identity document fraudopen-set detectionlayout-aware embeddingstransfer learningmetric learningdocument analysisphysical forgery discovery

0 comments

The pith

Layout-aware embeddings trained solely on U.S. identity documents transfer to Canadian layouts and surface hundreds of adaptive fraud cases missed by prior detectors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that document-specific adaptation of a vision foundation model creates embeddings sensitive to layout structure rather than just visual content. These embeddings, built exclusively from U.S. training data through context-aware reconstruction and a composite metric-learning loss, classify Canadian ID layouts at 99.83 percent accuracy with a simple downstream classifier. On a collection of more than twenty thousand Canadian IDs the same space reveals 276 instances of adaptive physical fraud, 222 of which escaped detection by existing systems. The approach also supports growing a fraud cluster from a single confirmed seed by nearest-neighbor search instead of metadata linkage. This matters because fraud campaigns evolve faster than labeled datasets, so methods that discover new patterns under distribution shift can stay ahead of forgers.

Core claim

By adapting DINOv3 to the document domain with context-aware SimMIM fine-tuning and supervised metric learning that enforces both inter-class separation and intra-class compactness, the resulting embeddings organize identity documents by layout and fraud status. Trained only on U.S. IDs, the model transfers to Canadian data sufficiently well that a lightweight MLP achieves 99.83 percent layout classification accuracy and embedding-space analysis identifies 276 adaptive physical-fraud cases among 20,448 Canadian samples, including 222 not caught by incumbent detectors. The same embeddings permit similarity-based expansion from any single verified fraud seed to additional related cases without

What carries the argument

The layout-aware document embedding produced by DINOv3 after context-aware SimMIM fine-tuning and composite metric learning, which places documents in a space where layout classes and fraud patterns form separable clusters.

If this is right

A simple classifier on top of the embedding reaches 99.83 percent accuracy on unseen national layouts.
Embedding-space analysis surfaces hundreds of adaptive fraud instances that standard detectors miss.
Confirmed fraud examples can seed discovery of additional related cases through similarity search.
The method continues to function when training and deployment countries differ in document design.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Shared embeddings across countries could lower the cost of maintaining separate fraud models for each jurisdiction.
The same layout-sensitive space might help track how a single fabrication campaign evolves over successive batches of forged documents.
Extending the approach to passports, visas, or other variable documents would test whether the transfer property generalizes beyond driver licenses.
Independent verification of the newly surfaced cases remains necessary before operational deployment, since the paper reports discovery but not final adjudication.

Load-bearing premise

That representations learned exclusively from U.S. IDs will reliably separate genuine Canadian documents from adaptive physical fraud without substantial domain shift or overfitting to the training layouts.

What would settle it

Forensic examination of the 276 surfaced cases showing that most are genuine documents rather than fraud, or layout classification accuracy falling below 90 percent on a new held-out collection of Canadian or third-country IDs.

Figures

Figures reproduced from arXiv: 2605.05215 by Cathy Chang, Daniel George, Hongkai Pan, Jinxing Li, Nicholas Ren.

**Figure 2.** Figure 2: Training framework for layout-aware ID representation learning. Top: self-supervised fine-tuning adapts the DINOv3 Vision [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

Identity-document fraud detection is not a stationary binary classification problem. Adaptive attackers modify templates and fabrication pipelines, making historical fraud labels stale, and successful forgeries recur at scale as coherent campaigns. We therefore study layout-aware representation learning for open-set fraud discovery rather than only closed-set classification. We adapt DINOv3 to the document domain via context-aware SimMIM fine-tuning and supervised metric learning with composite loss that encourages inter-class separability and intra-class compactness. The model is trained with U.S. IDs only. With a lightweight MLP and softmax classifier, the embedding achieves 99.83% layout classification accuracy on Canadian layouts. Moreover, on a dataset of 20,448 Canadian IDs, embedding-space analysis surfaces 276 adaptive physical-fraud cases, including 222 not surfaced by incumbent detectors. The embedding supports similarity-based expansion from a single confirmed seed to additional related cases not linked by conventional metadata graphs. The layout-aware document embeddings provide a production-aligned basis for discovering novel and campaign-scale fraud under distribution shift.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a workable transfer of layout-aware embeddings from U.S. to Canadian IDs for flagging potential fraud, but the 276 cases rest on unverified embedding analysis.

read the letter

The paper adapts DINOv3 with SimMIM fine-tuning and a composite metric-learning loss to produce layout-aware embeddings for identity documents. Trained only on U.S. data, the embedding plus a simple MLP hits 99.83% accuracy on Canadian layout classification. On 20,448 Canadian IDs it then uses distance and clustering in the embedding space to surface 276 suspected adaptive physical-fraud cases, 222 of which were missed by existing detectors. It also shows how a single confirmed seed can pull in related cases via similarity search rather than metadata graphs alone. That cross-border transfer result and the seed-expansion idea are the concrete pieces worth noting. The setup takes standard self-supervised and metric-learning tools and applies them to a real detection problem where attackers change templates, which is a reasonable domain extension. The production framing around campaign-scale discovery is also practical. The central weakness is that the fraud-discovery numbers lack independent ground truth or a stated validation protocol for the Canadian set. The 276 designations appear to come from embedding proximity and disagreement with incumbents, which makes the open-set claim hard to assess without circularity. No baselines for the discovery task itself are shown, and details on how the post-hoc analysis was run are thin. This is aimed at teams building document-fraud systems in security or finance. A reader already working on representation learning for detection tasks could extract the experimental choices and the transfer numbers, even if they want more verification details. I would send it to peer review. The transfer experiment is specific enough to deserve referee time, though the authors should expect direct questions on how the new cases were confirmed.

Referee Report

3 major / 2 minor

Summary. The paper proposes adapting DINOv3 for document images via context-aware SimMIM fine-tuning and supervised metric learning with a composite loss on U.S. identity documents only. It reports that a lightweight MLP+softmax head achieves 99.83% layout classification accuracy on held-out Canadian layouts. On a set of 20,448 Canadian IDs, embedding-space analysis (distance to seeds and clustering) surfaces 276 adaptive physical-fraud cases, of which 222 are not flagged by incumbent detectors; the embedding is also shown to support similarity-based expansion from a single confirmed seed.

Significance. If the empirical claims are substantiated, the work provides a practical route to open-set fraud discovery under domain shift, moving beyond closed-set classification. The combination of self-supervised pre-training and metric learning for layout-aware document embeddings is a clear strength and could transfer to other document-analysis settings. The reported ability to expand from seeds without relying on metadata graphs is particularly production-relevant.

major comments (3)

[Abstract] Abstract and the Canadian-dataset analysis section: the central claim that 276 adaptive physical-fraud cases (222 novel) were surfaced rests on embedding-space analysis alone, yet no verification protocol, ground-truth labels for the Canadian set, or independent adjudication of the 276 designations is described. This directly undermines the open-set discovery result.
[Abstract] Abstract and results section: no baselines, ablation studies, or error bars are reported for either the 99.83% layout accuracy or the fraud-discovery counts, nor is a comparison to alternative anomaly-detection or open-set methods provided. Without these, the improvement over incumbents cannot be quantified.
[Method] Method section on the composite loss: the weights of the inter-class separability and intra-class compactness terms are listed as free parameters but no sensitivity analysis or selection procedure is given, leaving the metric-learning objective under-specified for reproducibility.

minor comments (2)

[Abstract] Abstract: the phrase 'context-aware SimMIM' is introduced without a one-sentence gloss or citation.
[Figures] Figure captions and embedding visualizations: axis labels, distance metrics, and seed-selection criteria should be stated explicitly so readers can interpret the clustering plots.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our work. We address each major point below and indicate where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract and the Canadian-dataset analysis section: the central claim that 276 adaptive physical-fraud cases (222 novel) were surfaced rests on embedding-space analysis alone, yet no verification protocol, ground-truth labels for the Canadian set, or independent adjudication of the 276 designations is described. This directly undermines the open-set discovery result.

Authors: The open-set discovery setting inherently lacks ground-truth labels for novel fraud patterns in the Canadian data, as these cases represent previously unseen adaptations. The 276 cases were identified via distance-to-seed analysis and clustering in the learned embedding space, with the 222 additional cases relative to incumbent detectors providing supporting evidence of utility. We will revise the Canadian-dataset analysis section to explicitly detail the verification protocol, including distance thresholds, clustering parameters, and illustrative examples of surfaced cases. revision: partial
Referee: [Abstract] Abstract and results section: no baselines, ablation studies, or error bars are reported for either the 99.83% layout accuracy or the fraud-discovery counts, nor is a comparison to alternative anomaly-detection or open-set methods provided. Without these, the improvement over incumbents cannot be quantified.

Authors: The manuscript emphasizes representation transfer for open-set discovery under domain shift rather than a full benchmark study. We agree that additional context strengthens the presentation. In the revision we will add error bars for the layout accuracy metric, ablations on the SimMIM and composite-loss components, and a comparison to representative open-set and anomaly-detection baselines applied to the same embeddings. revision: yes
Referee: [Method] Method section on the composite loss: the weights of the inter-class separability and intra-class compactness terms are listed as free parameters but no sensitivity analysis or selection procedure is given, leaving the metric-learning objective under-specified for reproducibility.

Authors: We agree that the weight selection procedure should be documented for reproducibility. In the revised method section we will add a sensitivity analysis over the loss weights together with the procedure used to choose the reported values. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results on held-out data with no derivations or self-referential reductions

full rationale

The paper reports training a model on U.S. ID data only, then evaluates layout classification accuracy (99.83%) and fraud discovery (276 cases) on a separate Canadian dataset of 20,448 IDs using embedding-space analysis. No equations, derivations, or first-principles claims are present. The results are direct empirical outputs from fine-tuning and metric learning applied to held-out data, with no steps that reduce predictions to inputs by construction, no self-citations as load-bearing premises, and no renaming or ansatz smuggling. This is standard supervised transfer evaluation and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the transferability of DINOv3 representations after SimMIM fine-tuning and metric learning from U.S. to Canadian documents; no explicit free parameters or invented entities are named in the abstract, but standard ML hyperparameters are implicitly present.

free parameters (1)

composite loss term weights
Weights balancing inter-class separability and intra-class compactness are required for the metric learning step and are typically tuned on validation data.

axioms (1)

domain assumption DINOv3 pre-trained representations remain useful for document layout after context-aware SimMIM fine-tuning
Invoked when the authors state they adapt DINOv3 to the document domain.

pith-pipeline@v0.9.0 · 5485 in / 1404 out tokens · 100171 ms · 2026-05-10T09:33:42.065287+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

Abhijit Bendale and Terrance E. Boult. Towards open set deep networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1563–1572, 2016. 2

work page 2016
[2]

Bokai Cao, Mia Mao, Siim Viidu, and Philip S. Yu. Collec- tive fraud detection capturing inter-transaction dependency. InProceedings of the KDD 2017 Workshop on Anomaly De- tection in Finance, pages 66–75. PMLR, 2018. 2

work page 2017
[3]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, 2021. 3

work page 2021
[4]

A simple framework for contrastive learning of visual representations, 2020

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- offrey Hinton. A simple framework for contrastive learning of visual representations, 2020. 2

work page 2020
[5]

Credit card fraud de- tection and concept-drift adaptation with delayed supervised information

Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Ce- sare Alippi, and Gianluca Bontempi. Credit card fraud de- tection and concept-drift adaptation with delayed supervised information. In2015 International Joint Conference on Neu- ral Networks (IJCNN), pages 1–8, 2015. 2

work page 2015
[6]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InProceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 248– 255, 2009. 4

work page 2009
[7]

Arcface: Additive angular margin loss for deep face recognition

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 3

work page 2019
[8]

Dinov3-diffusion policy: Self-supervised large visual model for visuomotor diffusion policy learning, 2025

ThankGod Egbe, Peng Wang, Zhihao Guo, and Zidong Chen. Dinov3-diffusion policy: Self-supervised large visual model for visuomotor diffusion policy learning, 2025. 3

work page 2025
[9]

Un- supervised representation learning by predicting image rota- tions, 2018

Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Un- supervised representation learning by predicting image rota- tions, 2018. 2

work page 2018
[10]

Richemond, Elena Buchatskaya, Carl Do- ersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Moham- mad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, R´emi Munos, and Michal Valko

Jean-Bastien Grill, Florian Strub, Florent Altch ´e, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Do- ersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Moham- mad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, R´emi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning, 2020. 3

work page 2020
[11]

Dimensional- ity reduction by learning an invariant mapping

Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensional- ity reduction by learning an invariant mapping. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. 3

work page 2006
[12]

Momentum contrast for unsupervised visual rep- resentation learning, 2020

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning, 2020. 2

work page 2020
[13]

Masked autoencoders are scalable vision learners, 2021

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners, 2021. 2

work page 2021
[14]

Scaling out-of-distribution detection for real- world settings, 2022

Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, and Dawn Song. Scaling out-of-distribution detection for real- world settings, 2022. 2

work page 2022
[15]

Fraudar: Bounding graph fraud in the face of camouflage

Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. Fraudar: Bounding graph fraud in the face of camouflage. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 895–904, 2016. 2

work page 2016
[16]

Universal language model fine-tuning for text classification.Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 328–339, 2018

Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classification.Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 328–339, 2018. 5

work page 2018
[17]

Real-time object detection meets dinov3, 2025

Shihua Huang, Yongjie Hou, Longfei Liu, Xuanlong Yu, and Xi Shen. Real-time object detection meets dinov3, 2025. 3

work page 2025
[18]

Supervised contrastive learning, 2021

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning, 2021. 3

work page 2021
[19]

Deepid challenge of detecting synthetic manipulations in id docu- ments

Pavel Korshunov, Vidit Vidit, Amir Mohammadi, Christophe Ecabert, Nevena Shamoska, S ´ebastien Marcel, Zeqin Yu, Ye Tian, Jiangqun Ni, Lazar Lazarevic, Renat Khizbullin, Anastasiia Evteeva, Alexey Tochin, Aleksei Grishin, An- jith George, Daniel Dealcala, Tamas Endrei, Javier Mu ˜noz Haro, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fier...

work page 2025
[20]

Bercea, Cheng Ouyang, Chen Chen, Zhiwei Xiong, Benedikt Wiestler, Christian Wachinger, James S

Che Liu, Yinda Chen, Haoyuan Shi, Jinpeng Lu, Bailiang Jian, Jiazhen Pan, Linghan Cai, Jiayi Wang, Jieming Yu, Ziqi Gao, Xiaoran Zhang, Long Bai, Yundi Zhang, Jun Li, Cos- min I. Bercea, Cheng Ouyang, Chen Chen, Zhiwei Xiong, Benedikt Wiestler, Christian Wachinger, James S. Duncan, Daniel Rueckert, Wenjia Bai, and Rossella Arcucci. Does dinov3 set a new m...

work page 2026
[21]

Sphereface: Deep hypersphere embedding for face recognition

Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. Sphereface: Deep hypersphere embedding for face recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 3

work page 2017
[22]

An efficient method to detect series of fraudulent identity documents based on digitised forensic data.Science & Justice, 62(5):610–620, 2022

Sol `ene Lugon Moulin, C ´eline Weyermann, and Simon Baechler. An efficient method to detect series of fraudulent identity documents based on digitised forensic data.Science & Justice, 62(5):610–620, 2022. 2

work page 2022
[23]

Adversarial learning in real-world fraud de- tection: Challenges and perspectives

Daniele Lunghi, Alkis Simitsis, Olivier Caelen, and Gian- luca Bontempi. Adversarial learning in real-world fraud de- tection: Challenges and perspectives. 2023. 1, 2

work page 2023
[24]

A survey on open set recognition

Atefeh Mahdavi and Marco Carvalho. A survey on open set recognition. In2021 IEEE Fourth International Confer- ence on Artificial Intelligence and Knowledge Engineering (AIKE), page 37–44. IEEE, 2021. 2

work page 2021
[25]

Gradual tuning: a better way of fine tuning the parameters of a deep neural network.arXiv preprint arXiv:1711.10177,

Guido Montone, Giuseppe Rizzo, and Maurizio Morisio. Gradual tuning: a better way of fine tuning the parameters of a deep neural network.arXiv preprint arXiv:1711.10177,

work page arXiv
[26]

Cross-border forensic profiling of fraudulent identity and travel documents: A pilot project between france and switzerland.Science & Justice, 64(2):202–209, 2024

Sol `ene Lugon Moulin, Emre Ertan, Didier Martin, and Si- mon Baechler. Cross-border forensic profiling of fraudulent identity and travel documents: A pilot project between france and switzerland.Science & Justice, 64(2):202–209, 2024. 2

work page 2024
[27]

Unsupervised learning of visual representations by solving jigsaw puzzles, 2017

Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles, 2017. 2

work page 2017
[28]

Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patr...

work page 2024
[29]

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. Context encoders: Feature learning by inpainting, 2016. 2

work page 2016
[30]

Facenet: A unified embedding for face recognition and clus- tering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clus- tering. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3

work page 2015
[31]

Oriane Sim ´eoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timoth´ee Darcet, Th´eo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie,...

work page 2025
[32]

Fantasyid: A dataset for detecting digital manipulations of id-documents, 2025

Vidit Vidit, Pavel Korshunov, S ´ebastien Marcel, Amir Mo- hammadi, and Christophe Ecabert. Fantasyid: A dataset for detecting digital manipulations of id-documents, 2025. 6

work page 2025
[33]

Cosface: Large margin cosine loss for deep face recognition

Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface: Large margin cosine loss for deep face recognition. InPro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 3

work page 2018
[34]

Dissecting out- of-distribution detection and open-set recognition: A critical analysis of methods and benchmarks, 2024

Hongjun Wang, Sagar Vaze, and Kai Han. Dissecting out- of-distribution detection and open-set recognition: A critical analysis of methods and benchmarks, 2024. 2

work page 2024
[35]

A discriminative feature learning approach for deep face recog- nition

Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recog- nition. InComputer Vision – ECCV 2016, pages 499–515, Cham, 2016. Springer International Publishing. 3, 5

work page 2016
[36]

Manmatha, Alexander J

Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, and Philipp Krahenbuhl. Sampling matters in deep embedding learning. InProceedings of the IEEE International Confer- ence on Computer Vision (ICCV), 2017. 3

work page 2017
[37]

Simmim: A simple framework for masked image modeling

Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9653–9663, 2022. 3

work page 2022
[38]

Openood: Benchmarking generalized out-of-distribution de- tection, 2022

Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, and Ziwei Liu. Openood: Benchmarking generalized out-of-distribution de- tection, 2022. 2

work page 2022
[39]

Group-based fraud detection network on e-commerce platforms

Jianke Yu, Hanchen Wang, Xiaoyang Wang, Zhao Li, Lu Qin, Wenjie Zhang, Jian Liao, and Ying Zhang. Group-based fraud detection network on e-commerce platforms. InPro- ceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2023. 2

work page 2023
[40]

Richard Zhang, Phillip Isola, and Alexei A. Efros. Split- brain autoencoders: Unsupervised learning by cross-channel prediction, 2017. 2

work page 2017

[1] [1]

Abhijit Bendale and Terrance E. Boult. Towards open set deep networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1563–1572, 2016. 2

work page 2016

[2] [2]

Bokai Cao, Mia Mao, Siim Viidu, and Philip S. Yu. Collec- tive fraud detection capturing inter-transaction dependency. InProceedings of the KDD 2017 Workshop on Anomaly De- tection in Finance, pages 66–75. PMLR, 2018. 2

work page 2017

[3] [3]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9650–9660, 2021. 3

work page 2021

[4] [4]

A simple framework for contrastive learning of visual representations, 2020

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- offrey Hinton. A simple framework for contrastive learning of visual representations, 2020. 2

work page 2020

[5] [5]

Credit card fraud de- tection and concept-drift adaptation with delayed supervised information

Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Ce- sare Alippi, and Gianluca Bontempi. Credit card fraud de- tection and concept-drift adaptation with delayed supervised information. In2015 International Joint Conference on Neu- ral Networks (IJCNN), pages 1–8, 2015. 2

work page 2015

[6] [6]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InProceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 248– 255, 2009. 4

work page 2009

[7] [7]

Arcface: Additive angular margin loss for deep face recognition

Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 3

work page 2019

[8] [8]

Dinov3-diffusion policy: Self-supervised large visual model for visuomotor diffusion policy learning, 2025

ThankGod Egbe, Peng Wang, Zhihao Guo, and Zidong Chen. Dinov3-diffusion policy: Self-supervised large visual model for visuomotor diffusion policy learning, 2025. 3

work page 2025

[9] [9]

Un- supervised representation learning by predicting image rota- tions, 2018

Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Un- supervised representation learning by predicting image rota- tions, 2018. 2

work page 2018

[10] [10]

Richemond, Elena Buchatskaya, Carl Do- ersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Moham- mad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, R´emi Munos, and Michal Valko

Jean-Bastien Grill, Florian Strub, Florent Altch ´e, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Do- ersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Moham- mad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, R´emi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning, 2020. 3

work page 2020

[11] [11]

Dimensional- ity reduction by learning an invariant mapping

Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensional- ity reduction by learning an invariant mapping. InProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006. 3

work page 2006

[12] [12]

Momentum contrast for unsupervised visual rep- resentation learning, 2020

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning, 2020. 2

work page 2020

[13] [13]

Masked autoencoders are scalable vision learners, 2021

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners, 2021. 2

work page 2021

[14] [14]

Scaling out-of-distribution detection for real- world settings, 2022

Dan Hendrycks, Steven Basart, Mantas Mazeika, Andy Zou, Joe Kwon, Mohammadreza Mostajabi, Jacob Steinhardt, and Dawn Song. Scaling out-of-distribution detection for real- world settings, 2022. 2

work page 2022

[15] [15]

Fraudar: Bounding graph fraud in the face of camouflage

Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. Fraudar: Bounding graph fraud in the face of camouflage. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 895–904, 2016. 2

work page 2016

[16] [16]

Universal language model fine-tuning for text classification.Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 328–339, 2018

Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classification.Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), pages 328–339, 2018. 5

work page 2018

[17] [17]

Real-time object detection meets dinov3, 2025

Shihua Huang, Yongjie Hou, Longfei Liu, Xuanlong Yu, and Xi Shen. Real-time object detection meets dinov3, 2025. 3

work page 2025

[18] [18]

Supervised contrastive learning, 2021

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. Supervised contrastive learning, 2021. 3

work page 2021

[19] [19]

Deepid challenge of detecting synthetic manipulations in id docu- ments

Pavel Korshunov, Vidit Vidit, Amir Mohammadi, Christophe Ecabert, Nevena Shamoska, S ´ebastien Marcel, Zeqin Yu, Ye Tian, Jiangqun Ni, Lazar Lazarevic, Renat Khizbullin, Anastasiia Evteeva, Alexey Tochin, Aleksei Grishin, An- jith George, Daniel Dealcala, Tamas Endrei, Javier Mu ˜noz Haro, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fier...

work page 2025

[20] [20]

Bercea, Cheng Ouyang, Chen Chen, Zhiwei Xiong, Benedikt Wiestler, Christian Wachinger, James S

Che Liu, Yinda Chen, Haoyuan Shi, Jinpeng Lu, Bailiang Jian, Jiazhen Pan, Linghan Cai, Jiayi Wang, Jieming Yu, Ziqi Gao, Xiaoran Zhang, Long Bai, Yundi Zhang, Jun Li, Cos- min I. Bercea, Cheng Ouyang, Chen Chen, Zhiwei Xiong, Benedikt Wiestler, Christian Wachinger, James S. Duncan, Daniel Rueckert, Wenjia Bai, and Rossella Arcucci. Does dinov3 set a new m...

work page 2026

[21] [21]

Sphereface: Deep hypersphere embedding for face recognition

Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. Sphereface: Deep hypersphere embedding for face recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 3

work page 2017

[22] [22]

An efficient method to detect series of fraudulent identity documents based on digitised forensic data.Science & Justice, 62(5):610–620, 2022

Sol `ene Lugon Moulin, C ´eline Weyermann, and Simon Baechler. An efficient method to detect series of fraudulent identity documents based on digitised forensic data.Science & Justice, 62(5):610–620, 2022. 2

work page 2022

[23] [23]

Adversarial learning in real-world fraud de- tection: Challenges and perspectives

Daniele Lunghi, Alkis Simitsis, Olivier Caelen, and Gian- luca Bontempi. Adversarial learning in real-world fraud de- tection: Challenges and perspectives. 2023. 1, 2

work page 2023

[24] [24]

A survey on open set recognition

Atefeh Mahdavi and Marco Carvalho. A survey on open set recognition. In2021 IEEE Fourth International Confer- ence on Artificial Intelligence and Knowledge Engineering (AIKE), page 37–44. IEEE, 2021. 2

work page 2021

[25] [25]

Gradual tuning: a better way of fine tuning the parameters of a deep neural network.arXiv preprint arXiv:1711.10177,

Guido Montone, Giuseppe Rizzo, and Maurizio Morisio. Gradual tuning: a better way of fine tuning the parameters of a deep neural network.arXiv preprint arXiv:1711.10177,

work page arXiv

[26] [26]

Cross-border forensic profiling of fraudulent identity and travel documents: A pilot project between france and switzerland.Science & Justice, 64(2):202–209, 2024

Sol `ene Lugon Moulin, Emre Ertan, Didier Martin, and Si- mon Baechler. Cross-border forensic profiling of fraudulent identity and travel documents: A pilot project between france and switzerland.Science & Justice, 64(2):202–209, 2024. 2

work page 2024

[27] [27]

Unsupervised learning of visual representations by solving jigsaw puzzles, 2017

Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles, 2017. 2

work page 2017

[28] [28]

Maxime Oquab, Timoth ´ee Darcet, Th´eo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patr...

work page 2024

[29] [29]

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A. Efros. Context encoders: Feature learning by inpainting, 2016. 2

work page 2016

[30] [30]

Facenet: A unified embedding for face recognition and clus- tering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clus- tering. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 3

work page 2015

[31] [31]

Oriane Sim ´eoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timoth´ee Darcet, Th´eo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie,...

work page 2025

[32] [32]

Fantasyid: A dataset for detecting digital manipulations of id-documents, 2025

Vidit Vidit, Pavel Korshunov, S ´ebastien Marcel, Amir Mo- hammadi, and Christophe Ecabert. Fantasyid: A dataset for detecting digital manipulations of id-documents, 2025. 6

work page 2025

[33] [33]

Cosface: Large margin cosine loss for deep face recognition

Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface: Large margin cosine loss for deep face recognition. InPro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. 3

work page 2018

[34] [34]

Dissecting out- of-distribution detection and open-set recognition: A critical analysis of methods and benchmarks, 2024

Hongjun Wang, Sagar Vaze, and Kai Han. Dissecting out- of-distribution detection and open-set recognition: A critical analysis of methods and benchmarks, 2024. 2

work page 2024

[35] [35]

A discriminative feature learning approach for deep face recog- nition

Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recog- nition. InComputer Vision – ECCV 2016, pages 499–515, Cham, 2016. Springer International Publishing. 3, 5

work page 2016

[36] [36]

Manmatha, Alexander J

Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, and Philipp Krahenbuhl. Sampling matters in deep embedding learning. InProceedings of the IEEE International Confer- ence on Computer Vision (ICCV), 2017. 3

work page 2017

[37] [37]

Simmim: A simple framework for masked image modeling

Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9653–9663, 2022. 3

work page 2022

[38] [38]

Openood: Benchmarking generalized out-of-distribution de- tection, 2022

Jingkang Yang, Pengyun Wang, Dejian Zou, Zitang Zhou, Kunyuan Ding, Wenxuan Peng, Haoqi Wang, Guangyao Chen, Bo Li, Yiyou Sun, Xuefeng Du, Kaiyang Zhou, Wayne Zhang, Dan Hendrycks, Yixuan Li, and Ziwei Liu. Openood: Benchmarking generalized out-of-distribution de- tection, 2022. 2

work page 2022

[39] [39]

Group-based fraud detection network on e-commerce platforms

Jianke Yu, Hanchen Wang, Xiaoyang Wang, Zhao Li, Lu Qin, Wenjie Zhang, Jian Liao, and Ying Zhang. Group-based fraud detection network on e-commerce platforms. InPro- ceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2023. 2

work page 2023

[40] [40]

Richard Zhang, Phillip Isola, and Alexei A. Efros. Split- brain autoencoders: Unsupervised learning by cross-channel prediction, 2017. 2

work page 2017