Image Retrieval and Pattern Spotting using Siamese Neural Network

Alceu S. Britto Jr.; Alessandro L. Koerich; Kelly L. Wiggers; Laurent Heutte; Luiz S. Oliveira

arxiv: 1906.09513 · v1 · pith:OQTKLTKFnew · submitted 2019-06-22 · 💻 cs.CV

Image Retrieval and Pattern Spotting using Siamese Neural Network

Kelly L. Wiggers , Alceu S. Britto Jr. , Laurent Heutte , Alessandro L. Koerich , Luiz S. Oliveira This is my paper

Pith reviewed 2026-05-25 17:49 UTC · model grok-4.3

classification 💻 cs.CV

keywords image retrievalpattern spottingsiamese neural networkdocument image analysissimilarity learningtobacco800 dataset

0 comments

The pith

A Siamese neural network trained only on natural image pairs can retrieve and spot patterns in document images with high accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that similarity features learned by a Siamese network from ImageNet pairs transfer effectively to document images. This would allow retrieval and pattern spotting in document collections without manual features or document-specific training. The method is evaluated on the Tobacco800 collection, where it achieves strong results against other approaches. Reducing the size of the learned feature maps is also tested for its effect on speed and accuracy.

Core claim

The central claim is that a Siamese Neural Network trained on a subset of image pairs from the ImageNet dataset learns a similarity-based representation. This representation provides feature maps that find relevant document image candidates given a query, leading to 0.94 mAP for retrieval and 0.83 mAP for pattern spotting at IoU=0.7 on the Tobacco800 dataset, outperforming state-of-the-art document image retrieval methods.

What carries the argument

Siamese Neural Network trained on image pairs to produce similarity-based feature maps for matching.

If this is right

The learned features support both whole-image retrieval and localized pattern spotting.
Performance holds with varying feature map sizes, trading some accuracy for reduced computation.
Manual feature engineering can be replaced by this learned similarity approach in document collections.
Results suggest the method applies to public document image datasets without additional adaptation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar transfer might work for other specialized image domains like medical scans or historical archives.
The same model could be tested on retrieval tasks outside documents to check cross-domain generality.
It raises the question of whether document-specific training data is needed at all for similarity-based matching.

Load-bearing premise

Similarity features learned from ImageNet natural-image pairs transfer directly to document images without further domain adaptation or document-specific training data.

What would settle it

Substantially lower mAP scores when the same network is tested on Tobacco800 after training on document image pairs instead would challenge the direct transfer claim.

read the original abstract

This paper presents a novel approach for image retrieval and pattern spotting in document image collections. The manual feature engineering is avoided by learning a similarity-based representation using a Siamese Neural Network trained on a previously prepared subset of image pairs from the ImageNet dataset. The learned representation is used to provide the similarity-based feature maps used to find relevant image candidates in the data collection given an image query. A robust experimental protocol based on the public Tobacco800 document image collection shows that the proposed method compares favorably against state-of-the-art document image retrieval methods, reaching 0.94 and 0.83 of mean average precision (mAP) for retrieval and pattern spotting (IoU=0.7), respectively. Besides, we have evaluated the proposed method considering feature maps of different sizes, showing the impact of reducing the number of features in the retrieval performance and time-consuming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies an ImageNet-trained Siamese network to Tobacco800 document retrieval without adaptation, so the reported mAP numbers rest on an untested domain transfer.

read the letter

The paper takes a Siamese network trained on ImageNet image pairs and applies the resulting embeddings to retrieve and spot patterns in the Tobacco800 document collection. It reports mAP of 0.94 for retrieval and 0.83 for spotting at IoU=0.7, and shows how performance changes with smaller feature maps. What is new is the specific application to document images using this setup. The use of a public dataset and the evaluation on different feature dimensions are straightforward and useful for practitioners who need a quick learned baseline instead of hand-engineered features. The main weakness is the domain shift. The network sees only natural images during training, yet the target is document images with text and layout. The abstract gives no indication of fine-tuning or document-specific pairs, so the strong numbers depend on the assumption that natural-image similarity transfers directly. Without architecture details, training protocol, or baseline comparisons, it is hard to judge if the comparison to SOTA holds. This is the kind of paper that might interest someone building a document retrieval system who wants a simple learned-feature baseline. A reader who already knows Siamese networks will not learn much new. It does not look like it has enough verification on the transfer step to go to peer review without major additions to the experiments.

Referee Report

2 major / 0 minor

Summary. The paper claims that a Siamese Neural Network trained solely on pairs from the ImageNet dataset can learn transferable similarity features for image retrieval and pattern spotting on document images. Using the public Tobacco800 collection, it reports mean average precision of 0.94 for retrieval and 0.83 for pattern spotting (at IoU=0.7), states that these results compare favorably to prior document-specific methods, and examines the effect of reducing feature-map dimensionality on accuracy and runtime.

Significance. If the reported mAP numbers are reproducible and the domain transfer holds, the work would show that natural-image embeddings can be applied off-the-shelf to document retrieval, removing the need for manual features or document-specific training data and thereby simplifying pipelines for large archival collections.

major comments (2)

[Abstract] Abstract: the headline mAP figures (0.94 retrieval, 0.83 spotting) and the claim of favorable comparison to state-of-the-art document methods are presented without any description of network architecture, training protocol on ImageNet pairs, baseline re-implementations, or statistical significance tests, so the data-to-claim link cannot be verified.
[Abstract] Abstract / evaluation protocol: the central assumption that similarity features learned from ImageNet natural-image pairs transfer directly to Tobacco800 documents without domain adaptation or document-specific fine-tuning is not tested by any ablation, feature-alignment analysis, or cross-domain experiment; this premise is load-bearing for the claim that the method outperforms document-tuned baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point-by-point below, indicating planned revisions where appropriate. The full manuscript provides the requested methodological details in the body text; the abstract is a high-level summary.

read point-by-point responses

Referee: [Abstract] Abstract: the headline mAP figures (0.94 retrieval, 0.83 spotting) and the claim of favorable comparison to state-of-the-art document methods are presented without any description of network architecture, training protocol on ImageNet pairs, baseline re-implementations, or statistical significance tests, so the data-to-claim link cannot be verified.

Authors: The abstract is intentionally concise. Network architecture (Siamese backbone), ImageNet pair preparation and training protocol, baseline re-implementations, and experimental comparisons are fully described in Sections 3–5 of the manuscript. We will revise the abstract to include a short clause referencing the Siamese architecture and ImageNet-only training to strengthen the data-to-claim linkage at the summary level. revision: partial
Referee: [Abstract] Abstract / evaluation protocol: the central assumption that similarity features learned from ImageNet natural-image pairs transfer directly to Tobacco800 documents without domain adaptation or document-specific fine-tuning is not tested by any ablation, feature-alignment analysis, or cross-domain experiment; this premise is load-bearing for the claim that the method outperforms document-tuned baselines.

Authors: The manuscript's central experiment is exactly this direct transfer test: a model trained exclusively on ImageNet pairs is evaluated on Tobacco800 without any document fine-tuning or adaptation, and it outperforms prior document-specific methods. This constitutes the cross-domain evidence. While an explicit ablation comparing an ImageNet-trained model against a Tobacco800-trained counterpart is absent, the reported results already isolate the transfer benefit. We will add a dedicated discussion paragraph on domain transfer implications. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical transfer evaluated on external public benchmark

full rationale

The paper trains a standard Siamese network on ImageNet image pairs and applies the resulting embeddings to the independent Tobacco800 document collection for retrieval and pattern spotting, reporting mAP against external SOTA baselines. No equations, fitted parameters, or self-citations are presented that reduce the headline mAP figures (0.94/0.83) to definitions or inputs of the same quantities by construction. The derivation chain consists of off-the-shelf network training followed by direct feature extraction and ranking on a held-out public dataset; the domain-transfer assumption is an empirical claim open to falsification rather than a self-referential reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on transfer of similarity features from natural images to documents plus the assumption that mAP on Tobacco800 reflects real retrieval utility.

free parameters (1)

feature-map dimensionality
Paper varies this size and reports impact on performance and speed, indicating it is chosen rather than derived.

axioms (1)

domain assumption Features learned on ImageNet pairs generalize to document images for retrieval.
Invoked by training exclusively on ImageNet pairs then testing on Tobacco800 without domain adaptation.

pith-pipeline@v0.9.0 · 5692 in / 1093 out tokens · 35155 ms · 2026-05-25T17:49:36.625907+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

Deep binary codes for large scale image retrieval,

S. Wu, A. Oerlemans, E. M. Bakker, and M. S. Lew, “Deep binary codes for large scale image retrieval,” Neurocomputing, 2017

work page 2017
[2]

Large-scale image retrieval with supervised sparse hashing,

Y . Xu, F. Shen, X. Xu, L. Gao, Y . Wang, and X. Tan, “Large-scale image retrieval with supervised sparse hashing,” Neurocomputing, vol. 229, pp. 45 – 53, 2017

work page 2017
[3]

A scalable pattern spotting system for historical documents,

S. En, C. Petitjean, S. Nicolas, and L. Heutte, “A scalable pattern spotting system for historical documents,” Pattern Recognition, vol. 54, pp. 149–161, 2016

work page 2016
[4]

Recognition and analysis of objects in medieval images,

P . Y arlagadda, A. Monroy, B. Carque, and B. Ommer, “Recognition and analysis of objects in medieval images,” in ACCV 2010 International Workshops, R. Koch and F. Huang, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 296–305

work page 2010
[5]

Logo matching for document image re- trieval,

G. Zhu and D. Doermann, “Logo matching for document image re- trieval,” in 2009 10th International Conference on Document Analysis and Recognition , 2009, pp. 606–610

work page 2009
[6]

Video google: a text retrieval approach to object matching in videos,

Sivic and Zisserman, “Video google: a text retrieval approach to object matching in videos,” in Proceedings Ninth IEEE International Confer- ence on Computer Vision , Oct 2003, pp. 1470–1477 vol.2

work page 2003
[7]

Aggregating local deep features for image retrieval,

A. Babenko and V . Lempitsky, “Aggregating local deep features for image retrieval,” in The IEEE International Conference on Computer Vision (ICCV), December 2015

work page 2015
[8]

Exploiting local features from deep networks for image retrieval,

J. Y ue-Hei, N. F. Y ang, and L. S. Davis, “Exploiting local features from deep networks for image retrieval,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition , 2015, pp. 53–61

work page 2015
[9]

Deep image retrieval: Learning global representations for image search,

A. Gordo, J. Almazán, J. Revaud, and D. Larlus, “Deep image retrieval: Learning global representations for image search,” in Computer Vision ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI , B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 241–257

work page 2016
[10]

Grading image retrieval based on cnn deep features,

Y . W. Luo, Y . Li, F. J. Han, and S. B. Huang, “Grading image retrieval based on cnn deep features,” in 2018 20th International Conference on Advanced Communication Technology (ICACT), Feb 2018, pp. 148–152

work page 2018
[11]

Document image retrieval using deep features,

K. L. Wiggers, A. S. Britto Jr., A. L. Koerich, L. Heutte, and L. E. S. Oliveira, “Document image retrieval using deep features,” in Interna- tional Joint Conference on Neural Networks (IJCNN) , vol. 1, Rio de Janeiro, 2018, pp. 3185–3192

work page 2018
[12]

Siamese neural networks for one-shot image recognition,

G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for one-shot image recognition,” in ICML 2015 Deep Learning Workshop , 2015

work page 2015
[13]

Face recognition based on convolution siamese networks,

H. Wu, Z. Xu, J. Zhang, W. Y an, and X. Ma, “Face recognition based on convolution siamese networks,” in 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Oct 2017, pp. 1–5

work page 2017
[14]

Digital libraries and document im- age retrieval techniques: A survey,

S. Marinai, B. Miotti, and G. Soda, “Digital libraries and document im- age retrieval techniques: A survey,” in Learning Structure and Schemas from Documents , M. Biba and F. Xhafa, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 181–204

work page 2011
[15]

Image retrieval based on image-to-class similarity,

J. Chen, Y . Wang, L. Luo, J.-G. Y u, and J. Ma, “Image retrieval based on image-to-class similarity,” Pattern Recognition Letters , vol. 83, Part 3, pp. 379 – 387, 2016

work page 2016
[16]

An ef ﬁcient semantic – related image retrieval method,

Q. D. T. Thuy, Q. N. Huu, C. P . V an, and T. N. Quoc, “An ef ﬁcient semantic – related image retrieval method,” Expert Systems with Appli- cations, vol. 72, pp. 30 – 41, 2017

work page 2017
[17]

Historical manuscript dating based on temporal pattern codebook,

S. He, P . Samara, J. Burgers, and L. Schomaker, “Historical manuscript dating based on temporal pattern codebook,” Computer Vision and Image Understanding, vol. 152, pp. 167 – 175, 2016

work page 2016
[18]

Logo detection using painting based representation and probability features,

A. Alaei, M. Delalandre, and N. Girard, “Logo detection using painting based representation and probability features,” in 12th International Conference on Document Analysis and Recognition , vol. 1236-1239, 2013

work page 2013
[19]

Region proposal for pattern spotting in historical document images,

S. En, C. Petitjean, S. Nicolas, L. Heutte, and F. Jurie, “Region proposal for pattern spotting in historical document images,” in 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Oct 2016, pp. 367–372

work page 2016
[20]

Selective search for object recognition,

J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeul- ders, “Selective search for object recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154–171, 2013

work page 2013
[21]

Edge boxes: Locating object proposals from edges,

C. L. Zitnick and P . Dollár, “Edge boxes: Locating object proposals from edges,” in ECCV, 2014

work page 2014
[22]

BING: Binarized normed gradients for objectness estimation at 300fps,

M.-M. Cheng, Z. Zhang, W.-Y . Lin, and P . H. S. Torr, “BING: Binarized normed gradients for objectness estimation at 300fps,” in IEEE CVPR , 2014

work page 2014
[23]

Using very deep autoencoders for content-based image retrieval

A. Krizhevsky and G. E. Hinton, “Using very deep autoencoders for content-based image retrieval.” in ESANN, 2011

work page 2011
[24]

Supervised hashing for image retrieval via image representation learning,

R. Xia, Y . Pan, H. Lai, C. Liu, and S. Y an, “Supervised hashing for image retrieval via image representation learning,” in Proceedings of the Twenty-Eighth AAAI Conference on Arti ﬁcial Intelligence . AAAI Press, 2014, pp. 2156–2162

work page 2014
[25]

Neural codes for image retrieval,

A. Babenko, A. Slesarev, A. Chigorin, and V . Lempitsky, “Neural codes for image retrieval,” in Computer Vision – ECCV 2014 , D. Fleet, T. Pa- jdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, pp. 584–599

work page 2014
[26]

Facenet: A uni ﬁed embed- ding for face recognition and clustering,

J. P . Florian Schroff, Dmitry Kalenichenko, “Facenet: A uni ﬁed embed- ding for face recognition and clustering,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2015, pp. 815–823

work page 2015
[27]

Class-balanced siamese neural networks,

S. Berlemont, G. Lefebvre, S. Duffner, and C. Garcia, “Class-balanced siamese neural networks,” Neurocomputing, vol. 273, pp. 47 – 56, 2018

work page 2018
[28]

Sig- nature veri ﬁcation using a

J. Bromley, I. Guyon, Y . LeCun, E. Säckinger, and R. Shah, “Sig- nature veri ﬁcation using a "siamese" time delay neural network,” in Proceedings of the 6th International Conference on Neural Information Processing Systems . San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993, pp. 737–744

work page 1993
[29]

Sketch-based image retrieval via siamese convolutional neural network,

Y . Qi, Y . Song, H. Zhang, and J. Liu, “Sketch-based image retrieval via siamese convolutional neural network,” in 2016 IEEE International Conference on Image Processing (ICIP) , Sept 2016, pp. 2460–2464

work page 2016
[30]

Learning deep representations of medi- cal images using siamese cnns with application to content-based image retrieval,

Y .-A. Chung and W.-H. Weng, “Learning deep representations of medi- cal images using siamese cnns with application to content-based image retrieval,” in Proceedings of the 31st Conference on Neural Information Processing Systems - NIPS 2017 , 11 2017

work page 2017
[31]

Learning deep representations for ground-to-aerial geolocalization,

T. Lin, Y . Cui, S. Belongie, and J. Hays, “Learning deep representations for ground-to-aerial geolocalization,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 5007– 5015

work page 2015
[32]

Siamese network features for image matching,

I. Melekhov, J. Kannala, and E. Rahtu, “Siamese network features for image matching,” in 2016 23rd International Conference on Pattern Recognition (ICPR) , Dec 2016, pp. 378–383

work page 2016
[33]

Imagenet classi ﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classi ﬁcation with deep convolutional neural networks,” in Advances in Neural Infor- mation Processing Systems , 2012

work page 2012
[34]

Caffe: Convolutional Architecture for Fast Feature Embedding

Y . Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[35]

Hogwild: A lock-free approach to parallelizing stochastic gradient descent,

B. Recht, C. Re, S. Wright, and F. Niu, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,” in Advances in Neural In- formation Processing Systems 24 , J. Shawe-taylor, R. Zemel, P . Bartlett, F. Pereira, and K. Weinberger, Eds., 2011, pp. 693–701

work page 2011
[36]

Learning effective binary descriptors via cross entropy,

L. Liu and H. Qi, “Learning effective binary descriptors via cross entropy,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), March 2017, pp. 1251–1258

work page 2017
[37]

Optimal decisions from probabilistic models: the intersection-over-union case,

S. Nowozin, “Optimal decisions from probabilistic models: the intersection-over-union case,” in Computer Vision and Pattern Recog- nition (CVPR 2014) . IEEE Computer Society, June 2014

work page 2014
[38]

Building a test collection for complex document information processing,

D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, and J. Heard, “Building a test collection for complex document information processing,” in Proc. 29th Annual Int. ACM SIGIR Conference (SIGIR 2006), 2006, pp. 665–666

work page 2006
[39]

Logo retrieval in document images,

R. Jain and D. Doermann, “Logo retrieval in document images,” in 2012 10th IAPR International Workshop on Document Analysis Systems , 2012, pp. 135–139

work page 2012
[40]

Ef ﬁcient logo retrieval through hashing shape context descriptors,

M. Rusinol and J. Lladós, “Ef ﬁcient logo retrieval through hashing shape context descriptors,” in Proceedings of the 9th IAPR International Workshop on Document Analysis Systems , 2010, pp. 215–222

work page 2010
[41]

Improving logo spotting and matching for document categorization by a post- ﬁlter based on homography,

V . P . Le, M. Visani, C. D. Tran, and J. M. Ogier, “Improving logo spotting and matching for document categorization by a post- ﬁlter based on homography,” in 2013 12th International Conference on Document Analysis and Recognition , 2013, pp. 270–274

work page 2013
[42]

Document retrieval based on logo spotting using key-point matching,

V . P . Le, N. Nayef, M. Visani, J.-M. Ogier, and C. D. Tran, “Document retrieval based on logo spotting using key-point matching,” in 2014 22nd International Conference on Pattern Recognition , 2014, pp. 3056–3061

work page 2014

[1] [1]

Deep binary codes for large scale image retrieval,

S. Wu, A. Oerlemans, E. M. Bakker, and M. S. Lew, “Deep binary codes for large scale image retrieval,” Neurocomputing, 2017

work page 2017

[2] [2]

Large-scale image retrieval with supervised sparse hashing,

Y . Xu, F. Shen, X. Xu, L. Gao, Y . Wang, and X. Tan, “Large-scale image retrieval with supervised sparse hashing,” Neurocomputing, vol. 229, pp. 45 – 53, 2017

work page 2017

[3] [3]

A scalable pattern spotting system for historical documents,

S. En, C. Petitjean, S. Nicolas, and L. Heutte, “A scalable pattern spotting system for historical documents,” Pattern Recognition, vol. 54, pp. 149–161, 2016

work page 2016

[4] [4]

Recognition and analysis of objects in medieval images,

P . Y arlagadda, A. Monroy, B. Carque, and B. Ommer, “Recognition and analysis of objects in medieval images,” in ACCV 2010 International Workshops, R. Koch and F. Huang, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 296–305

work page 2010

[5] [5]

Logo matching for document image re- trieval,

G. Zhu and D. Doermann, “Logo matching for document image re- trieval,” in 2009 10th International Conference on Document Analysis and Recognition , 2009, pp. 606–610

work page 2009

[6] [6]

Video google: a text retrieval approach to object matching in videos,

Sivic and Zisserman, “Video google: a text retrieval approach to object matching in videos,” in Proceedings Ninth IEEE International Confer- ence on Computer Vision , Oct 2003, pp. 1470–1477 vol.2

work page 2003

[7] [7]

Aggregating local deep features for image retrieval,

A. Babenko and V . Lempitsky, “Aggregating local deep features for image retrieval,” in The IEEE International Conference on Computer Vision (ICCV), December 2015

work page 2015

[8] [8]

Exploiting local features from deep networks for image retrieval,

J. Y ue-Hei, N. F. Y ang, and L. S. Davis, “Exploiting local features from deep networks for image retrieval,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition , 2015, pp. 53–61

work page 2015

[9] [9]

Deep image retrieval: Learning global representations for image search,

A. Gordo, J. Almazán, J. Revaud, and D. Larlus, “Deep image retrieval: Learning global representations for image search,” in Computer Vision ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI , B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 241–257

work page 2016

[10] [10]

Grading image retrieval based on cnn deep features,

Y . W. Luo, Y . Li, F. J. Han, and S. B. Huang, “Grading image retrieval based on cnn deep features,” in 2018 20th International Conference on Advanced Communication Technology (ICACT), Feb 2018, pp. 148–152

work page 2018

[11] [11]

Document image retrieval using deep features,

K. L. Wiggers, A. S. Britto Jr., A. L. Koerich, L. Heutte, and L. E. S. Oliveira, “Document image retrieval using deep features,” in Interna- tional Joint Conference on Neural Networks (IJCNN) , vol. 1, Rio de Janeiro, 2018, pp. 3185–3192

work page 2018

[12] [12]

Siamese neural networks for one-shot image recognition,

G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for one-shot image recognition,” in ICML 2015 Deep Learning Workshop , 2015

work page 2015

[13] [13]

Face recognition based on convolution siamese networks,

H. Wu, Z. Xu, J. Zhang, W. Y an, and X. Ma, “Face recognition based on convolution siamese networks,” in 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Oct 2017, pp. 1–5

work page 2017

[14] [14]

Digital libraries and document im- age retrieval techniques: A survey,

S. Marinai, B. Miotti, and G. Soda, “Digital libraries and document im- age retrieval techniques: A survey,” in Learning Structure and Schemas from Documents , M. Biba and F. Xhafa, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 181–204

work page 2011

[15] [15]

Image retrieval based on image-to-class similarity,

J. Chen, Y . Wang, L. Luo, J.-G. Y u, and J. Ma, “Image retrieval based on image-to-class similarity,” Pattern Recognition Letters , vol. 83, Part 3, pp. 379 – 387, 2016

work page 2016

[16] [16]

An ef ﬁcient semantic – related image retrieval method,

Q. D. T. Thuy, Q. N. Huu, C. P . V an, and T. N. Quoc, “An ef ﬁcient semantic – related image retrieval method,” Expert Systems with Appli- cations, vol. 72, pp. 30 – 41, 2017

work page 2017

[17] [17]

Historical manuscript dating based on temporal pattern codebook,

S. He, P . Samara, J. Burgers, and L. Schomaker, “Historical manuscript dating based on temporal pattern codebook,” Computer Vision and Image Understanding, vol. 152, pp. 167 – 175, 2016

work page 2016

[18] [18]

Logo detection using painting based representation and probability features,

A. Alaei, M. Delalandre, and N. Girard, “Logo detection using painting based representation and probability features,” in 12th International Conference on Document Analysis and Recognition , vol. 1236-1239, 2013

work page 2013

[19] [19]

Region proposal for pattern spotting in historical document images,

S. En, C. Petitjean, S. Nicolas, L. Heutte, and F. Jurie, “Region proposal for pattern spotting in historical document images,” in 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Oct 2016, pp. 367–372

work page 2016

[20] [20]

Selective search for object recognition,

J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeul- ders, “Selective search for object recognition,” International Journal of Computer Vision, vol. 104, no. 2, pp. 154–171, 2013

work page 2013

[21] [21]

Edge boxes: Locating object proposals from edges,

C. L. Zitnick and P . Dollár, “Edge boxes: Locating object proposals from edges,” in ECCV, 2014

work page 2014

[22] [22]

BING: Binarized normed gradients for objectness estimation at 300fps,

M.-M. Cheng, Z. Zhang, W.-Y . Lin, and P . H. S. Torr, “BING: Binarized normed gradients for objectness estimation at 300fps,” in IEEE CVPR , 2014

work page 2014

[23] [23]

Using very deep autoencoders for content-based image retrieval

A. Krizhevsky and G. E. Hinton, “Using very deep autoencoders for content-based image retrieval.” in ESANN, 2011

work page 2011

[24] [24]

Supervised hashing for image retrieval via image representation learning,

R. Xia, Y . Pan, H. Lai, C. Liu, and S. Y an, “Supervised hashing for image retrieval via image representation learning,” in Proceedings of the Twenty-Eighth AAAI Conference on Arti ﬁcial Intelligence . AAAI Press, 2014, pp. 2156–2162

work page 2014

[25] [25]

Neural codes for image retrieval,

A. Babenko, A. Slesarev, A. Chigorin, and V . Lempitsky, “Neural codes for image retrieval,” in Computer Vision – ECCV 2014 , D. Fleet, T. Pa- jdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, pp. 584–599

work page 2014

[26] [26]

Facenet: A uni ﬁed embed- ding for face recognition and clustering,

J. P . Florian Schroff, Dmitry Kalenichenko, “Facenet: A uni ﬁed embed- ding for face recognition and clustering,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2015, pp. 815–823

work page 2015

[27] [27]

Class-balanced siamese neural networks,

S. Berlemont, G. Lefebvre, S. Duffner, and C. Garcia, “Class-balanced siamese neural networks,” Neurocomputing, vol. 273, pp. 47 – 56, 2018

work page 2018

[28] [28]

Sig- nature veri ﬁcation using a

J. Bromley, I. Guyon, Y . LeCun, E. Säckinger, and R. Shah, “Sig- nature veri ﬁcation using a "siamese" time delay neural network,” in Proceedings of the 6th International Conference on Neural Information Processing Systems . San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993, pp. 737–744

work page 1993

[29] [29]

Sketch-based image retrieval via siamese convolutional neural network,

Y . Qi, Y . Song, H. Zhang, and J. Liu, “Sketch-based image retrieval via siamese convolutional neural network,” in 2016 IEEE International Conference on Image Processing (ICIP) , Sept 2016, pp. 2460–2464

work page 2016

[30] [30]

Learning deep representations of medi- cal images using siamese cnns with application to content-based image retrieval,

Y .-A. Chung and W.-H. Weng, “Learning deep representations of medi- cal images using siamese cnns with application to content-based image retrieval,” in Proceedings of the 31st Conference on Neural Information Processing Systems - NIPS 2017 , 11 2017

work page 2017

[31] [31]

Learning deep representations for ground-to-aerial geolocalization,

T. Lin, Y . Cui, S. Belongie, and J. Hays, “Learning deep representations for ground-to-aerial geolocalization,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015, pp. 5007– 5015

work page 2015

[32] [32]

Siamese network features for image matching,

I. Melekhov, J. Kannala, and E. Rahtu, “Siamese network features for image matching,” in 2016 23rd International Conference on Pattern Recognition (ICPR) , Dec 2016, pp. 378–383

work page 2016

[33] [33]

Imagenet classi ﬁcation with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classi ﬁcation with deep convolutional neural networks,” in Advances in Neural Infor- mation Processing Systems , 2012

work page 2012

[34] [34]

Caffe: Convolutional Architecture for Fast Feature Embedding

Y . Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[35] [35]

Hogwild: A lock-free approach to parallelizing stochastic gradient descent,

B. Recht, C. Re, S. Wright, and F. Niu, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,” in Advances in Neural In- formation Processing Systems 24 , J. Shawe-taylor, R. Zemel, P . Bartlett, F. Pereira, and K. Weinberger, Eds., 2011, pp. 693–701

work page 2011

[36] [36]

Learning effective binary descriptors via cross entropy,

L. Liu and H. Qi, “Learning effective binary descriptors via cross entropy,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), March 2017, pp. 1251–1258

work page 2017

[37] [37]

Optimal decisions from probabilistic models: the intersection-over-union case,

S. Nowozin, “Optimal decisions from probabilistic models: the intersection-over-union case,” in Computer Vision and Pattern Recog- nition (CVPR 2014) . IEEE Computer Society, June 2014

work page 2014

[38] [38]

Building a test collection for complex document information processing,

D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, and J. Heard, “Building a test collection for complex document information processing,” in Proc. 29th Annual Int. ACM SIGIR Conference (SIGIR 2006), 2006, pp. 665–666

work page 2006

[39] [39]

Logo retrieval in document images,

R. Jain and D. Doermann, “Logo retrieval in document images,” in 2012 10th IAPR International Workshop on Document Analysis Systems , 2012, pp. 135–139

work page 2012

[40] [40]

Ef ﬁcient logo retrieval through hashing shape context descriptors,

M. Rusinol and J. Lladós, “Ef ﬁcient logo retrieval through hashing shape context descriptors,” in Proceedings of the 9th IAPR International Workshop on Document Analysis Systems , 2010, pp. 215–222

work page 2010

[41] [41]

Improving logo spotting and matching for document categorization by a post- ﬁlter based on homography,

V . P . Le, M. Visani, C. D. Tran, and J. M. Ogier, “Improving logo spotting and matching for document categorization by a post- ﬁlter based on homography,” in 2013 12th International Conference on Document Analysis and Recognition , 2013, pp. 270–274

work page 2013

[42] [42]

Document retrieval based on logo spotting using key-point matching,

V . P . Le, N. Nayef, M. Visani, J.-M. Ogier, and C. D. Tran, “Document retrieval based on logo spotting using key-point matching,” in 2014 22nd International Conference on Pattern Recognition , 2014, pp. 3056–3061

work page 2014