Recognition: no theorem link
Zero-Shot Synthetic-to-Real Handwritten Text Recognition via Task Analogies
Pith reviewed 2026-05-10 18:31 UTC · model grok-4.3
The pith
Models learn the parameter shift from synthetic to real handwriting in known languages and apply the same correction to recognize real text in entirely new languages with no real target samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our approach learns how model parameters change when moving from synthetic to real handwriting in one or more source languages and transfers this learned correction to new target languages. When using multiple sources, we rely on linguistic similarity to weigh their contribution when combining them. Experiments across five languages and six architectures show consistent improvements over synthetic-only baselines and reveal that the transferred corrections benefit even languages unrelated to the sources.
What carries the argument
The learned parameter correction that maps a synthetic-trained model toward real-data performance, transferred by analogy to target languages.
If this is right
- Recognition accuracy on real test data rises in target languages compared with models trained only on synthetic data.
- Multiple source languages can be combined by weighting each correction according to linguistic similarity.
- Performance gains occur even when the target language has no linguistic connection to any source language.
- The same correction mechanism improves results across different neural architectures used for handwritten text recognition.
Where Pith is reading between the lines
- Collecting large real handwriting datasets could become unnecessary for many languages once source corrections exist.
- The same idea of learning and transferring parameter shifts might apply to other synthetic-to-real problems such as scene text or document layout analysis.
- If the shifts turn out to be stable, researchers could test whether corrections can be chained across sequences of languages or scripts.
Load-bearing premise
The adjustment that model parameters require when moving from synthetic to real handwriting is similar enough across languages for the correction to transfer directly.
What would settle it
Apply the parameter correction learned from a source language pair to a model for a new target language and measure whether accuracy on real target data improves, stays the same, or drops below the synthetic-only baseline.
Figures
read the original abstract
Handwritten Text Recognition (HTR) models trained on synthetic handwriting often struggle to generalize to real text, and existing adaptation methods still require real samples from the target domain. In this work, we tackle the fully zero-shot synthetic-to-real generalization setting, where no real data from the target language is available. Our approach learns how model parameters change when moving from synthetic to real handwriting in one or more source languages and transfers this learned correction to new target languages. When using multiple sources, we rely on linguistic similarity to weigh their contrubition when combining them. Experiments across five languages and six architectures show consistent improvements over synthetic-only baselines and reveal that the transferred corrections benefit even languages unrelated to the sources.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a zero-shot synthetic-to-real adaptation method for handwritten text recognition (HTR). It learns a parameter-level correction (delta) from synthetic-to-real shifts observed in one or more source languages and transfers this correction to a target language by adding it to the synthetic-trained model parameters; when multiple sources are available, the deltas are combined via a linguistic-similarity weighting. Experiments on five languages and six architectures are reported to show consistent gains over synthetic-only baselines, including gains on languages unrelated to the sources.
Significance. If the empirical results and the underlying invariance assumption hold under closer scrutiny, the work would be significant for zero-shot domain adaptation in document analysis and computer vision. It offers a practical route to improve HTR models for languages lacking real annotated data by exploiting cross-lingual analogies in parameter space rather than requiring target-domain samples. The multi-architecture evaluation is a positive feature that supports broader applicability.
major comments (3)
- [§3] §3 (Method): The central claim that the learned correction delta = theta_real_source - theta_synth_source can be directly added to theta_synth_target rests on an untested cross-lingual invariance assumption. When source and target languages differ in script or character set, the output-layer dimensionality and character-frequency statistics differ, so the shared-parameter portion of delta may encode language-specific cues rather than a domain shift; the linguistic-similarity weighting does not resolve this.
- [§4] §4 (Experiments) and Table 2: The abstract and results claim 'consistent improvements' across five languages and six architectures, yet no numerical values, baseline CER/WER figures, standard deviations, or statistical significance tests are provided in the visible sections. Without these, it is impossible to judge effect size or whether gains on unrelated languages are reliable or merely within noise.
- [§5.1] §5.1 (Discussion of unrelated languages): The observation that corrections benefit unrelated languages is presented as evidence of generality, but the paper does not report an ablation that isolates whether the benefit arises from the shared convolutional backbone versus the language-specific classifier head. This leaves the invariance claim load-bearing but unverified.
minor comments (3)
- [Abstract] Abstract: 'contrubition' is a typo for 'contribution'.
- [§2] §2 (Related Work): The discussion of prior synthetic-to-real HTR adaptation omits several recent parameter-efficient or prompt-based zero-shot methods that could serve as stronger baselines.
- [Figure 3] Figure 3: Axis labels and legend are too small; the plotted curves for different source combinations are difficult to distinguish.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We provide point-by-point responses to the major comments and have made revisions to address the concerns raised.
read point-by-point responses
-
Referee: [§3] §3 (Method): The central claim that the learned correction delta = theta_real_source - theta_synth_source can be directly added to theta_synth_target rests on an untested cross-lingual invariance assumption. When source and target languages differ in script or character set, the output-layer dimensionality and character-frequency statistics differ, so the shared-parameter portion of delta may encode language-specific cues rather than a domain shift; the linguistic-similarity weighting does not resolve this.
Authors: We thank the referee for this observation. In the revised manuscript, we clarify in §3 that the delta is computed and transferred only for the shared convolutional layers of the network, as the classifier head is language-specific and its parameters are not included in the correction for target languages with different scripts. This ensures the transferred delta focuses on domain shift in feature extraction. We have added a figure showing the selective parameter update. While the invariance is an assumption, the positive results on unrelated languages support its practical utility. The linguistic similarity is used for weighting multiple sources but is secondary to the shared-layer design. revision: yes
-
Referee: [§4] §4 (Experiments) and Table 2: The abstract and results claim 'consistent improvements' across five languages and six architectures, yet no numerical values, baseline CER/WER figures, standard deviations, or statistical significance tests are provided in the visible sections. Without these, it is impossible to judge effect size or whether gains on unrelated languages are reliable or merely within noise.
Authors: We have updated §4 to explicitly quote and discuss the key numerical results from Table 2, including baseline and improved CER/WER values for each language and model. Standard deviations are now reported for experiments with multiple seeds. We also added statistical significance testing using paired t-tests, with p-values indicating that the improvements are significant, including for gains on languages unrelated to the sources. revision: yes
-
Referee: [§5.1] §5.1 (Discussion of unrelated languages): The observation that corrections benefit unrelated languages is presented as evidence of generality, but the paper does not report an ablation that isolates whether the benefit arises from the shared convolutional backbone versus the language-specific classifier head. This leaves the invariance claim load-bearing but unverified.
Authors: This is a fair point. We have performed the requested ablation and included the results in the revised §5.1. Specifically, we compare transferring the full delta (where possible) versus only the backbone delta. The ablation shows that the majority of the benefit comes from the backbone corrections, with minimal or no contribution from head parameters when scripts match. This verifies that the method leverages domain-invariant shifts in the shared parameters. revision: yes
Circularity Check
No circularity: empirical parameter-transfer method with no closed-form derivation or self-referential fitting
full rationale
The paper presents an empirical method for zero-shot domain transfer in HTR by learning parameter deltas on source languages and applying weighted combinations to targets based on linguistic similarity. No equations, derivations, or first-principles claims appear in the provided text; the approach is framed as training on observed source shifts and testing generalization, without any step that reduces a 'prediction' to a fitted input by construction or relies on self-citation for uniqueness. The central claim rests on experimental validation across languages and architectures rather than any mathematical identity or ansatz smuggled via prior work. This is a standard empirical transfer setup with no detectable circular reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Hamada, and Daniyar Nurseitov
Abdelrahman Abdallah, Mohamed A. Hamada, and Daniyar Nurseitov. Attention-based fully gated cnn-bgru for russian handwritten text.Journal of Imaging, 2020. 2
2020
-
[2]
International Conference on Frontiers in Handwriting Recognition (ICFHR 2010) - Competitions Overview
Haikal El Abed, V olker M ¨argner, and Michael Blumen- stein. International Conference on Frontiers in Handwriting Recognition (ICFHR 2010) - Competitions Overview. 2010. 2
2010
-
[3]
Manmatha, and Pietro Per- ona
Aviad Aberdam, Ron Litman, Shahar Tsiper, Oron Anschel, Ron Slossberg, Shai Mazor, R. Manmatha, and Pietro Per- ona. Sequence-to-sequence contrastive learning for text recognition.Computer Vision and Pattern Recognition,
-
[4]
Understanding intermediate layers using linear classifier probes
Guillaume Alain and Yoshua Bengio. Understanding in- termediate layers using linear classifier probes.ArXiv, abs/1610.01644, 2016. 8
work page Pith review arXiv 2016
-
[5]
Jose Carlos Aradillas, Juan Jose Murillo-Fuentes, and Pablo M. Olmos. Boosting Offline Handwritten Text Recog- nition in Historical Documents With Few Labeled Lines. IEEE Access, 9:76674–76688, 2021. 2
2021
-
[6]
Jose Carlos Aradillas, Juan Jose Murillo-Fuentes, and Pablo M. Olmos. Boosting offline handwritten text recog- nition in historical documents with few labeled lines.IEEE Access, 2021. 2
2021
-
[7]
Jose Carlos Aradillas Jaramillo, Juan Jose Murillo-Fuentes, and Pablo M. Olmos. Boosting Handwriting Text Recog- nition in Small Databases with Transfer Learning. In2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pages 429–434, Niagara Falls, NY , USA, 2018. IEEE. 2
2018
-
[8]
Neural machine translation by jointly learning to align and translate
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In3rd International Conference on Learning Rep- resentations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. 2
2015
-
[9]
A light transformer-based architecture for handwritten text recognition
Killian Barrere, Yann Soullard, Aur ´elie Lemaitre, and Bertrand Co ¨uasnon. A light transformer-based architecture for handwritten text recognition. 2022. 2
2022
-
[10]
Killian Barrere, Yann Soullard, Aur ´elie Lemaitre, and Bertrand Co ¨uasnon. Training transformer architectures on few annotated data: an application to historical handwritten text recognition.International Journal on Document Analy- sis and Recognition (IJDAR), 2024. 2
2024
-
[11]
MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition
Ayan Kumar Bhunia, Shuvozit Ghose, Amandeep Ku- mar, Pinaki Nath Chowdhury, Aneeshan Sain, and Yi-Zhe Song. MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition. In2021 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 15825– 15834, Nashville, TN, USA, 2021. IEEE. 2
2021
-
[12]
The lam dataset: A novel benchmark for line- level handwritten text recognition
Silvia Cascianelli, Vittorio Pippi, Maarand Martin, Marcella Cornia, Lorenzo Baraldi, Kermorvant Christopher, and Rita Cucchiara. The lam dataset: A novel benchmark for line- level handwritten text recognition. InICPR, 2022. 5, 1
2022
-
[13]
End-to-end handwritten paragraph text recognition using a vertical attention network
Denis Coquenet, Clement Chatelain, and Thierry Paquet. End-to-end handwritten paragraph text recognition using a vertical attention network. 2022. 1, 2, 6
2022
-
[14]
Dan: a segmentation-free document attention network for handwritten document recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
Denis Coquenet, Cl ´ement Chatelain, and Thierry Paquet. Dan: a segmentation-free document attention network for handwritten document recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. 1, 5
2023
-
[15]
Dan: a segmentation-free document attention network for handwritten document recognition
Denis Coquenet, Cl ´ement Chatelain, and Thierry Paquet. Dan: a segmentation-free document attention network for handwritten document recognition. 2023. 2
2023
-
[16]
Rethinking text line recog- nition models.arXiv, 2021
Daniel Hernandez Diaz, Reeve Ingle, Siyang Qin, Alessan- dro Bissacco, and Yasuhisa Fujii. Rethinking text line recog- nition models.arXiv, 2021. 2
2021
-
[17]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale.ArXiv, abs/2010.11929, 2020. 2
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[18]
Dtrocr: Decoder-only transformer for opti- cal character recognition.arXiv.org, 2023
Masato Fujitake. Dtrocr: Decoder-only transformer for opti- cal character recognition.arXiv.org, 2023. 2
2023
-
[19]
Task singular vectors: Reducing task in- terference in model merging.Proceedings of the IEEE con- ference on Computer Vision and Pattern Recognition, 2025
Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, and Emanuele Rodol`a. Task singular vectors: Reducing task in- terference in model merging.Proceedings of the IEEE con- ference on Computer Vision and Pattern Recognition, 2025. 2
2025
-
[20]
On the generalization of handwritten text recognition models
Carlos Garrido-Munoz and Jorge Calvo-Zaragoza. On the generalization of handwritten text recognition models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 15275–15286,
-
[21]
Handwritten text recognition: A survey, 2025
Carlos Garrido-Munoz, Antonio Rios-Vila, and Jorge Calvo- Zaragoza. Handwritten text recognition: A survey, 2025. 1, 2
2025
-
[22]
Framewise phoneme classification with bidirectional lstm and other neural net- work architectures.Neural Networks, 18(5):602–610, 2005
Alex Graves and J ¨urgen Schmidhuber. Framewise phoneme classification with bidirectional lstm and other neural net- work architectures.Neural Networks, 18(5):602–610, 2005. IJCNN 2005. 2
2005
-
[23]
Graves, Santiago Fern ´andez, Faustino J
A. Graves, Santiago Fern ´andez, Faustino J. Gomez, and J. Schmidhuber. Connectionist temporal classification: la- belling unsegmented sequence data with recurrent neural networks.ICML, 2006. 2, 5
2006
-
[24]
In search of lost do- main generalization.International Conference on Learning Representations, 2021
Ishaan Gulrajani and David Lopez-Paz. In search of lost do- main generalization.International Conference on Learning Representations, 2021. 2, 4, 6
2021
-
[25]
Long short-term memory.Neural Comput., 9(8):1735–1780, 1997
Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory.Neural Comput., 9(8):1735–1780, 1997. 2
1997
-
[26]
Editing models with task arithmetic
Gabriel Ilharco, Marco T ´ulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. InInternational Con- ference on Learning Representations, 2023. 1, 2, 3
2023
-
[27]
Unsupervised Adaptation for Synthetic- to-Real Handwritten Word Recognition
Lei Kang, Marcal Rusinol, Alicia Fornes, Pau Riba, and Mauricio Villegas. Unsupervised Adaptation for Synthetic- to-Real Handwritten Word Recognition. In2020 IEEE Win- ter Conference on Applications of Computer Vision (WACV), pages 3491–3500, Snowmass Village, CO, USA, 2020. IEEE. 2
2020
-
[28]
Attentionhtr: Handwritten text recognition based on attention encoder-decoder networks
Dmitrijs Kass and Ekta Vats. Attentionhtr: Handwritten text recognition based on attention encoder-decoder networks
-
[29]
Handwrit- ten mail classification experiments with the rimes database
Christopher Kermorvant and J ´erˆome Louradour. Handwrit- ten mail classification experiments with the rimes database. InInternational Conference on Frontiers in Handwriting Recognition, ICFHR 2010, Kolkata, India, 16-18 November 2010, pages 241–246. IEEE Computer Society, 2010. 5, 1
2010
-
[30]
Towards Writing Style Adaptation in Handwriting Recognition, 2023
Jan Koh ´ut, Michal Hradiˇs, and Martin Kiˇsˇs. Towards Writing Style Adaptation in Handwriting Recognition, 2023. Version Number: 1. 2
2023
-
[31]
Lexi- con and attention based handwritten text recognition system
Lalita Kumari, Sukhdeep Singh, Vaibhav Varish Singh Rathore, Anuj Sharma, Lalita Kumari, Sukhdeep Singh, Vaibhav Varish Singh Rathore, and Anuj Sharma. Lexi- con and attention based handwritten text recognition system
-
[32]
Trocr: Transformer-based optical character recognition with pre-trained models.Proceedings of the
Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. Trocr: Transformer-based optical character recognition with pre-trained models.Proceedings of the ... AAAI Conference on Artificial Intelligence, 2023. 1, 2, 5, 6
2023
-
[33]
HTR-VT: Handwritten text recognition with vision trans- former
Yuting Li, Dexiong Chen, Tinglong Tang, and Xi Shen. HTR-VT: Handwritten text recognition with vision trans- former. 158:110967, 2024. 1, 5, 6, 2
2024
-
[34]
Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.Neural Information Processing Systems, 2019
Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks.Neural Information Processing Systems, 2019. 2
2019
-
[35]
Simpler is better: Few-shot semantic segmenta- tion with classifier weight transformer
Zhihe Lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. Simpler is better: Few-shot semantic segmenta- tion with classifier weight transformer. 2021. 2
2021
-
[36]
Magmax: Lever- aging model merging for seamless continual learning
Daniel Marczak, Bartłomiej Twardowski, Tomasz Trzci’nski, and Sebastian Cygert. Magmax: Lever- aging model merging for seamless continual learning. Proceedings of the European Conference on Computer Vision, 2024. 2
2024
-
[37]
Bagdanov, and Joost van de Weijer
Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, and Joost van de Weijer. No task left behind: Isotropic model merging with common and task-specific subspaces. InInternational Conference on Machine Learning, 2025
2025
-
[38]
Weighted ensemble models are strong continual learners.arXiv preprint arXiv: 2312.08977, 2023
Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione, and St´ephane Lathuili`ere. Weighted ensemble models are strong continual learners.arXiv preprint arXiv: 2312.08977, 2023. 2
-
[39]
The iam-database: an en- glish sentence database for offline handwriting recognition
Urs-Viktor Marti and Horst Bunke. The iam-database: an en- glish sentence database for offline handwriting recognition. International Journal on Document Analysis and Recogni- tion, 2002. 5, 1
2002
-
[40]
Merging models with fisher-weighted averaging
Michael Matena and Colin Raffel. Merging models with fisher-weighted averaging. InAdvances in Neural Informa- tion Processing Systems, 2021. 2
2021
-
[41]
Labahn, Tobias Gr ¨uning, and Jochen Z¨ollner
Johannes Michael, R. Labahn, Tobias Gr ¨uning, and Jochen Z¨ollner. Evaluating sequence-to-sequence models for hand- written text recognition.IEEE International Conference on Document Analysis and Recognition, 2019. 2
2019
-
[42]
Saleh Momeni and B. BabaAli. A transformer-based approach for arabic offline handwritten text recognition. arXiv.org, 2023. 2
2023
-
[43]
Aly Mostafa, Omar Mohamed, Ali Ashraf, Ahmed Elbe- hery, Salma Jamal, Ghada Khoriba, and A. Ghoneim. Oc- former: A transformer-based model for arabic handwritten text recognition.2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), 2021. 2
2021
-
[44]
Bag- danov, Simone Calderara, and Joost van de Weijer
Aniello Panariello, Daniel Marczak, Simone Magistri, An- gelo Porrello, Bartłomiej Twardowski, Andrew D. Bag- danov, Simone Calderara, and Joost van de Weijer. Accu- rate and efficient low-rank model merging in core space. In Advances in Neural Information Processing Systems, 2025. 2
2025
-
[45]
Paul, Gagan Madan, Akankshya Mishra, N
S. Paul, Gagan Madan, Akankshya Mishra, N. Hegde, Pradeep Kumar, and Gaurav Aggarwal. Weakly supervised information extraction from inscrutable handwritten docu- ment images.arXiv, 2023. 2
2023
-
[46]
Carlos Pe ˜narrubia, J. J. Valero-Mas, and Jorge Calvo- Zaragoza. Self-supervised learning for text recognition: A critical survey.arXiv.org, 2024. 2
2024
-
[47]
Cascianelli, and R
Vittorio Pippi, S. Cascianelli, and R. Cucchiara. Handwritten text generation from visual archetypes.arXiv.org, 2023. 2
2023
-
[48]
How to choose pretrained handwriting recognition models for single writer fine-tuning
Vittorio Pippi, Silvia Cascianelli, Christopher Kermorvant, and Rita Cucchiara. How to choose pretrained handwriting recognition models for single writer fine-tuning. InInterna- tional Conference on Document Analysis and Recognition, pages 330–347. Springer, 2023. 1, 2
2023
-
[49]
Character-based handwritten text transcription with attention networks.Neural Computing and Applications, 2021
Jason Poulos and Rafael Valle. Character-based handwritten text transcription with attention networks.Neural Computing and Applications, 2021. 2
2021
-
[50]
Joan Puigcerver. Are multidimensional recurrent layers re- ally necessary for handwritten text recognition? In14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, November 9-15, 2017, pages 67–72. IEEE, 2017. 1, 5, 6, 2
2017
-
[51]
Learning to learn single domain generalization.Computer Vision and Pattern Recognition, 2020
Fengchun Qiao, Long Zhao, and Xi Peng. Learning to learn single domain generalization.Computer Vision and Pattern Recognition, 2020. 2
2020
-
[52]
Language models are unsuper- vised multitask learners
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsuper- vised multitask learners. 2019. 2
2019
-
[53]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, 2021. 2
2021
-
[54]
D. V . Sang and Le Tran Bao Cuong. Improving crnn with efficientnet-like feature extractor and multi-head attention for text recognition.SoICT 2019, 2019. 2
2019
-
[55]
The RODRIGO database
Nicolas Serrano, Francisco Castro, and Alfons Juan. The RODRIGO database. InProceedings of the Seventh Interna- tional Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, 2010. European Language Re- sources Association (ELRA). 5, 1
2010
-
[56]
Improving Text Recognition using Optical and Language Model Writer Adaptation
Yann Soullard, Wassim Swaileh, Pierrick Tranouez, Thierry Paquet, and Clement Chatelain. Improving Text Recognition using Optical and Language Model Writer Adaptation. In 2019 International Conference on Document Analysis and 10 Recognition (ICDAR), pages 1175–1180, Sydney, Australia,
2019
-
[57]
Wit: Wikipedia-based image text dataset for multimodal multilingual machine learning
Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, and Marc Najork. Wit: Wikipedia-based image text dataset for multimodal multilingual machine learning. InProceedings of the 44th International ACM SIGIR Confer- ence on Research and Development in Information Retrieval, page 2443–2449, New York, NY , USA, 2021. Association for Computing Ma...
2021
-
[58]
Task arithmetic can mitigate synthetic-to-real gap in automatic speech recognition.Empirical Methods in Natural Language Processing, 2024
Hsuan Su, Hua Farn, Fan-Yun Sun, Shang-Tse Chen, and Hung yi Lee. Task arithmetic can mitigate synthetic-to-real gap in automatic speech recognition.Empirical Methods in Natural Language Processing, 2024. 2
2024
-
[59]
Toselli, and E
Joan Andreu S ´anchez, Ver´onica Romero, A. Toselli, and E. Vidal. Icfhr2014 competition on handwritten text recogni- tion on transcriptorium datasets (htrts).2014 14th Interna- tional Conference on Frontiers in Handwriting Recognition,
2014
-
[60]
Toselli, and Enrique Vidal
Joan Andreu S ´anchez, Ver ´onica Romero, Alejandro H. Toselli, and Enrique Vidal. Icfhr2016 competition on hand- written text recognition on the read dataset. In2016 15th In- ternational Conference on Frontiers in Handwriting Recog- nition (ICFHR), pages 630–635, 2016. 5, 1
2016
-
[61]
Toselli, M
Joan-Andreu S ´anchez, Ver´onica Romero, A. Toselli, M. Vil- legas, and E. Vidal. Icdar2017 competition on handwritten text recognition on the read dataset.2017 14th IAPR Inter- national Conference on Document Analysis and Recognition (ICDAR), 2017. 2
2017
-
[62]
Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu
Chuanqi Tan, F. Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. A survey on deep transfer learn- ing.International Conference on Artificial Neural Networks,
-
[63]
Is it an i or an l: Test- time Adaptation of Text Line Recognition Models, 2023
Debapriya Tula, Sujoy Paul, Gagan Madan, Peter Garst, Reeve Ingle, and Gaurav Aggarwal. Is it an i or an l: Test- time Adaptation of Text Line Recognition Models, 2023. Version Number: 1. 2
2023
-
[64]
Dhali, and Lambert Schomaker
Tobias van der Werff, Maruf A. Dhali, and Lambert Schomaker. Writer adaptation for offline text recognition: An exploration of neural network-based methods, 2023. Ver- sion Number: 1. 2
2023
-
[65]
Cascianelli, Nick Michiels, F
Bram Vanherle, Vittorio Pippi, S. Cascianelli, Nick Michiels, F. Reeth, and R. Cucchiara. Vatr++: Choose your words wisely for handwritten text generation.arXiv.org, 2024. 2
2024
-
[66]
Attention is all you need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neu- ral Information Processing Systems. Curran Associates, Inc.,
-
[67]
Generalizing to unseen domains: A survey on domain generalization.IEEE Transactions on Knowledge and Data Engineering, 2021
Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, and Tao Qin. Generalizing to unseen domains: A survey on domain generalization.IEEE Transactions on Knowledge and Data Engineering, 2021. 2
2021
-
[68]
Deep visual domain adaptation: A survey.Neurocomputing, 2018
Mei Legam Wang and Weihong Deng. Deep visual domain adaptation: A survey.Neurocomputing, 2018. 2
2018
-
[69]
Learning to diversify for single do- main generalization.2021 IEEE/CVF International Confer- ence on Computer Vision (ICCV), 2021
Zijian Wang, Yadan Luo, Ruihong Qiu, Zi Huang, and Mahsa Baktashmotlagh. Learning to diversify for single do- main generalization.2021 IEEE/CVF International Confer- ence on Computer Vision (ICCV), 2021. 2
2021
-
[70]
Weiss, Taghi M
Karl R. Weiss, Taghi M. Khoshgoftaar, and Dingding Wang. A survey of transfer learning.Journal of Big Data, 2016. 2
2016
-
[71]
Rescoring sequence-to-sequence models for text line recog- nition with ctc-prefixes.arXiv: Computer Vision and Pattern Recognition, 2021
Christoph Wick, Jochen Z ¨ollner, and Tobias Gr ¨uning. Rescoring sequence-to-sequence models for text line recog- nition with ctc-prefixes.arXiv: Computer Vision and Pattern Recognition, 2021. 2
2021
-
[72]
Wick, Jochen Z ¨ollner, and Tobias Gr¨uning
C. Wick, Jochen Z ¨ollner, and Tobias Gr¨uning. Transformer for handwritten text recognition using bidirectional post- decoding.ICDAR, 2021. 2
2021
-
[73]
Garrett Wilson and D. Cook. A survey of unsupervised deep domain adaptation.ACM Transactions on Intelligent Systems and Technology, 2018. 2
2018
-
[74]
Robust fine-tuning of zero-shot models
Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gon- tijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, and Ludwig Schmidt. Robust fine-tuning of zero-shot models. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2022. 2, 3
2022
-
[75]
TIES-merging: Resolving interference when merging models,
Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, and Mohit Bansal. Ties-merging: Resolving interference when merging models.arXiv preprint arXiv:2306.01708, 2023
-
[76]
arXiv preprint arXiv:2408.07666 , year=
Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xi- aochun Cao, Jie Zhang, and Dacheng Tao. Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities.arXiv preprint arXiv: 2408.07666, 2024. 2
-
[77]
Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Sheng- gao Zhu, Hualin Luo, Qingzhen Tian, and X. Bai. Read- ing and writing: Discriminative and generative modeling for self-supervised text recognition.ACM Multimedia, 2022. 2
2022
-
[78]
Carbonell, Ruslan Salakhutdinov, and Quoc V
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V . Le. Xlnet: General- ized autoregressive pretraining for language understanding. arXiv: Computation and Language, 2019. 2
2019
-
[79]
Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition
Yaping Zhang, Shuai Nie, Wenju Liu, Xing Xu, Dongxiang Zhang, and Heng Tao Shen. Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition. In 2019 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 2735–2744, Long Beach, CA, USA, 2019. IEEE. 2
2019
-
[80]
Kaiyang Zhou, Ziwei Liu, Y . Qiao, T. Xiang, and Chen Change Loy. Domain generalization: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.