EpiSAM: Character Segmentation in Challenging Stone Inscriptions
Pith reviewed 2026-06-30 10:02 UTC · model grok-4.3
The pith
EpiSAM segments characters in eroded stone inscriptions by also predicting their immediate neighbors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EpiSAM is a prompt-guided transformer framework for character segmentation in stone inscriptions that employs a neighbor-aware strategy by explicitly predicting adjacent characters alongside the target character. These contextual predictions resolve boundary ambiguities caused by surface irregularities and low contrast, resulting in improved mask generation. The work additionally expands an existing stone inscription dataset with dense polygonal annotations to support further epigraphic research, and reports consistent gains over baselines together with strong zero-shot generalization on challenging unseen inscriptions.
What carries the argument
The neighbor-aware strategy that explicitly predicts masks for adjacent characters to supply contextual cues for resolving boundary ambiguities.
If this is right
- Automated analysis of historical stone inscriptions becomes more reliable for historians and linguists.
- An expanded dataset with dense polygonal character annotations enables new research on Southeast Asian epigraphy.
- Zero-shot application works on challenging epigraphic scenarios without retraining the model.
- Better character masks support downstream tasks such as recognition or transcription of degraded inscriptions.
Where Pith is reading between the lines
- The neighbor-prediction tactic could extend to segmenting text in other low-contrast or damaged media such as faded manuscripts.
- The framework might combine with recognition models to produce end-to-end transcription pipelines for inscriptions.
- Performance on inscriptions from different geographic regions or time periods would test the limits of the reported zero-shot behavior.
Load-bearing premise
Explicitly predicting adjacent characters will reliably resolve boundary ambiguities caused by surface irregularities and low contrast.
What would settle it
Measure segmentation mask accuracy on the same set of inscriptions once with neighbor prediction enabled and once disabled; if the accuracy gain disappears, the neighbor-aware mechanism does not deliver the claimed benefit.
Figures
read the original abstract
Stone inscriptions are invaluable sources of historical and linguistic knowledge, yet their automated analysis remains a major challenge due to surface irregularities, erosion, and low visual contrast. Conventional document and handwriting analysis techniques fail to perform well in these scenarios. In this work, we propose character detection as a core strategy for robust inscription analysis. We introduce EpiSAM, a prompt-guided transformer framework for character segmentation in stone inscriptions. Rather than treating characters in isolation, EpiSAM employs a novel neighbor-aware strategy, explicitly predicting adjacent characters alongside the target. These contextual cues resolve boundary ambiguities, improving mask generation and enabling more accurate character segmentation. Furthermore, we expand an existing stone inscription dataset by adding dense polygonal annotations for characters, thereby enabling comprehensive research on Southeast Asian epigraphy. Experimental results show that EpiSAM achieves consistent improvements over existing baselines, while also exhibiting strong zero-shot generalization in challenging epigraphic scenarios.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes EpiSAM, a prompt-guided transformer framework for character segmentation in stone inscriptions. It introduces a novel neighbor-aware strategy that explicitly predicts adjacent characters alongside the target to resolve boundary ambiguities arising from surface irregularities, erosion, and low contrast. The authors expand an existing Southeast Asian stone inscription dataset with dense polygonal annotations and claim that EpiSAM achieves consistent improvements over existing baselines while exhibiting strong zero-shot generalization in challenging epigraphic scenarios.
Significance. If the quantitative results hold after the addition of required evidence, the work would offer a practical advance for automated analysis of historical inscriptions where standard document and handwriting methods fail. The dataset expansion with dense annotations is a clear, reusable contribution that supports further epigraphic research. The zero-shot generalization claim, if substantiated with cross-dataset testing, would increase the method's applicability to varied real-world inscriptions.
major comments (3)
- [Abstract] Abstract: The claims of 'consistent improvements over existing baselines' and 'strong zero-shot generalization' are stated without any quantitative numbers, dataset sizes, ablation studies, or error analysis. This absence prevents verification of the magnitude or reliability of the reported gains and is load-bearing for the central empirical claims.
- [Methods (neighbor-aware strategy)] Description of the neighbor-aware strategy (Methods section): The paper states that explicitly predicting adjacent characters resolves boundary ambiguities and improves mask generation, but provides no mechanism details (prompt construction, loss terms, or changes to the prompt-guided transformer), no boundary-specific metrics, and no ablation isolating the strategy's effect versus the base SAM-like model. Without these, improvements cannot be attributed to the stated innovation rather than dataset expansion or other factors.
- [Experiments] Experiments section: No error analysis on boundary cases (e.g., low-contrast or eroded regions) or ablation tables comparing variants with and without the neighbor-aware component are reported, despite the strategy being presented as the key to resolving the core challenges of surface irregularities.
minor comments (1)
- [Abstract] The abstract and introduction could more explicitly reference the expanded dataset (name, size, annotation protocol) to allow readers to assess its scope.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help improve the clarity and rigor of our work. We address each major comment below and will make the necessary revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claims of 'consistent improvements over existing baselines' and 'strong zero-shot generalization' are stated without any quantitative numbers, dataset sizes, ablation studies, or error analysis. This absence prevents verification of the magnitude or reliability of the reported gains and is load-bearing for the central empirical claims.
Authors: We acknowledge that the abstract currently lacks specific quantitative details. In the revised version, we will include key performance metrics (e.g., mean IoU scores and improvements over baselines), dataset sizes, and references to ablations and error analyses from the experiments section to substantiate the claims. revision: yes
-
Referee: [Methods (neighbor-aware strategy)] Description of the neighbor-aware strategy (Methods section): The paper states that explicitly predicting adjacent characters resolves boundary ambiguities and improves mask generation, but provides no mechanism details (prompt construction, loss terms, or changes to the prompt-guided transformer), no boundary-specific metrics, and no ablation isolating the strategy's effect versus the base SAM-like model. Without these, improvements cannot be attributed to the stated innovation rather than dataset expansion or other factors.
Authors: The Methods section describes the neighbor-aware prediction as part of the prompt-guided transformer, but we agree that additional details on prompt construction, specific loss terms, and architectural changes would strengthen the presentation. We will expand this section accordingly and include an ablation study comparing variants with and without the neighbor-aware component, along with boundary-specific metrics. revision: yes
-
Referee: [Experiments] Experiments section: No error analysis on boundary cases (e.g., low-contrast or eroded regions) or ablation tables comparing variants with and without the neighbor-aware component are reported, despite the strategy being presented as the key to resolving the core challenges of surface irregularities.
Authors: We will add a dedicated error analysis subsection focusing on challenging cases such as low-contrast and eroded regions. Additionally, we will include ablation tables to isolate the contribution of the neighbor-aware strategy. These revisions will help attribute the improvements to the proposed method. revision: yes
Circularity Check
No circularity: empirical ML application with no derivations or self-referential reductions
full rationale
The paper describes an empirical application of a prompt-guided transformer (EpiSAM) for character segmentation, introducing a neighbor-aware strategy at a conceptual level without any equations, derivations, fitted parameters presented as predictions, or load-bearing self-citations. Claims of improvement and zero-shot generalization rest on experimental results against baselines on an expanded dataset, not on any chain that reduces outputs to inputs by construction. No patterns from the enumerated circularity types apply, as there are no mathematical steps or uniqueness theorems invoked.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Prompt-guided transformer architectures (e.g., SAM-style) can be adapted to character-level segmentation on irregular surfaces
Reference graph
Works this paper leans on
-
[1]
A light transformer-based architecture for handwritten text recognition
Killian Barrere, Yann Soullard, Aurélie Lemaitre, and Bertrand Coüasnon. A light transformer-based architecture for handwritten text recognition. InInternational Workshop on Document Analysis Systems, pages 275–290. Springer, 2022
2022
-
[2]
General detection-based text line recognition.Neurips, 37:42388–42404, 2024
Raphael Baena, Syrine Kalleli, and Mathieu Aubry. General detection-based text line recognition.Neurips, 37:42388–42404, 2024
2024
-
[3]
Hi-sam: Marrying segment anything model for hierarchical text segmentation.IEEE TPAMI, 2024
Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, and Dacheng Tao. Hi-sam: Marrying segment anything model for hierarchical text segmentation.IEEE TPAMI, 2024
2024
-
[4]
Dtrocr: Decoder-only transformer for optical character recogni- tion
Masato Fujitake. Dtrocr: Decoder-only transformer for optical character recogni- tion. InWACV, pages 8025–8035, 2024
2024
-
[5]
Curio: Curvature-aligned and efficient ocr for low-resource his- torical manuscripts
Sai Madhusudan Gunda, Tathagata Ghosh, Simran Singh Sandral, and Ravi Kiran Sarvadevabhatla. Curio: Curvature-aligned and efficient ocr for low-resource his- torical manuscripts. InProceedings of the IEEE/CVF WACV, pages 2011–2021, March 2026
2011
-
[6]
A survey of methods and strategies in char- acter segmentation.IEEE TPAMI, 18(7):690–706, 2002
Richard G Casey and Eric Lecolinet. A survey of methods and strategies in char- acter segmentation.IEEE TPAMI, 18(7):690–706, 2002
2002
-
[7]
Segment anything
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InICCV, pages 4015–4026, 2023. EpiSAM 15
2023
-
[8]
Pratyush Jena, Amal Joseph, Arnav Sharma, and Ravi Kiran Sarvadevabhatla. Unveiling text in challenging stone inscriptions: A character-context-aware patch- ing strategy for binarization.arXiv preprint arXiv:2601.03609, 2026
-
[9]
Segmentation meth- ods for character recognition: from segmentation to document structure analysis
Hiromichi Fujisawa, Yasuaki Nakano, and Kiyomichi Kurino. Segmentation meth- ods for character recognition: from segmentation to document structure analysis. Proceedings of the IEEE, 80(7):1079–1092, 1992
1992
-
[10]
Lerec: A nn/hmm hybrid for on-line handwriting recognition.Neural computation, 7(6):1289–1303, 1995
Yoshua Bengio, Yann LeCun, Craig Nohl, and Chris Burges. Lerec: A nn/hmm hybrid for on-line handwriting recognition.Neural computation, 7(6):1289–1303, 1995
1995
-
[11]
Real-time scene text localization and recognition
Lukáš Neumann and Jiří Matas. Real-time scene text localization and recognition. In2012 IEEE conference on computer vision and pattern recognition, pages 3538–
-
[12]
A hybrid approach to detect and localize texts in natural scene images.IEEE transactions on image processing, 20(3):800–813, 2010
Yi-Feng Pan, Xinwen Hou, and Cheng-Lin Liu. A hybrid approach to detect and localize texts in natural scene images.IEEE transactions on image processing, 20(3):800–813, 2010
2010
-
[13]
Deep features for text spotting
Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Deep features for text spotting. InEuropean conference on computer vision, pages 512–528. Springer, 2014
2014
-
[14]
Robust scene text detection with con- volution neural network induced mser trees
Weilin Huang, Yu Qiao, and Xiaoou Tang. Robust scene text detection with con- volution neural network induced mser trees. InEuropean conference on computer vision, pages 497–511. Springer, 2014
2014
-
[15]
Scene Text Detection via Holistic, Multi-Channel Prediction
Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, and Zhimin Cao. Scene text detection via holistic, multi-channel prediction.arXiv preprint arXiv:1606.09002, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[16]
Multi-oriented text detection with fully convolutional networks
Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai. Multi-oriented text detection with fully convolutional networks. InProceedings of the IEEE CVPR, pages 4159–4167, 2016
2016
-
[17]
Detecting text in natural scenes with stroke width transform
Boris Epshtein, Eyal Ofek, and Yonatan Wexler. Detecting text in natural scenes with stroke width transform. In2010 IEEE computer society conference on com- puter vision and pattern recognition, pages 2963–2970. IEEE, 2010
2010
-
[18]
East: an efficient and accurate scene text detector
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. East: an efficient and accurate scene text detector. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 5551– 5560, 2017
2017
-
[19]
Arbitrary shape text detection via segmentation with probability maps.IEEE transactions on pattern analysis and machine intelligence, 45(3):2736–2750, 2022
Shi-Xue Zhang, Xiaobin Zhu, Lei Chen, Jie-Bo Hou, and Xu-Cheng Yin. Arbitrary shape text detection via segmentation with probability maps.IEEE transactions on pattern analysis and machine intelligence, 45(3):2736–2750, 2022
2022
-
[20]
Dptext-detr: Towards better scene text detection with dynamic points in trans- former
Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, and Dacheng Tao. Dptext-detr: Towards better scene text detection with dynamic points in trans- former. InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 3241–3249, 2023
2023
-
[21]
Start, follow, read: End-to-end full-page handwriting recognition
Curtis Wigington, Chris Tensmeyer, Brian Davis, William Barrett, Brian Price, and Scott Cohen. Start, follow, read: End-to-end full-page handwriting recognition. InProceedings of the European conference on computer vision (ECCV), pages 367– 383, 2018
2018
-
[22]
Textsnake: A flexible representation for detecting text of arbitrary shapes
Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, and Cong Yao. Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proceedings of the European conference on computer vision (ECCV), pages 20–36, 2018. 16 Sharma et al
2018
-
[23]
Seam- former: high precision text line segmentation for handwritten documents
Niharika Vadlamudi, Rahul Krishna, and Ravi Kiran Sarvadevabhatla. Seam- former: high precision text line segmentation for handwritten documents. InIn- ternational Conference on Document Analysis and Recognition, pages 313–331. Springer, 2023
2023
-
[24]
docextractor: An off-the-shelf historical docu- ment element extraction
Tom Monnier and Mathieu Aubry. docextractor: An off-the-shelf historical docu- ment element extraction. In2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pages 91–96. IEEE, 2020
2020
-
[25]
Recent advances in text line segmentation and baseline detection in historical document images: a systematic review.IJDAR, pages 1–37, 2025
Irina Rabaev and Marina Litvak. Recent advances in text line segmentation and baseline detection in historical document images: a systematic review.IJDAR, pages 1–37, 2025
2025
-
[26]
Icdar 2025 competition on few-shot text line segmentation of ancient handwritten documents (fest)
Silvia Zottin, Axel De Nardin, Giuseppe Branca, Claudio Piciarelli, and Gian Luca Foresti. Icdar 2025 competition on few-shot text line segmentation of ancient handwritten documents (fest). InICDAR, pages 586–602. Springer, 2025
2025
-
[27]
Few-shot connectivity-aware text line segmentation in historical documents
Rafael Sterzinger, Tingyu Lin, and Robert Sablatnig. Few-shot connectivity-aware text line segmentation in historical documents. InACPR, pages 100–114. Springer, 2025
2025
-
[28]
Seghist: A general segmentation-based framework for chinese historical document text line detection
Xingjian Hu, Baole Wei, Liangcai Gao, and Jun Wang. Seghist: A general segmentation-based framework for chinese historical document text line detection. InICDAR, pages 391–410. Springer, 2024
2024
-
[29]
Linetr: Unified text line segmentation for challenging palm leaf manuscripts
Vaibhav Agrawal, Niharika Vadlamudi, Muhammad Waseem, Amal Joseph, Sreenya Chitluri, and Ravi Kiran Sarvadevabhatla. Linetr: Unified text line segmentation for challenging palm leaf manuscripts. InICPR, pages 217–233. Springer, 2025
2025
-
[30]
A case study of handwritten text recognition from pre-colonial era sanskrit manuscripts
Kartik Chincholikar, Shagun Dwivedi, Kaushik Gopalan, and Tarinee Awasthi. A case study of handwritten text recognition from pre-colonial era sanskrit manuscripts. InComputational Sanskrit and Digital Humanities-World Sanskrit Conference 2025, pages 52–69, 2025
2025
-
[31]
Convolutional char- acter networks
Linjie Xing, Zhi Tian, Weilin Huang, and Matthew R Scott. Convolutional char- acter networks. InProceedings of the IEEE/CVF international conference on com- puter vision, pages 9126–9136, 2019
2019
-
[32]
Char- acter region awareness for text detection
Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. Char- acter region awareness for text detection. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 9365–9374, 2019
2019
-
[33]
Cleval: Character-level evaluation for text detection and recognition tasks
YoungminBaek,DaehyunNam,SungraePark,JunyeopLee,SeungShin,Jeonghun Baek, Chae Young Lee, and Hwalsuk Lee. Cleval: Character-level evaluation for text detection and recognition tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 564–565, 2020
2020
-
[34]
Review of computational epigraphy.arXiv preprint arXiv:2406.06570, 2024
Vishal Kumar. Review of computational epigraphy.arXiv preprint arXiv:2406.06570, 2024
-
[35]
Character recognition for greek squeezes.IJDAR, pages 1–12, 2025
Nicholas R Howe, Feiran Chang, Isabella Falbo, Tajhini Brown, and Aaron Her- shkowitz. Character recognition for greek squeezes.IJDAR, pages 1–12, 2025
2025
-
[36]
Enhancing epigraphy: a deep learning ap- proach to recognize and analyze tamil ancient inscriptions.Neural Computing and Applications, 36(31):19839–19861, 2024
S Bhuvaneswari and K Kathiravan. Enhancing epigraphy: a deep learning ap- proach to recognize and analyze tamil ancient inscriptions.Neural Computing and Applications, 36(31):19839–19861, 2024
2024
-
[37]
Veda:Visualextractionanddecryptionofancient scripts
Sindhu Chandra Sekharan, Taruni Mamidipaka, Yoga Sreedhar Reddy Kakanuru, SummiaParveen,andSSaradha. Veda:Visualextractionanddecryptionofancient scripts. InInternational Conference on Smart Trends for Information Technology and Computer Communications, pages 345–355. Springer, 2025
2025
-
[38]
Yash Agrawal, Srinidhi Balasubramanian, Rahul Meena, Rohail Alam, Himanshu Malviya, et al. Optical character recognition using convolutional neural networks for ashokan brahmi inscriptions.arXiv preprint arXiv:2501.01981, 2024. EpiSAM 17
-
[39]
From detection to modelling: An end- to-end paleographic system for analysing historical handwriting styles
Hussein Mohammed and Mahdi Jampour. From detection to modelling: An end- to-end paleographic system for analysing historical handwriting styles. InInterna- tional Workshop on Document Analysis Systems, pages 363–376. Springer, 2024
2024
-
[40]
An oracle bone inscriptions detection algorithm based on improved yolov8.Algorithms, 17(5):174, 2024
Qianqian Zhen, Liang Wu, and Guoying Liu. An oracle bone inscriptions detection algorithm based on improved yolov8.Algorithms, 17(5):174, 2024
2024
-
[41]
Clustering-based feature representation learning for oracle bone inscriptions detection.npj Heritage Science, 13(1):296, 2025
Ye Tao, Xinran Fu, Honglin Pang, Xi Yang, and Chuntao Li. Clustering-based feature representation learning for oracle bone inscriptions detection.npj Heritage Science, 13(1):296, 2025
2025
-
[42]
Detecting oracle bone inscrip- tions via pseudo-category labels.Heritage Science, 12(1), 2024
Xinran Fu, Rixin Zhou, Xi Yang, and Chuntao Li. Detecting oracle bone inscrip- tions via pseudo-category labels.Heritage Science, 12(1), 2024
2024
-
[43]
Ancientglyphnet: an advanced deep learning framework for detecting ancientchinesecharactersincomplexscene.Artificial Intelligence Review,58(3):88, 2025
Hengnian Qi, Hao Yang, Zhaojiang Wang, Jiabin Ye, Qiuyi Xin, Chu Zhang, and Qing Lang. Ancientglyphnet: an advanced deep learning framework for detecting ancientchinesecharactersincomplexscene.Artificial Intelligence Review,58(3):88, 2025
2025
-
[44]
Region-based convo- lutional neural network for segmenting text in epigraphical images
Padmaprabha Preethi and Hosahalli Ramappa Mamatha. Region-based convo- lutional neural network for segmenting text in epigraphical images. InArtificial Intelligence and applications, volume 1, pages 103–111, 2023
2023
-
[45]
Applica- tions of convolutional neural networks to extracting oracle bone inscriptions from three-dimensional models.Symmetry, 15(8):1575, 2023
An Guo, Zhan Zhang, Feng Gao, Haichao Du, Xiaokui Liu, and Bang Li. Applica- tions of convolutional neural networks to extracting oracle bone inscriptions from three-dimensional models.Symmetry, 15(8):1575, 2023
2023
-
[46]
Ancient chinese character recognition with improved swin-transformer and flexible data enhance- ment strategies.Sensors, 24(7):2182, 2024
Yi Zheng, Yi Chen, Xianbo Wang, Donglian Qi, and Yunfeng Yan. Ancient chinese character recognition with improved swin-transformer and flexible data enhance- ment strategies.Sensors, 24(7):2182, 2024
2024
-
[47]
Gan augmented hybrid transformer network (ghtnet) for ancient tamil stone inscription recognition.npj Heritage Sci- ence, 13(1):604, 2025
Balasubramanian Murugan and P Visalakshi. Gan augmented hybrid transformer network (ghtnet) for ancient tamil stone inscription recognition.npj Heritage Sci- ence, 13(1):604, 2025
2025
-
[48]
Unitext: A unified framework for chinese text detection, recognition, and restoration in ancient document and inscription images.Applied Sciences, 15(14):7662, 2025
Lu Shen, Zewei Wu, Xiaoyuan Huang, Boliang Zhang, Su-Kit Tang, Jorge Hen- riques, and Silvia Mirri. Unitext: A unified framework for chinese text detection, recognition, and restoration in ancient document and inscription images.Applied Sciences, 15(14):7662, 2025
2025
-
[49]
Ancient script image recognition and processing: A review.arXiv preprint arXiv:2506.19208, 2025
Xiaolei Diao, Rite Bo, Yanling Xiao, Lida Shi, Zhihan Zhou, Hao Xu, Chuntao Li, Xiongfeng Tang, Massimo Poesio, Cédric M John, et al. Ancient script image recognition and processing: A review.arXiv preprint arXiv:2506.19208, 2025
-
[50]
End-to-end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. InECCV, pages 213–229. Springer, 2020
2020
-
[51]
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[52]
Ultralytics YOLOv8.https: //github.com/ultralytics/ultralytics, 2025
Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics YOLOv8.https: //github.com/ultralytics/ultralytics, 2025. Version 8.0.0, Licensed under AGPL-3.0
2025
-
[53]
Ultralytics YOLO11.https://github.com/ ultralytics/ultralytics, 2025
Glenn Jocher and Jing Qiu. Ultralytics YOLO11.https://github.com/ ultralytics/ultralytics, 2025. Version 11.0.0, license AGPL-3.0
2025
-
[54]
YOLOv12: Attention-Centric Real-Time Object Detectors
Yunjie Tian, Qixiang Ye, and David Doermann. Yolov12: Attention-centric real- time object detectors.arXiv preprint arXiv:2502.12524, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[55]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recog- nition at scale.arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.