pith. machine review for the scientific record. sign in

arxiv: 2605.11960 · v1 · submitted 2026-05-12 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:43 UTC · model grok-4.3

classification 💻 cs.CV
keywords VLLM evaluationChinese characterscross-temporal OCRancient scriptsvision language modelshistorical text recognitionbenchmark datasetSeven Chinese Scripts
0
0 comments X

The pith

Chronicles-OCR introduces a benchmark with 2,800 images to test VLLMs on visual perception of Chinese characters across their full evolutionary trajectory in the seven scripts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates the first benchmark that tracks how vision-language models handle the continuous shape changes in Chinese writing from oracle bones through later historical forms. It separates the task of recognizing visual structure from any semantic understanding to measure perception alone. This matters for applications that need reliable AI processing of ancient documents without modern assumptions about character appearance. The dataset draws on expert-curated examples from varied physical media and defines four specific tasks to quantify performance gaps.

Core claim

Chronicles-OCR is a benchmark of 2,800 strictly balanced images spanning the Seven Chinese Scripts that uses a Stage-Adaptive Annotation Paradigm to support four tasks: cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification, thereby isolating visual perception from semantic reasoning to expose VLLM limitations in cross-temporal settings.

What carries the argument

The Stage-Adaptive Annotation Paradigm, which adjusts labeling rules to accommodate large morphological and topological shifts in character forms across historical stages while maintaining evaluation consistency.

If this is right

  • VLLMs can be evaluated for robustness to script evolution without semantic shortcuts.
  • Failure modes in historical text perception become identifiable at specific evolutionary stages.
  • Digital humanities projects gain a standardized metric for AI support on ancient Chinese materials.
  • Model development can target evolution-aware perception rather than static modern forms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same isolation of visual form from meaning could be applied to other long-evolving scripts to compare model robustness across writing systems.
  • Models trained or fine-tuned on this benchmark data might generalize better to degraded or variant modern text.
  • The four tasks could serve as a template for automated analysis pipelines in museum digitization of inscribed artifacts.

Load-bearing premise

The 2,800 images are strictly balanced and representative of the complete evolutionary trajectory, and the annotation paradigm handles all variations without introducing selection bias.

What would settle it

Current VLLMs achieve high accuracy on all four tasks when evaluated on the released dataset, or an independent audit shows the image selection favors particular script stages or media types.

Figures

Figures reproduced from arXiv: 2605.11960 by Bang Li, Can Ma, Chengquan Zhang, Gengluo Li, Guojun Yan, Han Hu, Hao Feng, Pian Wu, Shangpin Peng, Weiping Wang, Xingyu Wan, Xin Xu, Yang Yang, Yipei Ye, Yongge Liu, Yu Zhou, Zengmao Ding, Zhan Shu, Zhe Li.

Figure 1
Figure 1. Figure 1: Chronicles-OCR. The top row showcases diverse physical artifact samples from Chronicles-OCR across seven script stages, alongside the morphological evolution of the modern Chinese character “虎” (Tiger). To comprehensively evaluate VLLMs, we introduce a stage-adaptive annotation paradigm and four progressive tasks. Evaluation results reveal substantial capability gaps in the fine-grained visual perception o… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the Chronicles-OCR Benchmark. The benchmark integrates three core components: (1) Data Curation and Image Sourcing across the Seven Chinese Scripts’ evolutionary timeline; (2) Stage-Adaptive Annotation Paradigm, applying character-level grounding for archaic scripts and sequence-level transcriptions for mature ones; and (3) Task Formulation, establishing four differentiated evaluation tasks to … view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative Spotting Results on Oracle Bone Script. Compared to the ground truth, leading VLLMs (Seed2.0 Pro and Gemini 3.1 Pro) struggle with three primary failure modes (highlighted in red): missed detections of unconstrained symbols, recognition errors due to semantic gaps, and hallucinations triggered by physical noise. VLLMs fundamentally lack the robust grounding mechanisms needed to isolate unconstr… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative Results of Fine-grained Archaic Character Recognition. By utilizing a visual referring mechanism, this task isolates pure morphological decipherment from spatial localization. Despite explicit visual guidance, current VLLMs still struggle to bridge the profound semantic gap between ancient pictographic glyphs and modern characters. Performance on Fine-grained Archaic Character Recognition. Eval… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative Results of Ancient Text Parsing on Mature Scripts. Even though mature scripts possess standardized layout priors, leading VLLMs still struggle to achieve perfect parsing. As shown in the generated transcriptions (errors highlighted in red), models frequently hallucinate or misinterpret complex continuous strokes, leading to suboptimal NED scores. Performance on Ancient Text Parsing. Evaluated v… view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of Script Classification Performance. Models reliably categorize archaic scripts by exploiting macro-level textural priors (e.g., shells, bronzes). Conversely, they exhibit severe confusion among mature scripts due to their fundamental inability to differentiate subtle stroke dynamics (e.g., cursiveness) on identical physical mediums. the global morphological style of ancient scripts or the d… view at source ↗
read the original abstract

Vision Large Language Models (VLLMs) have achieved remarkable success in modern text-rich visual understanding. However, their perceptual robustness in the face of the continuous morphological evolution of historical writing systems remains largely unexplored. Existing ancient text datasets typically focus on isolated historical periods, failing to capture the systematic visual distribution shifts spanning thousands of years. To bridge this gap and empower Digital Humanities, we introduce Chronicles-OCR, the first comprehensive benchmark specifically designed to evaluate the cross-temporal visual perception capabilities of VLLMs across the complete evolutionary trajectory of Chinese characters, known as the Seven Chinese Scripts. Curated in collaboration with top-tier institutional domain experts, the dataset comprises 2,800 strictly balanced images encompassing highly diverse physical media, ranging from tortoise shells to paper-based calligraphy. To accommodate the drastic morphological and topological variations across different historical stages, we propose a novel Stage-Adaptive Annotation Paradigm. Based on this, Chronicles-OCR formulates four rigorous quantitative tasks: cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification. By isolating visual perception from semantic reasoning, Chronicles-OCR provides an authoritative platform to expose the limitations of current VLLMs, paving the way for robust, evolution-aware historical text perception. Chronicles-OCR is publicly available at https://github.com/VirtualLUOUCAS/Chronicles-OCR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Chronicles-OCR, the first benchmark for evaluating VLLMs on cross-temporal visual perception of Chinese characters across the complete evolutionary trajectory of the Seven Scripts. It presents a curated dataset of 2,800 images spanning tortoise shells to paper-based media, developed with domain experts, and proposes a Stage-Adaptive Annotation Paradigm to handle morphological variations. The benchmark defines four tasks—cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification—explicitly isolating visual perception from semantic reasoning to expose current VLLM limitations.

Significance. If the dataset curation achieves true balance without selection bias and the annotation paradigm is validated, Chronicles-OCR would provide a valuable, publicly available resource for digital humanities and VLLM robustness research. It addresses a clear gap in existing ancient-text datasets, which are limited to isolated periods, and offers quantitative tasks that could drive development of evolution-aware perception models.

major comments (2)
  1. [§3] §3 (Dataset Curation): The claim of a 'strictly balanced' collection of 2,800 images across the Seven Scripts and diverse physical media is not supported by any sampling protocol, per-script counts, media-type stratification, or quantitative uniformity metrics (e.g., distribution statistics or inter-expert agreement scores). Without these, it is impossible to verify that task difficulties are even and that visual perception is isolated from curation artifacts.
  2. [§4] §4 (Stage-Adaptive Annotation Paradigm): The paradigm is described at a high level but lacks concrete methodological details on how it accommodates drastic morphological and topological changes without introducing selection bias, such as explicit exclusion criteria, quantitative bias checks, or validation against the full evolutionary trajectory. This directly affects the reliability of the four tasks and the central claim of an 'authoritative platform'.
minor comments (2)
  1. [Data Availability] The GitHub link is provided in the abstract but should be repeated with a permanent identifier or DOI in the main text and data-availability statement for reproducibility.
  2. [Figures] Figure captions for the example images and task illustrations could be expanded to explicitly note the script period and media type for each sample to aid reader interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for the thorough and constructive comments. We have carefully addressed each major point below and revised the manuscript to incorporate additional details and evidence where the original submission was insufficient.

read point-by-point responses
  1. Referee: [§3] §3 (Dataset Curation): The claim of a 'strictly balanced' collection of 2,800 images across the Seven Scripts and diverse physical media is not supported by any sampling protocol, per-script counts, media-type stratification, or quantitative uniformity metrics (e.g., distribution statistics or inter-expert agreement scores). Without these, it is impossible to verify that task difficulties are even and that visual perception is isolated from curation artifacts.

    Authors: We agree that the original manuscript did not provide sufficient quantitative support for the 'strictly balanced' claim. In the revised Section 3, we now include the full sampling protocol developed with domain experts, per-script image counts, media-type stratification tables, distribution statistics, and inter-expert agreement metrics (including Cohen's kappa). These additions enable verification that task difficulties are even across the Seven Scripts and that visual perception is isolated from curation artifacts. revision: yes

  2. Referee: [§4] §4 (Stage-Adaptive Annotation Paradigm): The paradigm is described at a high level but lacks concrete methodological details on how it accommodates drastic morphological and topological changes without introducing selection bias, such as explicit exclusion criteria, quantitative bias checks, or validation against the full evolutionary trajectory. This directly affects the reliability of the four tasks and the central claim of an 'authoritative platform'.

    Authors: We acknowledge that the original description of the Stage-Adaptive Annotation Paradigm was at too high a level. In the revised Section 4, we have added concrete methodological details, including explicit exclusion criteria for morphological variations, quantitative bias checks (pre- and post-annotation distribution comparisons), and validation steps against the complete evolutionary trajectory. These changes enhance transparency and support the reliability of the four tasks. revision: yes

Circularity Check

0 steps flagged

No significant circularity in benchmark definition

full rationale

The paper presents a new dataset and benchmark for cross-temporal Chinese character perception, consisting of data curation (2,800 images across Seven Scripts), a Stage-Adaptive Annotation Paradigm, and four task definitions. No mathematical derivations, equations, fitted parameters, or predictive models appear in the provided text. The central claims rest on explicit construction choices (expert curation, task isolation of visual perception) rather than any reduction to self-referential inputs, self-citations, or renamed prior results. The contribution is self-contained as a definitional benchmark without internal circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on domain assumptions about the historical completeness of the seven scripts and the feasibility of strict balancing across media and periods; no free parameters are fitted and no new entities are postulated.

axioms (1)
  • domain assumption The seven scripts represent the complete evolutionary trajectory of Chinese characters spanning thousands of years.
    Stated in the abstract as the foundation for curating images across the full trajectory from tortoise shells to paper-based calligraphy.

pith-pipeline@v0.9.0 · 5603 in / 1260 out tokens · 106326 ms · 2026-05-13T05:43:11.035536+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 13 internal anchors

  1. [1]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, et al. Qwen3-VL technical report.arXiv preprint arXiv:2511.21631, 2025

  2. [2]

    Qwen2.5-VL Technical Report

    Shuai Bai, Keqin Chen, Xuejing Liu, et al. Qwen2.5-VL technical report.arXiv preprint arXiv:2502.13923, 2025

  3. [3]

    OCRBench v2: An improved benchmark for evaluating large multimodal models on visual text localization and reasoning.arXiv preprint arXiv:2501.00321, 2024

    Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, et al. OCRBench v2: An improved benchmark for evaluating large multimodal models on visual text localization and reasoning.arXiv preprint arXiv:2501.00321, 2024

  4. [4]

    OmniDocBench: Benchmarking diverse pdf document parsing with comprehensive annotations

    Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man Jiang, Xiaomeng Zhao, et al. OmniDocBench: Benchmarking diverse pdf document parsing with comprehensive annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2025

  5. [5]

    OCR-Reasoning benchmark: Unveiling the true capabilities of MLLMs in complex text-rich image reasoning.arXiv preprint arXiv:2505.17163, 2025

    Mingxin Huang, Yongxin Shi, Dezhi Peng, Songxuan Lai, Zecheng Xie, and Lianwen Jin. OCR-Reasoning benchmark: Unveiling the true capabilities of MLLMs in complex text-rich image reasoning.arXiv preprint arXiv:2505.17163, 2025

  6. [6]

    PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9 B Ultra-Compact Vision-Language Model

    Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu, Xueqing Wang, Changda Zhou, Hongen Liu, Manhui Lin, et al. PaddleOCR-VL: Boosting multilingual document parsing via a 0.9B ultra-compact vision-language model.arXiv preprint arXiv:2510.14528, 2025

  7. [7]

    Hunyuanocr Technical Report.arXiv preprint arXiv:2511.19575, 2025

    Hunyuan Vision Team, Pengyuan Lyu, Xingyu Wan, Gengluo Li, Shangpin Peng, Weinong Wang, Liang Wu, Huawen Shen, Yu Zhou, Canhui Tang, et al. HunyuanOCR technical report.arXiv preprint arXiv:2511.19575, 2025

  8. [8]

    MCS-Bench: A comprehensive benchmark for evaluating multimodal large language models in chinese classical studies

    Yang Liu, Jiahuan Cao, Hiuyi Cheng, Yongxin Shi, Kai Ding, and Lianwen Jin. MCS-Bench: A comprehensive benchmark for evaluating multimodal large language models in chinese classical studies. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

  9. [9]

    arXiv preprint arXiv:2401.12467 (2024)

    Haisu Guan, Jinpeng Wan, Yuliang Liu, Pengjie Wang, Kaile Zhang, Zhebin Kuang, Xinyu Wang, Xiang Bai, and Lianwen Jin. An open dataset for the evolution of oracle bone characters: EVOBC.arXiv preprint arXiv:2401.12467, 2024

  10. [10]

    A graph-based evolutionary dataset for oracle bone characters from inscriptions to modern chinese scripts.npj Heritage Science, 2025

    Qingju Jiao, Jingwen Wu, Qi Liu, Han Zhang, Zhan Zhang, Bang Li, Jing Xiong, Guoying Liu, and Yongge Liu. A graph-based evolutionary dataset for oracle bone characters from inscriptions to modern chinese scripts.npj Heritage Science, 2025

  11. [11]

    Study on the evolution of chinese characters based on few-shot learning: From oracle bone inscriptions to regular script.Plos one, 2022

    Mengru Wang, Yu Cai, Li Gao, Ruichen Feng, Qingju Jiao, Xiaolin Ma, and Yu Jia. Study on the evolution of chinese characters based on few-shot learning: From oracle bone inscriptions to regular script.Plos one, 2022

  12. [12]

    OBI-Bench: Can LMMs aid in study of ancient script on oracle bones? InInternational Conference on Learning Representations, volume 2025, 2025

    Zijian Chen, Wenjun Zhang, Guangtao Zhai, et al. OBI-Bench: Can LMMs aid in study of ancient script on oracle bones? InInternational Conference on Learning Representations, volume 2025, 2025

  13. [13]

    Oracle bone inscriptions information processing: A comprehensive survey.npj Heritage Science, 2026

    Zijian Chen, Wenjie Hua, Jinhao Li, Yucheng Zhu, Xiaona Zhi, Zhiji Liu, Tingzhu Chen, Wenjun Zhang, and Guangtao Zhai. Oracle bone inscriptions information processing: A comprehensive survey.npj Heritage Science, 2026. doi: 10.1038/s40494-026-02511-w

  14. [14]

    OBC306: A large-scale oracle bone character recognition dataset

    Shuangping Huang, Haobin Wang, Yongge Liu, Xiaosong Shi, and Lianwen Jin. OBC306: A large-scale oracle bone character recognition dataset. In2019 International Conference on Document Analysis and Recognition (ICDAR), 2019

  15. [15]

    A comprehensive survey of oracle character recognition: challenges, benchmarks, and beyond.arXiv preprint arXiv:2411.11354, 2024

    Jing Li, Xueke Chi, Qiufeng Wang, Dahan Wang, Kaizhu Huang, Yongge Liu, and Cheng-lin Liu. A comprehensive survey of oracle character recognition: challenges, benchmarks, and beyond.arXiv preprint arXiv:2411.11354, 2024

  16. [16]

    Historical document processing: Historical document processing: A survey of techniques, tools, and trends.arXiv preprint arXiv:2002.06300, 2020

    James P Philips and Nasseh Tabrizi. Historical document processing: Historical document processing: A survey of techniques, tools, and trends.arXiv preprint arXiv:2002.06300, 2020

  17. [17]

    TongGu-VL: Advancing visual-language understanding in chinese classical studies through parameter sensitivity-guided instruction tuning

    Jiahuan Cao, Yang Liu, Peirong Zhang, Yongxin Shi, Kai Ding, and Lianwen Jin. TongGu-VL: Advancing visual-language understanding in chinese classical studies through parameter sensitivity-guided instruction tuning. InProceedings of the 33rd ACM International Conference on Multimedia, 2025

  18. [18]

    OracleAgent: A multimodal reasoning agent for oracle bone script research.arXiv preprint arXiv:2510.26114, 2025

    Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, Xu Peng, Taisong Jin, Yongge Liu, Shengwei Han, Jing Yang, et al. OracleAgent: A multimodal reasoning agent for oracle bone script research.arXiv preprint arXiv:2510.26114, 2025

  19. [19]

    V-oracle: Making progressive reasoning in deciphering oracle bones for you and me

    Runqi Qiao, Qiuna Tan, Guanting Dong, MinhuiWu MinhuiWu, Jiapeng Wang, Yifan Zhang, Zhuoma GongQue, Chong Sun, Yida Xu, Yadong Xue, et al. V-oracle: Making progressive reasoning in deciphering oracle bones for you and me. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

  20. [20]

    WenyanGPT: A large language model for classical chinese tasks

    Xinyu Yao, Mengdi Wang, Bo Chen, and Xiaobing Zhao. WenyanGPT: A large language model for classical chinese tasks. arXiv preprint arXiv:2504.20609, 2025

  21. [21]

    HWOBC-a handwriting oracle bone character recognition database.Journal of Physics: Conference Series, 2020

    Bang Li, Qianwen Dai, Feng Gao, Weiye Zhu, Qiang Li, and Yongge Liu. HWOBC-a handwriting oracle bone character recognition database.Journal of Physics: Conference Series, 2020. doi: 10.1088/1742-6596/1651/1/012050. URL https://doi.org/10.1088/1742-6596/1651/1/012050

  22. [22]

    OracleFusion: Assisting the decipherment of Oracle Bone Script with structurally constrained semantic typography

    Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, AndyPian Wu, Chaoyang Wang, Chengjie Wang, Taisong Jin, Seven Shu, et al. OracleFusion: Assisting the decipherment of Oracle Bone Script with structurally constrained semantic typography. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025

  23. [23]

    arXiv preprint arXiv:2401.15365 (2024)

    Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Jinpeng Wan, Haisu Guan, Zhebin Kuang, Lianwen Jin, Xiang Bai, et al. An open dataset for oracle bone script recognition and decipherment.arXiv preprint arXiv:2401.15365, 10 2024

  24. [24]

    CASIA-AHCDB: A large-scale chinese ancient handwritten characters database

    Yue Xu, Fei Yin, Da-Han Wang, Xu-Yao Zhang, Zhaoxiang Zhang, and Cheng-Lin Liu. CASIA-AHCDB: A large-scale chinese ancient handwritten characters database. In2019 international conference on document analysis and recognition (ICDAR), 2019

  25. [25]

    Divination and power: A multiregional view of the development of oracle bone divination in early China

    Rowan K Flad. Divination and power: A multiregional view of the development of oracle bone divination in early China. Current Anthropology, 2008

  26. [26]

    Graphs, words, and meanings: Three reference works for shang oracle-bone studies, with an excursus on the religious role of the day or sun, 1997

    David N Keightley. Graphs, words, and meanings: Three reference works for shang oracle-bone studies, with an excursus on the religious role of the day or sun, 1997

  27. [27]

    A research on an intelligent recognition tool for bronze inscriptions of the shang and zhou dynasties.Journal of Chinese Writing Systems, 2020

    Rui Guo. A research on an intelligent recognition tool for bronze inscriptions of the shang and zhou dynasties.Journal of Chinese Writing Systems, 2020

  28. [28]

    BIRD: Bronze inscription restoration and dating

    Wenjie Hua, Hoang H Nguyen, and Gangyan Ge. BIRD: Bronze inscription restoration and dating. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

  29. [29]

    LadderMoE: Ladder-side mixture of experts adapters for bronze inscription recognition.arXiv preprint arXiv:2510.01651, 2025

    Rixin Zhou, Peiqiang Qiu, Qian Zhang, Chuntao Li, and Xi Yang. LadderMoE: Ladder-side mixture of experts adapters for bronze inscription recognition.arXiv preprint arXiv:2510.01651, 2025

  30. [30]

    Bridging cultural divides: Metadata and the seal collection in a western context

    Veronica Fu. Bridging cultural divides: Metadata and the seal collection in a western context. InUnderstanding and Utilizing Informal Archives. IGI Global Scientific Publishing, 2026

  31. [31]

    Qin seal script character recognition with fuzzy and incomplete information.Baghdad Science Journal, 2024

    Yun Ou, Zhen-Jie Zhou, Di-Wen Kang, Pan Zhou, and Xue-Wei Liu. Qin seal script character recognition with fuzzy and incomplete information.Baghdad Science Journal, 2024

  32. [32]

    Style-independent radical sequence learning for zero-shot recognition of small seal script.Journal of the Franklin Institute, 2023

    Wenhui Zhou, Jinyu Liu, Jiefeng Li, Jiyi Li, Lili Lin, Fumiyo Fukumoto, and Guojun Dai. Style-independent radical sequence learning for zero-shot recognition of small seal script.Journal of the Franklin Institute, 2023

  33. [33]

    stele of cao quan

    Liu Guoqing, Hao Changning, Yan Jingbo, Dong Jing, Zhao Zuolong, and Hao Lujia. Stroke extraction algorithm of clerical script in Han dynasty based on contour: Take “stele of cao quan” as an example.Mobile Information Systems, 2022

  34. [34]

    Research on efficient calligraphy image classification based on attention enhancement.Mathematics, 2025

    Yu Lei, Tianzhao Zhou, and Yuankui Ma. Research on efficient calligraphy image classification based on attention enhancement.Mathematics, 2025

  35. [35]

    Juan Wu. Han dynasty portrait image feature extraction and cloud computing-supported symbolic interpretation: A new approach to cultural heritage digitalization.Scalable Computing: Practice and Experience, 2024

  36. [36]

    Stroke systems in chinese characters: A systemic functional perspective on simplified regular script

    Xuanwei Peng. Stroke systems in chinese characters: A systemic functional perspective on simplified regular script. Semiotica, 2017

  37. [37]

    Dense and tight detection of chinese characters in historical documents: Datasets and a recognition guided detector.IEEE Access, 2018

    Hailin Yang, Lianwen Jin, Weiguo Huang, Zhaoyang Yang, Songxuan Lai, and Jifeng Sun. Dense and tight detection of chinese characters in historical documents: Datasets and a recognition guided detector.IEEE Access, 2018

  38. [38]

    The advantages and disadvantages of Regular Script in the study of calligraphy

    Wei Zhang. The advantages and disadvantages of Regular Script in the study of calligraphy. In2nd International Conference on Language, Art and Cultural Exchange (ICLACE 2021), 2021

  39. [39]

    RS-GAN: unsupervised running script font generation via disentangled representation learning and contextual transformer.Pattern Analysis and Applications, 2025

    Xuanhong Wang, Cong Li, Zengguo Sun, and Luying Hui. RS-GAN: unsupervised running script font generation via disentangled representation learning and contextual transformer.Pattern Analysis and Applications, 2025

  40. [40]

    Chinese cursive character detection method.The Journal of Engineering, 2020

    Xiao Qin, Jianhui Jiang, Wei Fan, and Changan Yuan. Chinese cursive character detection method.The Journal of Engineering, 2020

  41. [41]

    Toward automatic recognition of cursive chinese calligraphy: An open dataset for cursive chinese calligraphy text

    Jung Liang, Wen-Hung Liao, and Yi-Chieh Wu. Toward automatic recognition of cursive chinese calligraphy: An open dataset for cursive chinese calligraphy text. In2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), 2020

  42. [42]

    A method of chinese characters changing from regular script to semi-cursive scrip described by track and point set

    Yao Wu, Jie Jiang, and Yi Li. A method of chinese characters changing from regular script to semi-cursive scrip described by track and point set. In2018 international joint conference on information, media and engineering (ICIME), 2018

  43. [43]

    Unconstrained freehand cursive script: A revolution in chinese calligraphic art.International Journal of Politics, Culture, and Society, 1995

    Jia Chen and Kwong Lum. Unconstrained freehand cursive script: A revolution in chinese calligraphic art.International Journal of Politics, Culture, and Society, 1995

  44. [44]

    Franz Steiner Verlag, 1998

    Adele Schlombs.Huai-su and the beginnings of wild cursive script in Chinese calligraphy. Franz Steiner Verlag, 1998

  45. [45]

    The aesthetic structure of Cursive Script.International Journal of Academic Research in Business and Social Sciences, 2023

    Zhu Lei Gang, Loy Chee Luen, and Lee Keok Cheong. The aesthetic structure of Cursive Script.International Journal of Academic Research in Business and Social Sciences, 2023

  46. [46]

    Study on the evolution of chinese characters based on few-shot learning: From oracle bone inscriptions to regular script.PLOS ONE, 2022

    Mengru Wang, Yu Cai, Li Gao, Ruichen Feng, Qingju Jiao, Xiaolin Ma, and Yu Jia. Study on the evolution of chinese characters based on few-shot learning: From oracle bone inscriptions to regular script.PLOS ONE, 2022. doi: 10.1371/ journal.pone.0272974

  47. [47]

    Towards real-world document parsing via realistic scene synthesis and document-aware training, 2026

    Gengluo Li, Pengyuan Lyu, Chengquan Zhang, Huawen Shen, Liang Wu, Xingyu Wan, Gangyan Zeng, Han Hu, Can Ma, and Yu Zhou. Towards real-world document parsing via realistic scene synthesis and document-aware training, 2026

  48. [48]

    MMTIT-Bench: A multilingual and multi-scenario benchmark with cognition-perception-reasoning guided text-image machine translation, 2026

    Gengluo Li, Chengquan Zhang, Yupu Liang, Huawen Shen, Yaping Zhang, Pengyuan Lyu, Weinong Wang, Xingyu Wan, Gangyan Zeng, Han Hu, Can Ma, and Yu Zhou. MMTIT-Bench: A multilingual and multi-scenario benchmark with cognition-perception-reasoning guided text-image machine translation, 2026. 11

  49. [49]

    Mitigating object hallucinations via sentence-level early intervention

    Shangpin Peng, Senqiao Yang, Li Jiang, and Zhuotao Tian. Mitigating object hallucinations via sentence-level early intervention. InProceedings of the IEEE International Conference on Computer Vision, 2025

  50. [50]

    Language models are few-shot learners.Advances in neural information processing systems, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 2020

  51. [51]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  52. [52]

    OpenAI GPT-5 System Card

    Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. OpenAI GPT-5 system card.arXiv preprint arXiv:2601.03267, 2025

  53. [53]

    Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

    Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. Qwen2-VL: Enhancing vision-language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191, 2024

  54. [54]

    Qwen-VL: A versatile vision-language model for understanding, localization.Text Reading, and Beyond, 2023

    Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-VL: A versatile vision-language model for understanding, localization.Text Reading, and Beyond, 2023

  55. [55]

    Visual instruction tuning.Advances in neural information processing systems, 2023

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 2023

  56. [56]

    Improved baselines with visual instruction tuning

    Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

  57. [57]

    LLaV A-NeXT: Improved reasoning, OCR, and world knowledge, 2024

    Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. LLaV A-NeXT: Improved reasoning, OCR, and world knowledge, 2024

  58. [58]

    InstructBLIP: Towards general-purpose vision-language models with instruction tuning, 2023

    Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven Hoi. InstructBLIP: Towards general-purpose vision-language models with instruction tuning, 2023

  59. [59]

    Uni-DPO: A unified paradigm for dynamic preference optimization of LLMs.arXiv preprint arXiv:2506.10054, 2025

    Shangpin Peng, Weinong Wang, Zhuotao Tian, Senqiao Yang, Xing Wu, Haotian Xu, Chengquan Zhang, Takashi Isobe, Baotian Hu, and Min Zhang. Uni-DPO: A unified paradigm for dynamic preference optimization of LLMs.arXiv preprint arXiv:2506.10054, 2025

  60. [60]

    GPT-4V(ision) system card, 2023

    OpenAI. GPT-4V(ision) system card, 2023

  61. [61]

    MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

    Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. MiniGPT-4: Enhancing vision-language understanding with advanced large language models.arXiv preprint arXiv:2304.10592, 2023

  62. [62]

    Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe

    Wenjin Hou, Shangpin Peng, Weinong Wang, Zheng Ruan, Yue Zhang, Zhenglin Zhou, Mingqi Gao, Yifei Chen, Kaiqi Wang, Hongming Yang, et al. Uni-OPD: Unifying on-policy distillation with a dual-perspective recipe.arXiv preprint arXiv:2605.03677, 2026

  63. [63]

    Qwen3.5: Towards native multimodal agents, February 2026

    Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https://qwen.ai/blog?id= qwen3.5

  64. [64]

    M5HisDoc: A large-scale multi-style chinese historical document analysis benchmark

    Yongxin Shi, Chongyu Liu, Dezhi Peng, Cheng Jian, Jiarong Huang, and Lianwen Jin. M5HisDoc: A large-scale multi-style chinese historical document analysis benchmark. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023

  65. [65]

    Component-level oracle bone inscription retrieval

    Zhikai Hu, Yiu-ming Cheung, Yonggang Zhang, Peiying Zhang, and Pui-ling Tang. Component-level oracle bone inscription retrieval. InProceedings of the 2024 International Conference on Multimedia Retrieval, 2024

  66. [66]

    Oracle bone inscriptions multi-modal dataset.arXiv preprint arXiv:2407.03900, 2024

    Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, et al. Oracle bone inscriptions multi-modal dataset.arXiv preprint arXiv:2407.03900, 2024

  67. [67]

    A large-scale dataset for chinese historical document recognition and analysis.Scientific Data, 2025

    Yongxin Shi, Dezhi Peng, Yuyi Zhang, Jiahuan Cao, and Lianwen Jin. A large-scale dataset for chinese historical document recognition and analysis.Scientific Data, 2025

  68. [68]

    Pictobi-20k: Unveiling large multimodal models in visual decipherment for pictographic oracle bone characters

    Zijian Chen, Wenjie Hua, Jinhao Li, Lirong Deng, Fan Du, Tingzhu Chen, and Guangtao Zhai. Pictobi-20k: Unveiling large multimodal models in visual decipherment for pictographic oracle bone characters. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026

  69. [69]

    Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning

    Rui Song, Lida Shi, Ruihua Qi, Yingji Li, and Hao Xu. Enhancing multimodal large language models for ancient chinese character evolution analysis via glyph-driven fine-tuning.arXiv preprint arXiv:2604.11299, 2026

  70. [70]

    Binary codes capable of correcting deletions, insertions, and reversals

    Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. InSoviet physics doklady, 1966

  71. [71]

    InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

    Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. InternVL3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265, 2025

  72. [72]

    Gemma: Open Models Based on Gemini Research and Technology

    Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivi`ere, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024. 12

  73. [73]

    Minicpm-v 4.5: Cooking efficient mllms via architecture, data, and training recipe.arXiv preprint arXiv:2509.18154, 2025

    Tianyu Yu, Zefan Wang, Chongyi Wang, Fuwei Huang, Wenshuo Ma, Zhihui He, Tianchi Cai, Weize Chen, Yuxiang Huang, Yuanqian Zhao, Bokai Xu, Junbo Cui, Yingjing Xu, Liqing Ruan, Luoyuan Zhang, Hanyu Liu, Jingkun Tang, Hongyuan Liu, Qining Guo, Wenhao Hu, Bingxiang He, Jie Zhou, Jie Cai, Ji Qi, Zonghao Guo, Chi Chen, Guoyang Zeng, Yuxuan Li, Ganqu Cui, Ning D...

  74. [74]

    Molmo and PixMo: Open weights and open data for state-of-the-art vision-language models

    Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, et al. Molmo and PixMo: Open weights and open data for state-of-the-art vision-language models. InProceedings of the Computer Vision and Pattern Recognition Conference, 2025

  75. [75]

    Ovis2.5 technical report, 2025

    Shiyin Lu, Yang Li, Yu Xia, Yuwei Hu, Shanshan Zhao, Yanqing Ma, Zhichao Wei, Yinglun Li, Lunhao Duan, Jianshan Zhao, Yuxuan Han, Haijun Li, Wanying Chen, Junke Tang, Chengkun Hou, Zhixing Du, Tianli Zhou, Wenjie Zhang, Huping Ding, Jiahe Li, Wen Li, Gui Hu, Yiliang Gu, Siran Yang, Jiamang Wang, Hailong Sun, Yibo Wang, Hui Sun, Jinlong Huang, Yuping He, S...

  76. [76]

    GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    V Team, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Bin Chen, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Jiazheng Xu, Jiale Zhu, Jiali Chen, J...

  77. [77]

    Kimi K2.5: Visual Agentic Intelligence

    Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al. Kimi K2.5: Visual agentic intelligence.arXiv preprint arXiv:2602.02276, 2026

  78. [78]

    Seed1.8 Model Card: Towards generalized real-world agency, 2025

    Bytedance Seed. Seed1.8 Model Card: Towards generalized real-world agency, 2025. URL https://github.com/ ByteDance-Seed/Seed-1.8/blob/main/Seed-1.8-Modelcard.pdf

  79. [79]

    Seed2.0 Model Card: Towards intelligence frontier for real-world complexity, February 2026

    ByteDance Seed Team. Seed2.0 Model Card: Towards intelligence frontier for real-world complexity, February 2026. URL https://github.com/ByteDance-Seed/Seed2.0. Model Card

  80. [80]

    Xiaomi MiMo-V2-Omni: See, hear, act in the agentic era

    Xiaomi Corporation. Xiaomi MiMo-V2-Omni: See, hear, act in the agentic era. https://mimo.xiaomi.com/ mimo-v2-omni, 2026

Showing first 80 references.