arxiv: 2605.11960 · v1 · submitted 2026-05-12 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters

Gengluo Li , Shangpin Peng , Xingyu Wan , Chengquan Zhang , Hao Feng , Xin Xu , Pian Wu , Bang Li

show 11 more authors

Zengmao Ding Yongge Liu Yipei Ye Yang Yang Zhan Shu Guojun Yan Zhe Li Can Ma Weiping Wang Yu Zhou Han Hu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:43 UTC · model grok-4.3

classification 💻 cs.CV

keywords VLLM evaluationChinese characterscross-temporal OCRancient scriptsvision language modelshistorical text recognitionbenchmark datasetSeven Chinese Scripts

0 comments

The pith

Chronicles-OCR introduces a benchmark with 2,800 images to test VLLMs on visual perception of Chinese characters across their full evolutionary trajectory in the seven scripts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates the first benchmark that tracks how vision-language models handle the continuous shape changes in Chinese writing from oracle bones through later historical forms. It separates the task of recognizing visual structure from any semantic understanding to measure perception alone. This matters for applications that need reliable AI processing of ancient documents without modern assumptions about character appearance. The dataset draws on expert-curated examples from varied physical media and defines four specific tasks to quantify performance gaps.

Core claim

Chronicles-OCR is a benchmark of 2,800 strictly balanced images spanning the Seven Chinese Scripts that uses a Stage-Adaptive Annotation Paradigm to support four tasks: cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification, thereby isolating visual perception from semantic reasoning to expose VLLM limitations in cross-temporal settings.

What carries the argument

The Stage-Adaptive Annotation Paradigm, which adjusts labeling rules to accommodate large morphological and topological shifts in character forms across historical stages while maintaining evaluation consistency.

If this is right

VLLMs can be evaluated for robustness to script evolution without semantic shortcuts.
Failure modes in historical text perception become identifiable at specific evolutionary stages.
Digital humanities projects gain a standardized metric for AI support on ancient Chinese materials.
Model development can target evolution-aware perception rather than static modern forms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same isolation of visual form from meaning could be applied to other long-evolving scripts to compare model robustness across writing systems.
Models trained or fine-tuned on this benchmark data might generalize better to degraded or variant modern text.
The four tasks could serve as a template for automated analysis pipelines in museum digitization of inscribed artifacts.

Load-bearing premise

The 2,800 images are strictly balanced and representative of the complete evolutionary trajectory, and the annotation paradigm handles all variations without introducing selection bias.

What would settle it

Current VLLMs achieve high accuracy on all four tasks when evaluated on the released dataset, or an independent audit shows the image selection favors particular script stages or media types.

Figures

Figures reproduced from arXiv: 2605.11960 by Bang Li, Can Ma, Chengquan Zhang, Gengluo Li, Guojun Yan, Han Hu, Hao Feng, Pian Wu, Shangpin Peng, Weiping Wang, Xingyu Wan, Xin Xu, Yang Yang, Yipei Ye, Yongge Liu, Yu Zhou, Zengmao Ding, Zhan Shu, Zhe Li.

**Figure 1.** Figure 1: Chronicles-OCR. The top row showcases diverse physical artifact samples from Chronicles-OCR across seven script stages, alongside the morphological evolution of the modern Chinese character “虎” (Tiger). To comprehensively evaluate VLLMs, we introduce a stage-adaptive annotation paradigm and four progressive tasks. Evaluation results reveal substantial capability gaps in the fine-grained visual perception o… view at source ↗

**Figure 2.** Figure 2: Overview of the Chronicles-OCR Benchmark. The benchmark integrates three core components: (1) Data Curation and Image Sourcing across the Seven Chinese Scripts’ evolutionary timeline; (2) Stage-Adaptive Annotation Paradigm, applying character-level grounding for archaic scripts and sequence-level transcriptions for mature ones; and (3) Task Formulation, establishing four differentiated evaluation tasks to … view at source ↗

**Figure 3.** Figure 3: Qualitative Spotting Results on Oracle Bone Script. Compared to the ground truth, leading VLLMs (Seed2.0 Pro and Gemini 3.1 Pro) struggle with three primary failure modes (highlighted in red): missed detections of unconstrained symbols, recognition errors due to semantic gaps, and hallucinations triggered by physical noise. VLLMs fundamentally lack the robust grounding mechanisms needed to isolate unconstr… view at source ↗

**Figure 4.** Figure 4: Qualitative Results of Fine-grained Archaic Character Recognition. By utilizing a visual referring mechanism, this task isolates pure morphological decipherment from spatial localization. Despite explicit visual guidance, current VLLMs still struggle to bridge the profound semantic gap between ancient pictographic glyphs and modern characters. Performance on Fine-grained Archaic Character Recognition. Eval… view at source ↗

**Figure 5.** Figure 5: Qualitative Results of Ancient Text Parsing on Mature Scripts. Even though mature scripts possess standardized layout priors, leading VLLMs still struggle to achieve perfect parsing. As shown in the generated transcriptions (errors highlighted in red), models frequently hallucinate or misinterpret complex continuous strokes, leading to suboptimal NED scores. Performance on Ancient Text Parsing. Evaluated v… view at source ↗

**Figure 6.** Figure 6: Visualization of Script Classification Performance. Models reliably categorize archaic scripts by exploiting macro-level textural priors (e.g., shells, bronzes). Conversely, they exhibit severe confusion among mature scripts due to their fundamental inability to differentiate subtle stroke dynamics (e.g., cursiveness) on identical physical mediums. the global morphological style of ancient scripts or the d… view at source ↗

read the original abstract

Vision Large Language Models (VLLMs) have achieved remarkable success in modern text-rich visual understanding. However, their perceptual robustness in the face of the continuous morphological evolution of historical writing systems remains largely unexplored. Existing ancient text datasets typically focus on isolated historical periods, failing to capture the systematic visual distribution shifts spanning thousands of years. To bridge this gap and empower Digital Humanities, we introduce Chronicles-OCR, the first comprehensive benchmark specifically designed to evaluate the cross-temporal visual perception capabilities of VLLMs across the complete evolutionary trajectory of Chinese characters, known as the Seven Chinese Scripts. Curated in collaboration with top-tier institutional domain experts, the dataset comprises 2,800 strictly balanced images encompassing highly diverse physical media, ranging from tortoise shells to paper-based calligraphy. To accommodate the drastic morphological and topological variations across different historical stages, we propose a novel Stage-Adaptive Annotation Paradigm. Based on this, Chronicles-OCR formulates four rigorous quantitative tasks: cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification. By isolating visual perception from semantic reasoning, Chronicles-OCR provides an authoritative platform to expose the limitations of current VLLMs, paving the way for robust, evolution-aware historical text perception. Chronicles-OCR is publicly available at https://github.com/VirtualLUOUCAS/Chronicles-OCR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Chronicles-OCR gives a new full-trajectory benchmark for VLLM visual robustness on Chinese scripts, but the balancing and annotation details are too thin to support the strong claims yet.

read the letter

The paper's core contribution is a dataset and benchmark that spans the entire seven-script history of Chinese characters in one place, with tasks meant to test pure visual perception rather than language understanding. That coverage is new compared to the period-specific ancient text sets mentioned in the abstract. They release 2800 images, claim expert curation, and define four tasks: cross-period spotting, visual referring recognition, text parsing, and script classification. The stage-adaptive annotation idea is a sensible response to the big shape changes across eras, and making the data public is the right move for anyone working on historical OCR or cultural heritage models. Those parts are straightforward and useful on their face. The soft spot is the repeated assertion of strict balance across scripts, media, and difficulty without any supporting numbers or protocol. The abstract says the images are balanced from tortoise shells to paper but shows no per-script counts, no sampling description, no inter-annotator stats, and no uniformity checks. If the selection favored clearer examples in some periods, the reported VLLM gaps would not isolate visual perception as cleanly as claimed. The stress-test note on this point holds up from what is visible. The four tasks are described at a high level, but without concrete examples or error analysis in the provided text it is hard to judge whether they truly separate vision from semantics. This is for people building or evaluating VLLMs on non-modern scripts and for digital humanities groups that need standardized tests for historical text. A reader who wants a ready-made benchmark to run their model against will get immediate value once the data is downloaded. The work shows clear thinking about the problem and honest engagement with the gap in existing datasets, so it is worth a serious referee even if the methods section needs expansion on curation and balance. I would send it to peer review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper introduces Chronicles-OCR, the first benchmark for evaluating VLLMs on cross-temporal visual perception of Chinese characters across the complete evolutionary trajectory of the Seven Scripts. It presents a curated dataset of 2,800 images spanning tortoise shells to paper-based media, developed with domain experts, and proposes a Stage-Adaptive Annotation Paradigm to handle morphological variations. The benchmark defines four tasks—cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification—explicitly isolating visual perception from semantic reasoning to expose current VLLM limitations.

Significance. If the dataset curation achieves true balance without selection bias and the annotation paradigm is validated, Chronicles-OCR would provide a valuable, publicly available resource for digital humanities and VLLM robustness research. It addresses a clear gap in existing ancient-text datasets, which are limited to isolated periods, and offers quantitative tasks that could drive development of evolution-aware perception models.

major comments (2)

[§3] §3 (Dataset Curation): The claim of a 'strictly balanced' collection of 2,800 images across the Seven Scripts and diverse physical media is not supported by any sampling protocol, per-script counts, media-type stratification, or quantitative uniformity metrics (e.g., distribution statistics or inter-expert agreement scores). Without these, it is impossible to verify that task difficulties are even and that visual perception is isolated from curation artifacts.
[§4] §4 (Stage-Adaptive Annotation Paradigm): The paradigm is described at a high level but lacks concrete methodological details on how it accommodates drastic morphological and topological changes without introducing selection bias, such as explicit exclusion criteria, quantitative bias checks, or validation against the full evolutionary trajectory. This directly affects the reliability of the four tasks and the central claim of an 'authoritative platform'.

minor comments (2)

[Data Availability] The GitHub link is provided in the abstract but should be repeated with a permanent identifier or DOI in the main text and data-availability statement for reproducibility.
[Figures] Figure captions for the example images and task illustrations could be expanded to explicitly note the script period and media type for each sample to aid reader interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for the thorough and constructive comments. We have carefully addressed each major point below and revised the manuscript to incorporate additional details and evidence where the original submission was insufficient.

read point-by-point responses

Referee: [§3] §3 (Dataset Curation): The claim of a 'strictly balanced' collection of 2,800 images across the Seven Scripts and diverse physical media is not supported by any sampling protocol, per-script counts, media-type stratification, or quantitative uniformity metrics (e.g., distribution statistics or inter-expert agreement scores). Without these, it is impossible to verify that task difficulties are even and that visual perception is isolated from curation artifacts.

Authors: We agree that the original manuscript did not provide sufficient quantitative support for the 'strictly balanced' claim. In the revised Section 3, we now include the full sampling protocol developed with domain experts, per-script image counts, media-type stratification tables, distribution statistics, and inter-expert agreement metrics (including Cohen's kappa). These additions enable verification that task difficulties are even across the Seven Scripts and that visual perception is isolated from curation artifacts. revision: yes
Referee: [§4] §4 (Stage-Adaptive Annotation Paradigm): The paradigm is described at a high level but lacks concrete methodological details on how it accommodates drastic morphological and topological changes without introducing selection bias, such as explicit exclusion criteria, quantitative bias checks, or validation against the full evolutionary trajectory. This directly affects the reliability of the four tasks and the central claim of an 'authoritative platform'.

Authors: We acknowledge that the original description of the Stage-Adaptive Annotation Paradigm was at too high a level. In the revised Section 4, we have added concrete methodological details, including explicit exclusion criteria for morphological variations, quantitative bias checks (pre- and post-annotation distribution comparisons), and validation steps against the complete evolutionary trajectory. These changes enhance transparency and support the reliability of the four tasks. revision: yes

Circularity Check

0 steps flagged

No significant circularity in benchmark definition

full rationale

The paper presents a new dataset and benchmark for cross-temporal Chinese character perception, consisting of data curation (2,800 images across Seven Scripts), a Stage-Adaptive Annotation Paradigm, and four task definitions. No mathematical derivations, equations, fitted parameters, or predictive models appear in the provided text. The central claims rest on explicit construction choices (expert curation, task isolation of visual perception) rather than any reduction to self-referential inputs, self-citations, or renamed prior results. The contribution is self-contained as a definitional benchmark without internal circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on domain assumptions about the historical completeness of the seven scripts and the feasibility of strict balancing across media and periods; no free parameters are fitted and no new entities are postulated.

axioms (1)

domain assumption The seven scripts represent the complete evolutionary trajectory of Chinese characters spanning thousands of years.
Stated in the abstract as the foundation for curating images across the full trajectory from tortoise shells to paper-based calligraphy.

pith-pipeline@v0.9.0 · 5603 in / 1260 out tokens · 106326 ms · 2026-05-13T05:43:11.035536+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Curated in collaboration with top-tier institutional domain experts, the dataset comprises 2,800 strictly balanced images...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

83 extracted references · 83 canonical work pages · 13 internal anchors

[1]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, et al. Qwen3-VL technical report.arXiv preprint arXiv:2511.21631, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, et al. Qwen2.5-VL technical report.arXiv preprint arXiv:2502.13923, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

OCRBench v2: An improved benchmark for evaluating large multimodal models on visual text localization and reasoning.arXiv preprint arXiv:2501.00321, 2024

Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, et al. OCRBench v2: An improved benchmark for evaluating large multimodal models on visual text localization and reasoning.arXiv preprint arXiv:2501.00321, 2024

work page arXiv 2024
[4]

OmniDocBench: Benchmarking diverse pdf document parsing with comprehensive annotations

Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man Jiang, Xiaomeng Zhao, et al. OmniDocBench: Benchmarking diverse pdf document parsing with comprehensive annotations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2025

work page 2025
[5]

OCR-Reasoning benchmark: Unveiling the true capabilities of MLLMs in complex text-rich image reasoning.arXiv preprint arXiv:2505.17163, 2025

Mingxin Huang, Yongxin Shi, Dezhi Peng, Songxuan Lai, Zecheng Xie, and Lianwen Jin. OCR-Reasoning benchmark: Unveiling the true capabilities of MLLMs in complex text-rich image reasoning.arXiv preprint arXiv:2505.17163, 2025

work page arXiv 2025
[6]

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9 B Ultra-Compact Vision-Language Model

Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu, Xueqing Wang, Changda Zhou, Hongen Liu, Manhui Lin, et al. PaddleOCR-VL: Boosting multilingual document parsing via a 0.9B ultra-compact vision-language model.arXiv preprint arXiv:2510.14528, 2025

work page arXiv 2025
[7]

Hunyuanocr Technical Report.arXiv preprint arXiv:2511.19575, 2025

Hunyuan Vision Team, Pengyuan Lyu, Xingyu Wan, Gengluo Li, Shangpin Peng, Weinong Wang, Liang Wu, Huawen Shen, Yu Zhou, Canhui Tang, et al. HunyuanOCR technical report.arXiv preprint arXiv:2511.19575, 2025

work page arXiv 2025
[8]

MCS-Bench: A comprehensive benchmark for evaluating multimodal large language models in chinese classical studies

Yang Liu, Jiahuan Cao, Hiuyi Cheng, Yongxin Shi, Kai Ding, and Lianwen Jin. MCS-Bench: A comprehensive benchmark for evaluating multimodal large language models in chinese classical studies. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

work page 2025
[9]

arXiv preprint arXiv:2401.12467 (2024)

Haisu Guan, Jinpeng Wan, Yuliang Liu, Pengjie Wang, Kaile Zhang, Zhebin Kuang, Xinyu Wang, Xiang Bai, and Lianwen Jin. An open dataset for the evolution of oracle bone characters: EVOBC.arXiv preprint arXiv:2401.12467, 2024

work page arXiv 2024
[10]

A graph-based evolutionary dataset for oracle bone characters from inscriptions to modern chinese scripts.npj Heritage Science, 2025

Qingju Jiao, Jingwen Wu, Qi Liu, Han Zhang, Zhan Zhang, Bang Li, Jing Xiong, Guoying Liu, and Yongge Liu. A graph-based evolutionary dataset for oracle bone characters from inscriptions to modern chinese scripts.npj Heritage Science, 2025

work page 2025
[11]

Study on the evolution of chinese characters based on few-shot learning: From oracle bone inscriptions to regular script.Plos one, 2022

Mengru Wang, Yu Cai, Li Gao, Ruichen Feng, Qingju Jiao, Xiaolin Ma, and Yu Jia. Study on the evolution of chinese characters based on few-shot learning: From oracle bone inscriptions to regular script.Plos one, 2022

work page 2022
[12]

OBI-Bench: Can LMMs aid in study of ancient script on oracle bones? InInternational Conference on Learning Representations, volume 2025, 2025

Zijian Chen, Wenjun Zhang, Guangtao Zhai, et al. OBI-Bench: Can LMMs aid in study of ancient script on oracle bones? InInternational Conference on Learning Representations, volume 2025, 2025

work page 2025
[13]

Oracle bone inscriptions information processing: A comprehensive survey.npj Heritage Science, 2026

Zijian Chen, Wenjie Hua, Jinhao Li, Yucheng Zhu, Xiaona Zhi, Zhiji Liu, Tingzhu Chen, Wenjun Zhang, and Guangtao Zhai. Oracle bone inscriptions information processing: A comprehensive survey.npj Heritage Science, 2026. doi: 10.1038/s40494-026-02511-w

work page doi:10.1038/s40494-026-02511-w 2026
[14]

OBC306: A large-scale oracle bone character recognition dataset

Shuangping Huang, Haobin Wang, Yongge Liu, Xiaosong Shi, and Lianwen Jin. OBC306: A large-scale oracle bone character recognition dataset. In2019 International Conference on Document Analysis and Recognition (ICDAR), 2019

work page 2019
[15]

A comprehensive survey of oracle character recognition: challenges, benchmarks, and beyond.arXiv preprint arXiv:2411.11354, 2024

Jing Li, Xueke Chi, Qiufeng Wang, Dahan Wang, Kaizhu Huang, Yongge Liu, and Cheng-lin Liu. A comprehensive survey of oracle character recognition: challenges, benchmarks, and beyond.arXiv preprint arXiv:2411.11354, 2024

work page arXiv 2024
[16]

Historical document processing: Historical document processing: A survey of techniques, tools, and trends.arXiv preprint arXiv:2002.06300, 2020

James P Philips and Nasseh Tabrizi. Historical document processing: Historical document processing: A survey of techniques, tools, and trends.arXiv preprint arXiv:2002.06300, 2020

work page arXiv 2002
[17]

TongGu-VL: Advancing visual-language understanding in chinese classical studies through parameter sensitivity-guided instruction tuning

Jiahuan Cao, Yang Liu, Peirong Zhang, Yongxin Shi, Kai Ding, and Lianwen Jin. TongGu-VL: Advancing visual-language understanding in chinese classical studies through parameter sensitivity-guided instruction tuning. InProceedings of the 33rd ACM International Conference on Multimedia, 2025

work page 2025
[18]

OracleAgent: A multimodal reasoning agent for oracle bone script research.arXiv preprint arXiv:2510.26114, 2025

Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, Xu Peng, Taisong Jin, Yongge Liu, Shengwei Han, Jing Yang, et al. OracleAgent: A multimodal reasoning agent for oracle bone script research.arXiv preprint arXiv:2510.26114, 2025

work page arXiv 2025
[19]

V-oracle: Making progressive reasoning in deciphering oracle bones for you and me

Runqi Qiao, Qiuna Tan, Guanting Dong, MinhuiWu MinhuiWu, Jiapeng Wang, Yifan Zhang, Zhuoma GongQue, Chong Sun, Yida Xu, Yadong Xue, et al. V-oracle: Making progressive reasoning in deciphering oracle bones for you and me. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

work page 2025
[20]

WenyanGPT: A large language model for classical chinese tasks

Xinyu Yao, Mengdi Wang, Bo Chen, and Xiaobing Zhao. WenyanGPT: A large language model for classical chinese tasks. arXiv preprint arXiv:2504.20609, 2025

work page arXiv 2025
[21]

HWOBC-a handwriting oracle bone character recognition database.Journal of Physics: Conference Series, 2020

Bang Li, Qianwen Dai, Feng Gao, Weiye Zhu, Qiang Li, and Yongge Liu. HWOBC-a handwriting oracle bone character recognition database.Journal of Physics: Conference Series, 2020. doi: 10.1088/1742-6596/1651/1/012050. URL https://doi.org/10.1088/1742-6596/1651/1/012050

work page doi:10.1088/1742-6596/1651/1/012050 2020
[22]

OracleFusion: Assisting the decipherment of Oracle Bone Script with structurally constrained semantic typography

Caoshuo Li, Zengmao Ding, Xiaobin Hu, Bang Li, Donghao Luo, AndyPian Wu, Chaoyang Wang, Chengjie Wang, Taisong Jin, Seven Shu, et al. OracleFusion: Assisting the decipherment of Oracle Bone Script with structurally constrained semantic typography. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2025

work page 2025
[23]

arXiv preprint arXiv:2401.15365 (2024)

Pengjie Wang, Kaile Zhang, Xinyu Wang, Shengwei Han, Yongge Liu, Jinpeng Wan, Haisu Guan, Zhebin Kuang, Lianwen Jin, Xiang Bai, et al. An open dataset for oracle bone script recognition and decipherment.arXiv preprint arXiv:2401.15365, 10 2024

work page arXiv 2024
[24]

CASIA-AHCDB: A large-scale chinese ancient handwritten characters database

Yue Xu, Fei Yin, Da-Han Wang, Xu-Yao Zhang, Zhaoxiang Zhang, and Cheng-Lin Liu. CASIA-AHCDB: A large-scale chinese ancient handwritten characters database. In2019 international conference on document analysis and recognition (ICDAR), 2019

work page 2019
[25]

Divination and power: A multiregional view of the development of oracle bone divination in early China

Rowan K Flad. Divination and power: A multiregional view of the development of oracle bone divination in early China. Current Anthropology, 2008

work page 2008
[26]

Graphs, words, and meanings: Three reference works for shang oracle-bone studies, with an excursus on the religious role of the day or sun, 1997

David N Keightley. Graphs, words, and meanings: Three reference works for shang oracle-bone studies, with an excursus on the religious role of the day or sun, 1997

work page 1997
[27]

A research on an intelligent recognition tool for bronze inscriptions of the shang and zhou dynasties.Journal of Chinese Writing Systems, 2020

Rui Guo. A research on an intelligent recognition tool for bronze inscriptions of the shang and zhou dynasties.Journal of Chinese Writing Systems, 2020

work page 2020
[28]

BIRD: Bronze inscription restoration and dating

Wenjie Hua, Hoang H Nguyen, and Gangyan Ge. BIRD: Bronze inscription restoration and dating. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

work page 2025
[29]

LadderMoE: Ladder-side mixture of experts adapters for bronze inscription recognition.arXiv preprint arXiv:2510.01651, 2025

Rixin Zhou, Peiqiang Qiu, Qian Zhang, Chuntao Li, and Xi Yang. LadderMoE: Ladder-side mixture of experts adapters for bronze inscription recognition.arXiv preprint arXiv:2510.01651, 2025

work page arXiv 2025
[30]

Bridging cultural divides: Metadata and the seal collection in a western context

Veronica Fu. Bridging cultural divides: Metadata and the seal collection in a western context. InUnderstanding and Utilizing Informal Archives. IGI Global Scientific Publishing, 2026

work page 2026
[31]

Qin seal script character recognition with fuzzy and incomplete information.Baghdad Science Journal, 2024

Yun Ou, Zhen-Jie Zhou, Di-Wen Kang, Pan Zhou, and Xue-Wei Liu. Qin seal script character recognition with fuzzy and incomplete information.Baghdad Science Journal, 2024

work page 2024
[32]

Style-independent radical sequence learning for zero-shot recognition of small seal script.Journal of the Franklin Institute, 2023

Wenhui Zhou, Jinyu Liu, Jiefeng Li, Jiyi Li, Lili Lin, Fumiyo Fukumoto, and Guojun Dai. Style-independent radical sequence learning for zero-shot recognition of small seal script.Journal of the Franklin Institute, 2023

work page 2023
[33]

stele of cao quan

Liu Guoqing, Hao Changning, Yan Jingbo, Dong Jing, Zhao Zuolong, and Hao Lujia. Stroke extraction algorithm of clerical script in Han dynasty based on contour: Take “stele of cao quan” as an example.Mobile Information Systems, 2022

work page 2022
[34]

Research on efficient calligraphy image classification based on attention enhancement.Mathematics, 2025

Yu Lei, Tianzhao Zhou, and Yuankui Ma. Research on efficient calligraphy image classification based on attention enhancement.Mathematics, 2025

work page 2025
[35]

Juan Wu. Han dynasty portrait image feature extraction and cloud computing-supported symbolic interpretation: A new approach to cultural heritage digitalization.Scalable Computing: Practice and Experience, 2024

work page 2024
[36]

Stroke systems in chinese characters: A systemic functional perspective on simplified regular script

Xuanwei Peng. Stroke systems in chinese characters: A systemic functional perspective on simplified regular script. Semiotica, 2017

work page 2017
[37]

Dense and tight detection of chinese characters in historical documents: Datasets and a recognition guided detector.IEEE Access, 2018

Hailin Yang, Lianwen Jin, Weiguo Huang, Zhaoyang Yang, Songxuan Lai, and Jifeng Sun. Dense and tight detection of chinese characters in historical documents: Datasets and a recognition guided detector.IEEE Access, 2018

work page 2018
[38]

The advantages and disadvantages of Regular Script in the study of calligraphy

Wei Zhang. The advantages and disadvantages of Regular Script in the study of calligraphy. In2nd International Conference on Language, Art and Cultural Exchange (ICLACE 2021), 2021

work page 2021
[39]

RS-GAN: unsupervised running script font generation via disentangled representation learning and contextual transformer.Pattern Analysis and Applications, 2025

Xuanhong Wang, Cong Li, Zengguo Sun, and Luying Hui. RS-GAN: unsupervised running script font generation via disentangled representation learning and contextual transformer.Pattern Analysis and Applications, 2025

work page 2025
[40]

Chinese cursive character detection method.The Journal of Engineering, 2020

Xiao Qin, Jianhui Jiang, Wei Fan, and Changan Yuan. Chinese cursive character detection method.The Journal of Engineering, 2020

work page 2020
[41]

Toward automatic recognition of cursive chinese calligraphy: An open dataset for cursive chinese calligraphy text

Jung Liang, Wen-Hung Liao, and Yi-Chieh Wu. Toward automatic recognition of cursive chinese calligraphy: An open dataset for cursive chinese calligraphy text. In2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM), 2020

work page 2020
[42]

A method of chinese characters changing from regular script to semi-cursive scrip described by track and point set

Yao Wu, Jie Jiang, and Yi Li. A method of chinese characters changing from regular script to semi-cursive scrip described by track and point set. In2018 international joint conference on information, media and engineering (ICIME), 2018

work page 2018
[43]

Unconstrained freehand cursive script: A revolution in chinese calligraphic art.International Journal of Politics, Culture, and Society, 1995

Jia Chen and Kwong Lum. Unconstrained freehand cursive script: A revolution in chinese calligraphic art.International Journal of Politics, Culture, and Society, 1995

work page 1995
[44]

Franz Steiner Verlag, 1998

Adele Schlombs.Huai-su and the beginnings of wild cursive script in Chinese calligraphy. Franz Steiner Verlag, 1998

work page 1998
[45]

The aesthetic structure of Cursive Script.International Journal of Academic Research in Business and Social Sciences, 2023

Zhu Lei Gang, Loy Chee Luen, and Lee Keok Cheong. The aesthetic structure of Cursive Script.International Journal of Academic Research in Business and Social Sciences, 2023

work page 2023
[46]

Study on the evolution of chinese characters based on few-shot learning: From oracle bone inscriptions to regular script.PLOS ONE, 2022

Mengru Wang, Yu Cai, Li Gao, Ruichen Feng, Qingju Jiao, Xiaolin Ma, and Yu Jia. Study on the evolution of chinese characters based on few-shot learning: From oracle bone inscriptions to regular script.PLOS ONE, 2022. doi: 10.1371/ journal.pone.0272974

work page 2022
[47]

Towards real-world document parsing via realistic scene synthesis and document-aware training, 2026

Gengluo Li, Pengyuan Lyu, Chengquan Zhang, Huawen Shen, Liang Wu, Xingyu Wan, Gangyan Zeng, Han Hu, Can Ma, and Yu Zhou. Towards real-world document parsing via realistic scene synthesis and document-aware training, 2026

work page 2026
[48]

MMTIT-Bench: A multilingual and multi-scenario benchmark with cognition-perception-reasoning guided text-image machine translation, 2026

Gengluo Li, Chengquan Zhang, Yupu Liang, Huawen Shen, Yaping Zhang, Pengyuan Lyu, Weinong Wang, Xingyu Wan, Gangyan Zeng, Han Hu, Can Ma, and Yu Zhou. MMTIT-Bench: A multilingual and multi-scenario benchmark with cognition-perception-reasoning guided text-image machine translation, 2026. 11

work page 2026
[49]

Mitigating object hallucinations via sentence-level early intervention

Shangpin Peng, Senqiao Yang, Li Jiang, and Zhuotao Tian. Mitigating object hallucinations via sentence-level early intervention. InProceedings of the IEEE International Conference on Computer Vision, 2025

work page 2025
[50]

Language models are few-shot learners.Advances in neural information processing systems, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 2020

work page 2020
[51]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 technical report.arXiv preprint arXiv:2303.08774, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[52]

OpenAI GPT-5 System Card

Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, et al. OpenAI GPT-5 system card.arXiv preprint arXiv:2601.03267, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[53]

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. Qwen2-VL: Enhancing vision-language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[54]

Qwen-VL: A versatile vision-language model for understanding, localization.Text Reading, and Beyond, 2023

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-VL: A versatile vision-language model for understanding, localization.Text Reading, and Beyond, 2023

work page 2023
[55]

Visual instruction tuning.Advances in neural information processing systems, 2023

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 2023

work page 2023
[56]

Improved baselines with visual instruction tuning

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

work page 2024
[57]

LLaV A-NeXT: Improved reasoning, OCR, and world knowledge, 2024

Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. LLaV A-NeXT: Improved reasoning, OCR, and world knowledge, 2024

work page 2024
[58]

InstructBLIP: Towards general-purpose vision-language models with instruction tuning, 2023

Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven Hoi. InstructBLIP: Towards general-purpose vision-language models with instruction tuning, 2023

work page 2023
[59]

Uni-DPO: A unified paradigm for dynamic preference optimization of LLMs.arXiv preprint arXiv:2506.10054, 2025

Shangpin Peng, Weinong Wang, Zhuotao Tian, Senqiao Yang, Xing Wu, Haotian Xu, Chengquan Zhang, Takashi Isobe, Baotian Hu, and Min Zhang. Uni-DPO: A unified paradigm for dynamic preference optimization of LLMs.arXiv preprint arXiv:2506.10054, 2025

work page arXiv 2025
[60]

GPT-4V(ision) system card, 2023

OpenAI. GPT-4V(ision) system card, 2023

work page 2023
[61]

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mohamed Elhoseiny. MiniGPT-4: Enhancing vision-language understanding with advanced large language models.arXiv preprint arXiv:2304.10592, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[62]

Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe

Wenjin Hou, Shangpin Peng, Weinong Wang, Zheng Ruan, Yue Zhang, Zhenglin Zhou, Mingqi Gao, Yifei Chen, Kaiqi Wang, Hongming Yang, et al. Uni-OPD: Unifying on-policy distillation with a dual-perspective recipe.arXiv preprint arXiv:2605.03677, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[63]

Qwen3.5: Towards native multimodal agents, February 2026

Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https://qwen.ai/blog?id= qwen3.5

work page 2026
[64]

M5HisDoc: A large-scale multi-style chinese historical document analysis benchmark

Yongxin Shi, Chongyu Liu, Dezhi Peng, Cheng Jian, Jiarong Huang, and Lianwen Jin. M5HisDoc: A large-scale multi-style chinese historical document analysis benchmark. InThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023

work page 2023
[65]

Component-level oracle bone inscription retrieval

Zhikai Hu, Yiu-ming Cheung, Yonggang Zhang, Peiying Zhang, and Pui-ling Tang. Component-level oracle bone inscription retrieval. InProceedings of the 2024 International Conference on Multimedia Retrieval, 2024

work page 2024
[66]

Oracle bone inscriptions multi-modal dataset.arXiv preprint arXiv:2407.03900, 2024

Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, et al. Oracle bone inscriptions multi-modal dataset.arXiv preprint arXiv:2407.03900, 2024

work page arXiv 2024
[67]

A large-scale dataset for chinese historical document recognition and analysis.Scientific Data, 2025

Yongxin Shi, Dezhi Peng, Yuyi Zhang, Jiahuan Cao, and Lianwen Jin. A large-scale dataset for chinese historical document recognition and analysis.Scientific Data, 2025

work page 2025
[68]

Pictobi-20k: Unveiling large multimodal models in visual decipherment for pictographic oracle bone characters

Zijian Chen, Wenjie Hua, Jinhao Li, Lirong Deng, Fan Du, Tingzhu Chen, and Guangtao Zhai. Pictobi-20k: Unveiling large multimodal models in visual decipherment for pictographic oracle bone characters. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026

work page 2026
[69]

Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning

Rui Song, Lida Shi, Ruihua Qi, Yingji Li, and Hao Xu. Enhancing multimodal large language models for ancient chinese character evolution analysis via glyph-driven fine-tuning.arXiv preprint arXiv:2604.11299, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[70]

Binary codes capable of correcting deletions, insertions, and reversals

Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. InSoviet physics doklady, 1966

work page 1966
[71]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. InternVL3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[72]

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivi`ere, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024. 12

work page internal anchor Pith review Pith/arXiv arXiv 2024
[73]

Minicpm-v 4.5: Cooking efficient mllms via architecture, data, and training recipe.arXiv preprint arXiv:2509.18154, 2025

Tianyu Yu, Zefan Wang, Chongyi Wang, Fuwei Huang, Wenshuo Ma, Zhihui He, Tianchi Cai, Weize Chen, Yuxiang Huang, Yuanqian Zhao, Bokai Xu, Junbo Cui, Yingjing Xu, Liqing Ruan, Luoyuan Zhang, Hanyu Liu, Jingkun Tang, Hongyuan Liu, Qining Guo, Wenhao Hu, Bingxiang He, Jie Zhou, Jie Cai, Ji Qi, Zonghao Guo, Chi Chen, Guoyang Zeng, Yuxuan Li, Ganqu Cui, Ning D...

work page arXiv 2025
[74]

Molmo and PixMo: Open weights and open data for state-of-the-art vision-language models

Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, et al. Molmo and PixMo: Open weights and open data for state-of-the-art vision-language models. InProceedings of the Computer Vision and Pattern Recognition Conference, 2025

work page 2025
[75]

Ovis2.5 technical report, 2025

Shiyin Lu, Yang Li, Yu Xia, Yuwei Hu, Shanshan Zhao, Yanqing Ma, Zhichao Wei, Yinglun Li, Lunhao Duan, Jianshan Zhao, Yuxuan Han, Haijun Li, Wanying Chen, Junke Tang, Chengkun Hou, Zhixing Du, Tianli Zhou, Wenjie Zhang, Huping Ding, Jiahe Li, Wen Li, Gui Hu, Yiliang Gu, Siran Yang, Jiamang Wang, Hailong Sun, Yibo Wang, Hui Sun, Jinlong Huang, Yuping He, S...

work page arXiv 2025
[76]

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

V Team, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Bin Chen, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Jiazheng Xu, Jiale Zhu, Jiali Chen, J...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[77]

Kimi K2.5: Visual Agentic Intelligence

Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al. Kimi K2.5: Visual agentic intelligence.arXiv preprint arXiv:2602.02276, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[78]

Seed1.8 Model Card: Towards generalized real-world agency, 2025

Bytedance Seed. Seed1.8 Model Card: Towards generalized real-world agency, 2025. URL https://github.com/ ByteDance-Seed/Seed-1.8/blob/main/Seed-1.8-Modelcard.pdf

work page 2025
[79]

Seed2.0 Model Card: Towards intelligence frontier for real-world complexity, February 2026

ByteDance Seed Team. Seed2.0 Model Card: Towards intelligence frontier for real-world complexity, February 2026. URL https://github.com/ByteDance-Seed/Seed2.0. Model Card

work page 2026
[80]

Xiaomi MiMo-V2-Omni: See, hear, act in the agentic era

Xiaomi Corporation. Xiaomi MiMo-V2-Omni: See, hear, act in the agentic era. https://mimo.xiaomi.com/ mimo-v2-omni, 2026

work page 2026

Showing first 80 references.