Beyond Single Character: Evaluating MLLMs for Sentence-Level Oracle Bone Inscription Understanding

Guangtao Zhai; Tingzhu Chen; Zijian Chen; Ziqi Li

arxiv: 2606.31169 · v1 · pith:XKS5LXHVnew · submitted 2026-06-30 · 💻 cs.CV

Beyond Single Character: Evaluating MLLMs for Sentence-Level Oracle Bone Inscription Understanding

Ziqi Li , Zijian Chen , Tingzhu Chen , Guangtao Zhai This is my paper

Pith reviewed 2026-07-01 06:26 UTC · model grok-4.3

classification 💻 cs.CV

keywords oracle bone inscriptionmultimodal large language modelssentence-level understandingbenchmarkglyph substitutioncontextual reasoningvisual perceptionsemantic matching

0 comments

The pith

Current MLLMs remain dependent on character-level recognition when attempting sentence-level understanding of oracle bone inscriptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the S-OBI benchmark to test whether multimodal large language models can handle full sentences in oracle bone inscriptions instead of isolated characters. It builds clean test instances by replacing characters in 95 expert-verified rubbings with standardized glyphs while keeping the original layout and meaning intact. Three tasks—semantic matching, slot extraction, and contextual reasoning—produce 695 question-answer pairs that expose how models misread visible parts of a sentence and then fail on masked characters. The results indicate that sentence-level processing in existing models still routes through single-character identification rather than true contextual coherence. The benchmark is positioned as a diagnostic tool to check progress toward structured inscription understanding.

Core claim

Experiments reveal the inferiority of contemporary MLLMs on sentence-level OBI understanding. In particular, visual perception errors in unmasked regions propagate through the reasoning chain, leading to erroneous predictions for masked characters, which indicates that sentence-level OBI understanding in current models remains strongly dependent on character-level recognition.

What carries the argument

The S-OBI benchmark, which creates sentence-level OBI instances by glyph substitution and composition on 95 original rubbings to isolate contextual understanding from low-level visual noise.

If this is right

Models that succeed on S-OBI would need to integrate long-form textual coherence and contextual dependencies rather than processing characters in isolation.
The three tasks demonstrate that current errors arise specifically from propagation of perception mistakes across the sentence.
S-OBI separates evaluation of inscription-level understanding from the effects of noisy rubbings.
The 695 QA pairs supply concrete test cases for measuring whether models can move past single-character reliance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The synthesis approach could be applied to other damaged historical scripts to create controlled tests of contextual reasoning.
If dependence on character recognition persists, real-world use on incomplete archaeological finds would remain limited.
Architectures that explicitly model inscription layout or semantic slots might reduce the observed error propagation.
Improved performance on S-OBI could indicate readiness for broader ancient-text interpretation tasks beyond oracle bones.

Load-bearing premise

Replacing characters in original rubbings with clean glyph samples while preserving overall structure and semantics creates instances that test sentence-level understanding rather than low-level visual distortions.

What would settle it

An MLLM that correctly infers masked characters in S-OBI sentences despite making clear visual errors on the surrounding unmasked characters would falsify the claim of strong dependence on character-level recognition.

Figures

Figures reproduced from arXiv: 2606.31169 by Guangtao Zhai, Tingzhu Chen, Zijian Chen, Ziqi Li.

**Figure 1.** Figure 1: Examples of OBI appearances covered in constructing the S-OBI. The red and green rectangles are paired original sentential OBI rubbings and the corresponding synthetic OBI sentences, respectively. 1 Introduction Oracle bone inscriptions (OBIs) are among the earliest mature writing systems in ancient China. They were mainly carved on turtle plastrons and animal bones in the late Shang dynasty and provide im… view at source ↗

**Figure 2.** Figure 2: The construction pipeline of S-OBI. We reconstruct the sentence-level OBI images based on the expert-corrected, punctuated, interpreted multi-character OBI images. Three task tiers are designed to evaluate MLLMs through semantic matching, slot extraction, and order-sensitive contextual understanding. LMM-based OBI decipherment systems [21,14]. Nevertheless, current LMMs remain fragile on noisy, original, … view at source ↗

**Figure 3.** Figure 3: Pearson’s correlation between different tasks. 4.3 Diagnostic Results Models capture coarse order more easily than sentence-level meaning. Tab. 2 reports an average score of 79.6% for semantic-unit order selection, but only 22.6% for meaning matching. This contrast is visible even in strong runs. For example, Qwen3.6-Plus-thinking reaches 89.2% on order selection but 32.6% on meaning matching, while DeepSe… view at source ↗

**Figure 4.** Figure 4: Case studies on semantic matching and contextual reasoning tasks across different difficulties. Correlation analysis [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Existing AI-assisted oracle bone inscription (OBI) visual recognition and understanding studies mainly focus on character-level, ignoring the long-form textual coherence and contextual dependencies embedded in complete divination charges. Recently, the powerful visual perception capabilities of multimodal large language models (MLLMs) have opened new possibilities for OBI information processing. In this work, we introduce S-OBI, a novel benchmark for evaluating MLLMs in Sentence-level OBI understanding. Instead of using noisy and incomplete rubbings as the visual input, S-OBI synthesizes clear and standardized sentence-level OBI instances through glyph substitution and composition. According to 95 original rubbings with translations that have been identified, corrected, and verified by experts, we replace characters in the original rubbings with corresponding clean glyph samples sourced from existing OBI datasets while preserving the overall inscriptional structure and semantic organization. This mitigates the influence of low-level distortions and enables a more focused evaluation of sentence-level OBI understanding. Based on this, we design semantic matching, semantic slot extraction, and contextual reasoning tasks and obtain 695 question-answer pairs. Experiments reveal the inferiority of contemporary MLLMs on sentence-level OBI understanding. In particular, visual perception errors in unmasked regions propagate through the reasoning chain, leading to erroneous predictions for masked characters, which indicates that sentence-level OBI understanding in current models remains strongly dependent on character-level recognition. Overall, S-OBI provides a diagnostic benchmark for evaluating whether MLLMs can move beyond isolated character recognition toward structured inscription-level understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a new S-OBI benchmark via glyph substitution on verified rubbings to test sentence-level OBI understanding in MLLMs, but the abstract gives no experimental numbers or controls so the propagation claim is hard to evaluate.

read the letter

The main thing to know is that this paper creates S-OBI, a benchmark with 695 QA pairs across semantic matching, slot extraction, and contextual reasoning tasks. They start from 95 expert-verified rubbings, replace characters with clean glyphs from existing datasets while keeping the layout and meaning, and use that to argue that current MLLMs still depend on character-level recognition because perception mistakes in unmasked areas spread to wrong answers on masked ones.

The synthesis step is the clearest piece of new work. It directly targets the usual problem of noisy rubbings so the evaluation can focus on whether models handle sentence context instead of just isolated glyphs. That construction is described clearly enough in the abstract to show the intent.

The soft spot is the complete lack of reported results. No models are named, no metrics or baselines appear, and there is no description of how they measured or confirmed the error propagation. Without those details the central observation stays untested. The stress-test concern also lands: even after substitution, style or scale differences between the inserted glyphs and what the models saw in training could still cause the unmasked-region errors, which would make the finding about sentence-level dependence weaker than claimed. The abstract asserts the substitution removes low-level distortions but offers no check that the resulting unmasked characters are actually unambiguous to the tested models.

This is narrow work aimed at digital humanities researchers who work on oracle bone inscriptions or at groups that build domain-specific MLLM benchmarks. It does not claim broader impact on general multimodal reasoning. The idea of a diagnostic benchmark is reasonable, but the current write-up is too thin on evidence to stand on its own. I would send it to peer review so the authors can supply the missing experimental section and address whether the synthesis truly separates visual perception from contextual reasoning.

Referee Report

1 major / 0 minor

Summary. The paper introduces the S-OBI benchmark for evaluating multimodal large language models (MLLMs) on sentence-level oracle bone inscription (OBI) understanding. It constructs 695 QA pairs from 95 expert-verified rubbings by synthesizing clear instances via glyph substitution (replacing characters with clean samples from existing datasets while preserving structure and semantics), then evaluates on semantic matching, semantic slot extraction, and contextual reasoning tasks. The central experimental finding is that contemporary MLLMs perform poorly, with visual perception errors on unmasked regions propagating to incorrect predictions on masked characters, indicating that sentence-level OBI understanding remains strongly dependent on character-level recognition rather than contextual reasoning.

Significance. If the results hold after addressing the synthesis validation, S-OBI would serve as a useful diagnostic benchmark for probing whether MLLMs can achieve structured, inscription-level understanding beyond isolated character recognition in a specialized historical domain. The expert verification of the 95 source rubbings and the explicit design to isolate sentence-level factors (by mitigating low-level distortions) are concrete strengths. The work is purely empirical with no machine-checked proofs, reproducible code releases, or parameter-free derivations noted.

major comments (1)

[S-OBI Construction (glyph substitution procedure)] The central claim that error propagation demonstrates dependence on character-level recognition (rather than sentence-level deficits) rests on the assumption that the glyph-substitution synthesis produces visually unambiguous unmasked glyphs. The construction procedure (replacing characters in the 95 rubbings with clean samples while preserving overall structure) provides no quantitative evidence—such as human clarity ratings, style-consistency metrics, or comparison against MLLM training distributions—that the composed images are free of rendering artifacts, scale inconsistencies, or domain mismatches. This is load-bearing for the interpretation of the experimental outcomes.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the single major comment below.

read point-by-point responses

Referee: [S-OBI Construction (glyph substitution procedure)] The central claim that error propagation demonstrates dependence on character-level recognition (rather than sentence-level deficits) rests on the assumption that the glyph-substitution synthesis produces visually unambiguous unmasked glyphs. The construction procedure (replacing characters in the 95 rubbings with clean samples while preserving overall structure) provides no quantitative evidence—such as human clarity ratings, style-consistency metrics, or comparison against MLLM training distributions—that the composed images are free of rendering artifacts, scale inconsistencies, or domain mismatches. This is load-bearing for the interpretation of the experimental outcomes.

Authors: We agree that the original submission lacks quantitative validation of the synthesized images and that this is a substantive gap for the central interpretation. The construction uses clean glyphs from existing datasets and preserves structure from expert-verified rubbings, but no explicit metrics (clarity ratings, artifact checks, or distribution comparisons) were reported. In the revised manuscript we will add an appendix containing: (1) expert clarity ratings (1-5 scale) on a random sample of 50 synthesized sentences, (2) simple style-consistency statistics (stroke-width variance across substituted glyphs), and (3) a brief comparison of glyph sources against common OBI training distributions. We will also release the synthesis code. These additions directly address the load-bearing assumption. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark with no derivation chain

full rationale

The paper introduces an empirical benchmark (S-OBI) via glyph substitution on 95 expert-verified rubbings to create 695 QA pairs for three tasks. No equations, fitted parameters, predictions, or uniqueness theorems appear. The central claim rests on observed MLLM performance differences rather than any self-referential reduction or self-citation load-bearing step. The synthesis method is a one-time data construction step, not a derivation that collapses to its inputs by construction. This matches the default expectation for non-circular empirical evaluation work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Benchmark creation paper; no free parameters, mathematical axioms, or new postulated entities are introduced or required by the central claim in the abstract.

pith-pipeline@v0.9.1-grok · 5823 in / 1172 out tokens · 26338 ms · 2026-07-01T06:26:57.584486+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 10 canonical work pages · 4 internal anchors

[1]

Qwen2.5-VL Technical Report

Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., et al.: Qwen2.5-VL technical report. arXiv preprint arXiv:2502.13923 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Chen, Z., Chen, T., Zhang, W., Zhai, G.: OBI-Bench: Can LMMs aid in the study of ancient script on oracle bones? In: Proceedings of the International Conference on Learning Representations (2025)

2025
[3]

arXiv preprint arXiv:2509.05773 (2025)

Chen, Z., Hua, W., Li, J., Deng, L., Du, F., Chen, T., Zhai, G.: PictOBI-20k: Unveiling large multimodal models in visual decipherment for pictographic oracle bone characters. arXiv preprint arXiv:2509.05773 (2025)

work page arXiv 2025
[4]

npj Heritage Science14, 220 (2026)

Chen, Z., Hua, W., Li, J., Zhu, Y., Zhi, X., Liu, Z., Chen, T., Zhang, W., Zhai, G.: Oracle bone inscriptions information processing: A comprehensive survey. npj Heritage Science14, 220 (2026). https://doi.org/10.1038/s40494-026-02511-w

work page doi:10.1038/s40494-026-02511-w 2026
[5]

arXiv preprint arXiv:2507.00490 (2025) 12 Li et al

Chen, Z., Tian, Y., Sun, Y., Sun, W., Zhang, Z., Lin, W., Zhai, G., Zhang, W.: Just noticeable difference for large multimodal models. arXiv preprint arXiv:2507.00490 (2025) 12 Li et al

work page arXiv 2025
[6]

https://huggingface.co/deepseek-ai/ DeepSeek-V4-Pro (2026), accessed 22 June 2026

DeepSeek-AI: DeepSeek-V4-Pro model card. https://huggingface.co/deepseek-ai/ DeepSeek-V4-Pro (2026), accessed 22 June 2026

2026
[7]

International Journal of Digital Humanities5(2), 65–79 (2023)

Fujikawa, Y., Li, H., Yue, X., Aravinda, C., Prabhu, G.A., Meng, L.: Recognition of oracle bone inscriptions by using two deep learning models. International Journal of Digital Humanities5(2), 65–79 (2023)

2023
[8]

https://ai.google.dev/gemini-api/docs/ models (2026), accessed 22 June 2026

Google: Gemini model documentation. https://ai.google.dev/gemini-api/docs/ models (2026), accessed 22 June 2026

2026
[9]

An open dataset for the evolution of oracle bone characters: Evobc

Guan, H., Wan, J., Liu, Y., Wang, P., Zhang, K., Kuang, Z., Wang, X., Bai, X., Jin, L.: An open dataset for the evolution of oracle bone characters: EVOBC. arXiv preprint arXiv:2401.12467 (2024)

work page arXiv 2024
[10]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Huang, S., Price, B., Fan, Y., Morse, B.: Beyond single images: A comprehensive benchmark for album-level vision-language understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 38564– 38573 (2026)

2026
[11]

In: Proceedings of the International Conference on Document Analysis and Recognition

Huang, S., Wang, H., Liu, Y., Shi, X., Jin, L.: OBC306: A large-scale oracle bone character recognition dataset. In: Proceedings of the International Conference on Document Analysis and Recognition. pp. 681–688 (2019)

2019
[12]

arXiv preprint arXiv:2411.17837 (2024)

Jiang, H., Pan, Y., Chen, J., Liu, Z., Zhou, Y., Shu, P., Li, Y., Zhao, H., Mihm, S., Howe, L.C., Liu, T.: OracleSage: Towards unified visual-linguistic understand- ing of oracle bone scripts through cross-modal knowledge fusion. arXiv preprint arXiv:2411.17837 (2024)

work page arXiv 2024
[13]

(ed.): Sources of Shang History: The Oracle-Bone Inscriptions of Bronze Age China

Keightley, D.N. (ed.): Sources of Shang History: The Oracle-Bone Inscriptions of Bronze Age China. University of California Press, Berkeley (1978)

1978
[14]

arXiv preprint arXiv:2510.26114 (2025)

Li, C., Ding, Z., Hu, X., Li, B., Luo, D., Peng, X., Jin, T., Liu, Y., Han, S., Yang, J., et al.: Oracleagent: A multimodal reasoning agent for oracle bone script research. arXiv preprint arXiv:2510.26114 (2025)

work page arXiv 2025
[15]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li, C., Ding, Z., Hu, X., Li, B., Luo, D., Wu, A., Wang, C., Wang, C., Jin, T., Shu, S., et al.: Oraclefusion: Assisting the decipherment of oracle bone script with structurally constrained semantic typography. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19893–19902 (2025)

2025
[16]

Pattern Recognition169, 111824 (2026)

Li, J., Chi, X., Wang, Q., Huang, K., Wang, D.H., Liu, Y., Liu, C.L.: A compre- hensive survey of oracle character recognition: Challenges, datasets, methodology, and beyond. Pattern Recognition169, 111824 (2026)

2026
[17]

Displays89, 103059 (2025)

Li, J., Chen, Z., Chen, T., Liu, Z., Wang, C.: Obiformer: A fast attentive denoising framework for oracle bone inscriptions. Displays89, 103059 (2025)

2025
[18]

In: Pro- ceedings of the 33rd ACM International Conference on Multimedia

Li, J., Chen, Z., Jiang, R., Chen, T., Wang, C., Zhai, G.: Mitigating long-tail distribution in oracle bone inscriptions: Dataset, model, and benchmark. In: Pro- ceedings of the 33rd ACM International Conference on Multimedia. pp. 7729–7738 (2025)

2025
[19]

https://doi.org/10.21203/ rs.3.rs-9733608/v1

Li, Z., Chen, Z., Yang, Z., Liu, Z., Zhai, G., Chen, T.: Roots: Recognizing oracle bone inscriptions via an organized tree structure (2026). https://doi.org/10.21203/ rs.3.rs-9733608/v1

2026
[20]

Journal of image and graphics8(4), 114–119 (2020)

Liu, M., Liu, G., Liu, Y., Jiao, Q.: Oracle bone inscriptions recognition based on deep convolutional neural network. Journal of image and graphics8(4), 114–119 (2020)

2020
[21]

The Innovation (2026)

Liu, Y., Guan, H., Wang, P., Wang, X., Wan, J., Zhang, K., Zheng, H., Liu, X., Kuang, Z., Yang, H., et al.: Alphaoracle: Oracle bone script decipherment via human-workflow-inspired deep learning. The Innovation (2026)

2026
[22]

Guangxi Education Press (11 2005) Title Suppressed Due to Excessive Length 13

Liu, Z., Zhang, D., Takashima, K.i., Zang, K., et al.: A Classified Collection of Oracle Bone Inscriptions with Modern Chinese and English Translations. Guangxi Education Press (11 2005) Title Suppressed Due to Excessive Length 13

2005
[23]

GPT-4o System Card

OpenAI: GPT-4o system card. arXiv preprint arXiv:2410.21276 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

https://openai.com/index/introducing-gpt-5-2/ (2026), accessed 22 June 2026

OpenAI: Introducing GPT-5.2. https://openai.com/index/introducing-gpt-5-2/ (2026), accessed 22 June 2026

2026
[25]

Qwen3 Technical Report

Qwen Team: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[26]

https://huggingface.co/Qwen/Qwen3.5-9B (2026), accessed 22 June 2026

Qwen Team: Qwen3.5-9B model card. https://huggingface.co/Qwen/Qwen3.5-9B (2026), accessed 22 June 2026

2026
[27]

https://qwen.ai/models (2026), ac- cessed 22 June 2026

Qwen Team: Qwen3.6 model documentation. https://qwen.ai/models (2026), ac- cessed 22 June 2026

2026
[28]

Scientific Data11, 87 (2024)

Wang, M., Deng, W.: A dataset of oracle characters for benchmarking machine learning algorithms. Scientific Data11, 87 (2024)

2024
[29]

Scientific Data11, 976 (2024)

Wang, P., Zhang, K., Wang, X., Han, S., Liu, Y., Wan, J., Guan, H., Kuang, Z., Jin, L., Bai, X., Liu, Y.: An open dataset for oracle bone script recognition and decipherment. Scientific Data11, 976 (2024)

2024
[30]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Wang, W., Gao, Z., Gu, L., Pu, H., Cui, L., Wei, X., Liu, Z., Jing, L., Ye, S., Shao, J., et al.: InternVL3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency. arXiv preprint arXiv:2508.18265 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[31]

ACM Journal on Computing and Cultural Heritage15(4), 1–20 (2022)

Yue, X., Li, H., Fujikawa, Y., Meng, L.: Dynamic dataset augmentation for deep learning-based oracle bone inscriptions recognition. ACM Journal on Computing and Cultural Heritage15(4), 1–20 (2022)

2022
[32]

In: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Zhang, J., Li, R., Pang, H., Xia, D., Zhu, Z., Zhang, Q., Li, C., Yang, X.: Special- izing large models for oracle bone script interpretation via component-grounded multimodal knowledge augmentation. In: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 35221–35236 (2026)

2026
[33]

In: Findings of the Association for Computational Linguistics: ACL 2026

Zhou, Z., Shi, D., Shi, L., Song, R., Qiu, P., Diao, X., Xu, H.: Acse: An ancient character semantic-aware embedding for large language models. In: Findings of the Association for Computational Linguistics: ACL 2026. pp. 9000–9012 (2026)

2026

[1] [1]

Qwen2.5-VL Technical Report

Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., et al.: Qwen2.5-VL technical report. arXiv preprint arXiv:2502.13923 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Chen, Z., Chen, T., Zhang, W., Zhai, G.: OBI-Bench: Can LMMs aid in the study of ancient script on oracle bones? In: Proceedings of the International Conference on Learning Representations (2025)

2025

[3] [3]

arXiv preprint arXiv:2509.05773 (2025)

Chen, Z., Hua, W., Li, J., Deng, L., Du, F., Chen, T., Zhai, G.: PictOBI-20k: Unveiling large multimodal models in visual decipherment for pictographic oracle bone characters. arXiv preprint arXiv:2509.05773 (2025)

work page arXiv 2025

[4] [4]

npj Heritage Science14, 220 (2026)

Chen, Z., Hua, W., Li, J., Zhu, Y., Zhi, X., Liu, Z., Chen, T., Zhang, W., Zhai, G.: Oracle bone inscriptions information processing: A comprehensive survey. npj Heritage Science14, 220 (2026). https://doi.org/10.1038/s40494-026-02511-w

work page doi:10.1038/s40494-026-02511-w 2026

[5] [5]

arXiv preprint arXiv:2507.00490 (2025) 12 Li et al

Chen, Z., Tian, Y., Sun, Y., Sun, W., Zhang, Z., Lin, W., Zhai, G., Zhang, W.: Just noticeable difference for large multimodal models. arXiv preprint arXiv:2507.00490 (2025) 12 Li et al

work page arXiv 2025

[6] [6]

https://huggingface.co/deepseek-ai/ DeepSeek-V4-Pro (2026), accessed 22 June 2026

DeepSeek-AI: DeepSeek-V4-Pro model card. https://huggingface.co/deepseek-ai/ DeepSeek-V4-Pro (2026), accessed 22 June 2026

2026

[7] [7]

International Journal of Digital Humanities5(2), 65–79 (2023)

Fujikawa, Y., Li, H., Yue, X., Aravinda, C., Prabhu, G.A., Meng, L.: Recognition of oracle bone inscriptions by using two deep learning models. International Journal of Digital Humanities5(2), 65–79 (2023)

2023

[8] [8]

https://ai.google.dev/gemini-api/docs/ models (2026), accessed 22 June 2026

Google: Gemini model documentation. https://ai.google.dev/gemini-api/docs/ models (2026), accessed 22 June 2026

2026

[9] [9]

An open dataset for the evolution of oracle bone characters: Evobc

Guan, H., Wan, J., Liu, Y., Wang, P., Zhang, K., Kuang, Z., Wang, X., Bai, X., Jin, L.: An open dataset for the evolution of oracle bone characters: EVOBC. arXiv preprint arXiv:2401.12467 (2024)

work page arXiv 2024

[10] [10]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Huang, S., Price, B., Fan, Y., Morse, B.: Beyond single images: A comprehensive benchmark for album-level vision-language understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 38564– 38573 (2026)

2026

[11] [11]

In: Proceedings of the International Conference on Document Analysis and Recognition

Huang, S., Wang, H., Liu, Y., Shi, X., Jin, L.: OBC306: A large-scale oracle bone character recognition dataset. In: Proceedings of the International Conference on Document Analysis and Recognition. pp. 681–688 (2019)

2019

[12] [12]

arXiv preprint arXiv:2411.17837 (2024)

Jiang, H., Pan, Y., Chen, J., Liu, Z., Zhou, Y., Shu, P., Li, Y., Zhao, H., Mihm, S., Howe, L.C., Liu, T.: OracleSage: Towards unified visual-linguistic understand- ing of oracle bone scripts through cross-modal knowledge fusion. arXiv preprint arXiv:2411.17837 (2024)

work page arXiv 2024

[13] [13]

(ed.): Sources of Shang History: The Oracle-Bone Inscriptions of Bronze Age China

Keightley, D.N. (ed.): Sources of Shang History: The Oracle-Bone Inscriptions of Bronze Age China. University of California Press, Berkeley (1978)

1978

[14] [14]

arXiv preprint arXiv:2510.26114 (2025)

Li, C., Ding, Z., Hu, X., Li, B., Luo, D., Peng, X., Jin, T., Liu, Y., Han, S., Yang, J., et al.: Oracleagent: A multimodal reasoning agent for oracle bone script research. arXiv preprint arXiv:2510.26114 (2025)

work page arXiv 2025

[15] [15]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Li, C., Ding, Z., Hu, X., Li, B., Luo, D., Wu, A., Wang, C., Wang, C., Jin, T., Shu, S., et al.: Oraclefusion: Assisting the decipherment of oracle bone script with structurally constrained semantic typography. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19893–19902 (2025)

2025

[16] [16]

Pattern Recognition169, 111824 (2026)

Li, J., Chi, X., Wang, Q., Huang, K., Wang, D.H., Liu, Y., Liu, C.L.: A compre- hensive survey of oracle character recognition: Challenges, datasets, methodology, and beyond. Pattern Recognition169, 111824 (2026)

2026

[17] [17]

Displays89, 103059 (2025)

Li, J., Chen, Z., Chen, T., Liu, Z., Wang, C.: Obiformer: A fast attentive denoising framework for oracle bone inscriptions. Displays89, 103059 (2025)

2025

[18] [18]

In: Pro- ceedings of the 33rd ACM International Conference on Multimedia

Li, J., Chen, Z., Jiang, R., Chen, T., Wang, C., Zhai, G.: Mitigating long-tail distribution in oracle bone inscriptions: Dataset, model, and benchmark. In: Pro- ceedings of the 33rd ACM International Conference on Multimedia. pp. 7729–7738 (2025)

2025

[19] [19]

https://doi.org/10.21203/ rs.3.rs-9733608/v1

Li, Z., Chen, Z., Yang, Z., Liu, Z., Zhai, G., Chen, T.: Roots: Recognizing oracle bone inscriptions via an organized tree structure (2026). https://doi.org/10.21203/ rs.3.rs-9733608/v1

2026

[20] [20]

Journal of image and graphics8(4), 114–119 (2020)

Liu, M., Liu, G., Liu, Y., Jiao, Q.: Oracle bone inscriptions recognition based on deep convolutional neural network. Journal of image and graphics8(4), 114–119 (2020)

2020

[21] [21]

The Innovation (2026)

Liu, Y., Guan, H., Wang, P., Wang, X., Wan, J., Zhang, K., Zheng, H., Liu, X., Kuang, Z., Yang, H., et al.: Alphaoracle: Oracle bone script decipherment via human-workflow-inspired deep learning. The Innovation (2026)

2026

[22] [22]

Guangxi Education Press (11 2005) Title Suppressed Due to Excessive Length 13

Liu, Z., Zhang, D., Takashima, K.i., Zang, K., et al.: A Classified Collection of Oracle Bone Inscriptions with Modern Chinese and English Translations. Guangxi Education Press (11 2005) Title Suppressed Due to Excessive Length 13

2005

[23] [23]

GPT-4o System Card

OpenAI: GPT-4o system card. arXiv preprint arXiv:2410.21276 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[24] [24]

https://openai.com/index/introducing-gpt-5-2/ (2026), accessed 22 June 2026

OpenAI: Introducing GPT-5.2. https://openai.com/index/introducing-gpt-5-2/ (2026), accessed 22 June 2026

2026

[25] [25]

Qwen3 Technical Report

Qwen Team: Qwen3 technical report. arXiv preprint arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[26] [26]

https://huggingface.co/Qwen/Qwen3.5-9B (2026), accessed 22 June 2026

Qwen Team: Qwen3.5-9B model card. https://huggingface.co/Qwen/Qwen3.5-9B (2026), accessed 22 June 2026

2026

[27] [27]

https://qwen.ai/models (2026), ac- cessed 22 June 2026

Qwen Team: Qwen3.6 model documentation. https://qwen.ai/models (2026), ac- cessed 22 June 2026

2026

[28] [28]

Scientific Data11, 87 (2024)

Wang, M., Deng, W.: A dataset of oracle characters for benchmarking machine learning algorithms. Scientific Data11, 87 (2024)

2024

[29] [29]

Scientific Data11, 976 (2024)

Wang, P., Zhang, K., Wang, X., Han, S., Liu, Y., Wan, J., Guan, H., Kuang, Z., Jin, L., Bai, X., Liu, Y.: An open dataset for oracle bone script recognition and decipherment. Scientific Data11, 976 (2024)

2024

[30] [30]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Wang, W., Gao, Z., Gu, L., Pu, H., Cui, L., Wei, X., Liu, Z., Jing, L., Ye, S., Shao, J., et al.: InternVL3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency. arXiv preprint arXiv:2508.18265 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[31] [31]

ACM Journal on Computing and Cultural Heritage15(4), 1–20 (2022)

Yue, X., Li, H., Fujikawa, Y., Meng, L.: Dynamic dataset augmentation for deep learning-based oracle bone inscriptions recognition. ACM Journal on Computing and Cultural Heritage15(4), 1–20 (2022)

2022

[32] [32]

In: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Zhang, J., Li, R., Pang, H., Xia, D., Zhu, Z., Zhang, Q., Li, C., Yang, X.: Special- izing large models for oracle bone script interpretation via component-grounded multimodal knowledge augmentation. In: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 35221–35236 (2026)

2026

[33] [33]

In: Findings of the Association for Computational Linguistics: ACL 2026

Zhou, Z., Shi, D., Shi, L., Song, R., Qiu, P., Diao, X., Xu, H.: Acse: An ancient character semantic-aware embedding for large language models. In: Findings of the Association for Computational Linguistics: ACL 2026. pp. 9000–9012 (2026)

2026