pith. sign in

arxiv: 2605.17214 · v1 · pith:2XH3MDO2new · submitted 2026-05-17 · 💻 cs.AI · cs.CL· cs.CV

ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding

Pith reviewed 2026-05-20 13:46 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.CV
keywords chemical reaction diagramslarge language modelsvisual understandingsemantic alignmentmolecular graph recognitionfunctional group detectionOCRD-Bench
0
0 comments X

The pith

ChemVA framework bridges visual and semantic gaps so LLMs can accurately read chemical reaction diagrams and reason about them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current large language models have trouble interpreting chemical reaction diagrams because their vision components cannot reliably track the exact connections between atoms in crowded molecular structures, and standard text representations like SMILES strings do not reliably trigger the models' stored chemical knowledge. The paper proposes the ChemVA framework to fix both problems at once. It first uses a Visual Anchor to locate functional groups at both coarse and fine scales, then converts those detected visual features into familiar entity names that better activate the model's reasoning capabilities. This matters if true because many chemistry tasks depend on reading diagrams rather than text alone, and closing the gap would let smaller open models perform closer to large proprietary systems on realistic scientific problems. The authors introduce a new benchmark called OCRD-Bench that tests the full pipeline from diagram recognition through to reaction reasoning and report strong gains on it.

Core claim

The central claim is that the Visual Anchor mechanism with hybrid-granularity detection grounds functional groups in reaction diagrams and that subsequent semantic alignment to entity names overcomes both the visual deficit in resolving topological connectivity and the semantic disconnect in activating chemical knowledge, yielding 92.0 percent structural recognition accuracy and an approximately 20 percentage point performance increase across nine different LLMs on the OCRD-Bench dataset.

What carries the argument

The Visual Anchor mechanism, which detects functional groups at multiple levels of detail and translates the resulting visual features into entity names for semantic alignment with the language model.

If this is right

  • Open-weight LLMs reach performance levels comparable to proprietary systems on complex chemical reasoning tasks.
  • Structural recognition accuracy on molecular diagrams reaches 92 percent when the visual and semantic components are aligned.
  • A single framework can address both poor diagram parsing and weak knowledge activation without separate fine-tuning for each.
  • The new OCRD-Bench dataset supports end-to-end evaluation from visual recognition through multi-step reasoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar visual-anchoring techniques could be adapted to help models read other types of scientific diagrams that contain dense connectivity, such as reaction networks in biology.
  • If the semantic alignment step generalizes, it might reduce reliance on string-based inputs like SMILES for chemistry-related queries.
  • The reported gains across nine models suggest the method could serve as a lightweight addition to existing multimodal pipelines rather than requiring full retraining.

Load-bearing premise

The hybrid-granularity detection step correctly identifies strict topological connections in dense molecular graphs without introducing errors that affect later reasoning steps.

What would settle it

Running ChemVA on a collection of dense reaction diagrams and finding either no accuracy improvement or a drop relative to baseline vision-language models on the same reasoning tasks would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.17214 by Hao Yu, Huajun Chen, Jiangzhen Fu, Kehua Feng, Keyan Ding, Mingyang Rao, Zhihui Zhu.

Figure 1
Figure 1. Figure 1: Overview of limitations in LLM-based chemical reac [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the ChemVA framework. Stage 1: Reaction Diagram Parsing. (a) Reaction Diagram Deconstruction: FG-VLM [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Superiority of FG-VLM over RxnScribe. The chart [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Effectiveness of Semantic Activation. The stacked bar [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
read the original abstract

While Large Language Models (LLMs) have revolutionized scientific text processing, they exhibit a significant capability gap when interpreting chemical reaction diagrams. We identify two fundamental bottlenecks restricting current systems: a Visual Deficit, where generic vision encoders struggle to resolve the strict topological connectivity of dense molecular graphs, and a Semantic Disconnect, where standard linear strings, such as SMILES, fail to effectively activate the model's latent chemical reasoning. To bridge these gaps, we propose the Chemical Visual Activation (ChemVA) framework, which employs a Visual Anchor mechanism to ground functional groups via hybrid-granularity detection, followed by a semantic alignment approach that translates visual features into entity names to maximize knowledge activation in LLMs. We evaluate our approach on OCRD-Bench, a newly constructed dataset featuring dense visual-semantic contexts and comprehensive reaction coverage to evaluate the full spectrum from recognition to reasoning. Extensive experiments on OCRD-Bench demonstrate that ChemVA achieves 92.0% structural recognition accuracy. By bridging visual and semantic bottlenecks, our framework delivers a consistent performance gain of approximately 20 percentage points across 9 diverse LLMs, enabling open-weight models to rival proprietary SOTA systems in complex chemical reasoning tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the ChemVA framework to improve LLMs' interpretation of chemical reaction diagrams. It identifies a Visual Deficit in generic vision encoders for resolving topological connectivity in dense molecular graphs and a Semantic Disconnect with linear representations like SMILES. The proposed solution uses a Visual Anchor with hybrid-granularity detection to ground functional groups, followed by semantic alignment to entity names for better knowledge activation. A new OCRD-Bench dataset is introduced for evaluating recognition to reasoning. Experiments report 92.0% structural recognition accuracy and consistent ~20 percentage point gains across 9 LLMs, allowing open-weight models to approach proprietary SOTA performance.

Significance. If the empirical gains hold under rigorous verification, the work could meaningfully advance multimodal chemical reasoning by providing a practical way to ground LLMs in visual molecular structures. The new OCRD-Bench dataset with dense visual-semantic contexts is a useful contribution for the community. The approach of combining hybrid detection with semantic translation has potential applicability beyond chemistry to other diagram-heavy scientific domains.

major comments (2)
  1. [§4.2] §4 Experiments and §4.2 Results: The central claim of ~20pp gains across LLMs rests on the Visual Anchor correctly resolving strict topological connectivity (bonds, rings, attachments) without systematic errors in dense graphs. However, the reported 92.0% aggregate structural recognition accuracy on OCRD-Bench supplies no stratified error analysis by graph density or complexity, no direct comparison to ground-truth molecular graphs, and no ablation isolating the hybrid-granularity detection from generic vision encoders or prompting variations. If connectivity misdetections concentrate in the complex diagrams driving the reasoning tasks, they could inflate the observed LLM improvements.
  2. [§3.2] §3.2 Visual Anchor mechanism: The description of hybrid-granularity detection does not include quantitative validation (e.g., precision/recall on bond detection or ring closure in high-density subgraphs) or error propagation analysis to downstream reasoning steps. This is load-bearing for the claim that the mechanism bridges the visual deficit without introducing undetected topological errors.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from explicit dataset statistics (e.g., number of diagrams, average atoms/bonds per diagram, distribution of reaction types) to contextualize the 92% accuracy figure.
  2. [§3.3] Notation for the semantic alignment step could be clarified with a short pseudocode or equation showing how visual features map to entity names.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments, which help us strengthen the presentation of our work. We address each major comment below and will revise the manuscript to incorporate additional analyses where this improves rigor without altering the core claims.

read point-by-point responses
  1. Referee: [§4.2] §4 Experiments and §4.2 Results: The central claim of ~20pp gains across LLMs rests on the Visual Anchor correctly resolving strict topological connectivity (bonds, rings, attachments) without systematic errors in dense graphs. However, the reported 92.0% aggregate structural recognition accuracy on OCRD-Bench supplies no stratified error analysis by graph density or complexity, no direct comparison to ground-truth molecular graphs, and no ablation isolating the hybrid-granularity detection from generic vision encoders or prompting variations. If connectivity misdetections concentrate in the complex diagrams driving the reasoning tasks, they could inflate the observed LLM improvements.

    Authors: We appreciate the referee's emphasis on verifying that the reported gains stem from reliable topological resolution rather than undetected errors in complex cases. The manuscript presents the 92.0% structural recognition accuracy as an aggregate metric on OCRD-Bench together with consistent end-to-end gains across nine LLMs. To directly address the concern, we will revise §4 to include a stratified breakdown of recognition accuracy by graph density and complexity, an explicit comparison of extracted structures against ground-truth molecular graphs, and an ablation isolating the hybrid-granularity detection from standard vision encoders and prompting variations. These additions will clarify the source of the improvements. revision: yes

  2. Referee: [§3.2] §3.2 Visual Anchor mechanism: The description of hybrid-granularity detection does not include quantitative validation (e.g., precision/recall on bond detection or ring closure in high-density subgraphs) or error propagation analysis to downstream reasoning steps. This is load-bearing for the claim that the mechanism bridges the visual deficit without introducing undetected topological errors.

    Authors: We agree that quantitative validation of the hybrid-granularity detection would strengthen the mechanistic claims. The current §3.2 describes the design rationale for combining fine- and coarse-grained anchors to resolve connectivity. In the revision we will add precision and recall figures for bond detection and ring closure evaluated on high-density subgraphs drawn from OCRD-Bench, together with a concise error-propagation analysis tracing detection errors through semantic alignment to final reasoning accuracy. These results will be placed in §3.2 and the appendix. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical framework evaluated on new benchmark

full rationale

The paper describes an empirical approach: it identifies visual and semantic bottlenecks in LLMs for chemical diagrams, proposes the ChemVA framework using a Visual Anchor for hybrid-granularity detection and semantic alignment to entity names, constructs OCRD-Bench, and reports 92% structural recognition plus ~20pp gains across LLMs. No equations, parameter fits presented as predictions, self-citations as load-bearing premises, or uniqueness theorems appear in the text. All central claims reduce to experimental measurements on the introduced dataset rather than reducing by construction to prior inputs or definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract introduces no mathematical derivations, free parameters, or new physical entities; the framework is described purely in terms of existing vision and language components.

pith-pipeline@v0.9.0 · 5757 in / 1081 out tokens · 34026 ms · 2026-05-20T13:46:01.164380+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 3 internal anchors

  1. [1]

    Benchmarking MLLMs on Topological Reasoning of Chemical Reaction Diagrams.OpenReview/arXiv Submission 1142(2025)

    2025. Benchmarking MLLMs on Topological Reasoning of Chemical Reaction Diagrams.OpenReview/arXiv Submission 1142(2025)

  2. [2]

    Evaluating the Accuracy and Educational Potential of Generative AI Models in Pharmacy Education: A Comparative Analysis of ChatGPT and Gemini Across Bloom’s Taxonomy.Pharmacy(2025)

    2025. Evaluating the Accuracy and Educational Potential of Generative AI Models in Pharmacy Education: A Comparative Analysis of ChatGPT and Gemini Across Bloom’s Taxonomy.Pharmacy(2025)

  3. [3]

    MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs.Under Review at ICLR 2026(2025)

    2025. MolecularIQ: Characterizing Chemical Reasoning Capabilities Through Symbolic Verification on Molecular Graphs.Under Review at ICLR 2026(2025)

  4. [4]

    Mandal, P

    Nawaf Alampara, I. Mandal, P. Khetarpal, H. S. Grover, et al. 2024. MaCBench: A multimodal chemistry and materials science benchmark. InNeurIPS 2024 Workshop AI for Materials

  5. [5]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al . 2025. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923(2025)

  6. [6]

    Daniil A Boiko, Robert MacKnight, Gabriel Gomes, et al . 2023. Autonomous chemical research with large language models.Nature624, 7992 (2023), 570–578

  7. [7]

    Andres M Bran et al. 2025. Chemical reasoning in LLMs unlocks steerable synthe- sis planning and reaction mechanism elucidation.arXiv preprint arXiv:2503.08537 (2025)

  8. [8]

    Kexin Chen, Yuyang Du, Junyou Li, Hanqun Cao, Menghao Guo, Xilin Dang, Lanqing Li, Jiezhong Qiu, Guangyong Chen, and Pheng Ann Heng. 2025. Chem- Miner: A Large Language Model Agent System for Chemical Literature Data Mining. InProceedings of the IEEE/CVF International Conference on Computer Vision. 7595–7603

  9. [9]

    Djork-Arné Clevert, Tuan Le, Robin Winter, and Floriane Montanari. 2021. Img2Mol - Accurate Molecular Structure Estimation from Images.Chemical Science12, 42 (2021), 14174–14181

  10. [10]

    Y Diao et al. 2023. MacFrag: Segmenting large-scale molecules to obtain diverse fragments.Bioinformatics39 (2023)

  11. [11]

    Carl Edwards et al. 2022. Translation between Molecules and Natural Language. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

  12. [12]

    Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W Coley, and Regina Barzilay. 2024. OpenChemIE: An information extraction toolkit for chemistry literature.Journal of Chemical Information and Modeling64, 14 (2024), 5521– 5534

  13. [13]

    Fernand Gobet et al. 2001. Chunking mechanisms in human learning.Trends in Cognitive Sciences5, 6 (2001), 236–243

  14. [14]

    Yu Gu and Zhi Liang. 2025. MolRAG: Unlocking the Power of LLMs for Molecular Property Prediction. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

  15. [15]

    Chawla, Olaf Wiest, and Xiangliang Zhang

    Taicheng Guo, Kehan Guo, Bozhao Nan, Zhenwen Liang, Zhichun Guo, Nitesh V . Chawla, Olaf Wiest, and Xiangliang Zhang. 2023. What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks. InAdvances in Neural Information Processing Systems (NeurIPS), V ol. 36

  16. [16]

    Stephen R Heller, Alan McNaught, Igor Pletnev, Stephen Stein, and Dmitrii Tchekhovskoi. 2015. InChI, the IUPAC international chemical identifier.Journal of cheminformatics7, 1 (2015), 23

  17. [17]

    Steven M Kearnes, Michael R Maser, Michael Wleklinski, Anton Kast, Abigail G Doyle, Spencer D Dreher, Joel M Hawkins, Klavs F Jensen, and Connor W Coley

  18. [18]

    The open reaction database.Journal of the American Chemical Society143, 45 (2021), 18820–18826

  19. [19]

    Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, et al . 2023. PubChem 2023 update.Nucleic Acids Research51, D1 (2023), D1373–D1380

  20. [20]

    Greg Landrum et al. 2013. RDKit: Open-source cheminformatics. http://www. rdkit.org. Accessed: 2025-05-20

  21. [21]

    Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, et al. 2025. Chemvlm: Exploring the power of multimodal large language models in chemistry area. InProceedings of the AAAI Conference on Artificial Intelligence, V ol. 39. 415–423

  22. [22]

    Junxian Li, Di Zhang, Dongzhan Zhou, et al. 2024. ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area.arXiv preprint arXiv:2408.07246(2024)

  23. [23]

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. InEuropean conference on computer vision. Springer, 740– 755

  24. [24]

    Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan. 2022. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering. In The 36th Conference on Neural Information Processing Systems (NeurIPS)

  25. [25]

    Lucas Morin, Martin Danelljan, Miguel I Agea, et al. 2023. MolGrapher: Graph- based Visual Recognition of Chemical Structures. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 19552–19561

  26. [26]

    Martijn Oldenhof, Adam Arany, Yves Moreau, and Jaak Simm. 2021. Self-labeling of fully mediating representations by graph alignment. InBenelux Conference on Artificial Intelligence. Springer, 46–65

  27. [27]

    Yujie Qian, Jiang Guo, Regina Barzilay, and Connor Coley. 2023. RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing. InJournal of Chemical Information and Modeling, V ol. 63. ACS Publications, 4030–4041

  28. [28]

    Coley, and Regina Barzilay

    Yujie Qian, Jiang Guo, Zhengkai Tu, Zhening Li, Connor W. Coley, and Regina Barzilay. 2023. MolScribe: Robust Molecular Structure Recognition with Image- to-Graph Generation.Journal of Chemical Information and Modeling63, 18 (2023), 5833–5844

  29. [29]

    Coley, and Regina Barzilay

    Yujie Qian, Jiang Guo, Zhengkai Tu, Zhening Li, Connor W. Coley, and Regina Barzilay. 2024. RxnScribe: A Unified Framework for Chemical Reaction Diagram Parsing.arXiv preprint arXiv:2305.11845(2024)

  30. [30]

    K Rajan, H O Brinkhaus, M I Agea, A Zielesny, and C Steinbeck. 2023. DEC- IMER.ai: An open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.Nature communications 14, 1 (2023), 5045

  31. [31]

    LG Research, Sehyun Chun, Jiye Kim, Ahra Jo, Yeonsik Jo, Seungyul Oh, Seungjun Lee, Kwangrok Ryoo, Jongmin Lee, Seung Hwan Kim, et al

  32. [32]

    MolMole: Molecule Mining from Scientific Literature.arXiv preprint arXiv:2505.03777(2025)

  33. [33]

    Nicholas T Runcie et al. 2025. Can Reasoning Power Significantly Improve the Knowledge of Large Language Models for Chemistry?Journal of Chemical Information and Modeling(2025)

  34. [34]

    Runcie, Charlotte M

    Nicholas T. Runcie, Charlotte M. Deane, and Fergus Imrie. 2025. ChemIQ: A Benchmark for Chemical Reasoning and Molecular Comprehension.arXiv preprint arXiv:2505.07735(2025)

  35. [35]

    Christof Schütt, Kohulan Rajan, Achim Zielesny, and Christoph Steinbeck. 2020. DECIMER 1.0: Deep Learning for Chemical Image Recognition Using Trans- formers.Journal of Chemical Information and Modeling60 (2020), 5359–5372. Also published in J. Cheminf. as separate work, please verify specific citation

  36. [36]

    Ayush Kumar Shah, Abhisek Dey, Leo Luo, Bryan Amador, Patrick Philippy, Ming Zhong, Siru Ouyang, David Mark Friday, David Bianchi, Nick Jackson, et al

  37. [37]

    InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

    Multimodal Search in Chemical Documents and Reactions. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 4030–4034

  38. [38]

    Joshua Staker, Kyle Marshall, Robert Abel, and Carolyn M McQuaw. 2019. Molec- ular structure extraction from documents using deep learning.Journal of chemical information and modeling59, 3 (2019), 1017–1029

  39. [39]

    Jingchao Wang, Haote Yang, Jiang Wu, Yifan He, Xingjian Wei, Yinfan Wang, Chengjin Liu, Lingli Ge, Lijun Wu, Bin Wang, et al . 2025. GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition.arXiv preprint arXiv:2506.07553(2025)

  40. [40]

    Xiaoxuan Wang, Yanqiao Zhu, Zemand Liu, et al. 2024. SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. In International Conference on Machine Learning (ICML)

  41. [41]

    Damian M Wilary and Jacqueline M Cole. 2021. ReactionDataExtractor: A tool for automated extraction of information from chemical reaction schemes.Journal of chemical information and modeling61, 10 (2021), 4962–4974

  42. [42]

    Damian M Wilary and Jacqueline M Cole. 2023. ReactionDataExtractor 2.0: A deep learning approach for data extraction from chemical reaction schemes. Journal of Chemical Information and Modeling63, 19 (2023), 6053–6067

  43. [43]

    Ruiling Xu, Yifan Zhang, Qingyun Wang, Carl Edwards, and Heng Ji. 2025. oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning. arXiv preprint arXiv:2510.07731

  44. [44]

    Zhaoning Yu, Xiangyang Xu, and Hongyang Gao. 2024. G2t-llm: Graph-to-tree text encoding for molecule generation with fine-tuned large language models. arXiv preprint arXiv:2410.02198(2024)

  45. [45]

    Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, et al. 2024. Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9556–9567

  46. [46]

    Di Zhang, Wei Liu, et al. 2024. ChemLLM: A Chemical Large Language Model. arXiv preprint arXiv:2402.06852(2024)

  47. [47]

    Shuai Zhang, Wei Liu, et al. 2024. Igniting the Power of Large Language Models for Chemistry: A Systematic Survey.arXiv preprint arXiv:2401.14656(2024)

  48. [48]

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P Xing, et al . 2023. Judging llm-as-a-judge with mt-bench and chatbot arena.arXiv preprint arXiv:2306.05685(2023)

  49. [49]

    Reactant

    Jiaxi Zhuang, Kangning Li, Jue Hou, Mingjun Xu, Zhifeng Gao, and Hengxing Cai. 2025. Doc2SAR: A Synergistic Framework for High-Fidelity Extraction of Structure-Activity Relationships from Scientific Documents.arXiv preprint arXiv:2506.21625(2025). 9 Preprint, 2026, Rao et al. A1 Implementation Details: Prompts and Instruction Tuning A1.1 Reaction Diagram ...

  50. [50]

    ‘json). [ {

    - In disconnected layouts (multiple independent reactions in one image), treat them as separate Reaction IDs. ### Output Format (STRICT) ### Respond with a valid JSON list of objects. Do not include markdown code blocks (“‘json). [ {"reaction_id": 1, "role": "Reactant", "bbox": [x1, y1, x2, y2]}, {"reaction_id": 1, "role": "Arrow", "bbox": [x1, y1, x2, y2...

  51. [51]

    residual

    Finally, determine the connectivity (Bonds) between all iden- tified nodes. User Prompt ### Task Description ### Analyze the molecular image and generate a structured JSON representation containing supernodes, atoms, and bonds. ### Decomposition Constraints (CRITICAL) ### 1.Visual Priority (Top-Down): -Rule: Prioritize the detection of Functional Group Pa...

  52. [52]

    images": [ {

    Return the center coordinates [x, y] of these anchor atoms (normalized 0-1000). ### Output Format (STRICT) ### Respond with a JSON object containing a single list of coordi- nates. { "anchors": [ [x1, y1], [x2, y2] ] } ### Input Image ### {cropped_molecule_image} Now, locate the anchors for the{target_label}at{tar- get_bbox}: A2 Data Construction Details ...

  53. [53]

    High-complexity groups (e.g., Carboxyl −COOH, Amide −CONH2) are assigned higher matching priority than their constituents (e.g., Carbonyl 𝐶=𝑂 , Hydroxyl−𝑂𝐻)

    Priority Hierarchy Definition.We constructed a hierarchical dictionary where functional groups are ranked by heavy atom count, topological complexity, and semantic weight. High-complexity groups (e.g., Carboxyl −COOH, Amide −CONH2) are assigned higher matching priority than their constituents (e.g., Carbonyl 𝐶=𝑂 , Hydroxyl−𝑂𝐻)

  54. [54]

    super-node

    Recursive Matching with Exclusivity.For each SMILES string, we perform recursive substructure matching using RDKit [19]. Cru- cially, we enforce anAtom-wise Exclusivity Constraint: once an atom is assigned to a high-priority "super-node" (e.g., the Carbon in −COOH), it is locked and explicitly excluded from subsequent scans. This prevents the redundant la...

  55. [55]

    Thishybrid-granularityapproach ensures that the model captures both high-level functional semantics and low-level structural details

    Residual Atom Handling.After the greedy matching process, any remaining atoms (typically satisfying the saturated alkane skele- ton) are retained as atomic tokens. Thishybrid-granularityapproach ensures that the model captures both high-level functional semantics and low-level structural details. We then calculate the 2D bound- ing box and the precise anc...

  56. [56]

    Carboxylic Anhydride

  57. [57]

    Hemiacetal/Hemiketal

  58. [58]

    Sulfo (Sulfonic Acid)

  59. [59]

    Identify the substrate (Benzene) and reagent (Chloroethane) in the image and encode them into SMILES

    Halo A3 OCRD-Bench Framework: Design and Metrics To comprehensively evaluate Multimodal Large Language Models (MLLMs) on organic chemistry reasoning, we designedOCRD- Bench, a hierarchical benchmark covering 8 major reaction cat- egories (see Figure A2). The evaluation is structured into three cognitive tiers, ranging from visual perception to deep mechan...