pith. machine review for the scientific record. sign in

arxiv: 2604.12282 · v1 · submitted 2026-04-14 · 💻 cs.CL

Recognition: unknown

Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:28 UTC · model grok-4.3

classification 💻 cs.CL
keywords spreadsheet understandingmulti-agent frameworkmulti-format reasoningstructural sketchverification modulelarge language modelsreal-world applications
0
0 comments X

The pith

SpreadsheetAgent lets language models handle oversized spreadsheets by building verified structural summaries from localized multi-format inspections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Spreadsheets in business and science often exceed what language models can process in one go and lose layout details when flattened into text. The paper introduces SpreadsheetAgent, a two-stage multi-agent system that first examines small regions using code results, images, and formatted tables to create a structural sketch plus row and column summaries. A verification module then checks the sketch for accuracy before the solving stage performs task reasoning on the condensed representation. This setup targets real-world reliability by avoiding full-sheet loading while trying to retain the cues that plain-text approaches discard.

Core claim

SpreadsheetAgent is a two-stage multi-agent framework that constructs a structural sketch and row/column summaries from incremental localized inspections across code execution, images, and LaTeX tables, then applies task-driven reasoning over this intermediate form while using a verification module to validate structures and limit error propagation.

What carries the argument

The structural sketch and row/column summaries generated from localized multi-format inspections, which act as a compact intermediate representation that enables reasoning without loading the full spreadsheet.

If this is right

  • Enables processing of spreadsheets larger than typical language-model context windows by handling them region by region.
  • Limits error propagation in final answers through explicit verification of the extracted structural sketch.
  • Outperforms direct agent baselines on spreadsheet understanding benchmarks by preserving layout and visual information.
  • Applies to practical domains such as enterprise reporting, auditing, and scientific data management where scale and structure matter.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same localized inspection and verification pattern could apply to other document types that exceed context limits, such as long financial reports.
  • Adopting this approach might lower the compute needed for large-table tasks by skipping full-sheet loading.
  • Experiments on spreadsheets with formulas that link distant cells would test whether the summaries truly preserve all dependencies.

Load-bearing premise

The structural sketch and summaries assembled from localized multi-format inspections retain every task-relevant layout cue and formula dependency without loss that would only become visible in a complete sheet view.

What would settle it

A spreadsheet task whose solution depends on a global pattern or cross-sheet dependency visible only when the entire grid is examined at once, where the method fails while a full-context baseline succeeds.

Figures

Figures reproduced from arXiv: 2604.12282 by Haotian Hou, Hongsheng Li, Houxing Ren, Ke Wang, Mingjie Zhan, Yunqiao Yang, Zimu Lu.

Figure 1
Figure 1. Figure 1: Overview of the proposed multi-agent framework for spreadsheet understanding. An Extraction Agent [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Case 1: Partial view of the leave entries spreadsheet. [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Case 2: Partial view of the sales data spreadsheet. [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
read the original abstract

Spreadsheets are central to real-world applications such as enterprise reporting, auditing, and scientific data management. Despite their ubiquity, existing large language model based approaches typically treat tables as plain text, overlooking critical layout cues and visual semantics. Moreover, real-world spreadsheets are often massive in scale, exceeding the input length that LLMs can efficiently process. To address these challenges, we propose SpreadsheetAgent, a two-stage multi-agent framework for spreadsheet understanding that adopts a step-by-step reading and reasoning paradigm. Instead of loading the entire spreadsheet at once, SpreadsheetAgent incrementally interprets localized regions through multiple modalities, including code execution results, images, and LaTeX tables. The method first constructs a structural sketch and row/column summaries, and then performs task-driven reasoning over this intermediate representation in the Solving Stage. To further enhance reliability, we design a verification module that validates extracted structures via targeted inspections, reducing error propagation and ensuring trustworthy inputs for downstream reasoning. Extensive experiments on two spreadsheet datasets demonstrate the effectiveness of our approach. With GPT-OSS-120B, SpreadsheetAgent achieves 38.16% on Spreadsheet Bench, outperforming the ChatGPT Agent baseline (35.27%) by 2.89 absolute points. These results highlight the potential of SpreadsheetAgent to advance robust and scalable spreadsheet understanding in real-world applications. Code is available at https://github.com/renhouxing/SpreadsheetAgent.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes SpreadsheetAgent, a two-stage multi-agent framework for spreadsheet understanding. The first stage incrementally interprets localized spreadsheet regions via multi-format inspections (code execution results, images, and LaTeX tables) to build a structural sketch plus row/column summaries. The second stage performs task-driven reasoning over this intermediate representation, supported by a verification module that validates extracted structures through targeted inspections. Experiments on two datasets report that SpreadsheetAgent with GPT-OSS-120B reaches 38.16% on Spreadsheet Bench, outperforming the ChatGPT Agent baseline (35.27%) by 2.89 absolute points.

Significance. If the reported gains prove robust, the localized multi-modal, sketch-based approach could meaningfully advance scalable spreadsheet understanding for large real-world sheets that exceed LLM context limits while incorporating layout and visual cues. The public code release supports reproducibility. However, the absence of ablations, error analysis, or evaluation of the verification module weakens the ability to assess whether the gains reflect genuine robustness improvements rather than task-specific artifacts.

major comments (2)
  1. [Experimental evaluation] Experimental evaluation (results paragraph and any associated tables): the headline claim of a 2.89-point improvement (38.16% vs. 35.27%) on Spreadsheet Bench is presented without ablation studies isolating the contributions of the structural sketch, row/column summaries, multi-format inspections, or verification module; without error bars or statistical significance tests, it is impossible to determine whether the gain is reliable or attributable to the proposed components.
  2. [§3] §3 (method description of the two-stage pipeline): the structural sketch and row/column summaries are constructed from localized inspections; for tasks requiring detection of formula chains, merged-cell semantics, or layout patterns spanning multiple inspected regions, the verification module only checks consistency of extracted pieces and does not restore information absent from the partial views. If such dependencies are present in Spreadsheet Bench, the performance gain may be limited to a solvable subset rather than demonstrating general robustness.
minor comments (2)
  1. [Abstract and §3] The abstract and method sections refer to 'GPT-OSS-120B' and 'ChatGPT Agent' without clarifying whether these are the same underlying model family or distinct implementations; consistent terminology would aid comparison.
  2. [Abstract] The paper states 'Code is available at https://github.com/renhouxing/SpreadsheetAgent.git' but provides no details on the exact experimental setup (e.g., prompt templates, inspection region sizes, or verification criteria) that would allow full reproduction from the repository alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and constructive suggestions. We address the major comments point-by-point below, agreeing to incorporate ablations and clarifications through revisions to the manuscript.

read point-by-point responses
  1. Referee: [Experimental evaluation] Experimental evaluation (results paragraph and any associated tables): the headline claim of a 2.89-point improvement (38.16% vs. 35.27%) on Spreadsheet Bench is presented without ablation studies isolating the contributions of the structural sketch, row/column summaries, multi-format inspections, or verification module; without error bars or statistical significance tests, it is impossible to determine whether the gain is reliable or attributable to the proposed components.

    Authors: We agree with the referee that ablation studies and statistical analysis are essential to validate the contributions of our proposed components. In the revised manuscript, we will add a dedicated ablation study section that systematically removes or modifies each element (structural sketch, row/column summaries, multi-format inspections, and verification module) to quantify their individual impacts. Furthermore, we will rerun experiments multiple times to report mean performance with standard deviations (error bars) and conduct statistical significance tests, such as McNemar's test or paired t-tests, to assess whether the 2.89-point improvement is statistically significant. revision: yes

  2. Referee: [§3] §3 (method description of the two-stage pipeline): the structural sketch and row/column summaries are constructed from localized inspections; for tasks requiring detection of formula chains, merged-cell semantics, or layout patterns spanning multiple inspected regions, the verification module only checks consistency of extracted pieces and does not restore information absent from the partial views. If such dependencies are present in Spreadsheet Bench, the performance gain may be limited to a solvable subset rather than demonstrating general robustness.

    Authors: We thank the referee for pointing out this important aspect of our method. The verification module is intended to detect and correct inconsistencies in the extracted structural sketch by performing additional targeted inspections on specific regions. However, as noted, it does not inherently restore information that was never inspected if the dependency spans beyond the localized views. To address this, we will revise §3 to provide a clearer description of the verification process and its limitations. Additionally, we plan to include an error analysis in the experiments section, breaking down performance on tasks involving formula chains, merged cells, and layout patterns to determine the extent to which such cases impact overall results. This will help demonstrate whether the gains are general or limited to certain subsets. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical multi-agent framework evaluated on external benchmarks

full rationale

The paper describes a two-stage SpreadsheetAgent framework that builds structural sketches and summaries from localized multi-format inspections (code, image, LaTeX) before task-driven reasoning and verification. All reported results consist of accuracy numbers on Spreadsheet Bench and a second dataset, measured against an external ChatGPT Agent baseline. No equations, parameter fits, self-definitional loops, or load-bearing self-citations appear in the provided text; the performance claims rest on direct benchmark comparison rather than any internal reduction or renaming of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework rests on the domain assumption that localized multi-modal inspections plus summarization are sufficient to capture spreadsheet semantics; no free parameters or new physical entities are introduced.

axioms (1)
  • domain assumption Large language models can reliably extract and summarize structure from localized spreadsheet regions presented in code, image, and LaTeX formats.
    Invoked to justify the first-stage sketch construction and the claim that incremental reading avoids context-length problems.
invented entities (2)
  • SpreadsheetAgent no independent evidence
    purpose: Two-stage multi-agent framework that performs incremental multi-format reading and verification for spreadsheet tasks.
    The central proposed system; no independent evidence outside the paper is provided.
  • structural sketch and row/column summaries no independent evidence
    purpose: Compact intermediate representation used for downstream task-driven reasoning.
    Invented as the output of the first stage; no external validation of its completeness is shown.

pith-pipeline@v0.9.0 · 5565 in / 1410 out tokens · 31133 ms · 2026-05-10T15:28:24.040897+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 24 canonical work pages · 9 internal anchors

  1. [1]

    URL: " 'urlintro :=

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Lang Cao. 2025. https://doi.org/10.48550/ARXIV.2501.19378 Tablemaster: A recipe to advance table understanding with language models . CoRR, abs/2501.19378

  4. [4]

    Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. 2020. https://openreview.net/forum?id=rkeJRhNYDH Tabfact: A large-scale dataset for table-based fact verification . In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net

  5. [5]

    Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao, Hangyu Mao, and Fuzheng Zhang. 2025. https://doi.org/10.1145/3696410.3714962 Sheetagent: Towards a generalist agent for spreadsheet reasoning and manipulation via large language models . In Proceedings of the ACM on Web Conference 2025, WWW 2025, Sydney, NSW, Australia, 28 April ...

  6. [6]

    Zhoujun Cheng, Haoyu Dong, Zhiruo Wang, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian - Guang Lou, and Dongmei Zhang. 2022. https://doi.org/10.18653/V1/2022.ACL-LONG.78 Hitab: A hierarchical table dataset for question answering and natural language generation . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume...

  7. [7]

    DeepSeek - AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai...

  8. [8]

    Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Boyan Shi, Changyu Pang, Chenhui Zhang, Da Yin, Fan Yang, Guoqing Chen, Jiazheng Xu, Jiali Chen, Jing Chen, Jinhao Chen, Jingha...

  9. [9]

    Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, L \' e lio Renard Lavaud, Lucile Saulnier, Marie - Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak...

  10. [10]

    Rihui Jin, Zheyu Xin, Xing Xie, Zuoyi Li, Guilin Qi, Yongrui Chen, Xinbang Dai, Tongtong Wu, and Gholamreza Haffari. 2025. https://doi.org/10.48550/ARXIV.2506.06137 Table-r1: Self-supervised and reinforcement learning for program-based table reasoning in small language models . CoRR, abs/2506.06137

  11. [11]

    Thomas Joshi, Herman Saini, Neil Dhillon, Antoni Viros i Martin, and Kaoutar El Maghraoui. 2025. https://doi.org/10.48550/ARXIV.2506.07311 Paged attention meets flexattention: Unlocking long-context efficiency in deployed inference . CoRR, abs/2506.07311

  12. [12]

    Chemmengath, Vishwajeet Kumar, Samarth Bharadwaj, Mustafa Canim, Michael R

    Yannis Katsis, Saneem A. Chemmengath, Vishwajeet Kumar, Samarth Bharadwaj, Mustafa Canim, Michael R. Glass, Alfio Gliozzo, Feifei Pan, Jaydeep Sen, Karthik Sankaranarayanan, and Soumen Chakrabarti. 2022. https://doi.org/10.18653/V1/2022.NAACL-INDUSTRY.34 AIT-QA: question answering dataset over complex tables in the airline industry . In Proceedings of the...

  13. [13]

    Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, and Zhaoxiang Zhang. 2023. http://papers.nips.cc/paper\_files/paper/2023/hash/0ff30c4bf31db0119a6219e0d250e037-Abstract-Conference.html Sheetcopilot: Bringing software productivity to the next level through large language models . In Advances in Neural Information Processing Systems 36: Annual Conference on Ne...

  14. [14]

    Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, and Surajit Chaudhuri. 2024. https://doi.org/10.1145/3654979 Table-gpt: Table fine-tuned GPT for diverse table tasks . Proc. ACM Manag. Data , 2(3):176

  15. [15]

    Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, and Jie Tang. 2024. http://papers.nips.cc/paper\_files/paper/2024/hash/ac840df270ac537dd74530a15c332684-Abstract-Datasets\_and\_Benchmarks\_Track.html Spreadsheetbench: Towards challenging real world spreadsheet manipulation . In Advances in Neural Information ...

  16. [16]

    OpenAI. 2023. https://doi.org/10.48550/ARXIV.2303.08774 GPT-4 technical report . CoRR, abs/2303.08774

  17. [17]

    Panupong Pasupat and Percy Liang. 2015. https://doi.org/10.3115/V1/P15-1142 Compositional semantic parsing on semi-structured tables . In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, A...

  18. [18]

    Aofeng Su, Aowen Wang, Chao Ye, Chen Zhou, Ga Zhang, Gang Chen, Guangcheng Zhu, Haobo Wang, Haokai Xu, Hao Chen, Haoze Li, Haoxuan Lan, Jiaming Tian, Jing Yuan, Junbo Zhao, Junlin Zhou, Kaizhe Shou, Liangyu Zha, Lin Long, Liyao Li, Pengzuo Wu, Qi Zhang, Qingyi Huang, Saisai Yang, Tao Zhang, Wentao Ye, Wufang Zhu, Xiaomeng Hu, Xijun Gu, Xinjie Sun, Xiang L...

  19. [19]

    Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, and Dongmei Zhang. 2024. https://doi.org/10.1145/3616855.3635752 Table meets LLM: can large language models understand structured table data? A benchmark and empirical study . In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, WSDM 2024, Merida, Mexico, March 4-8, 2024 , pag...

  20. [20]

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton - Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Har...

  21. [21]

    Zilong Wang, Hao Zhang, Chun - Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen - Yu Lee, and Tomas Pfister. 2024. https://openreview.net/forum?id=4L0xnS4GQM Chain-of-table: Evolving tables in the reasoning chain for table understanding . In The Twelfth International Conference on Learni...

  22. [22]

    Pengzuo Wu, Yuhang Yang, Guangcheng Zhu, Chao Ye, Hong Gu, Xu Lu, Ruixuan Xiao, Bowen Bao, Yijing He, Liangyu Zha, Wentao Ye, Junbo Zhao, and Haobo Wang. 2025 a . https://aclanthology.org/2025.findings-acl.371/ Realhitbench: A comprehensive realistic hierarchical table benchmark for evaluating llm-based table analysis . In Findings of the Association for ...

  23. [23]

    Zhenhe Wu, Jian Yang, Jiaheng Liu, Xianjie Wu, Changzai Pan, Jie Zhang, Yu Zhao, Shuangyong Song, Yongxiang Li, and Zhoujun Li. 2025 b . https://doi.org/10.48550/ARXIV.2505.12415 Table-r1: Region-based reinforcement learning for table understanding . CoRR, abs/2505.12415

  24. [24]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jian Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Liangha...

  25. [25]

    An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...

  26. [26]

    Xuanliang Zhang, Dingzirui Wang, Keyan Xu, Qingfu Zhu, and Wanxiang Che. 2025. https://doi.org/10.48550/ARXIV.2505.15110 Rot: Enhancing table reasoning with iterative row-wise traversals . CoRR, abs/2505.15110

  27. [27]

    Yunjia Zhang, Jordan Henkel, Avrilia Floratou, Joyce Cahoon, Shaleen Deep, and Jignesh M. Patel. 2024. https://doi.org/10.14778/3659437.3659452 Reactable: Enhancing react for table question answering . Proc. VLDB Endow. , 17(8):1981--1994

  28. [28]

    Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Binghong Wu, Lei Liao, Shu Wei, Yongjie Ye, Hao Liu, Wengang Zhou, Houqiang Li, and Can Huang. 2024 a . http://papers.nips.cc/paper\_files/paper/2024/hash/0d97fe65d7a1dc12a05642d9fa4cd578-Abstract-Conference.html Tabpedia: Towards comprehensive visual table understanding with concept synergy . In Advances in N...

  29. [29]

    Yilun Zhao, Lyuhao Chen, Arman Cohan, and Chen Zhao. 2024 b . https://doi.org/10.18653/V1/2024.ACL-LONG.692 Tapera: Enhancing faithfulness and interpretability in long-form table QA by content planning and execution-based reasoning . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 202...

  30. [30]

    Mingyu Zheng, Xinwei Feng, Qingyi Si, Qiaoqiao She, Zheng Lin, Wenbin Jiang, and Weiping Wang. 2024. https://doi.org/10.18653/V1/2024.ACL-LONG.493 Multimodal table understanding . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024 , pages 9102-...

  31. [31]

    Bangbang Zhou, Zuan Gao, Zixiao Wang, Boqiang Zhang, Yuxin Wang, Zhineng Chen, and Hongtao Xie. 2025. https://doi.org/10.1109/CVPR52734.2025.02309 Syntab-llava: Enhancing multimodal table understanding with decoupled synthesis . In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025 , pages 24796...

  32. [32]

    Ruiyan Zhu, Xi Cheng, Ke Liu, Brian Zhu, Daniel Jin, Neeraj Parihar, Zhoutian Xu, and Oliver Gao. 2025. https://doi.org/10.48550/ARXIV.2506.12339 Sheetmind: An end-to-end llm-powered multi-agent framework for spreadsheet automation . CoRR, abs/2506.12339