pith. sign in

arxiv: 2606.23997 · v1 · pith:DB23CDOKnew · submitted 2026-06-22 · 💻 cs.IR

ChartWalker: Benchmarking the Cross-Chart RAG Task with Hierarchical Knowledge Graphs

Pith reviewed 2026-06-26 06:00 UTC · model grok-4.3

classification 💻 cs.IR
keywords cross-chart RAGhierarchical knowledge graphbenchmark constructionmulti-hop reasoningretrieval-augmented generationQA generationchart analysisstructure-aware sampling
0
0 comments X

The pith

ChartWalker uses hierarchical knowledge graphs to build cross-chart RAG benchmarks with controlled multi-hop questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Cross-chart retrieval-augmented generation requires systems to connect information across multiple charts for complex analysis in science, business, and politics. Existing benchmarks often create questions by extracting key points, which creates easy word matches or broken logic chains. ChartWalker instead builds a hierarchical knowledge graph for charts that keeps entities and relations organized by level of detail. A structure-aware sampling step then draws multi-hop paths through these graphs to form questions with adjustable difficulty. The resulting benchmark exposes large gaps in how current RAG methods handle realistic analytical tasks.

Core claim

ChartWalker features a hierarchical knowledge graph construction method tailored to charts, which organizes entities and relations by granularity to preserve analytical structure. We then propose a structure-aware sampling algorithm that synthesizes semantically coherent, multi-hop reasoning paths, enabling explicit control over query difficulty and granularity for QA generation. Built with this framework, we release ChartWalker-Bench, a comprehensive benchmark spanning diverse domains and cross-chart query types. Extensive evaluations across major RAG paradigms reveal significant performance gaps, underscoring the benchmark's difficulty and utility. Furthermore, we provide ChartWalker-Agent

What carries the argument

The hierarchical knowledge graph tailored to charts that organizes entities and relations by granularity, together with the structure-aware sampling algorithm that creates multi-hop reasoning paths.

Load-bearing premise

The hierarchical knowledge graph construction and structure-aware sampling produce semantically coherent, logically consistent reasoning chains without lexical overlap between queries and evidence.

What would settle it

Manual inspection of the generated questions revealing either high lexical overlap with chart text or reasoning steps that do not follow logically from the sampled paths would show the method fails to create challenging benchmarks.

Figures

Figures reproduced from arXiv: 2606.23997 by Chenghan Xie, Hanyang Yuan, Hua Zhou, Jiarong Xu, Ning Tang, Qian Kou, Renhong Huang, Xiaofeng Shi, Yi Li.

Figure 1
Figure 1. Figure 1: Compared to (a) concatenating isolated key statistics and prompting a VLM to synthesize cross-chart questions, our hier￾archical KG (b) explicitly represents entities with their structural relations. Conditioning question generation on these structural paths makes entity dependencies clear and reduces the hallucina￾tion of incompatible subjects. nomic cycles. Recent advances in fundamental Vision Language … view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the ChartWalker framework. Given a large chart corpus, a VLM first extracts entities and relations from each chart to build per-chart hierarchical knowledge graphs, where entities are organized by granularity levels. Then, identical entities are merged layer-wise to form a global hierarchical KG over the entire corpus. On top of this hierarchy, we perform structure-aware path sampling to co… view at source ↗
Figure 3
Figure 3. Figure 3: Unified Prompt Template for Multi-Chart QA Generation. The system shares a common context and output format, but branches into four distinct modules (A-D) with specific logic, constraints, and paraphrasing requirements depending on the desired query type. A.4. Agent Environment Following the multi-turn VLM-agent training paradigm, we model visual search on the global chart KG as a partially observable Mark… view at source ↗
Figure 4
Figure 4. Figure 4: The ChartWalker Agent Prompt Structure. The prompt guides the agent through three distinct phases: (1) Selecting a start entity, (2) An iterative navigation loop involving edge searching and entity traversal, and (3) A termination phase to output the final answer. A.5. Showcase 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Manipulation showcase. Due to the relation of Democrats and Catholic Democrats being highlighted in the sampled path, our problem does not contain the logical inconsistency shown in (Lu et al., 2025) ( [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Wrong case 18 [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Complex Reasoning showcase. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Factcheck showcase. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Analysis showcase 21 [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
read the original abstract

Cross-Chart Retrieval-Augmented Generation (RAG) is critical for complex multi-modal analytical tasks in scientific, business, and political domains. However, existing benchmarks either focus on tables, which are well-structured and textualized, or generate cross-chart questions by simply extracting key points, which often induces lexical overlap between queries and evidence and yields logically inconsistent reasoning chains. To address this, we introduce ChartWalker, a novel framework for constructing challenging cross-chart RAG tasks. ChartWalker features a hierarchical knowledge graph construction method tailored to charts, which organizes entities and relations by granularity to preserve analytical structure. We then propose a structure-aware sampling algorithm that synthesizes semantically coherent, multi-hop reasoning paths, enabling explicit control over query difficulty and granularity for QA generation. Built with this framework, we release ChartWalker-Bench, a comprehensive benchmark spanning diverse domains and cross-chart query types. Extensive evaluations across major RAG paradigms reveal significant performance gaps, underscoring the benchmark's difficulty and utility. Furthermore, we provide ChartWalker-Agent, an agentic baseline to facilitate analysis and inspire future system design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces ChartWalker, a framework for constructing cross-chart RAG benchmarks. It proposes a hierarchical knowledge graph construction method that organizes chart entities and relations by granularity, paired with a structure-aware sampling algorithm to synthesize multi-hop reasoning paths for QA generation. This yields ChartWalker-Bench, spanning domains and query types, which is used to evaluate major RAG paradigms (revealing performance gaps) and to provide an agentic baseline (ChartWalker-Agent).

Significance. If the hierarchical KG construction and sampling reliably deliver semantically coherent, logically consistent multi-hop chains without lexical overlap between queries and evidence, the benchmark would address a clear gap in existing chart/table RAG evaluations and support more rigorous testing of cross-chart retrieval and reasoning systems.

major comments (2)
  1. [Abstract] Abstract: the central claim that the structure-aware sampling algorithm produces 'semantically coherent, multi-hop reasoning paths' that are 'logically consistent' and free of 'lexical overlap between queries and evidence' is load-bearing for the benchmark's claimed utility, yet the abstract (and the reader's summary of the full text) provides no mechanism, predicate, filter, or post-generation empirical check to verify these properties hold for the released dataset.
  2. [Abstract] Abstract: the statement that 'extensive evaluations across major RAG paradigms reveal significant performance gaps' is presented without any reported metrics, error bars, dataset statistics, or verification that the sampling procedure yields consistent chains; this undermines assessment of the benchmark's difficulty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments on the abstract point by point below. Both comments correctly identify that the abstract is too high-level; we will revise it accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the structure-aware sampling algorithm produces 'semantically coherent, multi-hop reasoning paths' that are 'logically consistent' and free of 'lexical overlap between queries and evidence' is load-bearing for the benchmark's claimed utility, yet the abstract (and the reader's summary of the full text) provides no mechanism, predicate, filter, or post-generation empirical check to verify these properties hold for the released dataset.

    Authors: We agree the abstract does not reference the verification steps. Section 3 details the hierarchical KG construction that organizes entities and relations by granularity. Section 4 describes the structure-aware sampling algorithm, which incorporates explicit predicates and filters for semantic coherence, logical consistency, and absence of lexical overlap. Section 5 reports post-generation empirical checks (including manual validation and automated metrics) confirming these properties on the released dataset. We will revise the abstract to briefly mention these mechanisms and checks. revision: yes

  2. Referee: [Abstract] Abstract: the statement that 'extensive evaluations across major RAG paradigms reveal significant performance gaps' is presented without any reported metrics, error bars, dataset statistics, or verification that the sampling procedure yields consistent chains; this undermines assessment of the benchmark's difficulty.

    Authors: The abstract summarizes results at a high level without numbers. The full manuscript reports concrete metrics, error bars, dataset statistics (e.g., number of queries, domains, hop distributions), and sampling verification in the evaluation section. We will revise the abstract to include key quantitative results (such as average performance gaps across RAG paradigms) and a brief note on sampling consistency to strengthen the claim. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark construction is self-contained

full rationale

The paper describes a constructive framework for building ChartWalker-Bench via hierarchical knowledge graph organization by granularity and a structure-aware sampling algorithm to generate multi-hop QA pairs. No equations, fitted parameters, predictions of derived quantities, or self-citations are invoked as load-bearing steps in any derivation chain. The central claims concern the properties of the released benchmark rather than any result that reduces to its own inputs by construction. This is the expected non-finding for a benchmark-release paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the domain assumption that charts can be effectively represented via hierarchical KGs preserving analytical structure and that the sampling algorithm yields coherent multi-hop paths; these are introduced as part of the new method without external benchmarks or independent evidence in the abstract. No free parameters or invented entities with falsifiable handles are explicitly detailed.

axioms (2)
  • domain assumption Charts can be represented as hierarchical knowledge graphs that organize entities and relations by granularity to preserve analytical structure.
    Invoked as the basis for the construction method in the abstract.
  • domain assumption Structure-aware sampling can synthesize semantically coherent multi-hop reasoning paths with explicit control over difficulty.
    Central to the QA generation process described.
invented entities (2)
  • hierarchical knowledge graph tailored to charts no independent evidence
    purpose: To organize chart entities and relations by granularity for preserving analytical structure in RAG tasks.
    Introduced as the core of the ChartWalker framework.
  • structure-aware sampling algorithm no independent evidence
    purpose: To generate semantically coherent multi-hop reasoning paths for cross-chart questions.
    Proposed to enable control over query difficulty and granularity.

pith-pipeline@v0.9.1-grok · 5743 in / 1469 out tokens · 36064 ms · 2026-06-26T06:00:14.434393+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

77 extracted references · 8 canonical work pages

  1. [1]

    Qwen3-vl technical report, 2025 a

    Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., Ge, W., Guo, Z., Huang, Q., Huang, J., Huang, F., Hui, B., Jiang, S., Li, Z., Li, M., Li, M., Li, K., Lin, Z., Lin, J., Liu, X., Liu, J., Liu, C., Liu, Y., Liu, D., Liu, S., Lu, D., Luo, R., Lv, C., Men, R., Meng, L., Ren, X., Ren, X., Song, S., Sun, Y., Tang, ...

  2. [2]

    Qwen2.5-vl technical report, 2025 b

    Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., Zhong, H., Zhu, Y., Yang, M., Li, Z., Wan, J., Wang, P., Ding, W., Fu, Z., Xu, Y., Ye, J., Zhang, X., Xie, T., Cheng, Z., Zhang, H., Yang, Z., Xu, H., and Lin, J. Qwen2.5-vl technical report, 2025 b . URL https://arxiv.org/abs/2502.13923

  3. [4]

    A., Hu, L., and Wang, D

    Cheng, K., Lin, G., Fei, H., zhai, Y., Yu, L., Ali, M. A., Hu, L., and Wang, D. Multi-hop question answering under temporal knowledge editing, 2024. URL https://arxiv.org/abs/2404.00492

  4. [6]

    Webwatcher: Breaking new frontier of vision-language deep research agent, 2025

    Geng, X., Xia, P., Zhang, Z., Wang, X., Wang, Q., Ding, R., Wang, C., Wu, J., Zhao, Y., Li, K., Jiang, Y., Xie, P., Huang, F., and Zhou, J. Webwatcher: Breaking new frontier of vision-language deep research agent, 2025. URL https://arxiv.org/abs/2508.05748

  5. [7]

    Rag-anything: All-in-one rag framework, 2025

    Guo, Z., Ren, X., Xu, L., Zhang, J., and Huang, C. Rag-anything: All-in-one rag framework, 2025. URL https://arxiv.org/abs/2510.12323

  6. [8]

    J., Shu, Y., Qi, W., Zhou, S., and Su, Y

    Gutiérrez, B. J., Shu, Y., Qi, W., Zhou, S., and Su, Y. From rag to memory: Non-parametric continual learning for large language models, 2025. URL https://arxiv.org/abs/2502.14802

  7. [9]

    A., Mukherjee, S., Tang, X., He, Q., Hua, Z., Long, B., Zhao, T., Shah, N., Javari, A., Xia, Y., and Tang, J

    Han, H., Wang, Y., Shomer, H., Guo, K., Ding, J., Lei, Y., Halappanavar, M., Rossi, R. A., Mukherjee, S., Tang, X., He, Q., Hua, Z., Long, B., Zhao, T., Shah, N., Javari, A., Xia, Y., and Tang, J. Retrieval-augmented generation with graphs (graphrag), 2025. URL https://arxiv.org/abs/2501.00309

  8. [10]

    Herzig, J., Müller, T., Krichene, S., and Eisenschlos, J. M. Open domain question answering over tables via dense retrieval, 2021. URL https://arxiv.org/abs/2103.12011

  9. [14]

    Open-wikitable: Dataset for open domain question answering with complex reasoning over table, 2023

    Kweon, S., Kwon, Y., Cho, S., Jo, Y., and Choi, E. Open-wikitable: Dataset for open domain question answering with complex reasoning over table, 2023. URL https://arxiv.org/abs/2305.07288

  10. [16]

    Mimotable: A multi-scale spreadsheet benchmark with meta operations for table reasoning, 2024

    Li, Z., Du, Y., Zheng, M., and Song, M. Mimotable: A multi-scale spreadsheet benchmark with meta operations for table reasoning, 2024. URL https://arxiv.org/abs/2412.11711

  11. [17]

    Graphsearchnet: Enhancing gnns via capturing global dependencies for semantic code search, 2023

    Liu, S., Xie, X., Siow, J., Ma, L., Meng, G., and Liu, Y. Graphsearchnet: Enhancing gnns via capturing global dependencies for semantic code search, 2023. URL https://arxiv.org/abs/2111.02671

  12. [18]

    Deepdive: Advancing deep search agents with knowledge graphs and multi-turn rl, 2025

    Lu, R., Hou, Z., Wang, Z., Zhang, H., Liu, X., Li, Y., Feng, S., Tang, J., and Dong, Y. Deepdive: Advancing deep search agents with knowledge graphs and multi-turn rl, 2025. URL https://arxiv.org/abs/2509.10446

  13. [20]

    Multi-hop question answering, 2024

    Mavi, V., Jangra, A., and Jatowt, A. Multi-hop question answering, 2024. URL https://arxiv.org/abs/2204.09140

  14. [21]

    and Siriborvornratanakul, T

    Norasaed, W. and Siriborvornratanakul, T. Market movement prediction using chart patterns and attention mechanism. Discover Analytics, 2 0 (1), 2024. doi:10.1007/s44257-023-00007-6. URL https://doi.org/10.1007/s44257-023-00007-6

  15. [22]

    Hello GPT-4

    OpenAI . Hello GPT-4 . https://openai.com/index/hello-gpt-4o/, 2024

  16. [24]

    Robertson, S. E. and Walker, S. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval . In Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval (Special Issue of the SIGIR Forum) , pp.\ 232--241. Springer-Verlag , 1994. ISBN 3-540-19889-X

  17. [25]

    Proximal policy optimization algorithms, 2017

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms, 2017. URL https://arxiv.org/abs/1707.06347

  18. [26]

    Reasoning in trees: Improving retrieval-augmented generation for multi-hop question answering, 2026

    Shi, Y., Sun, M., Liu, Z., Yang, M., Fang, Y., Sun, T., and Gu, X. Reasoning in trees: Improving retrieval-augmented generation for multi-hop question answering, 2026. URL https://arxiv.org/abs/2601.11255

  19. [27]

    Mtabvqa: Evaluating multi-tabular reasoning of language models in visual space, 2025 a

    Singh, A., Biemann, C., and Strich, J. Mtabvqa: Evaluating multi-tabular reasoning of language models in visual space, 2025 a . URL https://arxiv.org/abs/2506.11684

  20. [28]

    Singh, A., Ehtesham, A., Kumar, S., and Khoei, T. T. Agentic retrieval-augmented generation: A survey on agentic rag, 2025 b . URL https://arxiv.org/abs/2501.09136

  21. [29]

    Vagen:reinforcing world model reasoning for multi-turn vlm agents, 2025

    Wang*, K., Zhang*, P., Wang*, Z., Gao*, Y., Li*, L., Wang, Q., Chen, H., Wan, C., Lu, Y., Yang, Z., Wang, L., Krishna, R., Wu, J., Fei-Fei, L., Choi, Y., and Li, M. Vagen:reinforcing world model reasoning for multi-turn vlm agents, 2025. URL https://vagen-ai.github.io/

  22. [31]

    Infochartqa: A benchmark for multimodal question answering on infographic charts, 2025

    Xie, T., Lin, M., Liu, M., Ye, Y., Chen, C., and Liu, S. Infochartqa: A benchmark for multimodal question answering on infographic charts, 2025. URL https://arxiv.org/abs/2505.19028

  23. [33]

    W., Salakhutdinov, R., and Manning, C

    Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., and Manning, C. D. Hotpotqa: A dataset for diverse, explainable multi-hop question answering, 2018. URL https://arxiv.org/abs/1809.09600

  24. [34]

    Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning, 2025

    Yu, X., Jian, P., and Chen, C. Tablerag: A retrieval augmented generation framework for heterogeneous document reasoning, 2025. URL https://arxiv.org/abs/2506.10380

  25. [35]

    A graph representation of semi-structured data for web question answering, 2020

    Zhang, X., Shou, L., Pei, J., Gong, M., Wen, L., and Jiang, D. A graph representation of semi-structured data for web question answering, 2020. URL https://arxiv.org/abs/2010.06801

  26. [37]

    P., Zhang, H., Gonzalez, J

    Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E. P., Zhang, H., Gonzalez, J. E., and Stoica, I. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023. URL https://arxiv.org/abs/2306.05685

  27. [38]

    Seq2sql: Generating structured queries from natural language using reinforcement learning, 2017

    Zhong, V., Xiong, C., and Socher, R. Seq2sql: Generating structured queries from natural language using reinforcement learning, 2017. URL https://arxiv.org/abs/1709.00103

  28. [39]

    Rag over tables: Hierarchical memory index, multi-stage retrieval, and benchmarking, 2025

    Zou, J., Fu, D., Chen, S., He, X., Li, Z., Zhu, Y., Han, J., and He, J. Rag over tables: Hierarchical memory index, multi-stage retrieval, and benchmarking, 2025. URL https://arxiv.org/abs/2504.01346

  29. [40]

    Langley , title =

    P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

  30. [41]

    T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

  31. [42]

    M. J. Kearns , title =

  32. [43]

    Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

  33. [44]

    R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

  34. [45]

    Suppressed for Anonymity , author=

  35. [46]

    Newell and P

    A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

  36. [47]

    A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

  37. [48]

    arXiv preprint arXiv:2502.14864 , year=

    Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework , author=. arXiv preprint arXiv:2502.14864 , year=

  38. [49]

    arXiv preprint arXiv:2504.05506 , year=

    ChartQAPro: A more diverse and challenging benchmark for chart question answering , author=. arXiv preprint arXiv:2504.05506 , year=

  39. [50]

    2024 , eprint=

    MiMoTable: A Multi-scale Spreadsheet Benchmark with Meta Operations for Table Reasoning , author=. 2024 , eprint=

  40. [51]

    arXiv preprint arXiv:2510.24701 , year=

    Tongyi DeepResearch Technical Report , author=. arXiv preprint arXiv:2510.24701 , year=

  41. [52]

    2025 , eprint=

    MTabVQA: Evaluating Multi-Tabular Reasoning of Language Models in Visual Space , author=. 2025 , eprint=

  42. [53]

    2025 , eprint=

    DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL , author=. 2025 , eprint=

  43. [54]

    2025 , eprint=

    TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning , author=. 2025 , eprint=

  44. [55]

    2025 , eprint=

    RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking , author=. 2025 , eprint=

  45. [56]

    Compositional Semantic Parsing on Semi-Structured Tables

    Pasupat, Panupong and Liang, Percy. Compositional Semantic Parsing on Semi-Structured Tables. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2015. doi:10.3115/v1/P15-1142

  46. [57]

    2023 , eprint=

    Open-WikiTable: Dataset for Open Domain Question Answering with Complex Reasoning over Table , author=. 2023 , eprint=

  47. [58]

    2025 , url=

    VAGEN:Reinforcing World Model Reasoning for Multi-Turn VLM Agents , author=. 2025 , url=

  48. [59]

    2025 , eprint=

    Retrieval-Augmented Generation with Graphs (GraphRAG) , author=. 2025 , eprint=

  49. [60]

    2025 , eprint=

    RAG-Anything: All-in-One RAG Framework , author=. 2025 , eprint=

  50. [61]

    The anatomy of a large-scale hypertextual Web search engine , journal =

    Sergey Brin and Lawrence Page , keywords =. The anatomy of a large-scale hypertextual Web search engine , journal =. 1998 , note =. doi:https://doi.org/10.1016/S0169-7552(98)00110-X , url =

  51. [62]

    2017 , eprint=

    Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , author=. 2017 , eprint=

  52. [63]

    2025 , eprint=

    InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts , author=. 2025 , eprint=

  53. [64]

    2021 , eprint=

    Open Domain Question Answering over Tables via Dense Retrieval , author=. 2021 , eprint=

  54. [65]

    2025 , eprint=

    Qwen3-VL Technical Report , author=. 2025 , eprint=

  55. [66]

    2020 , eprint=

    A Graph Representation of Semi-structured Data for Web Question Answering , author=. 2020 , eprint=

  56. [67]

    2025 , eprint=

    From RAG to Memory: Non-Parametric Continual Learning for Large Language Models , author=. 2025 , eprint=

  57. [68]

    arXiv preprint arXiv:2601.04720 , year=

    Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking , author=. arXiv preprint arXiv:2601.04720 , year=

  58. [69]

    arXiv preprint arXiv:2506.05176 , year=

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. arXiv preprint arXiv:2506.05176 , year=

  59. [70]

    Robertson and Steve Walker

    Stephen E. Robertson and Steve Walker. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. Proceedings of the 17th International ACM SIGIR Conference on Research and Development in Information Retrieval (Special Issue of the SIGIR Forum)

  60. [71]

    2024 , howpublished =

    Hello. 2024 , howpublished =

  61. [72]

    Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , pages =

    Kim, Dae Hyun and Hoque, Enamul and Agrawala, Maneesh , title =. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , pages =. 2020 , isbn =. doi:10.1145/3313831.3376467 , abstract =

  62. [73]

    2025 , eprint=

    Qwen2.5-VL Technical Report , author=. 2025 , eprint=

  63. [74]

    2023 , eprint=

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena , author=. 2023 , eprint=

  64. [75]

    2025 , eprint=

    Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG , author=. 2025 , eprint=

  65. [76]

    2025 , eprint=

    WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent , author=. 2025 , eprint=

  66. [77]

    2017 , eprint=

    Proximal Policy Optimization Algorithms , author=. 2017 , eprint=

  67. [78]

    2020 , eprint=

    Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps , author=. 2020 , eprint=

  68. [79]

    2018 , eprint=

    HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , author=. 2018 , eprint=

  69. [80]

    A key review on graph data science: The power of graphs in scientific studies , journal =

    Resul Das and Mucahit Soylu , keywords =. A key review on graph data science: The power of graphs in scientific studies , journal =. 2023 , issn =. doi:https://doi.org/10.1016/j.chemolab.2023.104896 , url =

  70. [81]

    Discover Analytics , volume =

    Norasaed, Witawat and Siriborvornratanakul, Thitirat , title =. Discover Analytics , volume =. 2024 , doi =

  71. [82]

    Perspectives on Politics , author=

    Using Graphs Instead of Tables in Political Science , volume=. Perspectives on Politics , author=. 2007 , pages=. doi:10.1017/S1537592707072209 , number=

  72. [83]

    2024 , eprint=

    Multi-hop Question Answering , author=. 2024 , eprint=

  73. [84]

    2019 , isbn =

    Kumar, Vishwajeet and Hua, Yuncheng and Ramakrishnan, Ganesh and Qi, Guilin and Gao, Lianli and Li, Yuan-Fang , title =. 2019 , isbn =. doi:10.1007/978-3-030-30793-6_22 , booktitle =

  74. [85]

    2023 , eprint=

    GraphSearchNet: Enhancing GNNs via Capturing Global Dependencies for Semantic Code Search , author=. 2023 , eprint=

  75. [86]

    KCS : Diversify Multi-hop Question Generation with Knowledge Composition Sampling

    Wang, Yangfan and Liu, Jie and Tang, Chen and Yan, Lian and Jiang, Jingchi. KCS : Diversify Multi-hop Question Generation with Knowledge Composition Sampling. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1181

  76. [87]

    2026 , eprint=

    Reasoning in Trees: Improving Retrieval-Augmented Generation for Multi-Hop Question Answering , author=. 2026 , eprint=

  77. [88]

    2024 , eprint=

    Multi-hop Question Answering under Temporal Knowledge Editing , author=. 2024 , eprint=