Source-Grounded Data Generation for Text-to-JSON Learning

Guijin Son; Sunghee Ahn; Youngjae Yu

arxiv: 2606.20072 · v1 · pith:YHPCQ5QQnew · submitted 2026-06-18 · 💻 cs.CL

Source-Grounded Data Generation for Text-to-JSON Learning

Sunghee Ahn , Guijin Son , Youngjae Yu This is my paper

Pith reviewed 2026-06-26 17:30 UTC · model grok-4.3

classification 💻 cs.CL

keywords text-to-JSONdata generationspreadsheet groundingLLM synthesisstructured extractiontraining databenchmarkJSON schema

0 comments

The pith

A spreadsheet-grounded pipeline produces stronger training data for text-to-JSON extraction than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper targets the difficulty of building reliable training data for turning long unstructured documents into structured JSON. It introduces STAGE, which uses language models to synthesize reports and schemas at scale but verifies every generated value against an underlying spreadsheet. On the new STAGE-Eval benchmark of 851 examples, models trained with this data reach markedly higher exact-match and value-accuracy scores. The approach directly addresses the data bottleneck that limits automated extraction from financial filings and clinical records. If the gains hold, downstream systems could convert legacy documents into machine-readable form with less manual curation.

Core claim

STAGE constructs reports and JSON schema by using LLMs for scalable synthesis while validating ground-truth values against the underlying spreadsheet, and evaluations on STAGE-Eval show that this produces stronger training data than existing approaches, lifting Qwen3-4B exact match from 31.37% to 74.27% and value accuracy from 45.46% to 90.69%.

What carries the argument

The STAGE pipeline, which pairs LLM synthesis of text and JSON with direct validation of values against a source spreadsheet.

If this is right

Models trained on STAGE data achieve substantially higher exact-match rates on source-grounded text-to-JSON tasks.
Value-level accuracy in JSON extraction rises sharply compared with models trained on existing synthetic datasets.
The validation step allows scalable generation while preserving fidelity to the original source data.
The method targets extraction needs in domains that store information in both documents and tabular records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same grounding principle could be applied to other output formats such as tables or XML when a comparable source of truth exists.
Industries that already maintain both documents and spreadsheets could generate custom training sets without relying solely on public data.
If the validation catches only value mismatches and not structural or semantic biases, additional checks may still be needed for full reliability.

Load-bearing premise

Validating the generated JSON values against the spreadsheet is enough to ensure the training pairs are high-quality and free of systematic errors introduced by the LLM step.

What would settle it

Training models on STAGE data and then measuring performance on a held-out collection of real documents paired with independently verified JSON extractions; if accuracy shows no improvement over baselines trained on unvalidated synthetic data, the claim would be falsified.

Figures

Figures reproduced from arXiv: 2606.20072 by Guijin Son, Sunghee Ahn, Youngjae Yu.

**Figure 1.** Figure 1: Desired properties and trade-offs in text-to-JSON data construction. A good instance needs a grounded report, a verifiable JSON, and a scalable pipeline, yet LLM and human annotation each satisfy only part of these, motivating a sourcegrounded approach. and structured-output APIs (Qin et al., 2024; Patil et al., 2024), aiming for strict schema compliance. Recent works go beyond simply generating a format… view at source ↗

**Figure 2.** Figure 2: Overview of STAGE. A spreadsheet is filtered, serialized into a Markdown table, and used as the shared source for both report and JSON/schema generation. LLMs add surrounding context to the report and propose JSON structures. Every JSON value is verified against the source spreadsheet, ensuring that ground-truth knowledge stays anchored to the source. domain knowledge. Human annotation provides stronger qu… view at source ↗

read the original abstract

From financial filings to clinical records, legacy industries rely heavily on long, unstructured documents to store high-value information. Reliably extracting this information into structured, machine-readable representations is a key prerequisite to making the contents accessible to automated systems. JSON is a natural target for such structured extraction, yet constructing reliable and scalable text-to-JSON training data remains challenging. To address this gap, we propose STAGE (Spreadsheet-grounded Text-to-JSON Artifact GEneration), a source-grounded data generation pipeline that constructs reports and JSON schema by using LLMs for scalable synthesis while validating ground-truth values against the underlying spreadsheet. Evaluations on STAGE-Eval, our source-grounded benchmark with an 851-example test set, show that STAGE produces stronger training data than existing approaches. This improves Qwen3-4B exact match from 31.37% to 74.27% and value accuracy from 45.46% to 90.69%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

STAGE gives a workable LLM-plus-spreadsheet pipeline for text-to-JSON data and shows clear numeric lifts on its own benchmark, but the value-only validation leaves room for text-JSON mismatches that could explain part of the gains.

read the letter

The paper's core contribution is the STAGE pipeline: LLMs generate text reports and JSON from spreadsheet sources, then the JSON values are checked directly against the spreadsheet. They release STAGE-Eval, an 851-example test set, and report that training on STAGE data raises Qwen3-4B exact match from 31.37% to 74.27% and value accuracy from 45.46% to 90.69%.

What stands out is the concrete, source-grounded approach. Using the spreadsheet as an external validator is a straightforward way to scale data creation without manual annotation, and the reported improvements are large enough to matter for practical extraction tasks.

The main limitation is the validation step itself. It confirms that the JSON numbers and strings match the source, but it does not check whether every field is explicitly stated in the generated text, whether the text contains extra or conflicting details, or whether the LLM has introduced stylistic regularities that the downstream model could exploit. If STAGE-Eval shares similar generation artifacts, the lift could partly reflect overfitting rather than better extraction skill.

The methods section would need to show the exact prompting, exclusion rules, and error analysis to judge how much of the gain survives those checks. The citation pattern looks standard for synthetic data work.

This is useful for groups already building text-to-structured models or synthetic data pipelines. It is worth sending to peer review because the problem is real, the method is reproducible in principle, and the numbers invite closer inspection rather than dismissal.

Referee Report

2 major / 2 minor

Summary. The paper proposes STAGE (Spreadsheet-grounded Text-to-JSON Artifact GEneration), a pipeline that uses LLMs to synthesize reports and JSON schemas from spreadsheets while validating extracted values against the source spreadsheet. It introduces the STAGE-Eval benchmark (851-example test set) and reports that training on STAGE-generated data improves Qwen3-4B exact match from 31.37% to 74.27% and value accuracy from 45.46% to 90.69% over existing approaches.

Significance. If the gains prove robust to the validation gap, the method offers a scalable route to source-grounded training data for text-to-JSON extraction, a practical need in finance, clinical records, and similar domains. The explicit grounding step is a methodological strength relative to purely synthetic generation.

major comments (2)

[Pipeline description and evaluation] The validation step (described in the pipeline section) checks only that JSON values match the underlying spreadsheet; it does not verify that every JSON field is explicitly entailed by the generated report text, nor that the text avoids systematic omissions or stylistic artifacts introduced by the LLM. This leaves open the possibility that downstream gains reflect learning of generation-specific patterns rather than general extraction capability.
[Evaluation and STAGE-Eval] The abstract and results section report large lifts on STAGE-Eval, but the exact construction of the held-out test set, data exclusion rules, and whether any STAGE-generated examples overlap with the test distribution are not visible; without these details the comparison baselines cannot be assessed for fairness.

minor comments (2)

[Metrics] Define 'exact match' and 'value accuracy' precisely, including how partial matches or schema variations are handled.
[Results] Add a small error analysis or qualitative examples showing cases where the generated text and JSON are misaligned despite value validation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below. Both points identify areas where additional clarity or detail is warranted, and we will revise the manuscript to incorporate the requested information.

read point-by-point responses

Referee: [Pipeline description and evaluation] The validation step (described in the pipeline section) checks only that JSON values match the underlying spreadsheet; it does not verify that every JSON field is explicitly entailed by the generated report text, nor that the text avoids systematic omissions or stylistic artifacts introduced by the LLM. This leaves open the possibility that downstream gains reflect learning of generation-specific patterns rather than general extraction capability.

Authors: We agree that the current validation enforces value grounding against the spreadsheet but does not perform explicit textual entailment checks between the generated report and each JSON field, nor does it systematically audit for LLM-induced stylistic artifacts or omissions. This is a deliberate design decision to prioritize scalable value accuracy from the source spreadsheet. However, the referee is correct that this leaves open the possibility that models trained on STAGE data may exploit generation-specific patterns. In the revision we will (a) explicitly state this limitation in the pipeline section, (b) add a short analysis of report-text coverage on a sample of generated examples, and (c) include an ablation that trains on STAGE data but evaluates on an external, non-STAGE test set to help separate pattern learning from general extraction gains. revision: yes
Referee: [Evaluation and STAGE-Eval] The abstract and results section report large lifts on STAGE-Eval, but the exact construction of the held-out test set, data exclusion rules, and whether any STAGE-generated examples overlap with the test distribution are not visible; without these details the comparison baselines cannot be assessed for fairness.

Authors: The referee correctly notes that the current manuscript does not provide sufficient detail on STAGE-Eval construction. We will add a dedicated subsection (likely in Section 4 or an appendix) that describes: (1) the source spreadsheets used, (2) the exact train/test split procedure and any exclusion rules applied to avoid leakage, (3) how the 851-example test set was sampled and annotated, and (4) explicit confirmation that no STAGE-generated training examples appear in the test distribution. These additions will allow readers to evaluate baseline fairness. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical pipeline evaluated on held-out benchmark

full rationale

The paper describes a source-grounded data generation pipeline (STAGE) that synthesizes reports and JSON via LLMs, validates extracted values against spreadsheets, and measures downstream model improvements on the separate STAGE-Eval benchmark (851-example test set). No equations, derivations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the provided text. Claims rest on external empirical comparisons (e.g., Qwen3-4B exact match lift) rather than reducing to inputs by construction. This is a standard self-contained empirical result against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, the method relies on standard LLM synthesis capabilities and the existence of spreadsheet ground truth. No free parameters, new axioms, or invented entities are introduced or fitted in the description.

pith-pipeline@v0.9.1-grok · 5690 in / 1178 out tokens · 34639 ms · 2026-06-26T17:30:00.078923+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 13 canonical work pages · 6 internal anchors

[1]

arXiv preprint arXiv:2505.04016 , volume=

Slot: Structuring the output of large language models , author=. arXiv preprint arXiv:2505.04016 , volume=. 2025 , publisher=

work page arXiv 2025
[2]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[3]

2025 , eprint =

Qwen2.5 Technical Report , author =. 2025 , eprint =

2025
[4]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[5]

2025 , month = dec, howpublished =

Gemini 3 Flash: frontier intelligence built for speed , author =. 2025 , month = dec, howpublished =

2025
[6]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Advances in Neural Information Processing Systems , volume=

Sheetpedia: A 300K-Spreadsheet Corpus for Spreadsheet Intelligence and LLM Fine-Tuning , author=. Advances in Neural Information Processing Systems , volume=
[8]

Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 3: system demonstrations) , pages=

Llamafactory: Unified efficient fine-tuning of 100+ language models , author=. Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 3: system demonstrations) , pages=
[9]

2025 , eprint=

LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs , author=. 2025 , eprint=

2025
[10]

2026 IEEE International Conference on AI Engineering and Innovations (AIEI) , pages=

AI-Driven Document Automation: A Gemini API Integrated System for Data Extraction , author=. 2026 IEEE International Conference on AI Engineering and Innovations (AIEI) , pages=. 2026 , organization=

2026
[11]

Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

Layoutlm: Pre-training of text and layout for document image understanding , author=. Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pages=
[12]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Lmdx: Language model-based document information extraction and localization , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024
[13]

Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

Unified structure generation for universal information extraction , author=. Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers) , pages=
[14]

arXiv preprint arXiv:2501.10868 , year=

Jsonschemabench: A rigorous benchmark of structured outputs for language models , author=. arXiv preprint arXiv:2501.10868 , year=

work page arXiv
[15]

Frontiers of Computer Science , volume=

A survey of large language models , author=. Frontiers of Computer Science , volume=. 2026 , publisher=

2026
[16]

arXiv preprint arXiv:2509.25922 , year=

DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models , author=. arXiv preprint arXiv:2509.25922 , year=

work page arXiv
[17]

Extractbench: A benchmark and evaluation methodology for complex structured extraction, 2026

ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction , author=. arXiv preprint arXiv:2602.12247 , year=

work page arXiv
[18]

The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models

The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models , author=. arXiv preprint arXiv:2604.25359 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[19]

arXiv preprint arXiv:2602.14743 , year=

LLMStructBench: Benchmarking Large Language Model Structured Data Extraction , author=. arXiv preprint arXiv:2602.14743 , year=

work page arXiv
[20]

Efficient Guided Generation for Large Language Models

Efficient guided generation for large language models , author=. arXiv preprint arXiv:2307.09702 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Learning to generate structured output with schema reinforcement learning , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[22]

Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

PICARD: Parsing incrementally for constrained auto-regressive decoding from language models , author=. Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

2021
[23]

Proceedings of Machine Learning and Systems , volume=

Xgrammar: Flexible and efficient structured generation engine for large language models , author=. Proceedings of Machine Learning and Systems , volume=
[24]

Advances in Neural Information Processing Systems , volume=

Gorilla: Large language model connected with massive apis , author=. Advances in Neural Information Processing Systems , volume=
[25]

International Conference on Learning Representations , volume=

Toolllm: Facilitating large language models to master 16000+ real-world apis , author=. International Conference on Learning Representations , volume=
[26]

2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) , pages=

Spreadsheet practices and challenges in a large multinational conglomerate , author=. 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) , pages=. 2017 , organization=

2017
[27]

The American Statistician , volume=

Data organization in spreadsheets , author=. The American Statistician , volume=. 2018 , publisher=

2018
[28]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

DSQG-syn: Synthesizing high-quality data for text-to-SQL parsing by domain specific question generation , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025
[29]

arXiv preprint arXiv:2509.25672 , year=

Sing-sql: A synthetic data generation framework for in-domain text-to-sql translation , author=. arXiv preprint arXiv:2509.25672 , year=

work page arXiv
[30]

arXiv preprint arXiv:2511.04473 , year=

Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs , author=. arXiv preprint arXiv:2511.04473 , year=

work page arXiv
[31]

2023 IEEE international conference on big data (BigData) , pages=

Spider4SPARQL: a complex benchmark for evaluating knowledge graph question answering systems , author=. 2023 IEEE international conference on big data (BigData) , pages=. 2023 , organization=

2023
[32]

2024 , howpublished =

Glaive Function Calling v2 , author =. 2024 , howpublished =

2024
[33]

ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation

ScrapeGraphAI-100k: A Large-Scale Dataset for LLM-Based Web Information Extraction , author=. arXiv preprint arXiv:2602.15189 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task , author=. Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

2018
[35]

International semantic web conference , pages=

Lc-quad 2.0: A large dataset for complex question answering over wikidata and dbpedia , author=. International semantic web conference , pages=. 2019 , organization=

2019

[1] [1]

arXiv preprint arXiv:2505.04016 , volume=

Slot: Structuring the output of large language models , author=. arXiv preprint arXiv:2505.04016 , volume=. 2025 , publisher=

work page arXiv 2025

[2] [2]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

2025 , eprint =

Qwen2.5 Technical Report , author =. 2025 , eprint =

2025

[4] [4]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

2025 , month = dec, howpublished =

Gemini 3 Flash: frontier intelligence built for speed , author =. 2025 , month = dec, howpublished =

2025

[6] [6]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Advances in Neural Information Processing Systems , volume=

Sheetpedia: A 300K-Spreadsheet Corpus for Spreadsheet Intelligence and LLM Fine-Tuning , author=. Advances in Neural Information Processing Systems , volume=

[8] [8]

Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 3: system demonstrations) , pages=

Llamafactory: Unified efficient fine-tuning of 100+ language models , author=. Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 3: system demonstrations) , pages=

[9] [9]

2025 , eprint=

LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs , author=. 2025 , eprint=

2025

[10] [10]

2026 IEEE International Conference on AI Engineering and Innovations (AIEI) , pages=

AI-Driven Document Automation: A Gemini API Integrated System for Data Extraction , author=. 2026 IEEE International Conference on AI Engineering and Innovations (AIEI) , pages=. 2026 , organization=

2026

[11] [11]

Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

Layoutlm: Pre-training of text and layout for document image understanding , author=. Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

[12] [12]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Lmdx: Language model-based document information extraction and localization , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024

[13] [13]

Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

Unified structure generation for universal information extraction , author=. Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

[14] [14]

arXiv preprint arXiv:2501.10868 , year=

Jsonschemabench: A rigorous benchmark of structured outputs for language models , author=. arXiv preprint arXiv:2501.10868 , year=

work page arXiv

[15] [15]

Frontiers of Computer Science , volume=

A survey of large language models , author=. Frontiers of Computer Science , volume=. 2026 , publisher=

2026

[16] [16]

arXiv preprint arXiv:2509.25922 , year=

DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models , author=. arXiv preprint arXiv:2509.25922 , year=

work page arXiv

[17] [17]

Extractbench: A benchmark and evaluation methodology for complex structured extraction, 2026

ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction , author=. arXiv preprint arXiv:2602.12247 , year=

work page arXiv

[18] [18]

The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models

The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models , author=. arXiv preprint arXiv:2604.25359 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

arXiv preprint arXiv:2602.14743 , year=

LLMStructBench: Benchmarking Large Language Model Structured Data Extraction , author=. arXiv preprint arXiv:2602.14743 , year=

work page arXiv

[20] [20]

Efficient Guided Generation for Large Language Models

Efficient guided generation for large language models , author=. arXiv preprint arXiv:2307.09702 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Learning to generate structured output with schema reinforcement learning , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[22] [22]

Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

PICARD: Parsing incrementally for constrained auto-regressive decoding from language models , author=. Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

2021

[23] [23]

Proceedings of Machine Learning and Systems , volume=

Xgrammar: Flexible and efficient structured generation engine for large language models , author=. Proceedings of Machine Learning and Systems , volume=

[24] [24]

Advances in Neural Information Processing Systems , volume=

Gorilla: Large language model connected with massive apis , author=. Advances in Neural Information Processing Systems , volume=

[25] [25]

International Conference on Learning Representations , volume=

Toolllm: Facilitating large language models to master 16000+ real-world apis , author=. International Conference on Learning Representations , volume=

[26] [26]

2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) , pages=

Spreadsheet practices and challenges in a large multinational conglomerate , author=. 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) , pages=. 2017 , organization=

2017

[27] [27]

The American Statistician , volume=

Data organization in spreadsheets , author=. The American Statistician , volume=. 2018 , publisher=

2018

[28] [28]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

DSQG-syn: Synthesizing high-quality data for text-to-SQL parsing by domain specific question generation , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025

[29] [29]

arXiv preprint arXiv:2509.25672 , year=

Sing-sql: A synthetic data generation framework for in-domain text-to-sql translation , author=. arXiv preprint arXiv:2509.25672 , year=

work page arXiv

[30] [30]

arXiv preprint arXiv:2511.04473 , year=

Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs , author=. arXiv preprint arXiv:2511.04473 , year=

work page arXiv

[31] [31]

2023 IEEE international conference on big data (BigData) , pages=

Spider4SPARQL: a complex benchmark for evaluating knowledge graph question answering systems , author=. 2023 IEEE international conference on big data (BigData) , pages=. 2023 , organization=

2023

[32] [32]

2024 , howpublished =

Glaive Function Calling v2 , author =. 2024 , howpublished =

2024

[33] [33]

ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation

ScrapeGraphAI-100k: A Large-Scale Dataset for LLM-Based Web Information Extraction , author=. arXiv preprint arXiv:2602.15189 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task , author=. Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

2018

[35] [35]

International semantic web conference , pages=

Lc-quad 2.0: A large dataset for complex question answering over wikidata and dbpedia , author=. International semantic web conference , pages=. 2019 , organization=

2019