CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems

Avirup Sil; Cong Duy Vu Hoang; Fahimeh Saleh; Farhad Moghimifar; Katrin Kirchhoff; Poorya Zaremoodi; Shawn Chang Xu; Tabinda Sarwar; Xiaoxiao Ma

arxiv: 2604.22313 · v1 · submitted 2026-04-24 · 💻 cs.CL

CLARITY: A Framework and Benchmark for Conversational Language Ambiguity and Unanswerability in Interactive NL2SQL Systems

Tabinda Sarwar , Farhad Moghimifar , Cong Duy Vu Hoang , Xiaoxiao Ma , Shawn Chang Xu , Fahimeh Saleh , Poorya Zaremoodi , Avirup Sil

show 1 more author

Katrin Kirchhoff

This is my paper

Pith reviewed 2026-05-08 11:41 UTC · model grok-4.3

classification 💻 cs.CL

keywords NL2SQLambiguity detectionbenchmark generationconversational interfacesunanswerable queriesschema ambiguityinteractive systems

0 comments

The pith

Leading NL2SQL systems suffer significant performance drops when queries contain multiple sources of ambiguity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework to automatically generate benchmarks for NL2SQL systems that include multi-faceted ambiguities and conversational continuations in both single-turn and multi-turn settings. It uses a pipeline to turn clear SQL queries into ambiguous natural language versions while adding metadata about the schema. When tested on standard datasets, top systems including those using large language models show big declines in performance under these conditions. These systems can often spot that a query is ambiguous but have trouble figuring out exactly which parts of the database schema are causing the issues and how to fix them. This points to a gap in making NL2SQL tools reliable for real interactive use where users may not provide complete information.

Core claim

Clarity is a framework that automatically generates an NL2SQL benchmark containing multi-faceted ambiguities and diverse user behaviors across single- and multi-turn settings. Through a constraint-driven pipeline, it transforms executable SQL into ambiguous queries augmented with grounded conversational continuations and schema-level metadata. Empirical evaluation on existing datasets shows that leading NL2SQL systems suffer significant performance degradation under multi-faceted ambiguity, detecting it but struggling to localize and resolve the underlying schema-level sources.

What carries the argument

The Clarity framework, which uses a constraint-driven pipeline to transform executable SQL queries into ambiguous natural language versions with added conversational continuations and schema metadata.

Load-bearing premise

The automatically generated multi-faceted ambiguities and conversational continuations represent realistic failure modes that occur in real interactive industry NL2SQL deployments.

What would settle it

Comparing the ambiguity patterns and system failure modes in the generated benchmark against actual user interaction logs from deployed NL2SQL applications in industry settings.

Figures

Figures reproduced from arXiv: 2604.22313 by Avirup Sil, Cong Duy Vu Hoang, Fahimeh Saleh, Farhad Moghimifar, Katrin Kirchhoff, Poorya Zaremoodi, Shawn Chang Xu, Tabinda Sarwar, Xiaoxiao Ma.

**Figure 1.** Figure 1: Overview of the CLARITY framework. The symbols and denote rule-based and LLMbased components, respectively. 2025; Chen et al., 2025). In practice, a single query often contains multiple interacting sources of ambiguity, covering both schema columns and values, yet most existing benchmarks and evaluation protocols assume a single ambiguity per query and cooperative users who provide clean clarifications … view at source ↗

read the original abstract

NL2SQL systems deployed in industry settings often encounter ambiguous or unanswerable queries, particularly in interactive scenarios with incomplete user clarification. Existing benchmarks typically assume a single source of ambiguity and rely on user interaction for resolution, overlooking realistic failure modes. We introduce Clarity, a framework for automatically generating an NL2SQL benchmark with multi-faceted ambiguities and diverse user behaviors across both single- and multi-turn settings. Using a constraint-driven pipeline, Clarity transforms executable SQL into ambiguous queries, augmented with grounded conversational continuations and schema-level metadata. Empirical evaluation on Spider and BIRD shows that leading NL2SQL systems, including those based on strong LLMs, suffer significant performance degradation under multi-faceted ambiguity. While these systems often detect ambiguity, they struggle to accurately localize and resolve the underlying schema-level sources. Our results highlight the need for more robust ambiguity detection and resolution in industry-grade NL2SQL systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Clarity adds a concrete pipeline for multi-faceted ambiguity benchmarks in NL2SQL but the realism of its synthetic cases remains unproven.

read the letter

The key takeaway is that this paper gives us a new way to generate benchmarks covering multi-faceted ambiguities in conversational NL2SQL, along with evidence that leading systems struggle more than expected on those cases. It does well by extending beyond single-ambiguity assumptions and including diverse user behaviors in single and multi-turn settings. The constraint-driven pipeline that starts from executable SQL and adds grounded continuations is a practical addition that could help test robustness in deployed systems. Where it falls short is on validating that the synthetic ambiguities match real interactive failure modes. Without details on how they checked for naturalness or compared to industry logs, it's hard to know if the reported drops on Spider and BIRD are representative or just artifacts of the generation process. This paper is aimed at NL2SQL researchers and practitioners dealing with ambiguity in real-world querying. Someone working on evaluation or LLM robustness would get concrete ideas from the benchmark construction and the localization failures it uncovers. It deserves peer review to get feedback on strengthening the validation side, as the core idea of multi-faceted ambiguity testing is worth pursuing.

Referee Report

2 major / 1 minor

Summary. The manuscript presents CLARITY, a framework for generating benchmarks for conversational NL2SQL with multi-faceted ambiguities and unanswerability. It uses a constraint-driven pipeline to create ambiguous natural language queries from executable SQL, along with grounded conversational continuations and schema metadata. Evaluation on Spider and BIRD datasets demonstrates that state-of-the-art NL2SQL systems, including LLM-based ones, experience significant performance drops when facing these ambiguities, often detecting them but failing to localize schema-level sources accurately.

Significance. If the generated benchmark is shown to be representative of real interactive scenarios, this work would be significant for the NL2SQL field by exposing limitations in current systems' ambiguity handling and motivating more robust designs for industry deployments. The automatic, scalable generation approach is a strength that could support future benchmark expansion.

major comments (2)

[§3] §3 (Framework and Benchmark Generation): The constraint-driven pipeline is described as transforming executable SQL into ambiguous queries with multi-faceted ambiguities and grounded continuations, but no validation is reported (e.g., human evaluation, inter-annotator agreement, or distributional comparison against real industry NL2SQL logs) to confirm that the synthetic cases avoid artifacts and match realistic failure modes. This is load-bearing for the central claim of performance degradation and poor localization.
[§5] §5 (Empirical Evaluation): The results claim significant degradation on Spider and BIRD under multi-faceted ambiguity, yet the manuscript provides no details on specific error metrics, controls for artificiality of generated queries, breakdown by ambiguity type, or statistical tests. This makes it difficult to assess whether the localization/resolution failures are general or benchmark-specific.

minor comments (1)

The abstract could more precisely quantify the scale of the generated benchmark (number of examples, ambiguity types) to help readers assess the evaluation's scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your thoughtful and constructive review of our manuscript on the CLARITY framework. We appreciate the feedback highlighting the need for stronger validation of the benchmark generation process and more granular reporting in the empirical evaluation. We address each major comment below and commit to revisions that will enhance the rigor and clarity of the work without altering its core contributions.

read point-by-point responses

Referee: [§3] §3 (Framework and Benchmark Generation): The constraint-driven pipeline is described as transforming executable SQL into ambiguous queries with multi-faceted ambiguities and grounded continuations, but no validation is reported (e.g., human evaluation, inter-annotator agreement, or distributional comparison against real industry NL2SQL logs) to confirm that the synthetic cases avoid artifacts and match realistic failure modes. This is load-bearing for the central claim of performance degradation and poor localization.

Authors: We agree that the absence of explicit validation for the generated benchmark is a limitation that weakens support for our central claims. The constraint-driven pipeline was designed to systematically derive multi-faceted ambiguities from executable SQL and schema metadata in a way that mirrors observed failure modes in interactive NL2SQL, but the manuscript does not report human evaluation, inter-annotator agreement, or comparisons to real industry logs. In the revised manuscript, we will add a dedicated validation subsection in §3, including a human study on a representative sample of generated queries (with inter-annotator agreement metrics) and qualitative analysis of realism. Where feasible, we will also incorporate distributional comparisons using publicly available NL2SQL interaction datasets. These additions will directly address the load-bearing concern. revision: yes
Referee: [§5] §5 (Empirical Evaluation): The results claim significant degradation on Spider and BIRD under multi-faceted ambiguity, yet the manuscript provides no details on specific error metrics, controls for artificiality of generated queries, breakdown by ambiguity type, or statistical tests. This makes it difficult to assess whether the localization/resolution failures are general or benchmark-specific.

Authors: We concur that the empirical evaluation section would be strengthened by additional details and controls. The current manuscript reports aggregate performance degradation and localization issues across Spider and BIRD, but lacks breakdowns, specific metrics beyond high-level claims, explicit controls for artificiality, and statistical tests. In the revision, we will expand §5 to include: (1) breakdowns by ambiguity type (e.g., schema-level, value-level, and multi-faceted combinations); (2) specific metrics such as exact match, execution accuracy, and ambiguity detection F1; (3) controls comparing results on generated ambiguous queries versus their original unambiguous Spider/BIRD counterparts; and (4) statistical significance tests (e.g., paired t-tests or McNemar's test) for the observed degradations. This will clarify the generality of the localization and resolution failures. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark generation and evaluation are independent

full rationale

The paper describes a constraint-driven pipeline that transforms executable SQL into ambiguous queries with grounded continuations, then reports empirical results on Spider and BIRD. No equations, fitted parameters, or central claims reduce by construction to the paper's own inputs, self-citations, or prior author ansatzes. The performance degradation findings are direct observations on the generated test cases rather than tautological predictions, and the framework is presented as a standalone contribution without load-bearing self-referential derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that generated ambiguities reflect real-world cases and that performance drops on Spider/BIRD generalize; no free parameters or invented physical entities are involved.

axioms (1)

domain assumption Existing benchmarks assume a single source of ambiguity and rely on user interaction for resolution
Explicitly stated in the abstract as the limitation of prior work that Clarity addresses.

invented entities (1)

Clarity framework no independent evidence
purpose: Automatically generate NL2SQL benchmark with multi-faceted ambiguities and conversational continuations
Newly introduced in this paper with no independent evidence provided beyond the abstract description.

pith-pipeline@v0.9.0 · 5500 in / 1266 out tokens · 55262 ms · 2026-05-08T11:41:23.500747+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

[1]

nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025

nvbench 2.0: Resolving ambiguity in text- to-visualization through stepwise reasoning.arXiv preprint arXiv:2503.12880. Mihai Nad ˘as, , Laura Dio s, an, and Andreea Tomescu

work page arXiv
[2]

Simone Papicchio, Luca Cagliero, and Paolo Papotti

Synthetic data generation using large language models: Advances in text and code.IEEE Access. Simone Papicchio, Luca Cagliero, and Paolo Papotti

work page
[3]

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun, Xingchen Wan, Hailong Li, Azalia Mirhoseini, Amin Saberi, Sercan Arik, and 1 others

Squab: Evaluating llm robustness to ambigu- ous and unanswerable questions in semantic parsing. InProceedings of the 2025 Conference on Empiri- cal Methods in Natural Language Processing, pages 17937–17957. Mohammadreza Pourreza and Davood Rafiei. 2023. Din-sql: Decomposed in-context learning of text- to-sql with self-correction. InAdvances in Neural Info...

work page arXiv 2025
[4]

InFindings of the Association for Computational Linguistics: ACL 2023, pages 5701–5714

Know what i don’t know: Handling ambiguous and unknown questions for text-to-sql. InFindings of the Association for Computational Linguistics: ACL 2023, pages 5701–5714. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingn- ing Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018. Spider: A large-scale hu...

work page 2023
[5]

how manytablesdid we have?

Sphinteract: Resolving ambiguities in nl2sql through user interaction.Proceedings of the VLDB Endowment, 18(4):1145–1158. A The CLARITYFramework - Appendix This appendix presents a taxonomy of A/U in CLARITY. We define core A/U concepts (Table 8), characterize user clarification behaviors in multi-turn interactions (Table 9), provide representative conver...

work page
[6]

- The column of interest and selected columns must share common tokens, words or segments

**Group Selection**: Identify groups of columns that are lexically similar to the given column of interest - Identify the columns that have similar names and writing style or structure to the column of interest and group them. - The column of interest and selected columns must share common tokens, words or segments. - If there are many column names which ...

work page
[7]

employee name

**Term Generation**: Generate an ambiguous term for the column of interest that can likely be asked by a human database manager and cause ambiguity with the selected columns group using the following rules: - The ambiguous term should be lexically similar to column of interest in terms of wording, length, and style. - The ambiguous term should NOT be an e...

work page
[8]

- For the ambiguous term, DO NOT use tokens, words or segments from columns of the selected group that do not appear in the column of interest

**Constraints**: - The ambiguous term and selected group must be specific to the column of interest, i.e., the ambiguous term is generated for the column of interest which causes ambiguity with the selected columns group. - For the ambiguous term, DO NOT use tokens, words or segments from columns of the selected group that do not appear in the column of i...

work page
[9]

Monthly Orders

**Format**: Strictly generate the response using the given output format. DO NOT generate any other content or text. ### **Examples** Example 1: Column of Interest: "Monthly Orders" Given List of Similar Columns: ["Total Orders", "Yearly Orders", "Order Dispatch Date" , "Order Invoice ID"] History: None Selected Group: ["Total Orders", "Yearly Orders"] Am...

work page
[10]

* Focus on the conditions of the ambiguity criterion to identify valid ambiguous term

**Analyze**: * Compare the ‘Given Term‘ with each column of the ‘Given Column List‘ using the given ‘Ambiguity‘ definition. * Focus on the conditions of the ambiguity criterion to identify valid ambiguous term

work page
[11]

valid"‘ if the term is ambiguous. Otherwise, its ‘

**Decide**: Outcome is ‘"valid"‘ if the term is ambiguous. Otherwise, its ‘"invalid"‘

work page
[12]

**Explain**: Provide a brief reason for your decision

work page
[13]

* The reason should be specific to the given term and list of columns

**Constraints**: * Base your reason strictly on the provided definitions. * The reason should be specific to the given term and list of columns. * Do not use any external knowledge or make assumptions

work page
[14]

Your task is to generate a natural language query from the given SQL query

**Format**: Structure your entire response according to the ‘Output Format‘ instructions ### **Inputs** Given Term: {term} Given List of Columns: {dataset_columns} ### **Outputs** **Your response must follow these instructions**: {format_instructions} NL Query Generator Prompt You are a SQL expert specializing in natural language (NL)-to-SQL translation a...

work page
[15]

Do not make any assumptions - All the column names mentioned in the SQL query must be explicitly mentioned in the natural language query

**Natural Language Query Generation**: - Using the given SQL query, write a natural language query that would generate the given SQL query - Strictly use the column names mentioned in the SQL query and its context to generate the natural language query. Do not make any assumptions - All the column names mentioned in the SQL query must be explicitly mentio...

work page
[16]

Use these to generate an ambiguous query where the given column is replaced with the ambiguous term

**Ambiguous Query Generation**: - You are given a list of column names and corresponding ambiguous terms. Use these to generate an ambiguous query where the given column is replaced with the ambiguous term. - strictly use the given column name to replace it with the corresponding ambiguous term for the ambiguous query. - Use the generated natural language...

work page
[17]

customer_id

**Constraints**: - The generated natural language query and its ambiguous variant should correspond to the same underlying SQL query. The aim of the queries should not change. - The generated natural language query should be written in a human-like style and should not use the exact column names format found in the SQL query. Use the provided examples as ...

work page
[18]

Monthly Orders

**Format**: - Strictly generate response requested in the output instructions. - DO NOT GENERATE any other supplementary explanation or description. Strictly generate output as mentioned in the output format ### **Examples** Example 1: Given SQL Query: 'SELECT count("Monthly Orders"), "Employee ID" FROM my_data GROUP BY "Employee ID" ORDER BY Employee Nam...

work page 2010
[19]

- The ambiguous term(s) and corresponding true column(s) is given that resolves the ambiguity in the given query

**Consistency Assessment**: The ambiguous query should be compared with the original SQL query for assessment. - The ambiguous term(s) and corresponding true column(s) is given that resolves the ambiguity in the given query. - When the ambiguous term(s) is replaced by the true column(s), the resulting query should be consistent with the underlying SQL que...

work page
[20]

Otherwise, it is'invalid'

**Response Generation**: After replacing the ambiguous term(s) with true column(s) in the given ambiguous query, if the resulting query corresponds to underlying SQL query then the outcome is 'valid'. Otherwise, it is'invalid'. - The valid assessment must address the two consistency condition. - Provide a reason for the assessment as well

work page
[21]

Response should be specific to given queries, ambiguous term and true column information

**Constraints**: - Base your reason strictly on the provided instructions. Response should be specific to given queries, ambiguous term and true column information. - Do not use any external knowledge or make assumptions. - Refrain from providing a SQL language-based response and instead, provide a natural language response

work page
[22]

Monthly Orders

**Response Format**: Strictly generate the response using the given output format. DO NOT generate any other content or text. ### **Examples** Example 1: Ambiguous Natural Language Query: 'show count of orders by employee sorted on employee name' Ambiguous terms: ['orders','employee'] True Columns: ['Monthly Orders','Employee ID'] Underlying SQL Query: SE...

work page 2010
[23]

orders" term is lexically similar to

**Lexical Column Ambiguity**: The query includes tokens or terms that refer to a column, but there is no exact match with any column in the given schema. However, the term or token is lexically similar to two or more columns of the schema, making it unclear which column the query is referring to. - Lexical similarity means that the term shares similar nam...

work page
[24]

issue" is semantically similar to

**Semantic Column Ambiguity**: The query includes tokens or terms that refer to a column, but there is no exact match with any column in the schema. However, the term or token is semantically similar to two or more columns of the schema, making it is unclear which column the query is referring to. - Semantic Similarity means that the term represents highe...

work page
[25]

The term or token is neither lexically nor semantically similar to any columns in the schema

**Column Confusion**: The query contains tokens or terms that cannot be mapped to any columns in the schema. The term or token is neither lexically nor semantically similar to any columns in the schema. The column does not exists in the schema

work page
[26]

**Unambiguous**: The query contains tokens or terms referring to a column, with each term being mapped to a single column in the schema. Unambiguous queries can be translated to SQL query without any human intervention ### **Instructions**: You are given a natural language query that may belong to one of the given categories. Your task is to identify term...

work page
[27]

For example: -'Customer Count', 'customer count', 'CUSTOMER COUNT', 'customer_count' and 'customer counts'represent the same entity

Analyse the natural language query to identify the terms that refers to column in the schema taking into account the given ***Type of Categories** - Schema can contain multiple tables, compare the terms against each table - Ignore the case (uppercase or lowercase), singular plural and minor variations during the comparison. For example: -'Customer Count',...

work page
[28]

- There could be more than one terms whose exact match does not exist - If exact match with the schema columns exist for the identified terms then use 'None Found' for response

Identify the list of terms that refer to a column but do not exactly match any column names in the schema. - There could be more than one terms whose exact match does not exist - If exact match with the schema columns exist for the identified terms then use 'None Found' for response

work page
[29]

The reason should be specific to the given schema

Provide a reason for the assessment. The reason should be specific to the given schema. **Format**: Strictly generate the response using the given output format, recording terms and reason seperately. DO NOT generate any other content or text. ### **Inputs** Datbase Schema: {db_schema} Natural Language Question: {ac_query} ### **Outputs** {format_instruct...

work page arXiv

[1] [1]

nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025

nvbench 2.0: Resolving ambiguity in text- to-visualization through stepwise reasoning.arXiv preprint arXiv:2503.12880. Mihai Nad ˘as, , Laura Dio s, an, and Andreea Tomescu

work page arXiv

[2] [2]

Simone Papicchio, Luca Cagliero, and Paolo Papotti

Synthetic data generation using large language models: Advances in text and code.IEEE Access. Simone Papicchio, Luca Cagliero, and Paolo Papotti

work page

[3] [3]

Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun, Xingchen Wan, Hailong Li, Azalia Mirhoseini, Amin Saberi, Sercan Arik, and 1 others

Squab: Evaluating llm robustness to ambigu- ous and unanswerable questions in semantic parsing. InProceedings of the 2025 Conference on Empiri- cal Methods in Natural Language Processing, pages 17937–17957. Mohammadreza Pourreza and Davood Rafiei. 2023. Din-sql: Decomposed in-context learning of text- to-sql with self-correction. InAdvances in Neural Info...

work page arXiv 2025

[4] [4]

InFindings of the Association for Computational Linguistics: ACL 2023, pages 5701–5714

Know what i don’t know: Handling ambiguous and unknown questions for text-to-sql. InFindings of the Association for Computational Linguistics: ACL 2023, pages 5701–5714. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingn- ing Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018. Spider: A large-scale hu...

work page 2023

[5] [5]

how manytablesdid we have?

Sphinteract: Resolving ambiguities in nl2sql through user interaction.Proceedings of the VLDB Endowment, 18(4):1145–1158. A The CLARITYFramework - Appendix This appendix presents a taxonomy of A/U in CLARITY. We define core A/U concepts (Table 8), characterize user clarification behaviors in multi-turn interactions (Table 9), provide representative conver...

work page

[6] [6]

- The column of interest and selected columns must share common tokens, words or segments

**Group Selection**: Identify groups of columns that are lexically similar to the given column of interest - Identify the columns that have similar names and writing style or structure to the column of interest and group them. - The column of interest and selected columns must share common tokens, words or segments. - If there are many column names which ...

work page

[7] [7]

employee name

**Term Generation**: Generate an ambiguous term for the column of interest that can likely be asked by a human database manager and cause ambiguity with the selected columns group using the following rules: - The ambiguous term should be lexically similar to column of interest in terms of wording, length, and style. - The ambiguous term should NOT be an e...

work page

[8] [8]

- For the ambiguous term, DO NOT use tokens, words or segments from columns of the selected group that do not appear in the column of interest

**Constraints**: - The ambiguous term and selected group must be specific to the column of interest, i.e., the ambiguous term is generated for the column of interest which causes ambiguity with the selected columns group. - For the ambiguous term, DO NOT use tokens, words or segments from columns of the selected group that do not appear in the column of i...

work page

[9] [9]

Monthly Orders

**Format**: Strictly generate the response using the given output format. DO NOT generate any other content or text. ### **Examples** Example 1: Column of Interest: "Monthly Orders" Given List of Similar Columns: ["Total Orders", "Yearly Orders", "Order Dispatch Date" , "Order Invoice ID"] History: None Selected Group: ["Total Orders", "Yearly Orders"] Am...

work page

[10] [10]

* Focus on the conditions of the ambiguity criterion to identify valid ambiguous term

**Analyze**: * Compare the ‘Given Term‘ with each column of the ‘Given Column List‘ using the given ‘Ambiguity‘ definition. * Focus on the conditions of the ambiguity criterion to identify valid ambiguous term

work page

[11] [11]

valid"‘ if the term is ambiguous. Otherwise, its ‘

**Decide**: Outcome is ‘"valid"‘ if the term is ambiguous. Otherwise, its ‘"invalid"‘

work page

[12] [12]

**Explain**: Provide a brief reason for your decision

work page

[13] [13]

* The reason should be specific to the given term and list of columns

**Constraints**: * Base your reason strictly on the provided definitions. * The reason should be specific to the given term and list of columns. * Do not use any external knowledge or make assumptions

work page

[14] [14]

Your task is to generate a natural language query from the given SQL query

**Format**: Structure your entire response according to the ‘Output Format‘ instructions ### **Inputs** Given Term: {term} Given List of Columns: {dataset_columns} ### **Outputs** **Your response must follow these instructions**: {format_instructions} NL Query Generator Prompt You are a SQL expert specializing in natural language (NL)-to-SQL translation a...

work page

[15] [15]

Do not make any assumptions - All the column names mentioned in the SQL query must be explicitly mentioned in the natural language query

**Natural Language Query Generation**: - Using the given SQL query, write a natural language query that would generate the given SQL query - Strictly use the column names mentioned in the SQL query and its context to generate the natural language query. Do not make any assumptions - All the column names mentioned in the SQL query must be explicitly mentio...

work page

[16] [16]

Use these to generate an ambiguous query where the given column is replaced with the ambiguous term

**Ambiguous Query Generation**: - You are given a list of column names and corresponding ambiguous terms. Use these to generate an ambiguous query where the given column is replaced with the ambiguous term. - strictly use the given column name to replace it with the corresponding ambiguous term for the ambiguous query. - Use the generated natural language...

work page

[17] [17]

customer_id

**Constraints**: - The generated natural language query and its ambiguous variant should correspond to the same underlying SQL query. The aim of the queries should not change. - The generated natural language query should be written in a human-like style and should not use the exact column names format found in the SQL query. Use the provided examples as ...

work page

[18] [18]

Monthly Orders

**Format**: - Strictly generate response requested in the output instructions. - DO NOT GENERATE any other supplementary explanation or description. Strictly generate output as mentioned in the output format ### **Examples** Example 1: Given SQL Query: 'SELECT count("Monthly Orders"), "Employee ID" FROM my_data GROUP BY "Employee ID" ORDER BY Employee Nam...

work page 2010

[19] [19]

- The ambiguous term(s) and corresponding true column(s) is given that resolves the ambiguity in the given query

**Consistency Assessment**: The ambiguous query should be compared with the original SQL query for assessment. - The ambiguous term(s) and corresponding true column(s) is given that resolves the ambiguity in the given query. - When the ambiguous term(s) is replaced by the true column(s), the resulting query should be consistent with the underlying SQL que...

work page

[20] [20]

Otherwise, it is'invalid'

**Response Generation**: After replacing the ambiguous term(s) with true column(s) in the given ambiguous query, if the resulting query corresponds to underlying SQL query then the outcome is 'valid'. Otherwise, it is'invalid'. - The valid assessment must address the two consistency condition. - Provide a reason for the assessment as well

work page

[21] [21]

Response should be specific to given queries, ambiguous term and true column information

**Constraints**: - Base your reason strictly on the provided instructions. Response should be specific to given queries, ambiguous term and true column information. - Do not use any external knowledge or make assumptions. - Refrain from providing a SQL language-based response and instead, provide a natural language response

work page

[22] [22]

Monthly Orders

**Response Format**: Strictly generate the response using the given output format. DO NOT generate any other content or text. ### **Examples** Example 1: Ambiguous Natural Language Query: 'show count of orders by employee sorted on employee name' Ambiguous terms: ['orders','employee'] True Columns: ['Monthly Orders','Employee ID'] Underlying SQL Query: SE...

work page 2010

[23] [23]

orders" term is lexically similar to

**Lexical Column Ambiguity**: The query includes tokens or terms that refer to a column, but there is no exact match with any column in the given schema. However, the term or token is lexically similar to two or more columns of the schema, making it unclear which column the query is referring to. - Lexical similarity means that the term shares similar nam...

work page

[24] [24]

issue" is semantically similar to

**Semantic Column Ambiguity**: The query includes tokens or terms that refer to a column, but there is no exact match with any column in the schema. However, the term or token is semantically similar to two or more columns of the schema, making it is unclear which column the query is referring to. - Semantic Similarity means that the term represents highe...

work page

[25] [25]

The term or token is neither lexically nor semantically similar to any columns in the schema

**Column Confusion**: The query contains tokens or terms that cannot be mapped to any columns in the schema. The term or token is neither lexically nor semantically similar to any columns in the schema. The column does not exists in the schema

work page

[26] [26]

**Unambiguous**: The query contains tokens or terms referring to a column, with each term being mapped to a single column in the schema. Unambiguous queries can be translated to SQL query without any human intervention ### **Instructions**: You are given a natural language query that may belong to one of the given categories. Your task is to identify term...

work page

[27] [27]

For example: -'Customer Count', 'customer count', 'CUSTOMER COUNT', 'customer_count' and 'customer counts'represent the same entity

Analyse the natural language query to identify the terms that refers to column in the schema taking into account the given ***Type of Categories** - Schema can contain multiple tables, compare the terms against each table - Ignore the case (uppercase or lowercase), singular plural and minor variations during the comparison. For example: -'Customer Count',...

work page

[28] [28]

- There could be more than one terms whose exact match does not exist - If exact match with the schema columns exist for the identified terms then use 'None Found' for response

Identify the list of terms that refer to a column but do not exactly match any column names in the schema. - There could be more than one terms whose exact match does not exist - If exact match with the schema columns exist for the identified terms then use 'None Found' for response

work page

[29] [29]

The reason should be specific to the given schema

Provide a reason for the assessment. The reason should be specific to the given schema. **Format**: Strictly generate the response using the given output format, recording terms and reason seperately. DO NOT generate any other content or text. ### **Inputs** Datbase Schema: {db_schema} Natural Language Question: {ac_query} ### **Outputs** {format_instruct...

work page arXiv