Feasible Plan Generation with Ambiguity-Boundedness in Cross-Model Query Processing

Amarnath Gupta; Subhasis Dasgupta

arxiv: 2605.19197 · v1 · pith:AJM4KNW5new · submitted 2026-05-18 · 💻 cs.DB

Feasible Plan Generation with Ambiguity-Boundedness in Cross-Model Query Processing

Subhasis Dasgupta , Amarnath Gupta This is my paper

Pith reviewed 2026-05-20 06:53 UTC · model grok-4.3

classification 💻 cs.DB

keywords Packed Plan ForestFeasibility ConstraintsIntermediate Logical PlansCross-Model Query ProcessingNatural Language InterfacesDatabase Query PlanningAmbiguity Handling

0 comments

The pith

The Packed Plan Forest encodes all feasible intermediate logical plans for natural language database queries while pruning infeasible ones early.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Natural language questions to databases across different models often produce many ambiguous candidate plans, many of which fail due to type errors, missing connections, or engine rules. The paper defines local feasibility constraints to spot these problems without full enumeration and introduces the Packed Plan Forest as a compact structure to hold every remaining valid plan together. This structure stays polynomial in size when operator arity and annotation sets are bounded, and experiments show it represents exponentially many plans at low cost. A reader would care because the approach makes natural language access to heterogeneous databases practical by avoiding the explosion of useless plans.

Core claim

We address this challenge with feasibility constraints for detecting local inconsistencies and introduce the Packed Plan Forest (PPF) a polynomially bounded structure that compactly encodes all feasible ILPs while pruning infeasible ones early. Extending packed parse forest ideas to multi-model settings, PPF supports efficient feasibility analysis through annotated operators. Formal results show polynomial size under bounded arity and annotation vocabularies, and experiments confirm that PPFs capture exponentially many ILPs with minimal overhead, establishing a scalable foundation for NL-to-DB query planning across heterogeneous systems.

What carries the argument

The Packed Plan Forest (PPF), a polynomially bounded structure extending packed parse forests to multi-model settings that uses annotated operators and local feasibility constraints to encode all feasible intermediate logical plans while pruning infeasible ones early.

If this is right

The PPF size remains polynomial when operator arity and annotation vocabularies are bounded.
Experiments demonstrate that PPFs represent exponentially many ILPs while incurring only minimal overhead.
The structure supplies a scalable base for natural language to database query planning in systems that combine multiple data models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar packing could be tested on other high-ambiguity planning tasks such as multi-engine workflow generation.
The local-to-global pruning assumption could be checked by comparing PPF output against complete global optimization on larger schemas.
The representation might be extended to update incrementally when new engine constraints or schema changes appear.

Load-bearing premise

Local feasibility constraints based on type mismatches, missing bindings, and engine-specific rules are sufficient to prune infeasible ILPs without missing globally feasible plans or requiring full enumeration.

What would settle it

An exhaustive enumeration of all intermediate logical plans on a small heterogeneous schema that reveals either a globally feasible plan absent from the PPF or a PPF whose size exceeds the proven polynomial bound when arity or annotation vocabulary is increased.

Figures

Figures reproduced from arXiv: 2605.19197 by Amarnath Gupta, Subhasis Dasgupta.

**Figure 1.** Figure 1: Performance metrics across multiple parameters. Each subplot shows pairwise correlation (r) between key variables. (a) pruned vs. feasible nodes, (b) runtime vs. feasible nodes, (c) throughput vs. total unique nodes, (d) runtime vs. total unique nodes, (e) feasible vs. total unique nodes, (f) runtime vs. total nodes. algorithm can (a) reduce redundancy through node packing, (b) prune infeasible branches ea… view at source ↗

read the original abstract

Natural language (NL) interfaces to databases broaden access to heterogeneous data but often yield many ambiguous intermediate logical plans (ILPs) due to uncertain operator scope and predicate semantics. Many candidates are infeasible because of type mismatches, missing bindings, or engine-specific constraints. We address this challenge with \emph{feasibility constraints} for detecting local inconsistencies and introduce the Packed Plan Forest (PPF) a polynomially bounded structure that compactly encodes all feasible ILPs while pruning infeasible ones early. Extending packed parse forest ideas to multi-model settings, PPF supports efficient feasibility analysis through annotated operators. Formal results show polynomial size under bounded arity and annotation vocabularies, and experiments confirm that PPFs capture exponentially many ILPs with minimal overhead, establishing a scalable foundation for NL-to-DB query planning across heterogeneous systems

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces a Packed Plan Forest to compactly encode feasible plans from ambiguous NL queries in multi-model databases using local feasibility annotations.

read the letter

The main point is that this work takes packed parse forest ideas and adapts them to cross-model query planning by building a Packed Plan Forest that represents many intermediate logical plans while dropping infeasible ones early through local checks on types, bindings, and engine rules. The formal claim is that the structure stays polynomial sized when arity and annotation vocabularies are bounded, and the experiments indicate it can stand in for exponentially many plans with low overhead. That directly targets a usability bottleneck in natural language interfaces to heterogeneous data systems. The extension to multi-model settings with annotated operators is the concrete addition here. The local pruning approach is a reasonable engineering move for avoiding full enumeration. The soft spot is the completeness of those local constraints for global feasibility. In cross-model cases, operator interactions across engines can create feasibility that only appears after combining annotations, and nothing in the high-level description shows the rules are closed under that composition. If the full paper has a proof that every globally feasible plan satisfies the local checks without exception, the polynomial guarantee holds; otherwise the pruning could miss valid plans. The abstract leaves the derivation and baseline details thin, so the soundness rests on unshown steps. This is for people working on NL-to-DB systems and multi-model optimization. A reader focused on query planning under ambiguity would get value from the structure and the feasibility annotation idea. It deserves peer review to check the formal bounds and whether the experiments properly test against cases where local and global feasibility diverge.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Packed Plan Forest (PPF) as a compact, polynomially bounded structure for representing all feasible intermediate logical plans (ILPs) arising from ambiguous natural-language queries over heterogeneous data models. Feasibility constraints based on local checks (type mismatches, missing bindings, engine-specific rules) are used to prune infeasible candidates early; the PPF is claimed to extend packed-parse-forest techniques to multi-model settings. Formal results assert polynomial size under bounded arity and annotation vocabularies, while experiments are said to show that PPFs encode exponentially many ILPs with low overhead, providing a scalable foundation for NL-to-DB query planning across systems.

Significance. If the local constraints are shown to be complete for global feasibility and the size bounds are rigorously established, the PPF construction would offer a practical mechanism for managing plan ambiguity in cross-model environments. The structural innovation of packing feasible plans while supporting annotated-operator analysis could influence query optimizers for multi-model databases, provided the completeness argument holds.

major comments (2)

[§3 and Formal Results] The central claim that local feasibility constraints suffice to prune infeasible ILPs without missing any globally feasible ones is load-bearing for both the polynomial-size guarantee and the 'scalable foundation' conclusion. In §3 (Feasibility Constraints) and the formal-results section, the manuscript defines constraints via type mismatches, missing bindings, and engine-specific rules, but provides no argument or inductive proof that these local annotations are closed under cross-model operator composition. In heterogeneous settings, feasibility can emerge only after combining annotations from distinct engines; without such a completeness argument, the pruning step risks discarding valid global plans.
[§4 (Formal Results)] The polynomial-size theorem (stated in the abstract and §4) is asserted under 'bounded arity and annotation vocabularies,' yet the manuscript supplies neither the precise statement of the theorem, the bounding parameters, nor a proof sketch. Because the size bound is the primary formal contribution supporting scalability, the absence of this derivation prevents verification that the PPF remains polynomial when annotation vocabularies grow with additional engines.

minor comments (2)

[Experiments] The experimental section lacks descriptions of the datasets, the number of NL queries tested, the baseline planners compared, and the precise overhead metrics (e.g., memory vs. enumeration time). Adding these details would allow readers to assess whether the 'minimal overhead' claim is substantiated.
[§2] Notation for annotated operators and the PPF construction is introduced without a consolidated table of symbols; a small notation table would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive major comments. We address each point below and indicate the revisions we will incorporate to strengthen the formal arguments.

read point-by-point responses

Referee: [§3 and Formal Results] The central claim that local feasibility constraints suffice to prune infeasible ILPs without missing any globally feasible ones is load-bearing for both the polynomial-size guarantee and the 'scalable foundation' conclusion. In §3 (Feasibility Constraints) and the formal-results section, the manuscript defines constraints via type mismatches, missing bindings, and engine-specific rules, but provides no argument or inductive proof that these local annotations are closed under cross-model operator composition. In heterogeneous settings, feasibility can emerge only after combining annotations from distinct engines; without such a completeness argument, the pruning step risks discarding valid global plans.

Authors: We agree that an explicit completeness argument is required to justify that local pruning preserves all globally feasible plans. The manuscript defines the constraints locally in §3 but does not supply an inductive proof of closure under cross-model composition. In the revised version we will add a formal subsection to §4 that provides an inductive argument: base cases cover single operators via the listed local checks, and the inductive step shows that consistent annotations on sub-plans remain consistent after composition because type and binding constraints are independent of engine ordering in our model. revision: yes
Referee: [§4 (Formal Results)] The polynomial-size theorem (stated in the abstract and §4) is asserted under 'bounded arity and annotation vocabularies,' yet the manuscript supplies neither the precise statement of the theorem, the bounding parameters, nor a proof sketch. Because the size bound is the primary formal contribution supporting scalability, the absence of this derivation prevents verification that the PPF remains polynomial when annotation vocabularies grow with additional engines.

Authors: We concur that the theorem requires a precise statement and proof sketch for verifiability. The manuscript states the result informally under bounded arity k and fixed annotation vocabulary size |V|. We will revise §4 to include the formal theorem: the PPF has size at most O(n · |V|^k) for a query with n candidate operators, together with a short proof sketch based on dynamic programming over the packed forest nodes. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation introduces independent structure and formal bounds

full rationale

The paper defines feasibility constraints on local inconsistencies (type mismatches, missing bindings, engine rules) and introduces the Packed Plan Forest (PPF) as a polynomially bounded encoding that prunes infeasible ILPs while capturing feasible ones. Formal results on polynomial size are stated under explicit bounds on arity and annotation vocabularies, with experiments confirming coverage of exponentially many ILPs. No quoted step reduces a claimed result to a fitted parameter, self-definition, or self-citation chain; the construction and size proof are presented as independent of prior fitted quantities. The central claim therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The polynomial-size claim rests on the domain assumption of bounded arity and annotation vocabularies; PPF is introduced as a new compact encoding structure without independent evidence beyond the paper's formal results.

axioms (1)

domain assumption Arity and annotation vocabularies are bounded
Invoked to establish polynomial size of the Packed Plan Forest under the formal results.

invented entities (1)

Packed Plan Forest (PPF) no independent evidence
purpose: Compactly encode all feasible intermediate logical plans while pruning infeasible ones early
New data structure proposed to support feasibility analysis through annotated operators in multi-model settings

pith-pipeline@v0.9.0 · 5668 in / 1172 out tokens · 24983 ms · 2026-05-20T06:53:48.545338+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The Packed Plan Forest (PPF) is a polynomially bounded structure that compactly encodes all feasible ILPs while pruning infeasible ones early... Formal results show polynomial size under bounded arity and annotation vocabularies
IndisputableMonolith/Foundation/Atomicity.lean atomic_tick echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

feasibility constraints for detecting local inconsistencies... bottom-up labeling algorithm checks feasibility... infeasibility certificates

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

VLDB Journal28(5), 709–751 (2019)

Affolter, K., Stockinger, K., Bernstein, A.: A survey of natural lan- guage interfaces to databases. VLDB Journal28(5), 709–751 (2019). https://doi.org/10.1007/s00778-019-00557-9

work page doi:10.1007/s00778-019-00557-9 2019
[2]

In: International Conference on Coop- erative Information Systems

Barret, N., Ebel, S., Galizzi, T., Manolescu, I., Mohanty, M.: User-friendly explo- ration of highly heterogeneous data lakes. In: International Conference on Coop- erative Information Systems. pp. 488–496. Springer (2023)

work page 2023
[3]

In: Proceedings of the 20th Interna- tional Conference on Extending Database Technology (EDBT)

Benedikt, M., Konstantinou, N., Ley-Wild, R., Murlak, F., Vrgoc, D.: Querying with access patterns and integrity constraints. In: Proceedings of the 20th Interna- tional Conference on Extending Database Technology (EDBT). pp. 231–242 (2017)

work page 2017
[4]

arXiv preprint arXiv:1909.01822 (2019)

Dar, H.S., Lali, M.I., Din, M.U., Malik, K.M., Bukhari, S.A.C.: Frameworks for querying databases using natural language: a literature review. arXiv preprint arXiv:1909.01822 (2019)

work page arXiv 1909
[5]

In: 2016 IEEE International Conference on Big Data (Big Data)

Dasgupta, S., Coakley, K., Gupta, A.: Analytics-driven data ingestion and deriva- tion in the awesome polystore. In: 2016 IEEE International Conference on Big Data (Big Data). pp. 2555–2564. IEEE (2016)

work page 2016
[6]

Proceedings of ICDE pp

Graefe, G., McKenna, W.J.: The volcano optimizer generator: Extensibility and efficient search. Proceedings of ICDE pp. 209–218 (1993)

work page 1993
[7]

In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguis- tics (ACL)

Guo, J., Zhan, Z., Gao, Y., Xiao, Y., Lou, J.G., Liu, T., Zhang, D.: To- wards complex text-to-SQL in cross-domain databases. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguis- tics (ACL). pp. 4524–4535. ACL (2019). https://doi.org/10.18653/v1/P19-1443, https://aclanthology.org/P19-1443

work page doi:10.18653/v1/p19-1443 2019
[8]

VLDB Journal10(4), 270– 294 (2001)

Halevy, A.Y.: Answering queries using views: A survey. VLDB Journal10(4), 270– 294 (2001)

work page 2001
[9]

In: Proceedings of the 2016 International Conference on Management of Data

Kolev, B., Bondiombouy, C., Valduriez, P., Jiménez-Peris, R., Pau, R., Pereira, J.: The cloudmdsql multistore system. In: Proceedings of the 2016 International Conference on Management of Data. pp. 2113–2116 (2016)

work page 2016
[10]

VLDB Journal10(4), 270–294 (1996)

Levy, A.Y., Rajaraman, A., Ordille, J.J.: Answering queries using views: A survey. VLDB Journal10(4), 270–294 (1996)

work page 1996
[11]

Proceedings of the VLDB Endowment8(1), 73–84 (2014)

Li, F., Jagadish, H.V.: Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment8(1), 73–84 (2014). https://doi.org/10.14778/2735461.2735468

work page doi:10.14778/2735461.2735468 2014
[12]

In: Proceedings of the VLDB Endowment

Li, F., Jagadish, H.: Constructing natural language interfaces to databases. In: Proceedings of the VLDB Endowment. vol. 8, pp. 73–84 (2014)

work page 2014
[13]

Proceedings of the VLDB Endowment11(7), 819–831 (2018)

Stonebraker, M., Balazinska, M., Cetintemel, U., Cherniack, M., Zdonik, S.: Big- dawg: A polystore system for analytics on heterogeneous data. Proceedings of the VLDB Endowment11(7), 819–831 (2018)

work page 2018
[14]

In: Proceedings of the 44th ACM SIGPLAN Sympo- sium on Principles of Programming Languages (POPL)

Yaghmazadeh, N., Wang, Y., Dillig, I., Dillig, T.: SQLizer: Query synthesis from natural language. In: Proceedings of the 44th ACM SIGPLAN Sympo- sium on Principles of Programming Languages (POPL). pp. 63–76. ACM (2017). https://doi.org/10.1145/3009837.3009879

work page doi:10.1145/3009837.3009879 2017
[15]

In: Proceedings of POPL

Yaghmazadeh, N., Wang, Y., Dillig, I., Dillig, T.: Sqlizer: query synthesis from natural language. In: Proceedings of POPL. pp. 63–79 (2017)

work page 2017
[16]

In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval

Yuan, Q., Yuan, Y., Wen, Z., Wang, H., Tang, S.: An effective framework for en- hancing query answering in a heterogeneous data lake. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval. pp. 770–780 (2023)

work page 2023
[17]

In: Proceedings of ACL (2017)

Zhong, V., Xiong, C., Socher, R.: Seq2sql: Generating structured queries from natural language using reinforcement learning. In: Proceedings of ACL (2017)

work page 2017

[1] [1]

VLDB Journal28(5), 709–751 (2019)

Affolter, K., Stockinger, K., Bernstein, A.: A survey of natural lan- guage interfaces to databases. VLDB Journal28(5), 709–751 (2019). https://doi.org/10.1007/s00778-019-00557-9

work page doi:10.1007/s00778-019-00557-9 2019

[2] [2]

In: International Conference on Coop- erative Information Systems

Barret, N., Ebel, S., Galizzi, T., Manolescu, I., Mohanty, M.: User-friendly explo- ration of highly heterogeneous data lakes. In: International Conference on Coop- erative Information Systems. pp. 488–496. Springer (2023)

work page 2023

[3] [3]

In: Proceedings of the 20th Interna- tional Conference on Extending Database Technology (EDBT)

Benedikt, M., Konstantinou, N., Ley-Wild, R., Murlak, F., Vrgoc, D.: Querying with access patterns and integrity constraints. In: Proceedings of the 20th Interna- tional Conference on Extending Database Technology (EDBT). pp. 231–242 (2017)

work page 2017

[4] [4]

arXiv preprint arXiv:1909.01822 (2019)

Dar, H.S., Lali, M.I., Din, M.U., Malik, K.M., Bukhari, S.A.C.: Frameworks for querying databases using natural language: a literature review. arXiv preprint arXiv:1909.01822 (2019)

work page arXiv 1909

[5] [5]

In: 2016 IEEE International Conference on Big Data (Big Data)

Dasgupta, S., Coakley, K., Gupta, A.: Analytics-driven data ingestion and deriva- tion in the awesome polystore. In: 2016 IEEE International Conference on Big Data (Big Data). pp. 2555–2564. IEEE (2016)

work page 2016

[6] [6]

Proceedings of ICDE pp

Graefe, G., McKenna, W.J.: The volcano optimizer generator: Extensibility and efficient search. Proceedings of ICDE pp. 209–218 (1993)

work page 1993

[7] [7]

In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguis- tics (ACL)

Guo, J., Zhan, Z., Gao, Y., Xiao, Y., Lou, J.G., Liu, T., Zhang, D.: To- wards complex text-to-SQL in cross-domain databases. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguis- tics (ACL). pp. 4524–4535. ACL (2019). https://doi.org/10.18653/v1/P19-1443, https://aclanthology.org/P19-1443

work page doi:10.18653/v1/p19-1443 2019

[8] [8]

VLDB Journal10(4), 270– 294 (2001)

Halevy, A.Y.: Answering queries using views: A survey. VLDB Journal10(4), 270– 294 (2001)

work page 2001

[9] [9]

In: Proceedings of the 2016 International Conference on Management of Data

Kolev, B., Bondiombouy, C., Valduriez, P., Jiménez-Peris, R., Pau, R., Pereira, J.: The cloudmdsql multistore system. In: Proceedings of the 2016 International Conference on Management of Data. pp. 2113–2116 (2016)

work page 2016

[10] [10]

VLDB Journal10(4), 270–294 (1996)

Levy, A.Y., Rajaraman, A., Ordille, J.J.: Answering queries using views: A survey. VLDB Journal10(4), 270–294 (1996)

work page 1996

[11] [11]

Proceedings of the VLDB Endowment8(1), 73–84 (2014)

Li, F., Jagadish, H.V.: Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment8(1), 73–84 (2014). https://doi.org/10.14778/2735461.2735468

work page doi:10.14778/2735461.2735468 2014

[12] [12]

In: Proceedings of the VLDB Endowment

Li, F., Jagadish, H.: Constructing natural language interfaces to databases. In: Proceedings of the VLDB Endowment. vol. 8, pp. 73–84 (2014)

work page 2014

[13] [13]

Proceedings of the VLDB Endowment11(7), 819–831 (2018)

Stonebraker, M., Balazinska, M., Cetintemel, U., Cherniack, M., Zdonik, S.: Big- dawg: A polystore system for analytics on heterogeneous data. Proceedings of the VLDB Endowment11(7), 819–831 (2018)

work page 2018

[14] [14]

In: Proceedings of the 44th ACM SIGPLAN Sympo- sium on Principles of Programming Languages (POPL)

Yaghmazadeh, N., Wang, Y., Dillig, I., Dillig, T.: SQLizer: Query synthesis from natural language. In: Proceedings of the 44th ACM SIGPLAN Sympo- sium on Principles of Programming Languages (POPL). pp. 63–76. ACM (2017). https://doi.org/10.1145/3009837.3009879

work page doi:10.1145/3009837.3009879 2017

[15] [15]

In: Proceedings of POPL

Yaghmazadeh, N., Wang, Y., Dillig, I., Dillig, T.: Sqlizer: query synthesis from natural language. In: Proceedings of POPL. pp. 63–79 (2017)

work page 2017

[16] [16]

In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval

Yuan, Q., Yuan, Y., Wen, Z., Wang, H., Tang, S.: An effective framework for en- hancing query answering in a heterogeneous data lake. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval. pp. 770–780 (2023)

work page 2023

[17] [17]

In: Proceedings of ACL (2017)

Zhong, V., Xiong, C., Socher, R.: Seq2sql: Generating structured queries from natural language using reinforcement learning. In: Proceedings of ACL (2017)

work page 2017