pith. sign in

arxiv: 2605.19246 · v1 · pith:2LNSWDY3new · submitted 2026-05-19 · 💻 cs.DB

Example-Driven Intent Synthesis for Constrained Data Bundle Retrieval: Focused Text Snippet Extraction and Beyond

Pith reviewed 2026-05-20 03:00 UTC · model grok-4.3

classification 💻 cs.DB
keywords bundle retrievalpackage queriesexample-driven intentaggregate constraintsconstraint relaxationtext snippet extractiondata bundlescombinatorial retrieval
0
0 comments X

The pith

Users provide example bundles to let the system infer aggregate constraints and synthesize package queries for retrieving matching groups of items.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bundle retrieval requires finding sets of items that together satisfy multi-dimensional constraints, a combinatorial task that is NP-hard and hard for users to express directly in queries. The paper shows that providing a few example bundles lets the system deduce implicit aggregate constraints such as sums or averages that define the desired intent. These constraints are turned into package queries, and if they yield no results on the target data the bounds are adjusted minimally in a data-aware way to restore feasibility while staying close to the examples. The approach is demonstrated on focused text snippet extraction and tested on real datasets plus a user study. This matters because it replaces the need for precise query writing with an example-based interface that works even under changes in the underlying data distribution.

Core claim

Ex2Bundle enables users to specify their intent through example bundles and automatically synthesizes package queries that capture the intent implicit in those example bundles via aggregate constraints, while addressing infeasibility through data-aware constraint relaxation. The framework is instantiated for focused text snippet extraction by example, and experiments confirm that it improves usability and returns intent-aligned bundles even under distributional shifts of the target database.

What carries the argument

Example-driven synthesis of aggregate constraints from user-provided bundles, which are then used to form package queries with optional data-aware relaxation to handle empty results.

If this is right

  • Users can retrieve intent-aligned bundles without manually writing or tuning complex package queries.
  • Infeasible aggregate constraints are resolved automatically while keeping results close to the provided examples.
  • The same mechanism supports applications such as focused text snippet extraction and recommendation bundles.
  • Results remain aligned with user intent even when the target data distribution differs from the examples.
  • The framework reduces the user effort needed for combinatorial retrieval tasks across databases and summarization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The synthesis step could be made interactive so that users add or remove examples to iteratively tighten or loosen the inferred constraints.
  • The data-aware relaxation method might transfer to other infeasible query settings in database systems that involve aggregate conditions.
  • In settings with very few examples the inferred constraints could be validated or augmented by sampling additional bundles from the database itself.

Load-bearing premise

That aggregate constraints inferred from a small number of user-provided example bundles will be both representative of the user's true intent and amenable to minimal relaxation that preserves alignment when the original constraints are infeasible over the target data.

What would settle it

A controlled user study or experiment in which the bundles returned after relaxation are judged by participants as no longer matching the intent shown in the original examples, or where relaxation produces empty results or large deviations under a known distributional shift.

Figures

Figures reproduced from arXiv: 2605.19246 by Alexandra Meliou, Anna Fariha, Kuangfei Long, Mahmood Jasim, Matteo Brucato, Peter J. Haas, Whanhee Cho.

Figure 5
Figure 5. Figure 5: Ex2Bundle workflow: before end-user interaction, (0) a domain expert defines a quality function, which Ex2Bundle encodes as the objective. During use, (1) the user provides example bundles, from which (2) Ex2Bundle synthesizes initial constraint bounds and (3) relaxes them to ensure feasibility. A PaQL query is (4) formed using these bounds and the objective and (5) executed to retrieve a result bundle. Fo… view at source ↗
Figure 7
Figure 7. Figure 7: Slider for Demographics. Users can adjust topic emphasis from −100 (very little) to +100 (a lot). Each slider position maps to specific constraint bounds; the neutral position (0) corresponds to the feasible bounds synthesized (and relaxed) from user examples. users directly edit the upper and lower bounds of the synthesized constraints. While this offers full transparency and suits experts, it reduces usa… view at source ↗
Figure 9
Figure 9. Figure 9: For focused text snippet extraction on the [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 12
Figure 12. Figure 12: Relaxation Analysis for Ex2Bundle across varying size of examples. The number of relaxations triggered increases with the number of example sentences. 0 20 40 60 80 100 0 2 4 Time (sec) Single-Topic Random Multi-Topic 0 20 40 60 80 100 Example Size 0 20 40 60 80 100 [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Average learning (left), retrieval (center), and total times [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗
read the original abstract

Selecting a bundle of items that collectively satisfies constraints is a fundamental task across databases, recommender systems, and text summarization. Unlike traditional retrieval that returns individual or top-k items, bundle retrieval is inherently combinatorial and, in general, NP-hard. Although package queries can efficiently retrieve bundles given a well-formed query, two key user-centric challenges remain: (1) expressing and tuning multi-dimensional bundle intent through a user-friendly interface, and (2) ensuring feasibility when the query yields empty results. We introduce Ex2Bundle, an Example-driven Bundle retrieval framework that enables users to specify their intent through example bundles and automatically synthesizes package queries that capture the intent implicit in those example bundles via aggregate constraints. Ex2Bundle also addresses a challenge unique to bundle retrieval: when inferred aggregate constraints are infeasible over the target data, our data-aware constraint relaxation minimally adjusts the constraint bounds while preserving alignment with user intent. We instantiate a specific application of focused text snippet extraction by example to demonstrate the efficacy of the Ex2Bundle framework. Extensive experiments over real-world datasets and a user study demonstrate that Ex2Bundle improves usability and consistently returns intent-aligned bundles even under distributional shifts of the target database.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Ex2Bundle, an example-driven framework for bundle retrieval in databases and related domains. Users specify intent via example bundles; the system infers aggregate constraints (sum, count, average bounds) to synthesize package queries. A data-aware relaxation step adjusts infeasible constraints while aiming to preserve intent alignment. The framework is instantiated for focused text snippet extraction, with claims of improved usability, intent-aligned results, and robustness under distributional shifts supported by experiments on real-world datasets and a user study.

Significance. If the central claims hold, the work offers a practical advance in user-centric combinatorial retrieval by replacing explicit multi-dimensional query writing with example-based specification. The data-aware relaxation mechanism addresses a common pain point in package queries. Credit is due for the concrete application to text snippet extraction and the inclusion of a user study alongside quantitative experiments.

major comments (2)
  1. [§4] §4 (Constraint Synthesis): the procedure for inferring aggregate constraints from a small set of example bundles is described at a high level but lacks an explicit algorithm, optimization formulation, or pseudocode. This is load-bearing for the claim that the synthesized constraints are representative of latent user intent; without it, reproducibility and stability under example variation cannot be assessed.
  2. [§6.2, Table 2] §6.2 and Table 2 (Distributional shift experiments): the reported robustness to distributional shifts is presented as an empirical outcome, yet no sensitivity analysis quantifies how constraint bounds or retrieved bundles change when the provided examples are perturbed or replaced by equally plausible alternatives. This directly affects the weakest assumption that small example sets yield stable, intent-aligned constraints.
minor comments (2)
  1. [§3] Notation for aggregate constraint templates (e.g., how bounds are extracted from examples) is introduced without a dedicated table or running example that walks through the full pipeline from bundle to query.
  2. [§7] The user study description would benefit from explicit sample size, task design, and statistical significance tests for the usability claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Constraint Synthesis): the procedure for inferring aggregate constraints from a small set of example bundles is described at a high level but lacks an explicit algorithm, optimization formulation, or pseudocode. This is load-bearing for the claim that the synthesized constraints are representative of latent user intent; without it, reproducibility and stability under example variation cannot be assessed.

    Authors: We agree that an explicit formulation is necessary for reproducibility. In the revised manuscript we will augment Section 4 with a formal optimization problem that computes aggregate constraint bounds (sum, count, and average) from the provided example bundles, together with pseudocode for the inference procedure. This addition will make the mapping from examples to constraints fully transparent and allow direct assessment of stability under example variation. revision: yes

  2. Referee: [§6.2, Table 2] §6.2 and Table 2 (Distributional shift experiments): the reported robustness to distributional shifts is presented as an empirical outcome, yet no sensitivity analysis quantifies how constraint bounds or retrieved bundles change when the provided examples are perturbed or replaced by equally plausible alternatives. This directly affects the weakest assumption that small example sets yield stable, intent-aligned constraints.

    Authors: We acknowledge that a dedicated sensitivity analysis would strengthen the robustness claim. We will add new experiments that systematically perturb the example bundles (by replacing individual examples with plausible alternatives or introducing small controlled variations) and measure the resulting variation in synthesized bounds and retrieved bundles. These results will be reported in an expanded Section 6.2 or an accompanying appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained without reductions to inputs by construction

full rationale

The paper introduces Ex2Bundle as a new synthesis framework that infers aggregate constraints from user examples and applies data-aware relaxation for infeasibility. No equations, derivations, or formal proofs are referenced in the provided abstract or description that would reduce any prediction or result to fitted parameters or self-referential definitions. Claims rest on the novelty of the example-driven approach and empirical validation via experiments and user study, with no load-bearing self-citations or ansatzes imported from prior author work. The inference of constraints from examples is presented as a core algorithmic contribution rather than a tautological fit or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the framework rests on the assumption that user intent can be captured via aggregate constraints derived from examples and that minimal relaxation preserves intent; no explicit free parameters or invented entities are detailed.

axioms (1)
  • domain assumption Bundle retrieval is inherently combinatorial and NP-hard in general
    Stated directly in the abstract as background for the problem.
invented entities (1)
  • Ex2Bundle framework no independent evidence
    purpose: To synthesize package queries from example bundles and relax infeasible constraints
    Newly introduced system described in the abstract; no independent evidence provided outside the paper.

pith-pipeline@v0.9.0 · 5765 in / 1335 out tokens · 51349 ms · 2026-05-20T03:00:44.401756+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 1 internal anchor

  1. [1]

    Discover Weekly keeps giving me the same genre that I’m completely sick of

    2020. Discover Weekly keeps giving me the same genre that I’m completely sick of. https://community.spotify.com/t5/Your-Library/Discover-Weekly-keeps- giving-me-the-same-genre-that-I-m/td-p/5065068

  2. [2]

    LangChain

    2026. LangChain. https://github.com/langchain-ai/langchain Accessed: 2026- 01-07

  3. [3]

    Recommendations: Figuring out how to bring unique joy to each member

    2026. Recommendations: Figuring out how to bring unique joy to each member. https://research.netflix.com/research-area/recommendations

  4. [4]

    Jinze Bai, Chang Zhou, Junshuai Song, Xiaoru Qu, Weiting An, Zhao Li, and Jun Gao. 2019. Personalized Bundle List Recommendation. InThe World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019. ACM, 60–71

  5. [5]

    Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, and Elisabetta Fersini. 2021. Cross-lingual Contextualized Topic Models with Zero-shot Learn- ing. InProceedings of the 16th Conference of the European Chapter of the Associa- tion for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 1676–1683

  6. [6]

    Blei, Andrew Y

    David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation.J. Mach. Learn. Res.3 (2003), 993–1022

  7. [7]

    Jay Bonggolto. 2025. Google is testing a new ‘Daily Discover’ feed on YouTube Music. https://tech.yahoo.com/streaming/articles/google-testing-daily-discover- feed-170300058.html

  8. [8]

    Angela Bonifati, Radu Ciucanu, and Slawek Staworko. 2016. Learning Join Queries from User Examples.ACM Trans. Database Syst.40, 4 (2016), 24:1–24:38

  9. [9]

    2012.Thematic analysis.American Psycho- logical Association

    Virginia Braun and Victoria Clarke. 2012.Thematic analysis.American Psycho- logical Association

  10. [10]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

  11. [11]

    Matteo Brucato, Juan Felipe Beltran, Azza Abouzied, and Alexandra Meliou. 2016. Scalable Package Queries in Relational Database Systems.Proc. VLDB Endow.9, 7 (2016), 576–587

  12. [12]

    Jianxin Chang, Chen Gao, Xiangnan He, Depeng Jin, and Yong Li. 2023. Bundle Recommendation and Generation With Graph Neural Networks.IEEE Trans. Knowl. Data Eng.35, 3 (2023), 2326–2340

  13. [13]

    Whanhee Cho and Anna Fariha. 2026. Data-Semantics-Aware Recommendation of Diverse Pivot Tables.Proc. ACM Manag. Data4, 1, Article 23 (April 2026), 28 pages

  14. [14]

    Haas, and Anna Fariha

    Whanhee Cho, Kuangfei Long, Mahmood Jasim, Matteo Brucato, Alexandra Meliou, Peter J. Haas, and Anna Fariha. 2026. Example-Driven Intent Synthesis for Constrained Data Bundle Retrieval: Focused Text Snippet Extraction and Beyond (Technical Report).Preprint(2026). https://users.cs.utah.edu/~afariha/ ex2bundle_tech_rep.pdf

  15. [15]

    John Joon Young Chung and Eytan Adar. 2023. PromptPaint: Steering Text-to- Image Generation Through Paint Medium-like Interactions. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 6, 17 pages

  16. [16]

    Matthew JC Crump, John V McDonnell, and Todd M Gureckis. 2013. Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research.PloS One8, 3 (2013), e57410

  17. [17]

    Daniel Deutch and Amir Gilad. 2016. QPlain: Query by explanation. In32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016. IEEE Computer Society, 1358–1361

  18. [18]

    Heejin Do, Sangwon Ryu, Jonghwi Kim, and Gary Lee. 2025. Multi-Facet Blending for Faceted Query-by-Example Retrieval. InACL. 28577–28590

  19. [19]

    Marina Drosou and Evaggelia Pitoura. 2012. DisC diversity: result diversification based on dissimilarity and coverage.Proc. VLDB Endow.6, 1 (2012), 13–24

  20. [20]

    Shih Hsin Fang, Eric Hsueh-Chan Lu, and Vincent S. Tseng. 2014. Trip Recom- mendation with Multiple User Constraints by Integrating Point-of-Interests and Travel Packages. InIEEE 15th International Conference on Mobile Data Manage- ment, MDM 2014, Brisbane, Australia, July 14-18, 2014 - Volume 1. IEEE Computer Society, 33–42

  21. [21]

    Haas, and Alexandra Meliou

    Anna Fariha, Matteo Brucato, Peter J. Haas, and Alexandra Meliou. 2020. SuDocu: Summarizing Documents by Example.Proc. VLDB Endow.13, 12 (2020), 2861– 2864

  22. [22]

    Anna Fariha, Lucy Cousins, Narges Mahyar, and Alexandra Meliou. 2026. Example-driven semantic-similarity-aware query intent discovery: Empowering users to cross the SQL barrier through query by example.Inf. Syst.138 (2026), 102687

  23. [23]

    Anna Fariha and Alexandra Meliou. 2019. Example-Driven Query Intent Discov- ery: Abductive Reasoning using Semantic Similarity.Proc. VLDB Endow.12, 11 (2019), 1262–1275

  24. [24]

    Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, and David Bau. 2024. Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XL (Lecture Notes in Computer Science). Springer, 172–188

  25. [25]

    Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation.Proc. VLDB Endow.17, 5 (2024), 1132–1145

  26. [26]

    Jennifer C Greene, Valerie J Caracelli, and Wendy F Graham. 1989. Toward a conceptual framework for mixed-method evaluation designs.Educational evaluation and policy analysis11, 3 (1989), 255–274

  27. [27]

    Nianlong Gu, Elliott Ash, and Richard H. R. Hahnloser. 2022. MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes. (2022), 6507–6522. https://doi.org/10.18653/V1/2022.ACL-LONG.450

  28. [28]

    Qi Gu, Jian Cao, and Yancen Liu. 2022. CSBR: A Compositional Semantics-Based Service Bundle Recommendation Approach for Mashup Development.IEEE Trans. Serv. Comput.15, 6 (2022), 3170–3183

  29. [29]

    Sumit Gulwani. 2011. Automating string processing in spreadsheets using input- output examples. InProceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011. ACM, 317–330

  30. [30]

    Sumit Gulwani and Prateek Jain. 2017. Programming by Examples: PL Meets ML. InProgramming Languages and Systems - 15th Asian Symposium, APLAS 2017, Suzhou, China, November 27-29, 2017, Proceedings (Lecture Notes in Computer Science), Vol. 10695. Springer, 3–20

  31. [31]

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang

  32. [32]

    InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research), Vol

    Retrieval Augmented Language Model Pre-Training. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research), Vol. 119. PMLR, 3929–3938

  33. [33]

    Karl Moritz Hermann, Tomás Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. InNeurIPS. 1693–1701

  34. [34]

    Clemens Heuberger. 2004. Inverse Combinatorial Optimization: A Survey on Problems, Methods, and Results.Journal of Combinatorial Optimization8, 3 (2004), 329–361

  35. [35]

    Po Hu, Donghong Ji, Chong Teng, and Yujing Guo. 2012. Context-Enhanced Personalized Social Summarization. InProceedings of COLING 2012. The COLING 2012 Organizing Committee, Mumbai, India, 1223–1238

  36. [36]

    Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, and Lu Wang. 2021. Efficient Attentions for Long Document Summarization. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 1419–1436

  37. [37]

    Wanderley, and Matthew Paradis

    Andy Hunt, Marcelo M. Wanderley, and Matthew Paradis. 2002. The Importance of Parameter Mapping in Electronic Instrument Design. InNew Interfaces for Musical Expression, NIME-02, Proceedings, Dublin, Ireland, May 24-26, 2002. Media Lab Europe, 149–154

  38. [38]

    IBM ILOG CPLEX Optimization Studio

    IBM ILOG CPLEX Optimization Studio [n.d.]. IBM ILOG CPLEX Optimization Studio. https://www.ibm.com/docs/en/icos

  39. [39]

    Stratos Idreos, Olga Papaemmanouil, and Surajit Chaudhuri. 2015. Overview of Data Exploration Techniques. InSIGMOD. ACM, 277–281

  40. [40]

    Hal Daumé III and Daniel Marcu. 2006. Bayesian Query-Focused Summarization. InACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006, Nicoletta Calzolari, Claire Cardie, and Pierre Isabelle (Eds.). The A...

  41. [41]

    Rahul Jain, Amit Goel, Koichiro Niinuma, and Aakar Gupta. 2025. AdaptiveSliders: User-aligned Semantic Slider-based Editing of Text-to-Image Model Output. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 541, 27 pages

  42. [42]

    Chris Kedzie, Kathleen McKeown, and Hal Daumé III. 2018. Content Selection in Deep Learning Models of Summarization. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 1818–1828

  43. [43]

    Shahedul Huq Khandkar. 2009. Open coding.University of Calgary23, 2009 (2009)

  44. [44]

    Akash Khatri, Mahathir Mohammad, and El Kindi Rezig. 2025. Sort it Like You Mean It: Discovering Semantically Interesting Attribute Augmentations to Sort Tables.Proc. VLDB Endow.18, 12 (2025), 5427–5430. https://doi.org/10.14778/ 3750601.3750688

  45. [45]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim 13 Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InNeurIPS

  46. [46]

    Yanran Li and Sujian Li. 2014. Query-focused Multi-Document Summariza- tion: Combining a Topic Model with Graph-based Semi-supervised Learning. InProceedings of COLING 2014, the 25th International Conference on Computa- tional Linguistics: Technical Papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, 1197–1207

  47. [47]

    Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. InText summarization branches out. 74–81

  48. [48]

    Marina Litvak and Natalia Vanetik. 2017. Query-based summarization using MDL principle. InProceedings of the multiling 2017 workshop on summarization and summary evaluation across source types and genres. 22–31

  49. [49]

    2022.LlamaIndex

    Jerry Liu. 2022.LlamaIndex. https://github.com/jerryjliu/llama_index

  50. [50]

    Xiaohao Liu, Jie Wu, Zhulin Tao, Yunshan Ma, Yinwei Wei, and Tat-Seng Chua

  51. [51]

    In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, KDD 2025, Toronto, ON, Canada, August 3-7, 2025

    Fine-tuning Multimodal Large Language Models for Product Bundling. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, KDD 2025, Toronto, ON, Canada, August 3-7, 2025. ACM, 848–858

  52. [52]

    Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. (2019), 3728–3738. https://doi.org/10.18653/V1/D19-1387

  53. [53]

    Xuan Lu, Sifan Liu, Bochao Yin, Yongqi Li, Xinghao Chen, Hui Su, Yaohui Jin, Wenjun Zeng, and Xiaoyu Shen. 2025. MultiConIR: Towards Multi-Condition Information Retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2025. Association for Computational Linguistics, Suzhou, China, 13471– 13494

  54. [55]

    Gang Luo. 2006. Efficient Detection of Empty-Result Queries. InProceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006. ACM, 1015–1025

  55. [56]

    Mai, Pengyu Wang, Azza Abouzied, Matteo Brucato, Peter J

    Anh L. Mai, Pengyu Wang, Azza Abouzied, Matteo Brucato, Peter J. Haas, and Alexandra Meliou. 2024. Scaling Package Queries to a Billion Tuples via Hier- archical Partitioning and Customized Optimization.Proc. VLDB Endow.17, 5 (2024), 1146–1158

  56. [57]

    Mir Mahathir Mohammad and El Kindi Rezig. 2026. Qualitative Join Discovery in Data Lakes using Examples.Proc. ACM Manag. Data4, 1, Article 68 (April 2026), 28 pages. https://doi.org/10.1145/3786682

  57. [58]

    Sheshera Mysore, Arman Cohan, and Tom Hope. 2022. Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity. InPro- ceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza ...

  58. [59]

    Arnab Nandi and H. V. Jagadish. 2011. Guided Interaction: Rethinking the Query-Result Paradigm.Proc. VLDB Endow.4, 12 (2011), 1466–1469

  59. [60]

    Ani Nenkova, Sameer Maskey, and Yang Liu. 2011. Automatic Summarization. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA - Tutorial Abstracts. The Association for Computer Linguistics, 3

  60. [61]

    OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

  61. [62]

    Apurva Pathak, Kshitiz Gupta, and Julian J. McAuley. 2017. Generating and Personalizing Bundle Recommendations onSteam. InProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. ACM, 1073–1076

  62. [63]

    Fotis Psallidas, Bolin Ding, Kaushik Chakrabarti, and Surajit Chaudhuri. 2015. S4: Top-k Spreadsheet-Style Search for Query Discovery. InSIGMOD. ACM, 2001–2016

  63. [64]

    El Kindi Rezig, Anshul Bhandari, Anna Fariha, Benjamin Price, Allan Vanterpool, Vijay Gadepally, and Michael Stonebraker. 2021. DICE: Data Discovery by Example.Proc. VLDB Endow.14, 12 (2021), 2819–2822

  64. [65]

    Thibault Sellam and Martin L. Kersten. 2013. Meet Charles, big data query advisor. InSixth Biennial Conference on Innovative Data Systems Research, CIDR 2013, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings. www.cidrdb.org

  65. [66]

    Battersby, Melanie Tory, Rich Gossweiler, and Angel X

    Vidya Setlur, Sarah E. Battersby, Melanie Tory, Rich Gossweiler, and Angel X. Chang. 2016. Eviza: A Natural Language Interface for Visual Analysis. InPro- ceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST 2016, Tokyo, Japan, October 16-19, 2016. ACM, 365–377

  66. [67]

    Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik. 2014. Discovering queries based on example tuples. InSIGMOD. ACM, 493–504

  67. [68]

    Yunxiao Shi, Haoning Shang, Xing Zi, Wujiang Xu, Yue Feng, and Min Xu

  68. [69]

    InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

    Answering Narrative-Driven Recommendation Queries via a Retrieve– Rank Paradigm and the OCG-Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 13181–13202

  69. [70]

    Aviv Slobodkin, Niv Nachum, Shmuel Amar, Ori Shapira, and Ido Dagan. 2023. SummHelper: Collaborative Human-Computer Summarization. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - System Demonstrations, Singapore, December 6-10, 2023. Association for Computational Linguistics, 554–565

  70. [71]

    Spotify Advertising Team. 2020. Five years of discovery and engagement through Discover Weekly. https://ads.spotify.com/en-US/news-and-insights/five-years- of-discovery-and-engagement-through-discover-weekly/. Accessed: 2026-03- 17

  71. [72]

    Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, and Alexander M. Rush. 2023. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models.IEEE Trans. Vis. Comput. Graph.29, 1 (2023), 1146–1156

  72. [73]

    Wenqi Sun, Ruobing Xie, Junjie Zhang, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. 2023. Generative Next-Basket Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023, Singapore, Singapore, September 18-22, 2023. ACM, 737–743

  73. [74]

    Hiroaki Takatsu, Takahiro Kashikawa, Koichi Kimura, Ryota Ando, and Yoichi Matsuyama. 2021. Personalized Extractive Summarization Using an Ising Ma- chine Towards Real-time Generation of Efficient and Coherent Dialogue Sce- narios. InProceedings of the 3rd Workshop on Natural Language Processing for Conversational AI. Association for Computational Linguis...

  74. [75]

    Nandan Thakur, Nils Reimers, Johannes Daxenberger, and Iryna Gurevych. 2021. Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational...

  75. [76]

    Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual

  76. [77]

    Thurstone

    Louis L. Thurstone. 1974.A Law of Comparative Judgment(1st ed.). Routledge. 12 pages

  77. [78]

    [n.d.].TPC-H: Decision Support Benchmark

    Transaction Processing Performance Council (TPC). [n.d.].TPC-H: Decision Support Benchmark. Technical Report. TPC. http://www.tpc.org/tpch/

  78. [79]

    Joseph Tso, Preston Schmittou, Quan Huynh, and Jibran Hutchins. 2026. Con- straintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization. arXiv preprint arXiv:2602.22465(2026)

  79. [80]

    Jianyou Wang, Kaicheng Wang, Xiaoyue Wang, Prudhviraj Naidu, Leon Bergen, and Ramamohan Paturi. 2023. DORIS-MAE: scientific document retrieval using multi-level aspect-based queries. , Article 1668 (2023), 16 pages

  80. [81]

    Yue Wang, Alexandra Meliou, and Gerome Miklau. 2018. RC-Index: Diversifying Answers to Range Queries.Proc. VLDB Endow.11, 7 (2018), 773–786

Showing first 80 references.