Example-Driven Intent Synthesis for Constrained Data Bundle Retrieval: Focused Text Snippet Extraction and Beyond

Alexandra Meliou; Anna Fariha; Kuangfei Long; Mahmood Jasim; Matteo Brucato; Peter J. Haas; Whanhee Cho

arxiv: 2605.19246 · v1 · pith:2LNSWDY3new · submitted 2026-05-19 · 💻 cs.DB

Example-Driven Intent Synthesis for Constrained Data Bundle Retrieval: Focused Text Snippet Extraction and Beyond

Whanhee Cho , Kuangfei Long , Mahmood Jasim , Matteo Brucato , Alexandra Meliou , Peter J. Haas , Anna Fariha This is my paper

Pith reviewed 2026-05-20 03:00 UTC · model grok-4.3

classification 💻 cs.DB

keywords bundle retrievalpackage queriesexample-driven intentaggregate constraintsconstraint relaxationtext snippet extractiondata bundlescombinatorial retrieval

0 comments

The pith

Users provide example bundles to let the system infer aggregate constraints and synthesize package queries for retrieving matching groups of items.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bundle retrieval requires finding sets of items that together satisfy multi-dimensional constraints, a combinatorial task that is NP-hard and hard for users to express directly in queries. The paper shows that providing a few example bundles lets the system deduce implicit aggregate constraints such as sums or averages that define the desired intent. These constraints are turned into package queries, and if they yield no results on the target data the bounds are adjusted minimally in a data-aware way to restore feasibility while staying close to the examples. The approach is demonstrated on focused text snippet extraction and tested on real datasets plus a user study. This matters because it replaces the need for precise query writing with an example-based interface that works even under changes in the underlying data distribution.

Core claim

Ex2Bundle enables users to specify their intent through example bundles and automatically synthesizes package queries that capture the intent implicit in those example bundles via aggregate constraints, while addressing infeasibility through data-aware constraint relaxation. The framework is instantiated for focused text snippet extraction by example, and experiments confirm that it improves usability and returns intent-aligned bundles even under distributional shifts of the target database.

What carries the argument

Example-driven synthesis of aggregate constraints from user-provided bundles, which are then used to form package queries with optional data-aware relaxation to handle empty results.

If this is right

Users can retrieve intent-aligned bundles without manually writing or tuning complex package queries.
Infeasible aggregate constraints are resolved automatically while keeping results close to the provided examples.
The same mechanism supports applications such as focused text snippet extraction and recommendation bundles.
Results remain aligned with user intent even when the target data distribution differs from the examples.
The framework reduces the user effort needed for combinatorial retrieval tasks across databases and summarization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The synthesis step could be made interactive so that users add or remove examples to iteratively tighten or loosen the inferred constraints.
The data-aware relaxation method might transfer to other infeasible query settings in database systems that involve aggregate conditions.
In settings with very few examples the inferred constraints could be validated or augmented by sampling additional bundles from the database itself.

Load-bearing premise

That aggregate constraints inferred from a small number of user-provided example bundles will be both representative of the user's true intent and amenable to minimal relaxation that preserves alignment when the original constraints are infeasible over the target data.

What would settle it

A controlled user study or experiment in which the bundles returned after relaxation are judged by participants as no longer matching the intent shown in the original examples, or where relaxation produces empty results or large deviations under a known distributional shift.

Figures

Figures reproduced from arXiv: 2605.19246 by Alexandra Meliou, Anna Fariha, Kuangfei Long, Mahmood Jasim, Matteo Brucato, Peter J. Haas, Whanhee Cho.

**Figure 5.** Figure 5: Ex2Bundle workflow: before end-user interaction, (0) a domain expert defines a quality function, which Ex2Bundle encodes as the objective. During use, (1) the user provides example bundles, from which (2) Ex2Bundle synthesizes initial constraint bounds and (3) relaxes them to ensure feasibility. A PaQL query is (4) formed using these bounds and the objective and (5) executed to retrieve a result bundle. Fo… view at source ↗

**Figure 7.** Figure 7: Slider for Demographics. Users can adjust topic emphasis from −100 (very little) to +100 (a lot). Each slider position maps to specific constraint bounds; the neutral position (0) corresponds to the feasible bounds synthesized (and relaxed) from user examples. users directly edit the upper and lower bounds of the synthesized constraints. While this offers full transparency and suits experts, it reduces usa… view at source ↗

**Figure 9.** Figure 9: For focused text snippet extraction on the [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 12.** Figure 12: Relaxation Analysis for Ex2Bundle across varying size of examples. The number of relaxations triggered increases with the number of example sentences. 0 20 40 60 80 100 0 2 4 Time (sec) Single-Topic Random Multi-Topic 0 20 40 60 80 100 Example Size 0 20 40 60 80 100 [PITH_FULL_IMAGE:figures/full_fig_p011_12.png] view at source ↗

**Figure 13.** Figure 13: Average learning (left), retrieval (center), and total times [PITH_FULL_IMAGE:figures/full_fig_p011_13.png] view at source ↗

read the original abstract

Selecting a bundle of items that collectively satisfies constraints is a fundamental task across databases, recommender systems, and text summarization. Unlike traditional retrieval that returns individual or top-k items, bundle retrieval is inherently combinatorial and, in general, NP-hard. Although package queries can efficiently retrieve bundles given a well-formed query, two key user-centric challenges remain: (1) expressing and tuning multi-dimensional bundle intent through a user-friendly interface, and (2) ensuring feasibility when the query yields empty results. We introduce Ex2Bundle, an Example-driven Bundle retrieval framework that enables users to specify their intent through example bundles and automatically synthesizes package queries that capture the intent implicit in those example bundles via aggregate constraints. Ex2Bundle also addresses a challenge unique to bundle retrieval: when inferred aggregate constraints are infeasible over the target data, our data-aware constraint relaxation minimally adjusts the constraint bounds while preserving alignment with user intent. We instantiate a specific application of focused text snippet extraction by example to demonstrate the efficacy of the Ex2Bundle framework. Extensive experiments over real-world datasets and a user study demonstrate that Ex2Bundle improves usability and consistently returns intent-aligned bundles even under distributional shifts of the target database.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Ex2Bundle makes bundle retrieval more accessible via examples but the constraint inference could use more robustness checks.

read the letter

Hey, the thing to know about this paper is that it tries to make bundle retrieval easier by letting users show a few examples of what they want instead of crafting a full query with constraints. Then it infers the aggregates and relaxes them if needed to get results. What's new here is that specific framing: example-driven synthesis for package queries plus the data-aware relaxation. The prior package query stuff is there, but this user interface angle with examples and the relaxation for infeasibility feels like a fresh take. They show it on text snippet extraction, which makes sense for the domain. The paper handles the practical aspects okay. They ran experiments on real data and did a user study, which suggests it helps with usability and keeps things aligned even when data changes a bit. Where it could be stronger is on the inference part. The stress test is right to flag that inferring constraints from small example sets might not be stable. There's no mention of formal guarantees or tests showing what happens if you swap out one example for another similar one. That could be a real issue for reliability, and the experiments probably don't cover enough variation to prove it out. This is the kind of paper for people in databases who care about making query systems more user-friendly, or those in text and recsys applications. Someone building tools for complex data selection would pick up useful ideas from the framework. It has enough going on to go to a serious referee. I'd say send it for review. The contribution is practical and the evaluation is there, but expect comments on making the robustness clearer.

Referee Report

2 major / 2 minor

Summary. The paper introduces Ex2Bundle, an example-driven framework for bundle retrieval in databases and related domains. Users specify intent via example bundles; the system infers aggregate constraints (sum, count, average bounds) to synthesize package queries. A data-aware relaxation step adjusts infeasible constraints while aiming to preserve intent alignment. The framework is instantiated for focused text snippet extraction, with claims of improved usability, intent-aligned results, and robustness under distributional shifts supported by experiments on real-world datasets and a user study.

Significance. If the central claims hold, the work offers a practical advance in user-centric combinatorial retrieval by replacing explicit multi-dimensional query writing with example-based specification. The data-aware relaxation mechanism addresses a common pain point in package queries. Credit is due for the concrete application to text snippet extraction and the inclusion of a user study alongside quantitative experiments.

major comments (2)

[§4] §4 (Constraint Synthesis): the procedure for inferring aggregate constraints from a small set of example bundles is described at a high level but lacks an explicit algorithm, optimization formulation, or pseudocode. This is load-bearing for the claim that the synthesized constraints are representative of latent user intent; without it, reproducibility and stability under example variation cannot be assessed.
[§6.2, Table 2] §6.2 and Table 2 (Distributional shift experiments): the reported robustness to distributional shifts is presented as an empirical outcome, yet no sensitivity analysis quantifies how constraint bounds or retrieved bundles change when the provided examples are perturbed or replaced by equally plausible alternatives. This directly affects the weakest assumption that small example sets yield stable, intent-aligned constraints.

minor comments (2)

[§3] Notation for aggregate constraint templates (e.g., how bounds are extracted from examples) is introduced without a dedicated table or running example that walks through the full pipeline from bundle to query.
[§7] The user study description would benefit from explicit sample size, task design, and statistical significance tests for the usability claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below and describe the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [§4] §4 (Constraint Synthesis): the procedure for inferring aggregate constraints from a small set of example bundles is described at a high level but lacks an explicit algorithm, optimization formulation, or pseudocode. This is load-bearing for the claim that the synthesized constraints are representative of latent user intent; without it, reproducibility and stability under example variation cannot be assessed.

Authors: We agree that an explicit formulation is necessary for reproducibility. In the revised manuscript we will augment Section 4 with a formal optimization problem that computes aggregate constraint bounds (sum, count, and average) from the provided example bundles, together with pseudocode for the inference procedure. This addition will make the mapping from examples to constraints fully transparent and allow direct assessment of stability under example variation. revision: yes
Referee: [§6.2, Table 2] §6.2 and Table 2 (Distributional shift experiments): the reported robustness to distributional shifts is presented as an empirical outcome, yet no sensitivity analysis quantifies how constraint bounds or retrieved bundles change when the provided examples are perturbed or replaced by equally plausible alternatives. This directly affects the weakest assumption that small example sets yield stable, intent-aligned constraints.

Authors: We acknowledge that a dedicated sensitivity analysis would strengthen the robustness claim. We will add new experiments that systematically perturb the example bundles (by replacing individual examples with plausible alternatives or introducing small controlled variations) and measure the resulting variation in synthesized bounds and retrieved bundles. These results will be reported in an expanded Section 6.2 or an accompanying appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation chain is self-contained without reductions to inputs by construction

full rationale

The paper introduces Ex2Bundle as a new synthesis framework that infers aggregate constraints from user examples and applies data-aware relaxation for infeasibility. No equations, derivations, or formal proofs are referenced in the provided abstract or description that would reduce any prediction or result to fitted parameters or self-referential definitions. Claims rest on the novelty of the example-driven approach and empirical validation via experiments and user study, with no load-bearing self-citations or ansatzes imported from prior author work. The inference of constraints from examples is presented as a core algorithmic contribution rather than a tautological fit or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the framework rests on the assumption that user intent can be captured via aggregate constraints derived from examples and that minimal relaxation preserves intent; no explicit free parameters or invented entities are detailed.

axioms (1)

domain assumption Bundle retrieval is inherently combinatorial and NP-hard in general
Stated directly in the abstract as background for the problem.

invented entities (1)

Ex2Bundle framework no independent evidence
purpose: To synthesize package queries from example bundles and relax infeasible constraints
Newly introduced system described in the abstract; no independent evidence provided outside the paper.

pith-pipeline@v0.9.0 · 5765 in / 1335 out tokens · 51349 ms · 2026-05-20T03:00:44.401756+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We derive the initial constraint bounds Θinit by taking the topic-wise minimum and maximum of the feature profiles of bundles in E: lbj = min Fj(Ei), ubj = max Fj(Ei). ... Bound relaxation algorithm ... while PQ(Θ, Tq) = ∅ ... lbj ← max(lbj − ρ·ε, 0), ubj ← min(ubj + ρ·ε, Fj(Tq))
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Ex2Bundle ... consistently returns intent-aligned bundles even under distributional shifts

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

91 extracted references · 91 canonical work pages · 1 internal anchor

[1]

Discover Weekly keeps giving me the same genre that I’m completely sick of

2020. Discover Weekly keeps giving me the same genre that I’m completely sick of. https://community.spotify.com/t5/Your-Library/Discover-Weekly-keeps- giving-me-the-same-genre-that-I-m/td-p/5065068

work page arXiv 2020
[2]

LangChain

2026. LangChain. https://github.com/langchain-ai/langchain Accessed: 2026- 01-07

work page 2026
[3]

Recommendations: Figuring out how to bring unique joy to each member

2026. Recommendations: Figuring out how to bring unique joy to each member. https://research.netflix.com/research-area/recommendations

work page 2026
[4]

Jinze Bai, Chang Zhou, Junshuai Song, Xiaoru Qu, Weiting An, Zhao Li, and Jun Gao. 2019. Personalized Bundle List Recommendation. InThe World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019. ACM, 60–71

work page 2019
[5]

Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, and Elisabetta Fersini. 2021. Cross-lingual Contextualized Topic Models with Zero-shot Learn- ing. InProceedings of the 16th Conference of the European Chapter of the Associa- tion for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 1676–1683

work page 2021
[6]

Blei, Andrew Y

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation.J. Mach. Learn. Res.3 (2003), 993–1022

work page 2003
[7]

Jay Bonggolto. 2025. Google is testing a new ‘Daily Discover’ feed on YouTube Music. https://tech.yahoo.com/streaming/articles/google-testing-daily-discover- feed-170300058.html

work page 2025
[8]

Angela Bonifati, Radu Ciucanu, and Slawek Staworko. 2016. Learning Join Queries from User Examples.ACM Trans. Database Syst.40, 4 (2016), 24:1–24:38

work page 2016
[9]

2012.Thematic analysis.American Psycho- logical Association

Virginia Braun and Victoria Clarke. 2012.Thematic analysis.American Psycho- logical Association

work page 2012
[10]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page 2020
[11]

Matteo Brucato, Juan Felipe Beltran, Azza Abouzied, and Alexandra Meliou. 2016. Scalable Package Queries in Relational Database Systems.Proc. VLDB Endow.9, 7 (2016), 576–587

work page 2016
[12]

Jianxin Chang, Chen Gao, Xiangnan He, Depeng Jin, and Yong Li. 2023. Bundle Recommendation and Generation With Graph Neural Networks.IEEE Trans. Knowl. Data Eng.35, 3 (2023), 2326–2340

work page 2023
[13]

Whanhee Cho and Anna Fariha. 2026. Data-Semantics-Aware Recommendation of Diverse Pivot Tables.Proc. ACM Manag. Data4, 1, Article 23 (April 2026), 28 pages

work page 2026
[14]

Haas, and Anna Fariha

Whanhee Cho, Kuangfei Long, Mahmood Jasim, Matteo Brucato, Alexandra Meliou, Peter J. Haas, and Anna Fariha. 2026. Example-Driven Intent Synthesis for Constrained Data Bundle Retrieval: Focused Text Snippet Extraction and Beyond (Technical Report).Preprint(2026). https://users.cs.utah.edu/~afariha/ ex2bundle_tech_rep.pdf

work page 2026
[15]

John Joon Young Chung and Eytan Adar. 2023. PromptPaint: Steering Text-to- Image Generation Through Paint Medium-like Interactions. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 6, 17 pages

work page 2023
[16]

Matthew JC Crump, John V McDonnell, and Todd M Gureckis. 2013. Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research.PloS One8, 3 (2013), e57410

work page 2013
[17]

Daniel Deutch and Amir Gilad. 2016. QPlain: Query by explanation. In32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016. IEEE Computer Society, 1358–1361

work page 2016
[18]

Heejin Do, Sangwon Ryu, Jonghwi Kim, and Gary Lee. 2025. Multi-Facet Blending for Faceted Query-by-Example Retrieval. InACL. 28577–28590

work page 2025
[19]

Marina Drosou and Evaggelia Pitoura. 2012. DisC diversity: result diversification based on dissimilarity and coverage.Proc. VLDB Endow.6, 1 (2012), 13–24

work page 2012
[20]

Shih Hsin Fang, Eric Hsueh-Chan Lu, and Vincent S. Tseng. 2014. Trip Recom- mendation with Multiple User Constraints by Integrating Point-of-Interests and Travel Packages. InIEEE 15th International Conference on Mobile Data Manage- ment, MDM 2014, Brisbane, Australia, July 14-18, 2014 - Volume 1. IEEE Computer Society, 33–42

work page 2014
[21]

Haas, and Alexandra Meliou

Anna Fariha, Matteo Brucato, Peter J. Haas, and Alexandra Meliou. 2020. SuDocu: Summarizing Documents by Example.Proc. VLDB Endow.13, 12 (2020), 2861– 2864

work page 2020
[22]

Anna Fariha, Lucy Cousins, Narges Mahyar, and Alexandra Meliou. 2026. Example-driven semantic-similarity-aware query intent discovery: Empowering users to cross the SQL barrier through query by example.Inf. Syst.138 (2026), 102687

work page 2026
[23]

Anna Fariha and Alexandra Meliou. 2019. Example-Driven Query Intent Discov- ery: Abductive Reasoning using Semantic Similarity.Proc. VLDB Endow.12, 11 (2019), 1262–1275

work page 2019
[24]

Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, and David Bau. 2024. Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XL (Lecture Notes in Computer Science). Springer, 172–188

work page 2024
[25]

Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation.Proc. VLDB Endow.17, 5 (2024), 1132–1145

work page 2024
[26]

Jennifer C Greene, Valerie J Caracelli, and Wendy F Graham. 1989. Toward a conceptual framework for mixed-method evaluation designs.Educational evaluation and policy analysis11, 3 (1989), 255–274

work page 1989
[27]

Nianlong Gu, Elliott Ash, and Richard H. R. Hahnloser. 2022. MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes. (2022), 6507–6522. https://doi.org/10.18653/V1/2022.ACL-LONG.450

work page doi:10.18653/v1/2022.acl-long.450 2022
[28]

Qi Gu, Jian Cao, and Yancen Liu. 2022. CSBR: A Compositional Semantics-Based Service Bundle Recommendation Approach for Mashup Development.IEEE Trans. Serv. Comput.15, 6 (2022), 3170–3183

work page 2022
[29]

Sumit Gulwani. 2011. Automating string processing in spreadsheets using input- output examples. InProceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011. ACM, 317–330

work page 2011
[30]

Sumit Gulwani and Prateek Jain. 2017. Programming by Examples: PL Meets ML. InProgramming Languages and Systems - 15th Asian Symposium, APLAS 2017, Suzhou, China, November 27-29, 2017, Proceedings (Lecture Notes in Computer Science), Vol. 10695. Springer, 3–20

work page 2017
[31]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang

work page
[32]

InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research), Vol

Retrieval Augmented Language Model Pre-Training. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research), Vol. 119. PMLR, 3929–3938

work page 2020
[33]

Karl Moritz Hermann, Tomás Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. InNeurIPS. 1693–1701

work page 2015
[34]

Clemens Heuberger. 2004. Inverse Combinatorial Optimization: A Survey on Problems, Methods, and Results.Journal of Combinatorial Optimization8, 3 (2004), 329–361

work page 2004
[35]

Po Hu, Donghong Ji, Chong Teng, and Yujing Guo. 2012. Context-Enhanced Personalized Social Summarization. InProceedings of COLING 2012. The COLING 2012 Organizing Committee, Mumbai, India, 1223–1238

work page 2012
[36]

Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, and Lu Wang. 2021. Efficient Attentions for Long Document Summarization. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 1419–1436

work page 2021
[37]

Wanderley, and Matthew Paradis

Andy Hunt, Marcelo M. Wanderley, and Matthew Paradis. 2002. The Importance of Parameter Mapping in Electronic Instrument Design. InNew Interfaces for Musical Expression, NIME-02, Proceedings, Dublin, Ireland, May 24-26, 2002. Media Lab Europe, 149–154

work page 2002
[38]

IBM ILOG CPLEX Optimization Studio

IBM ILOG CPLEX Optimization Studio [n.d.]. IBM ILOG CPLEX Optimization Studio. https://www.ibm.com/docs/en/icos

work page
[39]

Stratos Idreos, Olga Papaemmanouil, and Surajit Chaudhuri. 2015. Overview of Data Exploration Techniques. InSIGMOD. ACM, 277–281

work page 2015
[40]

Hal Daumé III and Daniel Marcu. 2006. Bayesian Query-Focused Summarization. InACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006, Nicoletta Calzolari, Claire Cardie, and Pierre Isabelle (Eds.). The A...

work page doi:10.3115/1220175.1220214 2006
[41]

Rahul Jain, Amit Goel, Koichiro Niinuma, and Aakar Gupta. 2025. AdaptiveSliders: User-aligned Semantic Slider-based Editing of Text-to-Image Model Output. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 541, 27 pages

work page 2025
[42]

Chris Kedzie, Kathleen McKeown, and Hal Daumé III. 2018. Content Selection in Deep Learning Models of Summarization. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 1818–1828

work page 2018
[43]

Shahedul Huq Khandkar. 2009. Open coding.University of Calgary23, 2009 (2009)

work page 2009
[44]

Akash Khatri, Mahathir Mohammad, and El Kindi Rezig. 2025. Sort it Like You Mean It: Discovering Semantically Interesting Attribute Augmentations to Sort Tables.Proc. VLDB Endow.18, 12 (2025), 5427–5430. https://doi.org/10.14778/ 3750601.3750688

work page arXiv 2025
[45]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim 13 Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InNeurIPS

work page 2020
[46]

Yanran Li and Sujian Li. 2014. Query-focused Multi-Document Summariza- tion: Combining a Topic Model with Graph-based Semi-supervised Learning. InProceedings of COLING 2014, the 25th International Conference on Computa- tional Linguistics: Technical Papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, 1197–1207

work page 2014
[47]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. InText summarization branches out. 74–81

work page 2004
[48]

Marina Litvak and Natalia Vanetik. 2017. Query-based summarization using MDL principle. InProceedings of the multiling 2017 workshop on summarization and summary evaluation across source types and genres. 22–31

work page 2017
[49]

2022.LlamaIndex

Jerry Liu. 2022.LlamaIndex. https://github.com/jerryjliu/llama_index

work page 2022
[50]

Xiaohao Liu, Jie Wu, Zhulin Tao, Yunshan Ma, Yinwei Wei, and Tat-Seng Chua

work page
[51]

In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, KDD 2025, Toronto, ON, Canada, August 3-7, 2025

Fine-tuning Multimodal Large Language Models for Product Bundling. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, KDD 2025, Toronto, ON, Canada, August 3-7, 2025. ACM, 848–858

work page 2025
[52]

Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. (2019), 3728–3738. https://doi.org/10.18653/V1/D19-1387

work page doi:10.18653/v1/d19-1387 2019
[53]

Xuan Lu, Sifan Liu, Bochao Yin, Yongqi Li, Xinghao Chen, Hui Su, Yaohui Jin, Wenjun Zeng, and Xiaoyu Shen. 2025. MultiConIR: Towards Multi-Condition Information Retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2025. Association for Computational Linguistics, Suzhou, China, 13471– 13494

work page 2025
[55]

Gang Luo. 2006. Efficient Detection of Empty-Result Queries. InProceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006. ACM, 1015–1025

work page 2006
[56]

Mai, Pengyu Wang, Azza Abouzied, Matteo Brucato, Peter J

Anh L. Mai, Pengyu Wang, Azza Abouzied, Matteo Brucato, Peter J. Haas, and Alexandra Meliou. 2024. Scaling Package Queries to a Billion Tuples via Hier- archical Partitioning and Customized Optimization.Proc. VLDB Endow.17, 5 (2024), 1146–1158

work page 2024
[57]

Mir Mahathir Mohammad and El Kindi Rezig. 2026. Qualitative Join Discovery in Data Lakes using Examples.Proc. ACM Manag. Data4, 1, Article 68 (April 2026), 28 pages. https://doi.org/10.1145/3786682

work page doi:10.1145/3786682 2026
[58]

Sheshera Mysore, Arman Cohan, and Tom Hope. 2022. Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity. InPro- ceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza ...

work page 2022
[59]

Arnab Nandi and H. V. Jagadish. 2011. Guided Interaction: Rethinking the Query-Result Paradigm.Proc. VLDB Endow.4, 12 (2011), 1466–1469

work page 2011
[60]

Ani Nenkova, Sameer Maskey, and Yang Liu. 2011. Automatic Summarization. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA - Tutorial Abstracts. The Association for Computer Linguistics, 3

work page 2011
[61]

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[62]

Apurva Pathak, Kshitiz Gupta, and Julian J. McAuley. 2017. Generating and Personalizing Bundle Recommendations onSteam. InProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. ACM, 1073–1076

work page 2017
[63]

Fotis Psallidas, Bolin Ding, Kaushik Chakrabarti, and Surajit Chaudhuri. 2015. S4: Top-k Spreadsheet-Style Search for Query Discovery. InSIGMOD. ACM, 2001–2016

work page 2015
[64]

El Kindi Rezig, Anshul Bhandari, Anna Fariha, Benjamin Price, Allan Vanterpool, Vijay Gadepally, and Michael Stonebraker. 2021. DICE: Data Discovery by Example.Proc. VLDB Endow.14, 12 (2021), 2819–2822

work page 2021
[65]

Thibault Sellam and Martin L. Kersten. 2013. Meet Charles, big data query advisor. InSixth Biennial Conference on Innovative Data Systems Research, CIDR 2013, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings. www.cidrdb.org

work page 2013
[66]

Battersby, Melanie Tory, Rich Gossweiler, and Angel X

Vidya Setlur, Sarah E. Battersby, Melanie Tory, Rich Gossweiler, and Angel X. Chang. 2016. Eviza: A Natural Language Interface for Visual Analysis. InPro- ceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST 2016, Tokyo, Japan, October 16-19, 2016. ACM, 365–377

work page 2016
[67]

Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik. 2014. Discovering queries based on example tuples. InSIGMOD. ACM, 493–504

work page 2014
[68]

Yunxiao Shi, Haoning Shang, Xing Zi, Wujiang Xu, Yue Feng, and Min Xu

work page
[69]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Answering Narrative-Driven Recommendation Queries via a Retrieve– Rank Paradigm and the OCG-Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 13181–13202

work page 2025
[70]

Aviv Slobodkin, Niv Nachum, Shmuel Amar, Ori Shapira, and Ido Dagan. 2023. SummHelper: Collaborative Human-Computer Summarization. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - System Demonstrations, Singapore, December 6-10, 2023. Association for Computational Linguistics, 554–565

work page 2023
[71]

Spotify Advertising Team. 2020. Five years of discovery and engagement through Discover Weekly. https://ads.spotify.com/en-US/news-and-insights/five-years- of-discovery-and-engagement-through-discover-weekly/. Accessed: 2026-03- 17

work page 2020
[72]

Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, and Alexander M. Rush. 2023. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models.IEEE Trans. Vis. Comput. Graph.29, 1 (2023), 1146–1156

work page 2023
[73]

Wenqi Sun, Ruobing Xie, Junjie Zhang, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. 2023. Generative Next-Basket Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023, Singapore, Singapore, September 18-22, 2023. ACM, 737–743

work page 2023
[74]

Hiroaki Takatsu, Takahiro Kashikawa, Koichi Kimura, Ryota Ando, and Yoichi Matsuyama. 2021. Personalized Extractive Summarization Using an Ising Ma- chine Towards Real-time Generation of Efficient and Coherent Dialogue Sce- narios. InProceedings of the 3rd Workshop on Natural Language Processing for Conversational AI. Association for Computational Linguis...

work page 2021
[75]

Nandan Thakur, Nils Reimers, Johannes Daxenberger, and Iryna Gurevych. 2021. Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational...

work page 2021
[76]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual

work page 2021
[77]

Thurstone

Louis L. Thurstone. 1974.A Law of Comparative Judgment(1st ed.). Routledge. 12 pages

work page 1974
[78]

[n.d.].TPC-H: Decision Support Benchmark

Transaction Processing Performance Council (TPC). [n.d.].TPC-H: Decision Support Benchmark. Technical Report. TPC. http://www.tpc.org/tpch/

work page
[79]

Joseph Tso, Preston Schmittou, Quan Huynh, and Jibran Hutchins. 2026. Con- straintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization. arXiv preprint arXiv:2602.22465(2026)

work page arXiv 2026
[80]

Jianyou Wang, Kaicheng Wang, Xiaoyue Wang, Prudhviraj Naidu, Leon Bergen, and Ramamohan Paturi. 2023. DORIS-MAE: scientific document retrieval using multi-level aspect-based queries. , Article 1668 (2023), 16 pages

work page 2023
[81]

Yue Wang, Alexandra Meliou, and Gerome Miklau. 2018. RC-Index: Diversifying Answers to Range Queries.Proc. VLDB Endow.11, 7 (2018), 773–786

work page 2018

Showing first 80 references.

[1] [1]

Discover Weekly keeps giving me the same genre that I’m completely sick of

2020. Discover Weekly keeps giving me the same genre that I’m completely sick of. https://community.spotify.com/t5/Your-Library/Discover-Weekly-keeps- giving-me-the-same-genre-that-I-m/td-p/5065068

work page arXiv 2020

[2] [2]

LangChain

2026. LangChain. https://github.com/langchain-ai/langchain Accessed: 2026- 01-07

work page 2026

[3] [3]

Recommendations: Figuring out how to bring unique joy to each member

2026. Recommendations: Figuring out how to bring unique joy to each member. https://research.netflix.com/research-area/recommendations

work page 2026

[4] [4]

Jinze Bai, Chang Zhou, Junshuai Song, Xiaoru Qu, Weiting An, Zhao Li, and Jun Gao. 2019. Personalized Bundle List Recommendation. InThe World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019. ACM, 60–71

work page 2019

[5] [5]

Federico Bianchi, Silvia Terragni, Dirk Hovy, Debora Nozza, and Elisabetta Fersini. 2021. Cross-lingual Contextualized Topic Models with Zero-shot Learn- ing. InProceedings of the 16th Conference of the European Chapter of the Associa- tion for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 1676–1683

work page 2021

[6] [6]

Blei, Andrew Y

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation.J. Mach. Learn. Res.3 (2003), 993–1022

work page 2003

[7] [7]

Jay Bonggolto. 2025. Google is testing a new ‘Daily Discover’ feed on YouTube Music. https://tech.yahoo.com/streaming/articles/google-testing-daily-discover- feed-170300058.html

work page 2025

[8] [8]

Angela Bonifati, Radu Ciucanu, and Slawek Staworko. 2016. Learning Join Queries from User Examples.ACM Trans. Database Syst.40, 4 (2016), 24:1–24:38

work page 2016

[9] [9]

2012.Thematic analysis.American Psycho- logical Association

Virginia Braun and Victoria Clarke. 2012.Thematic analysis.American Psycho- logical Association

work page 2012

[10] [10]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page 2020

[11] [11]

Matteo Brucato, Juan Felipe Beltran, Azza Abouzied, and Alexandra Meliou. 2016. Scalable Package Queries in Relational Database Systems.Proc. VLDB Endow.9, 7 (2016), 576–587

work page 2016

[12] [12]

Jianxin Chang, Chen Gao, Xiangnan He, Depeng Jin, and Yong Li. 2023. Bundle Recommendation and Generation With Graph Neural Networks.IEEE Trans. Knowl. Data Eng.35, 3 (2023), 2326–2340

work page 2023

[13] [13]

Whanhee Cho and Anna Fariha. 2026. Data-Semantics-Aware Recommendation of Diverse Pivot Tables.Proc. ACM Manag. Data4, 1, Article 23 (April 2026), 28 pages

work page 2026

[14] [14]

Haas, and Anna Fariha

Whanhee Cho, Kuangfei Long, Mahmood Jasim, Matteo Brucato, Alexandra Meliou, Peter J. Haas, and Anna Fariha. 2026. Example-Driven Intent Synthesis for Constrained Data Bundle Retrieval: Focused Text Snippet Extraction and Beyond (Technical Report).Preprint(2026). https://users.cs.utah.edu/~afariha/ ex2bundle_tech_rep.pdf

work page 2026

[15] [15]

John Joon Young Chung and Eytan Adar. 2023. PromptPaint: Steering Text-to- Image Generation Through Paint Medium-like Interactions. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 6, 17 pages

work page 2023

[16] [16]

Matthew JC Crump, John V McDonnell, and Todd M Gureckis. 2013. Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research.PloS One8, 3 (2013), e57410

work page 2013

[17] [17]

Daniel Deutch and Amir Gilad. 2016. QPlain: Query by explanation. In32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016. IEEE Computer Society, 1358–1361

work page 2016

[18] [18]

Heejin Do, Sangwon Ryu, Jonghwi Kim, and Gary Lee. 2025. Multi-Facet Blending for Faceted Query-by-Example Retrieval. InACL. 28577–28590

work page 2025

[19] [19]

Marina Drosou and Evaggelia Pitoura. 2012. DisC diversity: result diversification based on dissimilarity and coverage.Proc. VLDB Endow.6, 1 (2012), 13–24

work page 2012

[20] [20]

Shih Hsin Fang, Eric Hsueh-Chan Lu, and Vincent S. Tseng. 2014. Trip Recom- mendation with Multiple User Constraints by Integrating Point-of-Interests and Travel Packages. InIEEE 15th International Conference on Mobile Data Manage- ment, MDM 2014, Brisbane, Australia, July 14-18, 2014 - Volume 1. IEEE Computer Society, 33–42

work page 2014

[21] [21]

Haas, and Alexandra Meliou

Anna Fariha, Matteo Brucato, Peter J. Haas, and Alexandra Meliou. 2020. SuDocu: Summarizing Documents by Example.Proc. VLDB Endow.13, 12 (2020), 2861– 2864

work page 2020

[22] [22]

Anna Fariha, Lucy Cousins, Narges Mahyar, and Alexandra Meliou. 2026. Example-driven semantic-similarity-aware query intent discovery: Empowering users to cross the SQL barrier through query by example.Inf. Syst.138 (2026), 102687

work page 2026

[23] [23]

Anna Fariha and Alexandra Meliou. 2019. Example-Driven Query Intent Discov- ery: Abductive Reasoning using Semantic Similarity.Proc. VLDB Endow.12, 11 (2019), 1262–1275

work page 2019

[24] [24]

Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, and David Bau. 2024. Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models. InComputer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XL (Lecture Notes in Computer Science). Springer, 172–188

work page 2024

[25] [25]

Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation.Proc. VLDB Endow.17, 5 (2024), 1132–1145

work page 2024

[26] [26]

Jennifer C Greene, Valerie J Caracelli, and Wendy F Graham. 1989. Toward a conceptual framework for mixed-method evaluation designs.Educational evaluation and policy analysis11, 3 (1989), 255–274

work page 1989

[27] [27]

Nianlong Gu, Elliott Ash, and Richard H. R. Hahnloser. 2022. MemSum: Extractive Summarization of Long Documents Using Multi-Step Episodic Markov Decision Processes. (2022), 6507–6522. https://doi.org/10.18653/V1/2022.ACL-LONG.450

work page doi:10.18653/v1/2022.acl-long.450 2022

[28] [28]

Qi Gu, Jian Cao, and Yancen Liu. 2022. CSBR: A Compositional Semantics-Based Service Bundle Recommendation Approach for Mashup Development.IEEE Trans. Serv. Comput.15, 6 (2022), 3170–3183

work page 2022

[29] [29]

Sumit Gulwani. 2011. Automating string processing in spreadsheets using input- output examples. InProceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011. ACM, 317–330

work page 2011

[30] [30]

Sumit Gulwani and Prateek Jain. 2017. Programming by Examples: PL Meets ML. InProgramming Languages and Systems - 15th Asian Symposium, APLAS 2017, Suzhou, China, November 27-29, 2017, Proceedings (Lecture Notes in Computer Science), Vol. 10695. Springer, 3–20

work page 2017

[31] [31]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang

work page

[32] [32]

InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research), Vol

Retrieval Augmented Language Model Pre-Training. InProceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research), Vol. 119. PMLR, 3929–3938

work page 2020

[33] [33]

Karl Moritz Hermann, Tomás Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. InNeurIPS. 1693–1701

work page 2015

[34] [34]

Clemens Heuberger. 2004. Inverse Combinatorial Optimization: A Survey on Problems, Methods, and Results.Journal of Combinatorial Optimization8, 3 (2004), 329–361

work page 2004

[35] [35]

Po Hu, Donghong Ji, Chong Teng, and Yujing Guo. 2012. Context-Enhanced Personalized Social Summarization. InProceedings of COLING 2012. The COLING 2012 Organizing Committee, Mumbai, India, 1223–1238

work page 2012

[36] [36]

Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, and Lu Wang. 2021. Efficient Attentions for Long Document Summarization. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 1419–1436

work page 2021

[37] [37]

Wanderley, and Matthew Paradis

Andy Hunt, Marcelo M. Wanderley, and Matthew Paradis. 2002. The Importance of Parameter Mapping in Electronic Instrument Design. InNew Interfaces for Musical Expression, NIME-02, Proceedings, Dublin, Ireland, May 24-26, 2002. Media Lab Europe, 149–154

work page 2002

[38] [38]

IBM ILOG CPLEX Optimization Studio

IBM ILOG CPLEX Optimization Studio [n.d.]. IBM ILOG CPLEX Optimization Studio. https://www.ibm.com/docs/en/icos

work page

[39] [39]

Stratos Idreos, Olga Papaemmanouil, and Surajit Chaudhuri. 2015. Overview of Data Exploration Techniques. InSIGMOD. ACM, 277–281

work page 2015

[40] [40]

Hal Daumé III and Daniel Marcu. 2006. Bayesian Query-Focused Summarization. InACL 2006, 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Sydney, Australia, 17-21 July 2006, Nicoletta Calzolari, Claire Cardie, and Pierre Isabelle (Eds.). The A...

work page doi:10.3115/1220175.1220214 2006

[41] [41]

Rahul Jain, Amit Goel, Koichiro Niinuma, and Aakar Gupta. 2025. AdaptiveSliders: User-aligned Semantic Slider-based Editing of Text-to-Image Model Output. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 541, 27 pages

work page 2025

[42] [42]

Chris Kedzie, Kathleen McKeown, and Hal Daumé III. 2018. Content Selection in Deep Learning Models of Summarization. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 1818–1828

work page 2018

[43] [43]

Shahedul Huq Khandkar. 2009. Open coding.University of Calgary23, 2009 (2009)

work page 2009

[44] [44]

Akash Khatri, Mahathir Mohammad, and El Kindi Rezig. 2025. Sort it Like You Mean It: Discovering Semantically Interesting Attribute Augmentations to Sort Tables.Proc. VLDB Endow.18, 12 (2025), 5427–5430. https://doi.org/10.14778/ 3750601.3750688

work page arXiv 2025

[45] [45]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim 13 Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InNeurIPS

work page 2020

[46] [46]

Yanran Li and Sujian Li. 2014. Query-focused Multi-Document Summariza- tion: Combining a Topic Model with Graph-based Semi-supervised Learning. InProceedings of COLING 2014, the 25th International Conference on Computa- tional Linguistics: Technical Papers. Dublin City University and Association for Computational Linguistics, Dublin, Ireland, 1197–1207

work page 2014

[47] [47]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. InText summarization branches out. 74–81

work page 2004

[48] [48]

Marina Litvak and Natalia Vanetik. 2017. Query-based summarization using MDL principle. InProceedings of the multiling 2017 workshop on summarization and summary evaluation across source types and genres. 22–31

work page 2017

[49] [49]

2022.LlamaIndex

Jerry Liu. 2022.LlamaIndex. https://github.com/jerryjliu/llama_index

work page 2022

[50] [50]

Xiaohao Liu, Jie Wu, Zhulin Tao, Yunshan Ma, Yinwei Wei, and Tat-Seng Chua

work page

[51] [51]

In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, KDD 2025, Toronto, ON, Canada, August 3-7, 2025

Fine-tuning Multimodal Large Language Models for Product Bundling. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, KDD 2025, Toronto, ON, Canada, August 3-7, 2025. ACM, 848–858

work page 2025

[52] [52]

Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. (2019), 3728–3738. https://doi.org/10.18653/V1/D19-1387

work page doi:10.18653/v1/d19-1387 2019

[53] [53]

Xuan Lu, Sifan Liu, Bochao Yin, Yongqi Li, Xinghao Chen, Hui Su, Yaohui Jin, Wenjun Zeng, and Xiaoyu Shen. 2025. MultiConIR: Towards Multi-Condition Information Retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2025. Association for Computational Linguistics, Suzhou, China, 13471– 13494

work page 2025

[54] [55]

Gang Luo. 2006. Efficient Detection of Empty-Result Queries. InProceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006. ACM, 1015–1025

work page 2006

[55] [56]

Mai, Pengyu Wang, Azza Abouzied, Matteo Brucato, Peter J

Anh L. Mai, Pengyu Wang, Azza Abouzied, Matteo Brucato, Peter J. Haas, and Alexandra Meliou. 2024. Scaling Package Queries to a Billion Tuples via Hier- archical Partitioning and Customized Optimization.Proc. VLDB Endow.17, 5 (2024), 1146–1158

work page 2024

[56] [57]

Mir Mahathir Mohammad and El Kindi Rezig. 2026. Qualitative Join Discovery in Data Lakes using Examples.Proc. ACM Manag. Data4, 1, Article 68 (April 2026), 28 pages. https://doi.org/10.1145/3786682

work page doi:10.1145/3786682 2026

[57] [58]

Sheshera Mysore, Arman Cohan, and Tom Hope. 2022. Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity. InPro- ceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza ...

work page 2022

[58] [59]

Arnab Nandi and H. V. Jagadish. 2011. Guided Interaction: Rethinking the Query-Result Paradigm.Proc. VLDB Endow.4, 12 (2011), 1466–1469

work page 2011

[59] [60]

Ani Nenkova, Sameer Maskey, and Yang Liu. 2011. Automatic Summarization. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA - Tutorial Abstracts. The Association for Computer Linguistics, 3

work page 2011

[60] [61]

OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2023

[61] [62]

Apurva Pathak, Kshitiz Gupta, and Julian J. McAuley. 2017. Generating and Personalizing Bundle Recommendations onSteam. InProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7-11, 2017. ACM, 1073–1076

work page 2017

[62] [63]

Fotis Psallidas, Bolin Ding, Kaushik Chakrabarti, and Surajit Chaudhuri. 2015. S4: Top-k Spreadsheet-Style Search for Query Discovery. InSIGMOD. ACM, 2001–2016

work page 2015

[63] [64]

El Kindi Rezig, Anshul Bhandari, Anna Fariha, Benjamin Price, Allan Vanterpool, Vijay Gadepally, and Michael Stonebraker. 2021. DICE: Data Discovery by Example.Proc. VLDB Endow.14, 12 (2021), 2819–2822

work page 2021

[64] [65]

Thibault Sellam and Martin L. Kersten. 2013. Meet Charles, big data query advisor. InSixth Biennial Conference on Innovative Data Systems Research, CIDR 2013, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings. www.cidrdb.org

work page 2013

[65] [66]

Battersby, Melanie Tory, Rich Gossweiler, and Angel X

Vidya Setlur, Sarah E. Battersby, Melanie Tory, Rich Gossweiler, and Angel X. Chang. 2016. Eviza: A Natural Language Interface for Visual Analysis. InPro- ceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST 2016, Tokyo, Japan, October 16-19, 2016. ACM, 365–377

work page 2016

[66] [67]

Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik. 2014. Discovering queries based on example tuples. InSIGMOD. ACM, 493–504

work page 2014

[67] [68]

Yunxiao Shi, Haoning Shang, Xing Zi, Wujiang Xu, Yue Feng, and Min Xu

work page

[68] [69]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Answering Narrative-Driven Recommendation Queries via a Retrieve– Rank Paradigm and the OCG-Agent. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Suzhou, China, 13181–13202

work page 2025

[69] [70]

Aviv Slobodkin, Niv Nachum, Shmuel Amar, Ori Shapira, and Ido Dagan. 2023. SummHelper: Collaborative Human-Computer Summarization. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - System Demonstrations, Singapore, December 6-10, 2023. Association for Computational Linguistics, 554–565

work page 2023

[70] [71]

Spotify Advertising Team. 2020. Five years of discovery and engagement through Discover Weekly. https://ads.spotify.com/en-US/news-and-insights/five-years- of-discovery-and-engagement-through-discover-weekly/. Accessed: 2026-03- 17

work page 2020

[71] [72]

Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, and Alexander M. Rush. 2023. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models.IEEE Trans. Vis. Comput. Graph.29, 1 (2023), 1146–1156

work page 2023

[72] [73]

Wenqi Sun, Ruobing Xie, Junjie Zhang, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. 2023. Generative Next-Basket Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023, Singapore, Singapore, September 18-22, 2023. ACM, 737–743

work page 2023

[73] [74]

Hiroaki Takatsu, Takahiro Kashikawa, Koichi Kimura, Ryota Ando, and Yoichi Matsuyama. 2021. Personalized Extractive Summarization Using an Ising Ma- chine Towards Real-time Generation of Efficient and Coherent Dialogue Sce- narios. InProceedings of the 3rd Workshop on Natural Language Processing for Conversational AI. Association for Computational Linguis...

work page 2021

[74] [75]

Nandan Thakur, Nils Reimers, Johannes Daxenberger, and Iryna Gurevych. 2021. Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational...

work page 2021

[75] [76]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual

work page 2021

[76] [77]

Thurstone

Louis L. Thurstone. 1974.A Law of Comparative Judgment(1st ed.). Routledge. 12 pages

work page 1974

[77] [78]

[n.d.].TPC-H: Decision Support Benchmark

Transaction Processing Performance Council (TPC). [n.d.].TPC-H: Decision Support Benchmark. Technical Report. TPC. http://www.tpc.org/tpch/

work page

[78] [79]

Joseph Tso, Preston Schmittou, Quan Huynh, and Jibran Hutchins. 2026. Con- straintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization. arXiv preprint arXiv:2602.22465(2026)

work page arXiv 2026

[79] [80]

Jianyou Wang, Kaicheng Wang, Xiaoyue Wang, Prudhviraj Naidu, Leon Bergen, and Ramamohan Paturi. 2023. DORIS-MAE: scientific document retrieval using multi-level aspect-based queries. , Article 1668 (2023), 16 pages

work page 2023

[80] [81]

Yue Wang, Alexandra Meliou, and Gerome Miklau. 2018. RC-Index: Diversifying Answers to Range Queries.Proc. VLDB Endow.11, 7 (2018), 773–786

work page 2018