Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method

Anastasia Ailamaki; Kyoungmin Kim; Martin Catheland

arxiv: 2606.08090 · v1 · pith:32UD7DB3new · submitted 2026-06-06 · 💻 cs.DB · cs.AI

Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method

Kyoungmin Kim , Martin Catheland , Anastasia Ailamaki This is my paper

Pith reviewed 2026-06-27 19:03 UTC · model grok-4.3

classification 💻 cs.DB cs.AI

keywords semantic filteringLLM cascadesadaptive two-phase methodsoft labelsconfidence calibrationproxy trainingdocument corpusyes/no predicates

0 comments

The pith

Adaptive two-phase cascade of clustering then confidence-trained proxy cuts LLM calls 1.6-2.0x at 90% accuracy target

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified framework for semantic filtering cascades and demonstrates that an adaptive two-phase method—model-free clustering first, followed by an online proxy only when needed—outperforms any single fixed cascade family. The proxy is trained on the oracle LLM's per-document confidence scores as soft labels rather than binary decisions, uses hybrid token-aware models, and applies a calibration that adds safety margin only where the labeled sample is sparse. On three 10K-document corpora this yields 1.6-2.0 times fewer oracle calls than the best prior method while meeting the 90% accuracy target on 95% of queries. The same confidence scores also serve as a query difficulty compass and establish a lower bound showing 4-20x additional headroom remains. Readers building LLM data pipelines would care because the technique directly reduces the dominant cost of evaluating yes/no predicates over large collections.

Core claim

The central claim is that composing clustering and an online proxy in two phases, training the proxy on the oracle's continuous per-document confidence scores as soft labels, replacing bi-encoders with hybrid token-aware models, and calibrating safety margins only on sparse samples produces a cascade that meets a 90% accuracy target on 95% of queries while using 1.6-2.0x fewer oracle calls than the best prior method on each of three 10K-document corpora. The oracle confidence scores are used for the first time as a query difficulty compass, a lower bound on minimum oracle calls any proxy-based cascade can achieve, and the proxy's soft training label.

What carries the argument

The adaptive two-phase cascade that applies model-free clustering first and invokes the online proxy only when clustering alone cannot guarantee the accuracy target, with oracle calls shared across phases and the proxy trained on soft labels from the oracle's per-document confidence.

Load-bearing premise

The oracle LLM's per-document confidence scores supply unbiased soft labels for proxy training and a reliable lower bound on minimum oracle calls even after the adaptive phase switch.

What would settle it

Measuring whether accuracy falls below the target on queries where the initial clustering step fails to separate positive and negative documents would show if the subsequent proxy phase preserves the guarantee without extra oracle cost.

Figures

Figures reproduced from arXiv: 2606.08090 by Anastasia Ailamaki, Kyoungmin Kim, Martin Catheland.

**Figure 2.** Figure 2: The unified framework as a DAG. The six steps of Algorithm 1 arranged left to right. Blue boxes are document sets; [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Method-by-step matrix view. Rows are methods; columns are the six steps. Cell color identifies the family (red = [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Three model architectures. (a) Bi-encoder: query [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: T-SNE visualization on raw embedding space (left) [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Target accuracy vs. end-to-end cost. Curves further [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Per-query end-to-end time breakdown at 𝛼 =0.9. Bars stack proxy time and oracle-LLM time across five segments: proxy train/score, Phase-1 sample labeling, training-set labeling, calibration labeling, and cascade. The cross-hatched cascade segment dominates everywhere; Phase-2 and Two-Phase shrink it on essentially every query; Two-Phase additionally removes the training-labeling segment on Phase-1-resolved… view at source ↗

**Figure 8.** Figure 8: Per-query lower-envelope on PubMed at 𝛼 = 0.9. Queries are sorted along the 𝑥-axis by the per-query cheapest deployable plan (the dashed envelope); 𝑦-axis is end-to-end time. Two-Phase (gray) tracks the envelope across the whole axis; CSV (red) wins outright on the leftmost ∼6 queries and then explodes by 5–25× on the rest; ScaleDoc (teal) and BARGAIN (orange) sit far above the envelope almost everywhere … view at source ↗

**Figure 9.** Figure 9: BER as a compass for which family wins, PubMed. [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

read the original abstract

Evaluating a natural-language yes/no predicate over a document corpus under an accuracy target - the semantic filter - is a cornerstone of LLM-based data processing. Calling the LLM on every document (the oracle) is prohibitive, so cascades pair the oracle with a fast proxy. As deployed today, they leave four limitations on the table. (1) Each cascade family - model-free clustering, prebuilt small-LLM proxies, online-trained proxies - commits to a single representation and pipeline, and wins on only a narrow query regime. (2) The strongest online proxy invests in a custom training scheme on a bi-encoder over dense embeddings, missing the token-level evidence richer predicates require. (3) The proxy is trained against binary yes/no labels, wasting the LLM's per-document confidence at the boundary documents it most needs to learn. (4) Existing calibrations add a uniform safety margin, conflating genuine proxy uncertainty with small-sample noise and inflating cascade cost. We address these by (1) composing families adaptively - model-free clustering first, online proxy only when needed, with oracle calls shared across phases; (2) replacing the cosine bi-encoder with a hybrid of off-the-shelf token-aware models; (3) training the proxy with the oracle's per-document confidence as a soft label; and (4) a calibration that adds the safety margin only where the labeled sample is sparse. We are also the first to use the oracle's per-document confidence for three purposes: a query-level difficulty compass, a lower bound on the minimum oracle calls any proxy-based cascade can make, and the proxy's soft training label. At a 90% accuracy target on three 10K-document corpora, our methods are 1.6-2.0x faster than the best prior method per corpus and meet the target on 95% of queries; the BER-derived lower bound indicates a further ~4-20x of headroom for future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The adaptive composition and oracle-confidence tricks are the real novelty, but the abstract gives zero experimental details so the 1.6-2x speedup and lower-bound claims are impossible to assess.

read the letter

The paper's core move is to stop committing to one cascade family and instead run model-free clustering first, then fall back to an online-trained proxy only when needed, sharing the oracle calls across phases. It also swaps the usual bi-encoder for a hybrid token-aware proxy, trains that proxy on the oracle's per-document confidence as soft labels instead of binary labels, and replaces uniform safety margins with a calibration that only adds margin where the sample is sparse. They further claim to be first in using oracle confidence as a query difficulty compass, a lower bound (BER) on any proxy cascade's oracle calls, and the soft training signal.

Those four targeted fixes line up directly with the four limitations they list, and the claimed result is 1.6-2.0x faster than the best prior method at 90% accuracy on three 10k-document sets while hitting the target on 95% of queries, with the BER bound suggesting another 4-20x headroom.

The obvious soft spot is that the abstract contains no experimental section at all—no baseline descriptions, no dataset characteristics, no statistical tests, no ablation on the adaptive switch—so none of the speed or accuracy numbers can be checked. The stress-test concern about the composition possibly violating the lower bound on queries where clustering fails also looks live from the abstract alone; nothing there shows how the phases interact or that the bound still holds after an uninformative cluster sample triggers the proxy phase.

This is aimed at people building LLM data pipelines inside databases. If the full paper supplies the missing experiments and directly tests the lower-bound preservation, it is worth a serious referee. Otherwise the empirical claims stay unverified.

Referee Report

2 major / 2 minor

Summary. The paper introduces a unified framework for semantic filtering (evaluating natural-language yes/no predicates over document corpora under an accuracy target) and proposes an adaptive two-phase method that begins with model-free clustering, invokes an online-trained proxy only when needed, and shares oracle LLM calls across phases. It replaces cosine bi-encoder proxies with hybrid token-aware models, trains the proxy using the oracle's per-document confidence as soft labels, and introduces a calibration that applies safety margins only on sparse samples. The oracle confidence is also used as a query difficulty compass and as the basis for a BER lower bound on minimum oracle calls. On three 10K-document corpora at a 90% accuracy target the method achieves 1.6-2.0x speedups over the best prior per-corpus baseline while meeting the target on 95% of queries, with the BER bound indicating 4-20x additional headroom.

Significance. If the empirical speedups, accuracy rates, and the claim that the adaptive composition preserves the BER lower bound all hold, the work would meaningfully improve the practicality of LLM-based semantic queries in database systems by moving beyond single-family cascades and by extracting more signal from oracle confidence scores. The lower-bound construction, if rigorously established, would also supply a useful benchmark for future proxy-cascade research.

major comments (2)

[Abstract and §3] Abstract and §3 (method): the claim that the adaptive two-phase construction (clustering first, then online proxy) inherits the BER lower bound on oracle calls is not demonstrated. No argument or proof sketch shows that the composition avoids new failure modes precisely on queries where the initial clustering sample is uninformative; if such queries exist, both the reported 1.6-2.0x speedup and the 4-20x headroom can be invalidated while the nominal accuracy target is still met on the remaining queries.
[§5] §5 (experiments): the headline speedups at 90% accuracy rest on the oracle confidence scores serving as unbiased soft labels for proxy training; the manuscript must supply an explicit check (e.g., calibration plots or bias metrics on held-out oracle scores) that this assumption holds across the three corpora, because any systematic bias would directly undermine both the proxy quality and the claimed speedups.

minor comments (2)

[Abstract] Abstract: the experimental claims (speedups, accuracy rates, 95% query coverage) are stated without any mention of dataset characteristics, baseline implementations, or statistical significance tests; these details belong in the abstract or a dedicated results paragraph.
[Abstract] Notation: the acronym BER is introduced without an explicit expansion or equation reference on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method): the claim that the adaptive two-phase construction (clustering first, then online proxy) inherits the BER lower bound on oracle calls is not demonstrated. No argument or proof sketch shows that the composition avoids new failure modes precisely on queries where the initial clustering sample is uninformative; if such queries exist, both the reported 1.6-2.0x speedup and the 4-20x headroom can be invalidated while the nominal accuracy target is still met on the remaining queries.

Authors: We agree that the manuscript does not contain an explicit argument or proof sketch establishing that the adaptive two-phase construction inherits the BER lower bound without new failure modes on queries where the initial clustering sample is uninformative. In the revision we will add a proof sketch in §3 showing that the composition preserves the bound: the proxy phase activates only after clustering fails to meet the accuracy target on its sample, oracle calls are shared, and the BER construction applies directly to the combined labeled set. revision: yes
Referee: [§5] §5 (experiments): the headline speedups at 90% accuracy rest on the oracle confidence scores serving as unbiased soft labels for proxy training; the manuscript must supply an explicit check (e.g., calibration plots or bias metrics on held-out oracle scores) that this assumption holds across the three corpora, because any systematic bias would directly undermine both the proxy quality and the claimed speedups.

Authors: We agree that an explicit check is required. The revision will add calibration plots and bias metrics (e.g., expected calibration error and mean absolute deviation) on held-out oracle confidence scores for each of the three corpora to verify that the soft-label assumption holds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical comparisons only

full rationale

The paper describes an adaptive two-phase semantic filtering method that composes model-free clustering with online proxy training, using oracle per-document confidence scores for training labels, difficulty assessment, and a claimed lower bound on oracle calls. All performance claims (1.6-2.0x speedups at 90% accuracy) are presented as direct empirical measurements against prior cascades on fixed 10K-document corpora, with no equations, derivations, or fitted parameters that reduce the reported quantities to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in the provided text to justify core results. The method is self-contained via experimental validation rather than any definitional or predictive reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all technical details on training, calibration, and clustering are absent.

pith-pipeline@v0.9.1-grok · 5902 in / 1056 out tokens · 17855 ms · 2026-06-27T19:03:24.755155+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 12 canonical work pages · 2 internal anchors

[1]

Angelopoulos and Stephen Bates

Anastasios N. Angelopoulos and Stephen Bates. 2021. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.CoRR abs/2107.07511 (2021). arXiv:2107.07511 https://arxiv.org/abs/2107.07511

Pith/arXiv arXiv 2021
[2]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated Machine Reading Comprehension Dataset.CoRR abs/1611.09268 (2016)

Pith/arXiv arXiv 2016
[3]

Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, and Michael I. Jordan. 2021. Distribution-free, Risk-controlling Prediction Sets.J. ACM68, 6 (2021), 43:1–43:34. doi:10.1145/3478535

work page doi:10.1145/3478535 2021
[4]

Bertsekas

Dimitri P. Bertsekas. 1999.Nonlinear Programming(2nd ed.). Athena Scientific, Belmont, MA

1999
[5]

Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adap- tive Neural Networks for Efficient Inference. InProceedings of the 34th Inter- national Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Research). PMLR, 527–536. http://proceedings.mlr.press/v70/bolukbasi17a.html

2017
[6]

2004.Convex Optimization

Stephen Boyd and Lieven Vandenberghe. 2004.Convex Optimization. Cambridge University Press

2004
[7]

Brown, T

Lawrence D. Brown, T. Tony Cai, and Anirban DasGupta. 2001. Interval Estima- tion for a Binomial Proportion.Statist. Sci.16, 2 (2001), 101–133

2001
[8]

Lingjiao Chen, Matei Zaharia, and James Zou. 2024. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance.Trans. Mach. Learn. Res.2024 (2024). https://openreview.net/forum?id=cSimKw5p6R

2024
[9]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning (ICML)

2020
[10]

Yeounoh Chung, Rushabh Desai, Jian He, Yu Xiao, Thibaud Hottelier, Yves- Laurent Kom Samo, Pushkar Kadilkar, Xianshun Chen, Sam Idicula, Fatma Özcan, Alon Halevy, and Yannis Papakonstantinou. 2026. 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models.CoRRabs/2603.15970 (2026). arXiv:2603.15970 https:...

Pith/arXiv arXiv 2026
[11]

C. J. Clopper and E. S. Pearson. 1934. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial.Biometrika26, 4 (1934), 404–413. doi:10. 2307/2331986

1934
[12]

Gupta, Serena Wang, Taman Narayan, Seungil You, and Karthik Sridharan

Andrew Cotter, Heinrich Jiang, Maya R. Gupta, Serena Wang, Taman Narayan, Seungil You, and Karthik Sridharan. 2019. Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals.Journal of Machine Learning Research20 (2019), 1–59

2019
[13]

Voorhees

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020. Overview of the TREC 2019 Deep Learning Track. InThe Twenty- Eighth Text REtrieval Conference (TREC) Proceedings

2020
[14]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, ...

work page doi:10.18653/v1/n19-1423 2019
[15]

1990.Introduction to Statistical Pattern Recognition(2 ed.)

Keinosuke Fukunaga. 1990.Introduction to Statistical Pattern Recognition(2 ed.). Academic Press. doi:10.1016/C2009-0-27872-X

work page doi:10.1016/c2009-0-27872-x 1990
[16]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. InProceedings of the 2021 Conference on Em- pirical Methods in Natural Language Processing (EMNLP). Conference’17, July 2017, Washington, DC, USA Kyoungmin Kim, Martin Catheland, and Anastasia Ailamaki

2021
[17]

Yonatan Geifman and Ran El-Yaniv. 2017. Selective Classification for Deep Neural Networks. InAdvances in Neural Information Processing Systems (NeurIPS) 30. arXiv:1705.08500 https://proceedings.neurips.cc/paper_files/paper/2017/hash/ 4a8423d5e91fda00bb7e46540e2b0cf1-Abstract.html

Pith/arXiv arXiv 2017
[18]

Isaac Gibbs and Emmanuel J. Candès. 2021. Adaptive Conformal Inference Under Distribution Shift. InAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 1660–1672. https://proceedings.neurips.cc/ paper/2021/hash/0d441de75945e5acbc865406fc9a2559-Abs...

2021
[19]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network.CoRRabs/1503.02531 (2015). https://arxiv.org/abs/1503.02531

Pith/arXiv arXiv 2015
[20]

Hou et al

A. Hou et al . 2026. Beyond Linear LLM Invocation: An Efficient and Effec- tive Semantic Filter Paradigm (CSV / UniCSV / SimCSV).Proceedings of the VLDB Endowment(2026). https://arxiv.org/abs/2603.04799 To appear; preprint arXiv:2603.04799

arXiv 2026
[21]

Daniel Kang, Edward Gan, Peter Bailis, Tatsunori Hashimoto, and Matei Zaharia
[22]

InProceedings of the VLDB Endowment, Vol

Approximate Selection with Guarantees using Proxies. InProceedings of the VLDB Endowment, Vol. 13. 1990–2003. doi:10.14778/3407790.3407811

work page doi:10.14778/3407790.3407811 1990
[23]

Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2020
[24]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. ACM, 39–48. doi:10. 1145/3397271.3401075

arXiv 2020
[25]

Ferdi Kossmann, Ziniu Wu, Alex Turk, Nesime Tatbul, Lei Cao, and Samuel Madden. 2026. KEN: An Execution Engine for Unstructured Database Systems. Proc. VLDB Endow.19, 5 (2026), 902–916. doi:10.14778/3796195.3796204

work page doi:10.14778/3796195.3796204 2026
[26]

Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, and Wei Ping. 2025. NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=lgsyLSsDRe

2025
[27]

2021.Pretrained Transformers for Text Ranking: BERT and Beyond

Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021.Pretrained Transformers for Text Ranking: BERT and Beyond. Morgan & Claypool Publishers

2021
[28]

Chunwei Liu et al . 2024. Palimpzest: Optimizing AI-Powered Analytics with Semantic Operators. InConference on Innovative Data Systems Research (CIDR)

2024
[29]

Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade Ranking for Opera- tional E-commerce Search. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. ACM, 1557–1565. doi:10.1145/3097983.3098011

work page doi:10.1145/3097983.3098011 2017
[30]

Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, and Surajit Chaudhuri. 2018. Accelerating Machine Learning Inference with Probabilistic Predicates. InPro- ceedings of the 2018 International Conference on Management of Data, SIG- MOD Conference 2018, Houston, TX, USA, June 10-15, 2018. ACM, 1493–1508. doi:10.1145/3183713.3183751

work page doi:10.1145/3183713.3183751 2018
[31]

Qiuyang Mang, Yufan Xiang, Hangrui Zhou, Runyuan He, Jiaxiang Yu, Hanchen Li, Aditya Parameswaran, and Alvin Cheung. 2026. PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans.CoRRabs/2604.09944 (2026). arXiv:2604.09944 https://arxiv.org/abs/2604.09944

Pith/arXiv arXiv 2026
[32]

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2023. MTEB: Massive Text Embedding Benchmark. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL)

2023
[33]

Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. CoRRabs/1901.04085 (2019). arXiv:1901.04085 http://arxiv.org/abs/1901.04085

Pith/arXiv arXiv 2019
[34]

Northcutt, Lu Jiang, and Isaac L

Curtis G. Northcutt, Lu Jiang, and Isaac L. Chuang. 2021. Confident Learning: Estimating Uncertainty in Dataset Labels.J. Artif. Intell. Res.70 (2021), 1373–1411. doi:10.1613/JAIR.1.12125

work page doi:10.1613/jair.1.12125 2021
[35]

Liana Patel et al . 2024. LOTUS: Enabling Semantic Queries with LLMs over Tables. InProceedings of the VLDB Endowment

2024
[36]

Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, and Geoffrey E. Hinton. 2017. Regularizing Neural Networks by Penalizing Confident Output Distributions. InWorkshop Track of the 5th International Conference on Learning Representations (ICLR)

2017
[37]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

2019
[38]

Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. 2022. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. InProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)

2022
[39]

Parameswaran, and Eugene Wu

Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. 2025. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing.Proc. VLDB Endow.18 (2025). arXiv:2410.12189 doi:10. 14778/3746405.3746426

arXiv 2025
[40]

Panagiotis Sioulas, Viktor Sanca, Ioannis Mytilinis, and Anastasia Ailamaki. 2021. Accelerating Complex Analytics using Speculation. InConference on Innova- tive Data Systems Research (CIDR). https://www.cidrdb.org/cidr2021/papers/ cidr2021_paper03.pdf

2021
[41]

Llama Team. 2024. The Llama 3 Herd of Models.CoRRabs/2407.21783 (2024). arXiv:2407.21783 doi:10.48550/ARXIV.2407.21783

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024
[42]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks

2021
[43]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lam- ple. 2023. LLaMA: Open and Efficient Foundation Language Models.CoRR abs/2302.13971 (2023)

Pith/arXiv arXiv 2023
[44]

2005.Algorithmic Learning in a Random World

Vladimir Vovk, Alex Gammerman, and Glenn Shafer. 2005.Algorithmic Learning in a Random World. Springer. doi:10.1007/b106715

work page doi:10.1007/b106715 2005
[45]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrastive Pre-training.CoRRabs/2212.03533 (2022)

Pith/arXiv arXiv 2022
[46]

Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, and Wenjuan Han. 2023. Zero-Shot Information Extraction via Chatting with ChatGPT.CoRR abs/2302.10205 (2023). arXiv:2302.10205 doi:10.48550/ARXIV.2302.10205

work page doi:10.48550/arxiv.2302.10205 2023
[47]

Edwin B. Wilson. 1927. Probable Inference, the Law of Succession, and Statistical Inference.J. Amer. Statist. Assoc.22, 158 (1927), 209–212

1927
[48]

Margaux Zaffran, Olivier Féron, Yannig Goude, Julie Josse, and Aymeric Dieuleveut. 2022. Adaptive Conformal Predictions for Time Series. InInter- national Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research). PMLR, 25834–25866. https://proceedings.mlr.press/v162/zaffran22a.html

2022
[49]

Sepanta Zeighami, Shreya Shankar, and Aditya Parameswaran. 2025. BAR- GAIN: Budget-Aware Risk-Adaptive Guarantees for Accuracy-Improved LLM Cascades over Documents.Proceedings of the ACM SIGMOD International Con- ference on Management of Data(2025). https://arxiv.org/abs/2509.02896 Preprint arXiv:2509.02896

arXiv 2025
[50]

Hengrui Zhang, Yulong Hui, Yihao Liu, and Huanchen Zhang. 2025. Scale- Doc: Scaling LLM-based Predicates over Large Document Collections.CoRR abs/2509.12610 (2025). arXiv:2509.12610 doi:10.48550/ARXIV.2509.12610

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.12610 2025

[1] [1]

Angelopoulos and Stephen Bates

Anastasios N. Angelopoulos and Stephen Bates. 2021. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.CoRR abs/2107.07511 (2021). arXiv:2107.07511 https://arxiv.org/abs/2107.07511

Pith/arXiv arXiv 2021

[2] [2]

Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, and Tong Wang. 2016. MS MARCO: A Human Generated Machine Reading Comprehension Dataset.CoRR abs/1611.09268 (2016)

Pith/arXiv arXiv 2016

[3] [3]

Stephen Bates, Anastasios Angelopoulos, Lihua Lei, Jitendra Malik, and Michael I. Jordan. 2021. Distribution-free, Risk-controlling Prediction Sets.J. ACM68, 6 (2021), 43:1–43:34. doi:10.1145/3478535

work page doi:10.1145/3478535 2021

[4] [4]

Bertsekas

Dimitri P. Bertsekas. 1999.Nonlinear Programming(2nd ed.). Athena Scientific, Belmont, MA

1999

[5] [5]

Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adap- tive Neural Networks for Efficient Inference. InProceedings of the 34th Inter- national Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Research). PMLR, 527–536. http://proceedings.mlr.press/v70/bolukbasi17a.html

2017

[6] [6]

2004.Convex Optimization

Stephen Boyd and Lieven Vandenberghe. 2004.Convex Optimization. Cambridge University Press

2004

[7] [7]

Brown, T

Lawrence D. Brown, T. Tony Cai, and Anirban DasGupta. 2001. Interval Estima- tion for a Binomial Proportion.Statist. Sci.16, 2 (2001), 101–133

2001

[8] [8]

Lingjiao Chen, Matei Zaharia, and James Zou. 2024. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance.Trans. Mach. Learn. Res.2024 (2024). https://openreview.net/forum?id=cSimKw5p6R

2024

[9] [9]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning (ICML)

2020

[10] [10]

Yeounoh Chung, Rushabh Desai, Jian He, Yu Xiao, Thibaud Hottelier, Yves- Laurent Kom Samo, Pushkar Kadilkar, Xianshun Chen, Sam Idicula, Fatma Özcan, Alon Halevy, and Yannis Papakonstantinou. 2026. 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models.CoRRabs/2603.15970 (2026). arXiv:2603.15970 https:...

Pith/arXiv arXiv 2026

[11] [11]

C. J. Clopper and E. S. Pearson. 1934. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial.Biometrika26, 4 (1934), 404–413. doi:10. 2307/2331986

1934

[12] [12]

Gupta, Serena Wang, Taman Narayan, Seungil You, and Karthik Sridharan

Andrew Cotter, Heinrich Jiang, Maya R. Gupta, Serena Wang, Taman Narayan, Seungil You, and Karthik Sridharan. 2019. Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals.Journal of Machine Learning Research20 (2019), 1–59

2019

[13] [13]

Voorhees

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020. Overview of the TREC 2019 Deep Learning Track. InThe Twenty- Eighth Text REtrieval Conference (TREC) Proceedings

2020

[14] [14]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, ...

work page doi:10.18653/v1/n19-1423 2019

[15] [15]

1990.Introduction to Statistical Pattern Recognition(2 ed.)

Keinosuke Fukunaga. 1990.Introduction to Statistical Pattern Recognition(2 ed.). Academic Press. doi:10.1016/C2009-0-27872-X

work page doi:10.1016/c2009-0-27872-x 1990

[16] [16]

Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. SimCSE: Simple Contrastive Learning of Sentence Embeddings. InProceedings of the 2021 Conference on Em- pirical Methods in Natural Language Processing (EMNLP). Conference’17, July 2017, Washington, DC, USA Kyoungmin Kim, Martin Catheland, and Anastasia Ailamaki

2021

[17] [17]

Yonatan Geifman and Ran El-Yaniv. 2017. Selective Classification for Deep Neural Networks. InAdvances in Neural Information Processing Systems (NeurIPS) 30. arXiv:1705.08500 https://proceedings.neurips.cc/paper_files/paper/2017/hash/ 4a8423d5e91fda00bb7e46540e2b0cf1-Abstract.html

Pith/arXiv arXiv 2017

[18] [18]

Isaac Gibbs and Emmanuel J. Candès. 2021. Adaptive Conformal Inference Under Distribution Shift. InAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 1660–1672. https://proceedings.neurips.cc/ paper/2021/hash/0d441de75945e5acbc865406fc9a2559-Abs...

2021

[19] [19]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network.CoRRabs/1503.02531 (2015). https://arxiv.org/abs/1503.02531

Pith/arXiv arXiv 2015

[20] [20]

Hou et al

A. Hou et al . 2026. Beyond Linear LLM Invocation: An Efficient and Effec- tive Semantic Filter Paradigm (CSV / UniCSV / SimCSV).Proceedings of the VLDB Endowment(2026). https://arxiv.org/abs/2603.04799 To appear; preprint arXiv:2603.04799

arXiv 2026

[21] [21]

Daniel Kang, Edward Gan, Peter Bailis, Tatsunori Hashimoto, and Matei Zaharia

[22] [22]

InProceedings of the VLDB Endowment, Vol

Approximate Selection with Guarantees using Proxies. InProceedings of the VLDB Endowment, Vol. 13. 1990–2003. doi:10.14778/3407790.3407811

work page doi:10.14778/3407790.3407811 1990

[23] [23]

Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2020

[24] [24]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. ACM, 39–48. doi:10. 1145/3397271.3401075

arXiv 2020

[25] [25]

Ferdi Kossmann, Ziniu Wu, Alex Turk, Nesime Tatbul, Lei Cao, and Samuel Madden. 2026. KEN: An Execution Engine for Unstructured Database Systems. Proc. VLDB Endow.19, 5 (2026), 902–916. doi:10.14778/3796195.3796204

work page doi:10.14778/3796195.3796204 2026

[26] [26]

Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, and Wei Ping. 2025. NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models. InThe Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id=lgsyLSsDRe

2025

[27] [27]

2021.Pretrained Transformers for Text Ranking: BERT and Beyond

Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2021.Pretrained Transformers for Text Ranking: BERT and Beyond. Morgan & Claypool Publishers

2021

[28] [28]

Chunwei Liu et al . 2024. Palimpzest: Optimizing AI-Powered Analytics with Semantic Operators. InConference on Innovative Data Systems Research (CIDR)

2024

[29] [29]

Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade Ranking for Opera- tional E-commerce Search. InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. ACM, 1557–1565. doi:10.1145/3097983.3098011

work page doi:10.1145/3097983.3098011 2017

[30] [30]

Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, and Surajit Chaudhuri. 2018. Accelerating Machine Learning Inference with Probabilistic Predicates. InPro- ceedings of the 2018 International Conference on Management of Data, SIG- MOD Conference 2018, Houston, TX, USA, June 10-15, 2018. ACM, 1493–1508. doi:10.1145/3183713.3183751

work page doi:10.1145/3183713.3183751 2018

[31] [31]

Qiuyang Mang, Yufan Xiang, Hangrui Zhou, Runyuan He, Jiaxiang Yu, Hanchen Li, Aditya Parameswaran, and Alvin Cheung. 2026. PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans.CoRRabs/2604.09944 (2026). arXiv:2604.09944 https://arxiv.org/abs/2604.09944

Pith/arXiv arXiv 2026

[32] [32]

Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2023. MTEB: Massive Text Embedding Benchmark. InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL)

2023

[33] [33]

Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. CoRRabs/1901.04085 (2019). arXiv:1901.04085 http://arxiv.org/abs/1901.04085

Pith/arXiv arXiv 2019

[34] [34]

Northcutt, Lu Jiang, and Isaac L

Curtis G. Northcutt, Lu Jiang, and Isaac L. Chuang. 2021. Confident Learning: Estimating Uncertainty in Dataset Labels.J. Artif. Intell. Res.70 (2021), 1373–1411. doi:10.1613/JAIR.1.12125

work page doi:10.1613/jair.1.12125 2021

[35] [35]

Liana Patel et al . 2024. LOTUS: Enabling Semantic Queries with LLMs over Tables. InProceedings of the VLDB Endowment

2024

[36] [36]

Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, and Geoffrey E. Hinton. 2017. Regularizing Neural Networks by Penalizing Confident Output Distributions. InWorkshop Track of the 5th International Conference on Learning Representations (ICLR)

2017

[37] [37]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

2019

[38] [38]

Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, and Matei Zaharia. 2022. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. InProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)

2022

[39] [39]

Parameswaran, and Eugene Wu

Shreya Shankar, Tristan Chambers, Tarak Shah, Aditya G. Parameswaran, and Eugene Wu. 2025. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing.Proc. VLDB Endow.18 (2025). arXiv:2410.12189 doi:10. 14778/3746405.3746426

arXiv 2025

[40] [40]

Panagiotis Sioulas, Viktor Sanca, Ioannis Mytilinis, and Anastasia Ailamaki. 2021. Accelerating Complex Analytics using Speculation. InConference on Innova- tive Data Systems Research (CIDR). https://www.cidrdb.org/cidr2021/papers/ cidr2021_paper03.pdf

2021

[41] [41]

Llama Team. 2024. The Llama 3 Herd of Models.CoRRabs/2407.21783 (2024). arXiv:2407.21783 doi:10.48550/ARXIV.2407.21783

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024

[42] [42]

Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks

2021

[43] [43]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lam- ple. 2023. LLaMA: Open and Efficient Foundation Language Models.CoRR abs/2302.13971 (2023)

Pith/arXiv arXiv 2023

[44] [44]

2005.Algorithmic Learning in a Random World

Vladimir Vovk, Alex Gammerman, and Glenn Shafer. 2005.Algorithmic Learning in a Random World. Springer. doi:10.1007/b106715

work page doi:10.1007/b106715 2005

[45] [45]

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. 2022. Text Embeddings by Weakly-Supervised Contrastive Pre-training.CoRRabs/2212.03533 (2022)

Pith/arXiv arXiv 2022

[46] [46]

Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, and Wenjuan Han. 2023. Zero-Shot Information Extraction via Chatting with ChatGPT.CoRR abs/2302.10205 (2023). arXiv:2302.10205 doi:10.48550/ARXIV.2302.10205

work page doi:10.48550/arxiv.2302.10205 2023

[47] [47]

Edwin B. Wilson. 1927. Probable Inference, the Law of Succession, and Statistical Inference.J. Amer. Statist. Assoc.22, 158 (1927), 209–212

1927

[48] [48]

Margaux Zaffran, Olivier Féron, Yannig Goude, Julie Josse, and Aymeric Dieuleveut. 2022. Adaptive Conformal Predictions for Time Series. InInter- national Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research). PMLR, 25834–25866. https://proceedings.mlr.press/v162/zaffran22a.html

2022

[49] [49]

Sepanta Zeighami, Shreya Shankar, and Aditya Parameswaran. 2025. BAR- GAIN: Budget-Aware Risk-Adaptive Guarantees for Accuracy-Improved LLM Cascades over Documents.Proceedings of the ACM SIGMOD International Con- ference on Management of Data(2025). https://arxiv.org/abs/2509.02896 Preprint arXiv:2509.02896

arXiv 2025

[50] [50]

Hengrui Zhang, Yulong Hui, Yihao Liu, and Huanchen Zhang. 2025. Scale- Doc: Scaling LLM-based Predicates over Large Document Collections.CoRR abs/2509.12610 (2025). arXiv:2509.12610 doi:10.48550/ARXIV.2509.12610

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2509.12610 2025