pith. sign in

arxiv: 2604.19753 · v1 · submitted 2026-03-20 · 💻 cs.AI · cs.CL· cs.LG

Algorithm Selection with Zero Domain Knowledge via Text Embeddings

Pith reviewed 2026-05-15 08:40 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG
keywords algorithm selectiontext embeddingspretrained modelszero domain knowledgeweighted k-nearest neighborsASlib scenarioscombinatorial problems
0
0 comments X

The pith

Pretrained text embeddings enable algorithm selection across domains without hand-crafted features or domain knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a fixed three-step pipeline can select algorithms effectively by treating raw problem instance files as plain text, embedding them with a pretrained model, and choosing via weighted k-nearest neighbors. This replaces the usual requirement for domain-specific feature engineering and works uniformly on SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems. Experiments across 11 ASlib scenarios show the method beats a random forest trained on hand-crafted features in 10 scenarios with one configuration and in all 11 with two-seed voting, often by large margins. An ablation study isolates inverse-distance weighting, line shuffling, and Manhattan distance as the key design choices that make the pipeline succeed.

Core claim

Our approach, ZeroFolio, reads the raw instance file as plain text, embeds it with a pretrained embedding model, and selects an algorithm via weighted k-nearest neighbors. The key observation is that pretrained embeddings produce representations that distinguish problem instances without any domain knowledge or task-specific training, allowing the same serialize-embed-select pipeline to apply across diverse problem domains with text-based instance formats.

What carries the argument

The ZeroFolio pipeline of serializing the instance to plain text, embedding it with a pretrained model, and performing weighted k-nearest-neighbor selection using those embeddings.

If this is right

  • The same fixed pipeline applies without modification to any new domain whose instances have a text format.
  • Inverse-distance weighting, line shuffling, and Manhattan distance are required for the observed performance gains.
  • Soft voting between the embedding selector and a traditional feature-based selector can improve accuracy on scenarios where both perform competitively.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method implies that general text embeddings already encode enough structural signal to replace manual feature design in combinatorial problem domains.
  • It opens the possibility of rapidly building selectors for entirely new problem classes by simply collecting instance files and running the same pipeline.
  • One could test whether converting non-text instance formats into textual descriptions would let the approach extend beyond the current seven domains.

Load-bearing premise

Pretrained embeddings produce representations that distinguish problem instances in a way that correlates with which algorithm solves them best.

What would settle it

A new set of instances from one of the seven domains where the embedding-based selector chooses worse algorithms than the hand-crafted-feature random forest on the majority of cases.

Figures

Figures reproduced from arXiv: 2604.19753 by Stefan Szeider.

Figure 1
Figure 1. Figure 1: Percentage of the SBS–VBS gap closed. RF on handcrafted features; Zero￾Folio. ZeroFolio closes a larger fraction on all 11 scenarios; RF falls below SBS on two. We propose an approach that does not require feature engineering. The basic idea is to read the raw instance file as plain text, embed it with a pretrained text embedding model, and subsequently select an algorithm via weighted k-nearest neighbors.… view at source ↗
Figure 2
Figure 2. Figure 2: The ZeroFolio pipeline: serialize, embed, select. The same three steps apply [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: t-SNE projection of Gemini embeddings for SAT12-ALL (1367 instances), colored [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

We propose a feature-free approach to algorithm selection that replaces hand-crafted instance features with pretrained text embeddings. Our method, ZeroFolio, proceeds in three steps: it reads the raw instance file as plain text, embeds it with a pretrained embedding model, and selects an algorithm via weighted k-nearest neighbors. The key to our approach is the observation that pretrained embeddings produce representations that distinguish problem instances without any domain knowledge or task-specific training. This allows us to apply the same three-step pipeline (serialize, embed, select) across diverse problem domains with text-based instance formats. We evaluate our approach on 11 ASlib scenarios spanning 7 domains (SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems). Our experiments show that this approach outperforms a random forest trained on hand-crafted features in 10 of 11 scenarios with a single fixed configuration, and in all 11 with two-seed voting; the margin is often substantial. Our ablation study shows that inverse-distance weighting, line shuffling, and Manhattan distance are the key design choices. On scenarios where both selectors are competitive, combining embeddings with hand-crafted features via soft voting yields further improvements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes ZeroFolio, a domain-agnostic algorithm selection method that serializes raw instance files as plain text, embeds them using a fixed pretrained text embedding model, and applies weighted k-nearest neighbors to select the best algorithm. It evaluates this pipeline on 11 ASlib scenarios across SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems, claiming consistent outperformance over a random forest baseline trained on hand-crafted features (10/11 scenarios with one fixed configuration, all 11 with two-seed voting) and further gains from soft voting with hand-crafted features.

Significance. If the empirical results hold under tighter controls, the work would be significant for demonstrating that general-purpose pretrained embeddings can serve as effective, zero-knowledge instance representations for algorithm selection. This could reduce reliance on domain-specific feature engineering and enable rapid deployment to new problem classes with text-serializable instances. The ablation identifying inverse-distance weighting, line shuffling, and Manhattan distance as key choices adds practical value.

major comments (3)
  1. [§4] §4 (Experiments) and abstract: The reported outperformance margins lack statistical significance tests, error bars, or variance across multiple runs; without these, it is impossible to determine whether the gains over the random forest baseline are reliable or could be explained by random variation in the 11 scenarios.
  2. [§3.1] §3.1 and §5.1: The central assumption that pretrained embeddings capture solver-relevant structure (rather than superficial textual artifacts such as variable naming or comment density) is not directly validated; an analysis correlating embedding distances with actual performance differences or hardness metrics would be required to rule out that the k-NN selector is succeeding on irrelevant surface features.
  3. [Table 2] Table 2 (or equivalent results table): The baseline random forest is described as trained on hand-crafted features, but no details are given on hyperparameter optimization, feature normalization, or whether the same cross-validation protocol was applied; unequal tuning effort could inflate the apparent advantage of the embedding approach.
minor comments (2)
  1. [§2] §2: The description of the embedding model (e.g., which specific pretrained model and dimensionality) should be stated explicitly rather than left as 'a pretrained embedding model'.
  2. [§5.2] §5.2 (Ablation): The line-shuffling ablation would benefit from a quantitative measure of how much performance drops when order is preserved, to clarify its contribution beyond qualitative description.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important aspects of empirical rigor and transparency that we will address in the revision. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments) and abstract: The reported outperformance margins lack statistical significance tests, error bars, or variance across multiple runs; without these, it is impossible to determine whether the gains over the random forest baseline are reliable or could be explained by random variation in the 11 scenarios.

    Authors: We agree that statistical tests and variance reporting are necessary for reliable interpretation. The embedding pipeline is deterministic given fixed pretrained models and data splits, but we will add Wilcoxon signed-rank tests comparing ZeroFolio against the random forest across the 11 scenarios, report mean and standard deviation over the cross-validation folds, and include error bars in the revised results tables and abstract. These additions will appear in Section 4. revision: yes

  2. Referee: [§3.1] §3.1 and §5.1: The central assumption that pretrained embeddings capture solver-relevant structure (rather than superficial textual artifacts such as variable naming or comment density) is not directly validated; an analysis correlating embedding distances with actual performance differences or hardness metrics would be required to rule out that the k-NN selector is succeeding on irrelevant surface features.

    Authors: This concern is valid and our current evidence is primarily indirect via performance and ablations. In the revision we will add a new analysis (in Section 5 or an appendix) that computes Spearman correlations between embedding distances (under the chosen Manhattan metric) and instance hardness metrics or pairwise runtime differences from the ASlib data. We will also discuss potential surface-feature confounds and note any limitations of the correlation results. revision: yes

  3. Referee: [Table 2] Table 2 (or equivalent results table): The baseline random forest is described as trained on hand-crafted features, but no details are given on hyperparameter optimization, feature normalization, or whether the same cross-validation protocol was applied; unequal tuning effort could inflate the apparent advantage of the embedding approach.

    Authors: We acknowledge the omission. The random forest baseline used the identical 10-fold cross-validation protocol as ZeroFolio. Hyperparameters were selected via grid search on a held-out validation portion of each scenario, and features were standardized with z-score normalization. We will expand the experimental setup subsection (currently §4.1) with these details, including the exact hyperparameter ranges and library versions, to ensure full reproducibility and fair comparison. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical pipeline (serialize instance as text, embed with external pretrained model, apply weighted kNN) evaluated directly on 11 ASlib scenarios. Performance claims rest on experimental comparisons to random forests using hand-crafted features, with ablations on design choices such as inverse-distance weighting. No equations or derivations reduce a result to its own inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing premise depends on self-citation chains or uniqueness theorems. The central observation that pretrained embeddings distinguish instances is treated as an empirical premise verified by the reported results rather than assumed via internal definition.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the domain assumption that general-purpose text embeddings distinguish computational problem instances, presented as an empirical observation rather than a derived result.

free parameters (2)
  • number of neighbors k
    Hyperparameter in the weighted k-nearest neighbors selection step, chosen as part of the fixed configuration.
  • distance metric
    Manhattan distance selected after ablation as a key design choice.
axioms (1)
  • domain assumption Pretrained text embeddings produce representations that distinguish problem instances without any domain knowledge or task-specific training.
    This is stated as the key observation enabling the zero-knowledge approach.

pith-pipeline@v0.9.0 · 5500 in / 1453 out tokens · 53305 ms · 2026-05-15T08:40:42.096648+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    H., Hutter, F., Leyton - Brown, K., Tierney, K., and Vanschoren, J

    Bischl, B., Kerschke, P., Kotthoff, L., Lindauer, M., Malitsky, Y., Fr \' e chette, A., Hoos, H. H., Hutter, F., Leyton - Brown, K., Tierney, K., and Vanschoren, J. (2016). Aslib: A benchmark library for algorithm selection. Artif. Intell. , 237:41--58

  2. [2]

    Fatemi, B., Halcrow, J., and Perozzi, B. (2024). Talk like a graph: Encoding graphs for large language models. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net

  3. [3]

    Gao, C., Shang, H., Xue, K., and Qian, C. (2025). Neural solver selection for combinatorial optimization. In Proceedings of the 42nd International Conference on Machine Learning ( ICML )

  4. [4]

    Kotthoff, L. (2016). Algorithm selection for combinatorial search problems: A survey. Data Min. Knowl. Discov. , 48(3):149--190

  5. [5]

    H., Hutter, F., and Schaub, T

    Lindauer, M., Hoos, H. H., Hutter, F., and Schaub, T. (2015). Autofolio: An automatically configured algorithm selector. J. Artif. Intell. Res. , 53:745--778

  6. [6]

    N., and Kotthoff, L

    Lindauer, M., van Rijn, J. N., and Kotthoff, L. (2019). The algorithm selection competitions 2015 and 2017. Artif. Intell. , 272:86--100

  7. [7]

    Liu, T., Amadini, R., Gabbrielli, M., and Mauro, J. (2021). sunny-as2: Enhancing SUNNY for algorithm selection. J. Artif. Intell. Res. , 72:329--376

  8. [8]

    Loreggia, A., Malitsky, Y., Samulowitz, H., and Saraswat, V. A. (2016). Deep learning for algorithm portfolios. In Proceedings of the 30th AAAI Conference on Artificial Intelligence , pages 1280--1286

  9. [9]

    H., Devkar, A., and Shoham, Y

    Nudelman, E., Leyton-Brown, K., Hoos, H. H., Devkar, A., and Shoham, Y. (2004). Understanding random SAT : Beyond the clauses-to-variables ratio. In Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming ( CP ) , pages 438--452

  10. [10]

    Pellegrino, A., Akg \" u n, \" O ., Dang, N., Kiziltan, Z., and Miguel, I. (2025). Transformer-based feature learning for algorithm selection in combinatorial optimisation. In 31st International Conference on Principles and Practice of Constraint Programming ( CP ) , volume 307 of LIPIcs , pages 31:1--31:19

  11. [11]

    Rice, J. R. (1976). The algorithm selection problem. In Advances in Computers , volume 15, pages 65--118. Elsevier

  12. [12]

    Salinas-Pinto, A., Alvarado-Ulloa, B., Hochbaum, D., Francia-Carrami \ n ana, M., \ N anculef, R., and As \' n-Ach \' a , R. (2024). Text-based feature-free automatic algorithm selection. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management ( KDIR ) , pages 267--274

  13. [13]

    and Hoos, H

    Shavit, H. and Hoos, H. H. (2024). Revisiting satzilla features in 2024. In Chakraborty, S. and Jiang, J. R., editors, 27th International Conference on Theory and Applications of Satisfiability Testing, SAT 2024, Pune, India, August 21-24, 2024 , volume 305 of LIPIcs , pages 27:1--27:26. Schloss Dagstuhl - Leibniz-Zentrum f \" u r Informatik

  14. [14]

    Wu, X., Zhong, Y., Wu, J., Jiang, B., and Tan, K. C. (2024). Large language model-enhanced algorithm selection: Towards comprehensive algorithm representation. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence ( IJCAI ) , pages 5243--5251

  15. [15]

    H., and Leyton - Brown, K

    Xu, L., Hutter, F., Hoos, H. H., and Leyton - Brown, K. (2008). Satzilla: Portfolio-based algorithm selection for SAT . J. Artif. Intell. Res. , 32:565--606

  16. [16]

    Yin, H., Kim, J., Mathur, P., Sarker, K., and Bansal, V. (2025). How to talk to language models: Serialization strategies for structured entity matching. In Chiruzzo, L., Ritter, A., and Wang, L., editors, Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29 - May 4, 2025 , volume NAACL 2025 of Find...

  17. [17]

    Zhang, Z., Ch \' e telat, D., Cotnareanu, J., Ghose, A., Xiao, W., Zhen, H.-L., Zhang, Y., Hao, J., Coates, M., and Yuan, M. (2024). GraSS : Combining graph neural networks with expert knowledge for SAT solver selection. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining ( KDD )