Algorithm Selection with Zero Domain Knowledge via Text Embeddings
Pith reviewed 2026-05-15 08:40 UTC · model grok-4.3
The pith
Pretrained text embeddings enable algorithm selection across domains without hand-crafted features or domain knowledge.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our approach, ZeroFolio, reads the raw instance file as plain text, embeds it with a pretrained embedding model, and selects an algorithm via weighted k-nearest neighbors. The key observation is that pretrained embeddings produce representations that distinguish problem instances without any domain knowledge or task-specific training, allowing the same serialize-embed-select pipeline to apply across diverse problem domains with text-based instance formats.
What carries the argument
The ZeroFolio pipeline of serializing the instance to plain text, embedding it with a pretrained model, and performing weighted k-nearest-neighbor selection using those embeddings.
If this is right
- The same fixed pipeline applies without modification to any new domain whose instances have a text format.
- Inverse-distance weighting, line shuffling, and Manhattan distance are required for the observed performance gains.
- Soft voting between the embedding selector and a traditional feature-based selector can improve accuracy on scenarios where both perform competitively.
Where Pith is reading between the lines
- The method implies that general text embeddings already encode enough structural signal to replace manual feature design in combinatorial problem domains.
- It opens the possibility of rapidly building selectors for entirely new problem classes by simply collecting instance files and running the same pipeline.
- One could test whether converting non-text instance formats into textual descriptions would let the approach extend beyond the current seven domains.
Load-bearing premise
Pretrained embeddings produce representations that distinguish problem instances in a way that correlates with which algorithm solves them best.
What would settle it
A new set of instances from one of the seven domains where the embedding-based selector chooses worse algorithms than the hand-crafted-feature random forest on the majority of cases.
Figures
read the original abstract
We propose a feature-free approach to algorithm selection that replaces hand-crafted instance features with pretrained text embeddings. Our method, ZeroFolio, proceeds in three steps: it reads the raw instance file as plain text, embeds it with a pretrained embedding model, and selects an algorithm via weighted k-nearest neighbors. The key to our approach is the observation that pretrained embeddings produce representations that distinguish problem instances without any domain knowledge or task-specific training. This allows us to apply the same three-step pipeline (serialize, embed, select) across diverse problem domains with text-based instance formats. We evaluate our approach on 11 ASlib scenarios spanning 7 domains (SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems). Our experiments show that this approach outperforms a random forest trained on hand-crafted features in 10 of 11 scenarios with a single fixed configuration, and in all 11 with two-seed voting; the margin is often substantial. Our ablation study shows that inverse-distance weighting, line shuffling, and Manhattan distance are the key design choices. On scenarios where both selectors are competitive, combining embeddings with hand-crafted features via soft voting yields further improvements.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ZeroFolio, a domain-agnostic algorithm selection method that serializes raw instance files as plain text, embeds them using a fixed pretrained text embedding model, and applies weighted k-nearest neighbors to select the best algorithm. It evaluates this pipeline on 11 ASlib scenarios across SAT, MaxSAT, QBF, ASP, CSP, MIP, and graph problems, claiming consistent outperformance over a random forest baseline trained on hand-crafted features (10/11 scenarios with one fixed configuration, all 11 with two-seed voting) and further gains from soft voting with hand-crafted features.
Significance. If the empirical results hold under tighter controls, the work would be significant for demonstrating that general-purpose pretrained embeddings can serve as effective, zero-knowledge instance representations for algorithm selection. This could reduce reliance on domain-specific feature engineering and enable rapid deployment to new problem classes with text-serializable instances. The ablation identifying inverse-distance weighting, line shuffling, and Manhattan distance as key choices adds practical value.
major comments (3)
- [§4] §4 (Experiments) and abstract: The reported outperformance margins lack statistical significance tests, error bars, or variance across multiple runs; without these, it is impossible to determine whether the gains over the random forest baseline are reliable or could be explained by random variation in the 11 scenarios.
- [§3.1] §3.1 and §5.1: The central assumption that pretrained embeddings capture solver-relevant structure (rather than superficial textual artifacts such as variable naming or comment density) is not directly validated; an analysis correlating embedding distances with actual performance differences or hardness metrics would be required to rule out that the k-NN selector is succeeding on irrelevant surface features.
- [Table 2] Table 2 (or equivalent results table): The baseline random forest is described as trained on hand-crafted features, but no details are given on hyperparameter optimization, feature normalization, or whether the same cross-validation protocol was applied; unequal tuning effort could inflate the apparent advantage of the embedding approach.
minor comments (2)
- [§2] §2: The description of the embedding model (e.g., which specific pretrained model and dimensionality) should be stated explicitly rather than left as 'a pretrained embedding model'.
- [§5.2] §5.2 (Ablation): The line-shuffling ablation would benefit from a quantitative measure of how much performance drops when order is preserved, to clarify its contribution beyond qualitative description.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight important aspects of empirical rigor and transparency that we will address in the revision. Below we respond point-by-point to the major comments.
read point-by-point responses
-
Referee: [§4] §4 (Experiments) and abstract: The reported outperformance margins lack statistical significance tests, error bars, or variance across multiple runs; without these, it is impossible to determine whether the gains over the random forest baseline are reliable or could be explained by random variation in the 11 scenarios.
Authors: We agree that statistical tests and variance reporting are necessary for reliable interpretation. The embedding pipeline is deterministic given fixed pretrained models and data splits, but we will add Wilcoxon signed-rank tests comparing ZeroFolio against the random forest across the 11 scenarios, report mean and standard deviation over the cross-validation folds, and include error bars in the revised results tables and abstract. These additions will appear in Section 4. revision: yes
-
Referee: [§3.1] §3.1 and §5.1: The central assumption that pretrained embeddings capture solver-relevant structure (rather than superficial textual artifacts such as variable naming or comment density) is not directly validated; an analysis correlating embedding distances with actual performance differences or hardness metrics would be required to rule out that the k-NN selector is succeeding on irrelevant surface features.
Authors: This concern is valid and our current evidence is primarily indirect via performance and ablations. In the revision we will add a new analysis (in Section 5 or an appendix) that computes Spearman correlations between embedding distances (under the chosen Manhattan metric) and instance hardness metrics or pairwise runtime differences from the ASlib data. We will also discuss potential surface-feature confounds and note any limitations of the correlation results. revision: yes
-
Referee: [Table 2] Table 2 (or equivalent results table): The baseline random forest is described as trained on hand-crafted features, but no details are given on hyperparameter optimization, feature normalization, or whether the same cross-validation protocol was applied; unequal tuning effort could inflate the apparent advantage of the embedding approach.
Authors: We acknowledge the omission. The random forest baseline used the identical 10-fold cross-validation protocol as ZeroFolio. Hyperparameters were selected via grid search on a held-out validation portion of each scenario, and features were standardized with z-score normalization. We will expand the experimental setup subsection (currently §4.1) with these details, including the exact hyperparameter ranges and library versions, to ensure full reproducibility and fair comparison. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents an empirical pipeline (serialize instance as text, embed with external pretrained model, apply weighted kNN) evaluated directly on 11 ASlib scenarios. Performance claims rest on experimental comparisons to random forests using hand-crafted features, with ablations on design choices such as inverse-distance weighting. No equations or derivations reduce a result to its own inputs by construction, no fitted parameters are relabeled as predictions, and no load-bearing premise depends on self-citation chains or uniqueness theorems. The central observation that pretrained embeddings distinguish instances is treated as an empirical premise verified by the reported results rather than assumed via internal definition.
Axiom & Free-Parameter Ledger
free parameters (2)
- number of neighbors k
- distance metric
axioms (1)
- domain assumption Pretrained text embeddings produce representations that distinguish problem instances without any domain knowledge or task-specific training.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a feature-free approach to algorithm selection that replaces hand-crafted instance features with pretrained text embeddings... serialize, embed, select
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
inverse-distance weighting, line shuffling, and Manhattan distance are the key design choices
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
H., Hutter, F., Leyton - Brown, K., Tierney, K., and Vanschoren, J
Bischl, B., Kerschke, P., Kotthoff, L., Lindauer, M., Malitsky, Y., Fr \' e chette, A., Hoos, H. H., Hutter, F., Leyton - Brown, K., Tierney, K., and Vanschoren, J. (2016). Aslib: A benchmark library for algorithm selection. Artif. Intell. , 237:41--58
work page 2016
-
[2]
Fatemi, B., Halcrow, J., and Perozzi, B. (2024). Talk like a graph: Encoding graphs for large language models. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net
work page 2024
-
[3]
Gao, C., Shang, H., Xue, K., and Qian, C. (2025). Neural solver selection for combinatorial optimization. In Proceedings of the 42nd International Conference on Machine Learning ( ICML )
work page 2025
-
[4]
Kotthoff, L. (2016). Algorithm selection for combinatorial search problems: A survey. Data Min. Knowl. Discov. , 48(3):149--190
work page 2016
-
[5]
Lindauer, M., Hoos, H. H., Hutter, F., and Schaub, T. (2015). Autofolio: An automatically configured algorithm selector. J. Artif. Intell. Res. , 53:745--778
work page 2015
-
[6]
Lindauer, M., van Rijn, J. N., and Kotthoff, L. (2019). The algorithm selection competitions 2015 and 2017. Artif. Intell. , 272:86--100
work page 2019
-
[7]
Liu, T., Amadini, R., Gabbrielli, M., and Mauro, J. (2021). sunny-as2: Enhancing SUNNY for algorithm selection. J. Artif. Intell. Res. , 72:329--376
work page 2021
-
[8]
Loreggia, A., Malitsky, Y., Samulowitz, H., and Saraswat, V. A. (2016). Deep learning for algorithm portfolios. In Proceedings of the 30th AAAI Conference on Artificial Intelligence , pages 1280--1286
work page 2016
-
[9]
Nudelman, E., Leyton-Brown, K., Hoos, H. H., Devkar, A., and Shoham, Y. (2004). Understanding random SAT : Beyond the clauses-to-variables ratio. In Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming ( CP ) , pages 438--452
work page 2004
-
[10]
Pellegrino, A., Akg \" u n, \" O ., Dang, N., Kiziltan, Z., and Miguel, I. (2025). Transformer-based feature learning for algorithm selection in combinatorial optimisation. In 31st International Conference on Principles and Practice of Constraint Programming ( CP ) , volume 307 of LIPIcs , pages 31:1--31:19
work page 2025
-
[11]
Rice, J. R. (1976). The algorithm selection problem. In Advances in Computers , volume 15, pages 65--118. Elsevier
work page 1976
-
[12]
Salinas-Pinto, A., Alvarado-Ulloa, B., Hochbaum, D., Francia-Carrami \ n ana, M., \ N anculef, R., and As \' n-Ach \' a , R. (2024). Text-based feature-free automatic algorithm selection. In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management ( KDIR ) , pages 267--274
work page 2024
-
[13]
Shavit, H. and Hoos, H. H. (2024). Revisiting satzilla features in 2024. In Chakraborty, S. and Jiang, J. R., editors, 27th International Conference on Theory and Applications of Satisfiability Testing, SAT 2024, Pune, India, August 21-24, 2024 , volume 305 of LIPIcs , pages 27:1--27:26. Schloss Dagstuhl - Leibniz-Zentrum f \" u r Informatik
work page 2024
-
[14]
Wu, X., Zhong, Y., Wu, J., Jiang, B., and Tan, K. C. (2024). Large language model-enhanced algorithm selection: Towards comprehensive algorithm representation. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence ( IJCAI ) , pages 5243--5251
work page 2024
-
[15]
Xu, L., Hutter, F., Hoos, H. H., and Leyton - Brown, K. (2008). Satzilla: Portfolio-based algorithm selection for SAT . J. Artif. Intell. Res. , 32:565--606
work page 2008
-
[16]
Yin, H., Kim, J., Mathur, P., Sarker, K., and Bansal, V. (2025). How to talk to language models: Serialization strategies for structured entity matching. In Chiruzzo, L., Ritter, A., and Wang, L., editors, Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29 - May 4, 2025 , volume NAACL 2025 of Find...
work page 2025
-
[17]
Zhang, Z., Ch \' e telat, D., Cotnareanu, J., Ghose, A., Xiao, W., Zhen, H.-L., Zhang, Y., Hao, J., Coates, M., and Yuan, M. (2024). GraSS : Combining graph neural networks with expert knowledge for SAT solver selection. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining ( KDD )
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.