pith. sign in

arxiv: 1907.06010 · v1 · pith:GNDH4S3Bnew · submitted 2019-07-13 · 💻 cs.LG · stat.ML

The Futility of Bias-Free Learning and Search

Pith reviewed 2026-05-24 22:10 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords bias in machine learninginformation resourcesconservation of biassearch view of learningno-free-lunch implicationsgeometric success measure
0
0 comments X

The pith

Machine learning requires bias toward a target, and that bias cannot favor many targets at once without trade-offs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats learning as a search process over possible datasets or information resources. It proves that for any fixed amount of bias favoring one particular target, only a limited share of those resources can actually help the search succeed. Bias behaves as a conserved quantity, so an algorithm cannot spread favorable bias across many different targets simultaneously. Success probability equals the geometric angle between the task's true structure and the assumptions encoded in the algorithm's bias. Finding a distribution over resources that supplies useful bias is itself provably hard unless the resources already align with the task.

Core claim

For a given degree of bias toward a fixed target, the proportion of favorable information resources is strictly bounded from above. Bias is a conserved quantity such that no algorithm can be favorably biased toward many distinct targets simultaneously. The probability of success for a task equals the angle of agreement between what holds for the actual task and what the algorithm assumes through its bias. Finding a favorably biasing distribution over a fixed set of information resources is provably difficult unless the set of resources itself is already favorable with respect to the given task and algorithm.

What carries the argument

Bias defined as a measure relative to a collection of possible information resources, which bounds the fraction of helpful resources and enforces conservation across targets.

If this is right

  • Success on one task necessarily reduces the resources available for success on other tasks.
  • The geometric angle between task and bias directly limits achievable success probability.
  • Any attempt to select a biasing distribution over resources faces a hardness barrier unless the resources already match the task.
  • Bias encodes unavoidable trade-offs between different learning targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The conservation result suggests that multi-task learning must involve explicit sharing mechanisms rather than hoping one bias serves all tasks.
  • The bounded proportion of favorable resources offers a quantitative way to compare how much prior structure different algorithms inject.
  • If the view of learning as search over information resources holds, then claims of completely general learners without task-specific assumptions become impossible to sustain.

Load-bearing premise

Machine learning can be viewed productively as search over a collection of possible datasets or information resources, with bias defined relative to that collection.

What would settle it

An explicit algorithm that achieves higher-than-random success on multiple unrelated targets while maintaining the same bias level toward each, or a bias-free learner whose success rate exceeds the uniform random baseline on a non-trivial task.

Figures

Figures reproduced from arXiv: 1907.06010 by Akshay Trikha, Dominique Macias, George D. Montanez, Jonathan Hayase, Julia Vendemiatti, Julius Lauw.

Figure 1
Figure 1. Figure 1: As a black-box optimization algorithm samples from Ω, it produc [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: These examples visualize the target divergence for various p [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
read the original abstract

Building on the view of machine learning as search, we demonstrate the necessity of bias in learning, quantifying the role of bias (measured relative to a collection of possible datasets, or more generally, information resources) in increasing the probability of success. For a given degree of bias towards a fixed target, we show that the proportion of favorable information resources is strictly bounded from above. Furthermore, we demonstrate that bias is a conserved quantity, such that no algorithm can be favorably biased towards many distinct targets simultaneously. Thus bias encodes trade-offs. The probability of success for a task can also be measured geometrically, as the angle of agreement between what holds for the actual task and what is assumed by the algorithm, represented in its bias. Lastly, finding a favorably biasing distribution over a fixed set of information resources is provably difficult, unless the set of resources itself is already favorable with respect to the given task and algorithm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper models machine learning as search over a fixed collection of information resources (e.g., datasets) and argues that bias—defined as excess success probability relative to uniform search—is necessary. It claims that for any fixed degree of bias toward one target, the fraction of favorable resources is strictly upper-bounded; that bias is conserved across targets (no algorithm can be favorably biased toward many distinct targets simultaneously); that success probability equals the angle of agreement between the algorithm's bias and the actual task; and that finding a favorably biasing distribution over resources is hard unless the resources are already favorable for the task.

Significance. If the modeling assumptions hold, the work supplies a parameter-free derivation of bias necessity, an explicit conservation law, and a geometric view of agreement, all obtained directly from averaging and linearity over the resource collection. These results formalize well-known intuitions about inductive bias and multi-task trade-offs without fitted parameters or invented entities, providing a clean theoretical lens that could inform algorithm analysis and the design of bias-aware search procedures.

major comments (2)
  1. [Abstract and main text (no numbered equations or definitions supplied)] The central claims rest on the definition of bias as excess success probability relative to the collection; the upper bound follows by averaging and conservation by linearity. However, the manuscript states these results in the abstract and text without exhibiting the explicit definitions, the averaging step, or the linearity argument, so the derivations cannot be inspected for edge cases (e.g., when success probabilities are zero or when the collection is finite but small).
  2. [Modeling premise (weakest assumption noted in reader's report)] The modeling choice that the collection of information resources is fixed and exhaustive is load-bearing for both the bound and the conservation claim. The paper does not examine what happens if the collection is misspecified, incomplete, or task-dependent, which would alter the averaging argument.
minor comments (3)
  1. Add a short formal section or appendix with the bias definition, the averaging argument for the proportion bound, and the linearity argument for conservation; this would raise soundness from its current low level without changing the central claim.
  2. The geometric angle interpretation is described only verbally; a one-paragraph vector-space example or a simple diagram would clarify the inner-product view of agreement between assumed and actual task properties.
  3. The final claim that finding a favorably biasing distribution is 'provably difficult' is stated without reference to a complexity result or reduction; either supply the argument or qualify the statement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and will incorporate revisions to improve the clarity and scope of the manuscript.

read point-by-point responses
  1. Referee: [Abstract and main text (no numbered equations or definitions supplied)] The central claims rest on the definition of bias as excess success probability relative to the collection; the upper bound follows by averaging and conservation by linearity. However, the manuscript states these results in the abstract and text without exhibiting the explicit definitions, the averaging step, or the linearity argument, so the derivations cannot be inspected for edge cases (e.g., when success probabilities are zero or when the collection is finite but small).

    Authors: We agree that the derivations should be presented explicitly. In the revised version we will add formal definitions of bias (as excess success probability relative to the resource collection), the averaging argument establishing the upper bound on the fraction of favorable resources, and the linearity argument for bias conservation. We will also include a dedicated subsection examining edge cases, including zero success probabilities and small finite collections, to confirm the results hold under these conditions. revision: yes

  2. Referee: [Modeling premise (weakest assumption noted in reader's report)] The modeling choice that the collection of information resources is fixed and exhaustive is load-bearing for both the bound and the conservation claim. The paper does not examine what happens if the collection is misspecified, incomplete, or task-dependent, which would alter the averaging argument.

    Authors: The fixed exhaustive collection is indeed the modeling premise that permits the direct averaging and linearity arguments. We will add a new discussion section in the revision that explicitly addresses the consequences of relaxing this assumption, including how misspecification, incompleteness, or task-dependence would modify the bounds and conservation law. This addition will delineate the scope of the current results without altering the core model. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper models learning as search over a fixed collection of information resources and defines bias as excess success probability relative to that collection. The upper bound on favorable resources follows directly from averaging over the collection (a mathematical consequence of the definition), and bias conservation follows from linearity of the excess-probability sum across targets. These are standard implications of the chosen formalization rather than reductions of outputs to inputs by construction. No equations equate a derived quantity to a fitted parameter, no self-citation chain supplies a uniqueness theorem, and no ansatz is smuggled via prior work. The argument is self-contained against external benchmarks and contains no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the modeling choice that learning equals search over information resources and on the relative definition of bias; no free parameters or new entities are introduced in the abstract.

axioms (1)
  • domain assumption Machine learning can be viewed as search over a collection of possible datasets or information resources
    Explicitly invoked in the first sentence of the abstract as the foundation for all subsequent claims.

pith-pipeline@v0.9.0 · 5699 in / 1119 out tokens · 21702 ms · 2026-05-24T22:10:34.326561+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    Addison-Wesley Longman Publishing Company (1999)

    Goldberg, D.: Genetic algorithms in search optimization and machine learning. Addison-Wesley Longman Publishing Company (1999)

  2. [2]

    Journal of Machine Learning Research 17(8), 1–32 (2016)

    G¨ ul¸ cehre, C ¸ ., Bengio, Y.: Knowledge matters: Importance of prior information for optimization. Journal of Machine Learning Research 17(8), 1–32 (2016)

  3. [3]

    no free lunch

    McDermott, J.: When and why metaheuristics re- searchers can ignore “no free lunch” theorems. Metaheuris- tics (Mar 2019). https://doi.org/10.1007/s42257-019-00 002-6, https://doi.org/10.1007/s42257-019-00002-6

  4. [4]

    In: Rutgers Univer- sity: CBM-TR-117 (1980)

    Mitchell, T.D.: The need for biases in learning generaliz ations. In: Rutgers Univer- sity: CBM-TR-117 (1980)

  5. [5]

    In: 2017 IEEE International Conference on Systems, M an, and Cybernetics (SMC)

    Monta ˜ nez, G.D.: The famine of forte: Few search problems greatly favor your algo- rithm. In: 2017 IEEE International Conference on Systems, M an, and Cybernetics (SMC). pp. 477–482. IEEE (2017)

  6. [6]

    In: Dissert ation

    Monta ˜ nez, G.D.: Why machine learning works. In: Dissert ation. pp. 52–59. Carnegie Mellon University (2017)

  7. [7]

    In: Proc eedings of the 13th International Conference on Neural Information Processin g Systems

    Rasmussen, C.E., Ghahramani, Z.: Occam’s razor. In: Proc eedings of the 13th International Conference on Neural Information Processin g Systems. pp. 276–282. NIPS’00, MIT Press, Cambridge, MA, USA (2000)

  8. [8]

    Reeves, C., Rowe, J.E.: Genetic algorithms: principles a nd perspectives: a guide to GA theory, vol. 20. Springer Science & Business Media (2002)

  9. [9]

    Systems, Man, and Cybernetics, Part C: Applications and Rev iews, IEEE Trans- actions on 35, 233 – 243 (06 2005)

    Runarsson, T., Yao, X.: Search biases in constrained evol utionary optimization. Systems, Man, and Cybernetics, Part C: Applications and Rev iews, IEEE Trans- actions on 35, 233 – 243 (06 2005). https://doi.org/10.1109/TSMCC.2004 .841906

  10. [10]

    In: Machine Learn- ing Proceedings 1994, pp

    Schaffer, C.: A conservation law for generalization perf ormance. In: Machine Learn- ing Proceedings 1994, pp. 259–265. Elsevier (1994)

  11. [11]

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp

    Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prio r. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp. 9446–9454 (2018)

  12. [12]

    Wolpert, D.H., Macready, W.G.: No free lunch theorems fo r optimization. Trans. Evol. Comp 1(1), 67–82 (Apr 1997). https://doi.org/10.1109/4235.585 893 The Futility of Bias-Free Learning and Search 13 7 Appendix: Proofs Lemma 1 (Expected Per Query Performance F rom Expected Distr i- bution). This lemma has been proven by Monta˜ nez [5] and is directly dr ...

  13. [13]

    Theorem 3 (F amine of F avorable Information Resources)

    So, Pr(q(t, F ) ≥ qmin) ≤ ∥t∥ qmin ( t⊤ ED[P F ] ∥t∥∥ED[P F ]∥ ) = ∥t∥ qmin cos ( arccos ( t⊤ ED[P F ] ∥t∥∥ED[P F ]∥ )) By the definition of target divergence, we have Pr(q(t, F ) ≥ qmin) ≤ ∥t∥ cos(θ) qmin . Theorem 3 (F amine of F avorable Information Resources). Let B be a finite set of information resources and let t ⊆ Ω be an arbitrary fixed k-size targe...