pith. sign in

arxiv: 2607.00144 · v1 · pith:QJSPAD5Jnew · submitted 2026-06-30 · 💻 cs.CV · cs.AI· cs.LG

A Mechanism-Driven Theory of Phase Transitions in Active Learning

Pith reviewed 2026-07-02 19:34 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords active learningphase transitionsgeneralization mechanismsPAC riskstrategy alignmentdata-driven phasemodel-driven phasesegmented regression
0
0 comments X

The pith

Dominance shifts between generalization mechanisms create unavoidable phase transitions that partition active learning into data-driven, transition, and model-driven regimes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes active learning budget regimes as shifts in the dominant generalization mechanism rather than arbitrary label counts. By treating PAC-style risk components as dynamic interacting terms, it proves that dominance shifts must occur, creating a moving bottleneck. This leads to a tripartite taxonomy of phases identified through measurable proxies and segmented regression. The framework accounts for why representativeness, coverage, and uncertainty strategies perform differently at various stages. Experiments on imaging datasets confirm that efficiency hinges on matching the strategy's bias to the current active bottleneck, with self-supervised representations advancing the transition point.

Core claim

Dominance shifts between generalization mechanisms are structurally unavoidable in active learning, creating a moving bottleneck for generalization. This yields a tripartite taxonomy consisting of data-driven, transition, and model-driven phases. The alignment between a strategy's inductive bias and the active bottleneck determines active learning efficiency across natural and medical imaging tasks.

What carries the argument

The moving bottleneck arising from unavoidable dominance shifts among PAC-style risk components reinterpreted as dynamic terms, operationalized via measurable proxies and segmented regression to identify the three phases.

If this is right

  • Representativeness, coverage, and uncertainty strategies excel at different phases due to their alignment with the active bottleneck.
  • Self-supervised representation learning causes the transition phase to occur earlier in the labeling trajectory.
  • AL efficiency is maximized when the query strategy matches the dominant generalization mechanism at each stage.
  • The phases can be identified in practice using proxies without relying on post-hoc definitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Algorithms could dynamically switch between strategies as the detected phase changes along the budget.
  • The same phase structure might appear in other supervised learning settings beyond active learning.
  • Representation quality acts as a modulator that alters the length and position of each phase.
  • Testing on non-imaging domains would reveal whether the tripartite structure generalizes.

Load-bearing premise

PAC-style risk components can be reinterpreted as dynamic interacting terms whose dominance shifts are provably unavoidable and can be identified via measurable proxies and segmented regression without the fitting process itself defining the phases.

What would settle it

A dataset and model where segmented regression on the chosen proxies fails to detect consistent phase boundaries across multiple random seeds, or where dominance shifts can be eliminated by altering the risk decomposition.

Figures

Figures reproduced from arXiv: 2607.00144 by Julia Machnio, Mads Nielsen, Mostafa Mehdipour Ghazi.

Figure 1
Figure 1. Figure 1: Active learning exhibits dynamic bottleneck transitions [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of accuracy and operational proxies on CIFAR-100. Insets provide a granular view of cold-start dynamics. Representativeness proxies decay sharply in Phase I, where TypiClust yields superior marginal gains by minimizing distributional discrepancy. Geometric proxies exhibit discriminative stabilization during the Phase II transition, where coverage-based methods excel. ER and Comp. exhibit the non￾… view at source ↗
Figure 3
Figure 3. Figure 3: Spatiotemporal selection dynamics on ImageNet-50. t-SNE visualizations (Mo￾Cov3 embeddings) compare AL methods across increasing budgets (B). Red points denote new acquisitions and black previously labeled. TypiClust prioritizes manifold typicality, reducing distributional discrepancy (Phase I), while Entropy targets ambigu￾ous boundary regions to minimize risk. This shows how selection bias aligns with th… view at source ↗
Figure 4
Figure 4. Figure 4: Identification of AL regimes as probability maps of each proxy vs annotation budget. The τ points show detected phase transitions with piecewise regression. Beyond τ2, the trajectory enters the model-driven regime. For CIFAR-10, ER and Conf. become the dominant signals, indicating that performance improve￾ments are driven mainly by ER refinement. In contrast, CIFAR-100 retains a noticeable secondary influe… view at source ↗
Figure 5
Figure 5. Figure 5: Spatiotemporal alignment between proxy importance and acquisition strategy rankings. Top: Relative ranking of operational proxies based on their contribution to the generalization bound. Bottom: Corresponding accuracy-based rankings. Red circle refers to equivalent rank. 4.4 Alignment Between Proxy Dominance and Method Ranking To evaluate whether minimizing specific components of the theoretical bound impr… view at source ↗
Figure 6
Figure 6. Figure 6: Low-budget performance of the AL methods on ImageNet-100 using differ￾ent embeddings. SSL enhances early-stage accuracy by providing meaningful features, particularly benefiting TypiClust. For ProbCover, the fixed-radius selection hampers training across embeddings. Structural Significance of Breakpoints. The transition points τ iden￾tified via segmented regression coincide with the most volatile regions o… view at source ↗
Figure 7
Figure 7. Figure 7: Proxy differences relative to Random on CIFAR-100. Positive values indicate improvements relative to random sampling. Early gains are primarily associated with reductions in discrepancy proxies (LD and FD), indicating that representativeness dom￾inates low-budget performance. As labeling progresses, improvements shift toward ge￾ometric coverage and later to empirical risk and complexity, reflecting the tra… view at source ↗
Figure 8
Figure 8. Figure 8: Absolute proxy trajectories for CIFAR-100 with SimCLR features. Improved representation quality leads to faster accuracy growth and earlier stabilization of prox￾ies, compressing the early data-driven regime [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Proxy differences relative to Random on CIFAR-100 with SimCLR features. Relative advantages become smaller and shorter-lived compared to the supervised set￾ting, indicating that stronger representations reduce the discrepancy burden and shift improvements toward refinement [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Absolute proxy trajectories on CIFAR-10 with acquisition batch size b = 100. Discrepancy proxies decrease rapidly, and accuracy saturates earlier than on CIFAR￾100, reflecting the lower intrinsic complexity of the dataset. finally empirical-risk refinement remains invariant. This confirms that our ob￾served phase structure is not an artifact of specific acquisition steps but a fun￾damental property of the… view at source ↗
Figure 11
Figure 11. Figure 11: Proxy differences relative to Random on CIFAR-10 (b = 100). Relative method advantages are concentrated in early rounds and diminish quickly as the labeled pool becomes representative of the dataset [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Absolute proxy trajectories on CIFAR-10 with smaller acquisition batch size (b = 10). The same phase structure remains visible, but transitions become smoother due to finer acquisition granularity. Notably, there is a visible correlation between proxy values and performance; for instance, UHerding maintains the highest values of LD, FD, and GC, which directly corresponds to its status as the worst-perform… view at source ↗
Figure 13
Figure 13. Figure 13: Proxy differences relative to Random on CIFAR-10 (b = 10). Smaller acqui￾sition steps produce smoother deviations from the baseline while preserving the same ordering of active learning mechanisms. ISIC 2019: Class Imbalance and Heterogeneity. Figures 14–17 illustrate the evo￾lution of operational proxies on the ISIC 2019 clinical dataset. Compared to the CIFAR benchmarks, the proxy dynamics in this domai… view at source ↗
Figure 14
Figure 14. Figure 14: Absolute proxy trajectories on ISIC with acquisition batch size b = 1000. Discrepancy and coverage proxies decrease gradually, reflecting the higher variability and class imbalance of the dataset compared to CIFAR benchmarks [PITH_FULL_IMAGE:figures/full_fig_p028_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Proxy differences relative to Random on ISIC (b = 1000). Relative method advantages are smaller and more variable than on CIFAR, indicating that acquisition strategies struggle more to consistently improve over random sampling in this hetero￾geneous medical dataset [PITH_FULL_IMAGE:figures/full_fig_p028_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Absolute proxy trajectories on ISIC with smaller acquisition batch size (b = 8). The same phase structure remains visible but evolves more smoothly due to finer acquisition granularity. budget range and exhibit significantly higher variability across strategies. This behavior is a direct consequence of the extreme class imbalance and high intra￾class heterogeneity inherent to skin lesion dermoscopy. In th… view at source ↗
Figure 17
Figure 17. Figure 17: Proxy differences relative to Random on ISIC (b = 8). Smaller acquisition steps produce smoother deviations from the baseline while preserving the same quali￾tative ordering of active learning mechanisms. distribution remains the primary generalization bottleneck for a substantial por￾tion of the trajectory. Conversely, in the low-budget setting (b = 8), the initial data-driven phase appears temporally co… view at source ↗
Figure 18
Figure 18. Figure 18: t-SNE visualization of sample selection behavior across different AL strate￾gies on CIFAR-10 at increasing annotation budgets. Points are projected using Sim￾CLR embeddings and colored by ground-truth class. Black points indicate previously labeled samples, while red points denote newly selected samples in the current round. Representation-based methods (e.g., TypiClust) emphasize class diversity early on… view at source ↗
Figure 19
Figure 19. Figure 19: t-SNE visualization of sample selection behavior on ImageNet-50 at increas￾ing annotation budgets. Points are projected using MoCov3 embeddings and colored by ground-truth class. Representativeness-based methods achieve broad class coverage early (see ○A ), while uncertainty-based strategies focus on ambiguous regions (see ○B ). ProbCover occasionally oversamples dense regions due to its predefined covera… view at source ↗
Figure 20
Figure 20. Figure 20: Segmented proxy importance across the labeling trajectory on CIFAR-10 for different numbers of segments K. Heatmaps show normalized proxy importance as a function of cumulative labeling budget. Vertical lines indicate estimated regime breakpoints τ . E.3 Regime Identification via Segmented Regression To further analyze the phase structure of active learning dynamics, we perform segmented regression over t… view at source ↗
Figure 21
Figure 21. Figure 21: Segmented regression analysis on CIFAR-100 showing proxy importance across budgets for increasing numbers of segments K. As K increases, distinct break￾points emerge that reveal transitions between discrepancy-dominated, coverage-driven, and refinement regimes. changes in others. Such dynamics are often obscured when viewing raw numerical values or simple rankings. For K = 1, the regression model assumes … view at source ↗
Figure 22
Figure 22. Figure 22: Segmented proxy importance on CIFAR-100 using SimCLR representations. Stronger pretrained features shift regime transitions earlier in the labeling trajectory while preserving the same three-phase structure observed in the supervised represen￾tation. the feature manifold, the active bottleneck shifts from global alignment to the local density and spatial coverage of the representation space. – Late Stages… view at source ↗
Figure 23
Figure 23. Figure 23: Segmented regression analysis on ISIC. Compared to CIFAR datasets, regime transitions occur later and exhibit greater variability due to higher visual diversity and class imbalance characteristic of medical imaging datasets. Importantly, the three-regime model (K = 3) consistently captures these core transitions across all benchmarks while maintaining model parsimony. While larger values of K introduce ad… view at source ↗
Figure 24
Figure 24. Figure 24: Observed versus predicted risk under the segmented regression model (K = 3) on CIFAR-10. Points are colored according to the regime assigned by the model. High R 2 values across acquisition strategies indicate strong predictive accuracy. R2 values. This strong agreement confirms that the selected operational proxies effectively capture the underlying drivers of generalization performance: – CIFAR-10 & CIF… view at source ↗
Figure 25
Figure 25. Figure 25: Regression validation for CIFAR-100 under the K = 3 segmented model. Predicted and observed risks align closely across acquisition strategies, supporting the validity of the identified regime structure. variance ( [PITH_FULL_IMAGE:figures/full_fig_p037_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Regression validation for CIFAR-100 with SimCLR features under the K = 3 segmented model. The strong agreement between predicted and observed risk confirms that the regime decomposition remains consistent when stronger representations are used. E.5 Alignment between Operational Proxies and Acquisition Efficacy To demonstrate the predictive utility of our framework, we visualize the temporal alignment betw… view at source ↗
Figure 27
Figure 27. Figure 27: Observed versus predicted risk for ISIC under the K = 3 segmented regres￾sion model. Despite higher dataset variability, the segmented model captures the main structure of the risk trajectory across acquisition strategies. 2. The Top Strip (Dominant Proxy): Identifies which specific component of the generalization bound is the primary bottleneck at any given step t. 3. The Bottom Strip (Dominant Method): … view at source ↗
Figure 28
Figure 28. Figure 28: Proxy–method alignment across the labeling trajectory on CIFAR-10. The top strip shows the dominant proxy at each labeling budget, while the heatmap dis￾plays normalized proxy importance. The bottom strip indicates the best-performing active learning method. Vertical lines correspond to regime breakpoints identified by segmented regression. Mitigating the Cold-Start with SSL. SSL representations organize … view at source ↗
Figure 29
Figure 29. Figure 29: Proxy–method alignment on CIFAR-100. Early stages are dominated by dis￾crepancy proxies, while coverage becomes more important at intermediate budgets. The best-performing acquisition strategies evolve accordingly. training workflow. By utilizing a structured, frozen feature space, we provide the model with a foundation of high-level semantic knowledge from the onset. Consequently, the downstream classifi… view at source ↗
Figure 30
Figure 30. Figure 30: Proxy–method alignment for CIFAR-100 using SimCLR representations. Im￾proved feature quality shifts regime transitions earlier in the labeling trajectory while preserving the same proxy–method alignment pattern. driven regimes (Phase III) is not only theoretically motivated by generalization bounds but also practically advantageous for large-scale system efficiency [PITH_FULL_IMAGE:figures/full_fig_p043_… view at source ↗
Figure 31
Figure 31. Figure 31: Proxy–method alignment on ISIC with b = 1000. Compared to natural image datasets, proxy transitions occur later and exhibit higher variability due to dataset complexity and class imbalance [PITH_FULL_IMAGE:figures/full_fig_p044_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Proxy–method alignment on ISIC with b = 8 presenting early phase detailed proxy importance. Compared to natural image datasets, proxy transitions occur later and exhibit higher variability due to dataset complexity and class imbalance [PITH_FULL_IMAGE:figures/full_fig_p045_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Low-budget performance of active learning methods on ImageNet-50 using different self-supervised embeddings. Structured SSL representations significantly im￾prove early-stage performance across most methods. Representativeness-based strate￾gies such as TypiClust benefit most from the semantically clustered feature space, while methods relying on fixed geometric assumptions (e.g., ProbCover) may experience… view at source ↗
read the original abstract

Active learning (AL) performance is known to be budget-dependent, yet regimes are typically defined by heuristic label counts that fail to generalize across datasets or architectures. We characterize AL dynamics by reframing budget regimes as shifts in the dominant generalization mechanism. By reinterpreting PAC-style risk components as dynamic interacting terms, we prove that dominance shifts are structurally unavoidable, creating a moving bottleneck for generalization. We operationalize this using measurable proxies and a segmented regression procedure to identify a tripartite taxonomy: data-driven, transition, and model-driven phases. Our framework explains the long-standing observation that representativeness, coverage, and uncertainty strategies excel at different stages. Experiments across natural and medical imaging show that AL efficiency depends on the alignment between the strategy's inductive bias and the active bottleneck. Moreover, self-supervised representation shift transitions earlier along the labeling trajectory, highlighting the role of representation quality in shaping AL dynamics. Overall, this work provides a unified framework for the next generation of transition-aware AL algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to develop a mechanism-driven theory of phase transitions in active learning by reframing budget regimes as shifts in dominant generalization mechanisms. It reinterprets PAC-style risk components as dynamic interacting terms and proves that dominance shifts between them are structurally unavoidable, creating a moving bottleneck. This yields a tripartite taxonomy of data-driven, transition, and model-driven phases, which is operationalized via measurable proxies and segmented regression. The framework explains why representativeness, coverage, and uncertainty strategies excel at different stages, with experiments on natural and medical imaging datasets showing that AL efficiency depends on alignment between strategy inductive bias and the active bottleneck; self-supervised representation shifts are also shown to occur earlier along the labeling trajectory.

Significance. If the structural proof is independent of the regression procedure and the phases are recoverable from a-priori signatures, the work could offer a unified, non-heuristic account of AL dynamics that explains strategy performance variation and guides adaptive algorithm design. The emphasis on mechanism alignment and representation quality provides a falsifiable lens for analyzing budget-dependent behavior across domains.

major comments (2)
  1. [Theoretical derivation of dominance shifts] The central proof that dominance shifts are structurally unavoidable (referenced in the abstract as reinterpreting PAC risk components) must explicitly derive a-priori detectable signatures for the phase boundaries; without this, the segmented regression on proxies risks defining the tripartite taxonomy post-hoc rather than recovering theoretically predicted transitions, rendering the taxonomy descriptive instead of derived.
  2. [Phase identification and experimental validation] The operationalization via proxies and segmented regression (abstract) must demonstrate that phase boundaries are recovered independently of the fitting procedure applied to the same data used to claim strategy alignment effects; otherwise the explanation for why strategies excel at different stages becomes circular.
minor comments (1)
  1. [Abstract] The abstract could specify the measurable proxies used for the data-driven, transition, and model-driven phases to aid immediate assessment of operationalization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which help clarify the distinction between structural unavoidability and operational phase recovery. We address each point below and will revise the manuscript to make the theoretical signatures explicit and to include robustness checks for phase identification.

read point-by-point responses
  1. Referee: [Theoretical derivation of dominance shifts] The central proof that dominance shifts are structurally unavoidable (referenced in the abstract as reinterpreting PAC risk components) must explicitly derive a-priori detectable signatures for the phase boundaries; without this, the segmented regression on proxies risks defining the tripartite taxonomy post-hoc rather than recovering theoretically predicted transitions, rendering the taxonomy descriptive instead of derived.

    Authors: The proof in Section 3 establishes structural unavoidability by showing that the three PAC risk terms possess distinct asymptotic scalings with labeled-set size, guaranteeing at least two dominance crossings. To address the request for a-priori signatures, we will add an explicit derivation of the boundary loci as the solutions to the equation where the ratio of any two risk terms equals unity, expressed in terms of the Lipschitz constants and covering numbers of the hypothesis class. These loci constitute detectable signatures that can be estimated from unlabeled data statistics before regression is applied. Revision will incorporate this derivation so that the taxonomy is recovered from theoretically predicted transitions. revision: yes

  2. Referee: [Phase identification and experimental validation] The operationalization via proxies and segmented regression (abstract) must demonstrate that phase boundaries are recovered independently of the fitting procedure applied to the same data used to claim strategy alignment effects; otherwise the explanation for why strategies excel at different stages becomes circular.

    Authors: The proxies (gradient alignment, representation shift, and uncertainty entropy) are computed from model internals and unlabeled statistics that do not depend on the downstream AL strategy performance curves. To demonstrate independence from the particular segmented-regression fit, the revision will add (i) results using an alternative change-point algorithm (PELT) on the same proxy trajectories and (ii) a cross-validation protocol in which phase boundaries are estimated on the first 60 % of the labeling trajectory and alignment effects are evaluated on the held-out remainder. These checks will confirm that the reported strategy-phase alignments persist under different identification procedures. revision: yes

Circularity Check

1 steps flagged

Segmented regression on proxies defines phases post-hoc rather than recovering theory-predicted shifts

specific steps
  1. fitted input called prediction [Abstract]
    "By reinterpreting PAC-style risk components as dynamic interacting terms, we prove that dominance shifts are structurally unavoidable, creating a moving bottleneck for generalization. We operationalize this using measurable proxies and a segmented regression procedure to identify a tripartite taxonomy: data-driven, transition, and model-driven phases."

    The proof is claimed to establish unavoidable shifts, yet the taxonomy itself is produced by applying segmented regression to proxies on the labeling data. The regression procedure necessarily partitions the trajectory into segments; therefore the three phases and their boundaries are defined by the fit rather than being a priori signatures recovered from the structural argument.

full rationale

The paper asserts a structural proof that dominance shifts between reinterpreted PAC risk components are unavoidable and yield a tripartite taxonomy. However, the taxonomy is located via measurable proxies plus segmented regression, which by construction detects change points in the observed trajectories. This makes the phase boundaries and the resulting data-driven/transition/model-driven classification an output of the fitting procedure applied to the same labeling trajectories used to claim the taxonomy, rather than an independent consequence of the proof. The alignment explanation for AL strategies therefore rests on a fitted description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The reinterpretation of PAC risk components as dynamic terms is treated as a modeling choice whose independence cannot be checked.

pith-pipeline@v0.9.1-grok · 5703 in / 1064 out tokens · 23974 ms · 2026-07-02T19:34:31.194527+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 1 canonical work pages · 1 internal anchor

  1. [1]

    In: Pacific-Asia Conf

    Aghaee, A., Ghadiri, M., Baghshah, M.S.: Active distance-based clustering using k-medoids. In: Pacific-Asia Conf. Adv. Knowl. Discov. Data Min. pp. 253–264 (2016)

  2. [2]

    Ash, J.T., Goel, S., Krishnamurthy, A., Kakade, S.M.: Gone fishing: Neural active learning with fisher embeddings. In: Adv. Neural Inform. Process. Syst. pp. 8927– 8939 (2021)

  3. [3]

    Ash, J.T., Zhang, C., Krishnamurthy, A., Langford, J., Agarwal, A.: Deep batch active learning by diverse, uncertain gradient lower bounds. In: Int. Conf. Learn. Represent. (2020)

  4. [4]

    Bae,W.,Sutherland,D.J.,Oliveira,G.L.:UncertaintyHerding:Oneactivelearning method for all label budgets. In: Int. Conf. Learn. Represent. (2025)

  5. [5]

    Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for con- trastive learning of visual representations. In: Int. Conf. Mach. Learn. pp. 1597– 1607 (2020)

  6. [6]

    Improved Baselines with Momentum Contrastive Learning

    Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum con- trastive learning. arXiv:2003.04297 [cs.CV] (2020)

  7. [7]

    Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Int. Conf. Comput. Vis. pp. 9640–9649 (2021)

  8. [8]

    Cortes, C., Mohri, M., Riley, M., Rostamizadeh, A.: Sample selection bias correc- tion theory. In: Int. Conf. Algorithmic Learn. Theory. pp. 38–53 (2008)

  9. [9]

    Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: Int. Conf. Mach. Learn. pp. 1183–1192 (2017)

  10. [10]

    MethodsX 7, 100864 (2020)

    Gessert, N., Nielsen, M., Shaikh, M., Werner, R., Schlaefer, A.: Skin lesion classifi- cation using ensembles of multi-resolution EfficientNets with meta data. MethodsX 7, 100864 (2020)

  11. [11]

    Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Do- ersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: Adv. Neural Inform. Process. Syst. pp. 21271–21284 (2020)

  12. [12]

    Hacohen, G., Dekel, A., Weinshall, D.: Active learning on a budget: Opposite strategies suit high and low budgets. In: Int. Conf. Mach. Learn. pp. 8175–8195 (2022)

  13. [13]

    Springer (2009)

    Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2009)

  14. [14]

    In: IEEE Conf

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 770–778 (2016)

  15. [15]

    In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

    Kaushal, V., Iyer, R., Kothawade, S., Mahadev, R., Doctor, K., Ramakrishnan, G.: Learning from less data: A unified data subset selection and active learning framework for computer vision. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 1289–1299. IEEE (2019)

  16. [16]

    Advances in neural information pro- cessing systems32(2019)

    Kirsch, A., Van Amersfoort, J., Gal, Y.: Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information pro- cessing systems32(2019)

  17. [17]

    Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)

  18. [18]

    SIGIR Forum29, 13–19 (1995)

    Lewis, D.D.: A sequential algorithm for training text classifiers: Corrigendum and additional data. SIGIR Forum29, 13–19 (1995)

  19. [19]

    In: Chinese Conf

    Liang, H., Qiang, S., Ma, H., Wan, J., Liang, Y.: Semantic segmentation active learning with scene coverage coreset. In: Chinese Conf. Biom. Recognit. pp. 238– 247 (2024) A Mechanism-Driven Theory of Phase Transitions in Active Learning 17

  20. [20]

    IEEE Trans

    Maalouf, A., Eini, G., Mussay, B., Feldman, D., Osadchy, M.: A unified approach to coreset learning. IEEE Trans. Neural Networks Learn. Syst. pp. 6893–6905 (2024)

  21. [21]

    In: Northern Lights Deep Learn

    Menden, V., Saleh, Y., Iske, A.: Bounds on the generalization error in active learn- ing. In: Northern Lights Deep Learn. Conf. pp. 168–175 (2025)

  22. [22]

    MIT press (2012)

    Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. MIT press (2012)

  23. [23]

    Scheffer, T., Decomain, C., Wrobel, S.: Active hidden markov models for informa- tion extraction. In: Int. Conf. Intell. Data Anal. pp. 309–318 (2001)

  24. [24]

    Sener, O., Savarese, S.: Active learning for convolutional neural networks: A core- set approach. In: Int. Conf. Learn. Represent. (2018)

  25. [25]

    Settles, B.: Active learning literature survey (2009)

  26. [26]

    Electron

    Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.: On the empirical estimation of integral probability metrics. Electron. J. Stat.6, 1550–1599 (2012)

  27. [27]

    Valiant, L.G.: A theory of the learnable. Commun. ACM27(11), 1134–1142 (1984)

  28. [28]

    Voevodski, K., Balcan, M.F., Röglin, H., Teng, S.H., Xia, Y.: Active clustering of biological sequences. J. Mach. Learn. Res.13, 203–225 (2012)

  29. [29]

    In: International conference on machine learning

    Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: International conference on machine learning. pp. 1954–1963. PMLR (2015)

  30. [30]

    Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Eur. Conf. Inf. Retr. pp. 393–407 (2003)

  31. [31]

    gradient

    Yehuda, O., Dekel, A., Hacohen, G., Weinshall, D.: Active learning through a covering lens. In: Adv. Neural Inform. Process. Syst. pp. 22354–22367 (2022) Appendix A Structural Properties of Bound Components Lemma 1 (Non-monotonic empirical risk under adaptive sampling). Letℓbe a bounded loss and letS t ⊂S t+1 withS t+1 =S t ∪A t. Suppose the learner retur...