A Mechanism-Driven Theory of Phase Transitions in Active Learning

Julia Machnio; Mads Nielsen; Mostafa Mehdipour Ghazi

arxiv: 2607.00144 · v1 · pith:QJSPAD5Jnew · submitted 2026-06-30 · 💻 cs.CV · cs.AI· cs.LG

A Mechanism-Driven Theory of Phase Transitions in Active Learning

Julia Machnio , Mads Nielsen , Mostafa Mehdipour Ghazi This is my paper

Pith reviewed 2026-07-02 19:34 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords active learningphase transitionsgeneralization mechanismsPAC riskstrategy alignmentdata-driven phasemodel-driven phasesegmented regression

0 comments

The pith

Dominance shifts between generalization mechanisms create unavoidable phase transitions that partition active learning into data-driven, transition, and model-driven regimes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reframes active learning budget regimes as shifts in the dominant generalization mechanism rather than arbitrary label counts. By treating PAC-style risk components as dynamic interacting terms, it proves that dominance shifts must occur, creating a moving bottleneck. This leads to a tripartite taxonomy of phases identified through measurable proxies and segmented regression. The framework accounts for why representativeness, coverage, and uncertainty strategies perform differently at various stages. Experiments on imaging datasets confirm that efficiency hinges on matching the strategy's bias to the current active bottleneck, with self-supervised representations advancing the transition point.

Core claim

Dominance shifts between generalization mechanisms are structurally unavoidable in active learning, creating a moving bottleneck for generalization. This yields a tripartite taxonomy consisting of data-driven, transition, and model-driven phases. The alignment between a strategy's inductive bias and the active bottleneck determines active learning efficiency across natural and medical imaging tasks.

What carries the argument

The moving bottleneck arising from unavoidable dominance shifts among PAC-style risk components reinterpreted as dynamic terms, operationalized via measurable proxies and segmented regression to identify the three phases.

If this is right

Representativeness, coverage, and uncertainty strategies excel at different phases due to their alignment with the active bottleneck.
Self-supervised representation learning causes the transition phase to occur earlier in the labeling trajectory.
AL efficiency is maximized when the query strategy matches the dominant generalization mechanism at each stage.
The phases can be identified in practice using proxies without relying on post-hoc definitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Algorithms could dynamically switch between strategies as the detected phase changes along the budget.
The same phase structure might appear in other supervised learning settings beyond active learning.
Representation quality acts as a modulator that alters the length and position of each phase.
Testing on non-imaging domains would reveal whether the tripartite structure generalizes.

Load-bearing premise

PAC-style risk components can be reinterpreted as dynamic interacting terms whose dominance shifts are provably unavoidable and can be identified via measurable proxies and segmented regression without the fitting process itself defining the phases.

What would settle it

A dataset and model where segmented regression on the chosen proxies fails to detect consistent phase boundaries across multiple random seeds, or where dominance shifts can be eliminated by altering the risk decomposition.

Figures

Figures reproduced from arXiv: 2607.00144 by Julia Machnio, Mads Nielsen, Mostafa Mehdipour Ghazi.

**Figure 2.** Figure 2: Evolution of accuracy and operational proxies on CIFAR-100. Insets provide a granular view of cold-start dynamics. Representativeness proxies decay sharply in Phase I, where TypiClust yields superior marginal gains by minimizing distributional discrepancy. Geometric proxies exhibit discriminative stabilization during the Phase II transition, where coverage-based methods excel. ER and Comp. exhibit the non… view at source ↗

**Figure 3.** Figure 3: Spatiotemporal selection dynamics on ImageNet-50. t-SNE visualizations (MoCov3 embeddings) compare AL methods across increasing budgets (B). Red points denote new acquisitions and black previously labeled. TypiClust prioritizes manifold typicality, reducing distributional discrepancy (Phase I), while Entropy targets ambiguous boundary regions to minimize risk. This shows how selection bias aligns with th… view at source ↗

**Figure 4.** Figure 4: Identification of AL regimes as probability maps of each proxy vs annotation budget. The τ points show detected phase transitions with piecewise regression. Beyond τ2, the trajectory enters the model-driven regime. For CIFAR-10, ER and Conf. become the dominant signals, indicating that performance improvements are driven mainly by ER refinement. In contrast, CIFAR-100 retains a noticeable secondary influe… view at source ↗

**Figure 5.** Figure 5: Spatiotemporal alignment between proxy importance and acquisition strategy rankings. Top: Relative ranking of operational proxies based on their contribution to the generalization bound. Bottom: Corresponding accuracy-based rankings. Red circle refers to equivalent rank. 4.4 Alignment Between Proxy Dominance and Method Ranking To evaluate whether minimizing specific components of the theoretical bound impr… view at source ↗

**Figure 6.** Figure 6: Low-budget performance of the AL methods on ImageNet-100 using different embeddings. SSL enhances early-stage accuracy by providing meaningful features, particularly benefiting TypiClust. For ProbCover, the fixed-radius selection hampers training across embeddings. Structural Significance of Breakpoints. The transition points τ identified via segmented regression coincide with the most volatile regions o… view at source ↗

**Figure 7.** Figure 7: Proxy differences relative to Random on CIFAR-100. Positive values indicate improvements relative to random sampling. Early gains are primarily associated with reductions in discrepancy proxies (LD and FD), indicating that representativeness dominates low-budget performance. As labeling progresses, improvements shift toward geometric coverage and later to empirical risk and complexity, reflecting the tra… view at source ↗

**Figure 8.** Figure 8: Absolute proxy trajectories for CIFAR-100 with SimCLR features. Improved representation quality leads to faster accuracy growth and earlier stabilization of proxies, compressing the early data-driven regime [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗

**Figure 9.** Figure 9: Proxy differences relative to Random on CIFAR-100 with SimCLR features. Relative advantages become smaller and shorter-lived compared to the supervised setting, indicating that stronger representations reduce the discrepancy burden and shift improvements toward refinement [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Absolute proxy trajectories on CIFAR-10 with acquisition batch size b = 100. Discrepancy proxies decrease rapidly, and accuracy saturates earlier than on CIFAR100, reflecting the lower intrinsic complexity of the dataset. finally empirical-risk refinement remains invariant. This confirms that our observed phase structure is not an artifact of specific acquisition steps but a fundamental property of the… view at source ↗

**Figure 11.** Figure 11: Proxy differences relative to Random on CIFAR-10 (b = 100). Relative method advantages are concentrated in early rounds and diminish quickly as the labeled pool becomes representative of the dataset [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Absolute proxy trajectories on CIFAR-10 with smaller acquisition batch size (b = 10). The same phase structure remains visible, but transitions become smoother due to finer acquisition granularity. Notably, there is a visible correlation between proxy values and performance; for instance, UHerding maintains the highest values of LD, FD, and GC, which directly corresponds to its status as the worst-perform… view at source ↗

**Figure 13.** Figure 13: Proxy differences relative to Random on CIFAR-10 (b = 10). Smaller acquisition steps produce smoother deviations from the baseline while preserving the same ordering of active learning mechanisms. ISIC 2019: Class Imbalance and Heterogeneity. Figures 14–17 illustrate the evolution of operational proxies on the ISIC 2019 clinical dataset. Compared to the CIFAR benchmarks, the proxy dynamics in this domai… view at source ↗

**Figure 14.** Figure 14: Absolute proxy trajectories on ISIC with acquisition batch size b = 1000. Discrepancy and coverage proxies decrease gradually, reflecting the higher variability and class imbalance of the dataset compared to CIFAR benchmarks [PITH_FULL_IMAGE:figures/full_fig_p028_14.png] view at source ↗

**Figure 15.** Figure 15: Proxy differences relative to Random on ISIC (b = 1000). Relative method advantages are smaller and more variable than on CIFAR, indicating that acquisition strategies struggle more to consistently improve over random sampling in this heterogeneous medical dataset [PITH_FULL_IMAGE:figures/full_fig_p028_15.png] view at source ↗

**Figure 16.** Figure 16: Absolute proxy trajectories on ISIC with smaller acquisition batch size (b = 8). The same phase structure remains visible but evolves more smoothly due to finer acquisition granularity. budget range and exhibit significantly higher variability across strategies. This behavior is a direct consequence of the extreme class imbalance and high intraclass heterogeneity inherent to skin lesion dermoscopy. In th… view at source ↗

**Figure 17.** Figure 17: Proxy differences relative to Random on ISIC (b = 8). Smaller acquisition steps produce smoother deviations from the baseline while preserving the same qualitative ordering of active learning mechanisms. distribution remains the primary generalization bottleneck for a substantial portion of the trajectory. Conversely, in the low-budget setting (b = 8), the initial data-driven phase appears temporally co… view at source ↗

**Figure 18.** Figure 18: t-SNE visualization of sample selection behavior across different AL strategies on CIFAR-10 at increasing annotation budgets. Points are projected using SimCLR embeddings and colored by ground-truth class. Black points indicate previously labeled samples, while red points denote newly selected samples in the current round. Representation-based methods (e.g., TypiClust) emphasize class diversity early on… view at source ↗

**Figure 19.** Figure 19: t-SNE visualization of sample selection behavior on ImageNet-50 at increasing annotation budgets. Points are projected using MoCov3 embeddings and colored by ground-truth class. Representativeness-based methods achieve broad class coverage early (see ○A ), while uncertainty-based strategies focus on ambiguous regions (see ○B ). ProbCover occasionally oversamples dense regions due to its predefined covera… view at source ↗

**Figure 20.** Figure 20: Segmented proxy importance across the labeling trajectory on CIFAR-10 for different numbers of segments K. Heatmaps show normalized proxy importance as a function of cumulative labeling budget. Vertical lines indicate estimated regime breakpoints τ . E.3 Regime Identification via Segmented Regression To further analyze the phase structure of active learning dynamics, we perform segmented regression over t… view at source ↗

**Figure 21.** Figure 21: Segmented regression analysis on CIFAR-100 showing proxy importance across budgets for increasing numbers of segments K. As K increases, distinct breakpoints emerge that reveal transitions between discrepancy-dominated, coverage-driven, and refinement regimes. changes in others. Such dynamics are often obscured when viewing raw numerical values or simple rankings. For K = 1, the regression model assumes … view at source ↗

**Figure 22.** Figure 22: Segmented proxy importance on CIFAR-100 using SimCLR representations. Stronger pretrained features shift regime transitions earlier in the labeling trajectory while preserving the same three-phase structure observed in the supervised representation. the feature manifold, the active bottleneck shifts from global alignment to the local density and spatial coverage of the representation space. – Late Stages… view at source ↗

**Figure 23.** Figure 23: Segmented regression analysis on ISIC. Compared to CIFAR datasets, regime transitions occur later and exhibit greater variability due to higher visual diversity and class imbalance characteristic of medical imaging datasets. Importantly, the three-regime model (K = 3) consistently captures these core transitions across all benchmarks while maintaining model parsimony. While larger values of K introduce ad… view at source ↗

**Figure 24.** Figure 24: Observed versus predicted risk under the segmented regression model (K = 3) on CIFAR-10. Points are colored according to the regime assigned by the model. High R 2 values across acquisition strategies indicate strong predictive accuracy. R2 values. This strong agreement confirms that the selected operational proxies effectively capture the underlying drivers of generalization performance: – CIFAR-10 & CIF… view at source ↗

**Figure 25.** Figure 25: Regression validation for CIFAR-100 under the K = 3 segmented model. Predicted and observed risks align closely across acquisition strategies, supporting the validity of the identified regime structure. variance ( [PITH_FULL_IMAGE:figures/full_fig_p037_25.png] view at source ↗

**Figure 26.** Figure 26: Regression validation for CIFAR-100 with SimCLR features under the K = 3 segmented model. The strong agreement between predicted and observed risk confirms that the regime decomposition remains consistent when stronger representations are used. E.5 Alignment between Operational Proxies and Acquisition Efficacy To demonstrate the predictive utility of our framework, we visualize the temporal alignment betw… view at source ↗

**Figure 27.** Figure 27: Observed versus predicted risk for ISIC under the K = 3 segmented regression model. Despite higher dataset variability, the segmented model captures the main structure of the risk trajectory across acquisition strategies. 2. The Top Strip (Dominant Proxy): Identifies which specific component of the generalization bound is the primary bottleneck at any given step t. 3. The Bottom Strip (Dominant Method): … view at source ↗

**Figure 28.** Figure 28: Proxy–method alignment across the labeling trajectory on CIFAR-10. The top strip shows the dominant proxy at each labeling budget, while the heatmap displays normalized proxy importance. The bottom strip indicates the best-performing active learning method. Vertical lines correspond to regime breakpoints identified by segmented regression. Mitigating the Cold-Start with SSL. SSL representations organize … view at source ↗

**Figure 29.** Figure 29: Proxy–method alignment on CIFAR-100. Early stages are dominated by discrepancy proxies, while coverage becomes more important at intermediate budgets. The best-performing acquisition strategies evolve accordingly. training workflow. By utilizing a structured, frozen feature space, we provide the model with a foundation of high-level semantic knowledge from the onset. Consequently, the downstream classifi… view at source ↗

**Figure 30.** Figure 30: Proxy–method alignment for CIFAR-100 using SimCLR representations. Improved feature quality shifts regime transitions earlier in the labeling trajectory while preserving the same proxy–method alignment pattern. driven regimes (Phase III) is not only theoretically motivated by generalization bounds but also practically advantageous for large-scale system efficiency [PITH_FULL_IMAGE:figures/full_fig_p043_… view at source ↗

**Figure 31.** Figure 31: Proxy–method alignment on ISIC with b = 1000. Compared to natural image datasets, proxy transitions occur later and exhibit higher variability due to dataset complexity and class imbalance [PITH_FULL_IMAGE:figures/full_fig_p044_31.png] view at source ↗

**Figure 32.** Figure 32: Proxy–method alignment on ISIC with b = 8 presenting early phase detailed proxy importance. Compared to natural image datasets, proxy transitions occur later and exhibit higher variability due to dataset complexity and class imbalance [PITH_FULL_IMAGE:figures/full_fig_p045_32.png] view at source ↗

**Figure 33.** Figure 33: Low-budget performance of active learning methods on ImageNet-50 using different self-supervised embeddings. Structured SSL representations significantly improve early-stage performance across most methods. Representativeness-based strategies such as TypiClust benefit most from the semantically clustered feature space, while methods relying on fixed geometric assumptions (e.g., ProbCover) may experience… view at source ↗

read the original abstract

Active learning (AL) performance is known to be budget-dependent, yet regimes are typically defined by heuristic label counts that fail to generalize across datasets or architectures. We characterize AL dynamics by reframing budget regimes as shifts in the dominant generalization mechanism. By reinterpreting PAC-style risk components as dynamic interacting terms, we prove that dominance shifts are structurally unavoidable, creating a moving bottleneck for generalization. We operationalize this using measurable proxies and a segmented regression procedure to identify a tripartite taxonomy: data-driven, transition, and model-driven phases. Our framework explains the long-standing observation that representativeness, coverage, and uncertainty strategies excel at different stages. Experiments across natural and medical imaging show that AL efficiency depends on the alignment between the strategy's inductive bias and the active bottleneck. Moreover, self-supervised representation shift transitions earlier along the labeling trajectory, highlighting the role of representation quality in shaping AL dynamics. Overall, this work provides a unified framework for the next generation of transition-aware AL algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims unavoidable phase shifts in active learning from dynamic PAC risk terms, yielding a tripartite taxonomy that explains strategy performance, but the segmented regression step risks post-hoc phase definition.

read the letter

The core claim is that dominance shifts between generalization mechanisms are structurally unavoidable as labeling budget grows, producing data-driven, transition, and model-driven phases whose alignment with a strategy's bias determines efficiency.

The new element is the explicit taxonomy plus the argument that these shifts follow from reinterpreting PAC components as interacting dynamic terms. The experiments on natural and medical imaging datasets show the phases appearing in practice and that self-supervised representations move the transition point earlier. That part connects the framework to an existing observation about when different heuristics work.

The operationalization uses measurable proxies and segmented regression to locate the boundaries. This is where the main softness sits. If the proof supplies explicit, a-priori signatures that the regression merely recovers, the taxonomy is derived; if the regression is what carves the data into phases and the theory then explains them, the account becomes more descriptive than predictive. The abstract does not make the distinction clear enough to judge.

The experiments appear to test the alignment prediction, which is a concrete step. No obvious circularity in the reported results, but the regression procedure itself needs scrutiny for whether boundaries are stable under different proxy choices or segmentation criteria.

This is for readers already working on active learning theory or algorithm design who want a way to think about budget-dependent behavior beyond heuristics. A serious referee should see it because the claim is falsifiable in principle and the empirical part is grounded in multiple datasets, even if the theoretical step requires careful checking.

Referee Report

2 major / 1 minor

Summary. The paper claims to develop a mechanism-driven theory of phase transitions in active learning by reframing budget regimes as shifts in dominant generalization mechanisms. It reinterprets PAC-style risk components as dynamic interacting terms and proves that dominance shifts between them are structurally unavoidable, creating a moving bottleneck. This yields a tripartite taxonomy of data-driven, transition, and model-driven phases, which is operationalized via measurable proxies and segmented regression. The framework explains why representativeness, coverage, and uncertainty strategies excel at different stages, with experiments on natural and medical imaging datasets showing that AL efficiency depends on alignment between strategy inductive bias and the active bottleneck; self-supervised representation shifts are also shown to occur earlier along the labeling trajectory.

Significance. If the structural proof is independent of the regression procedure and the phases are recoverable from a-priori signatures, the work could offer a unified, non-heuristic account of AL dynamics that explains strategy performance variation and guides adaptive algorithm design. The emphasis on mechanism alignment and representation quality provides a falsifiable lens for analyzing budget-dependent behavior across domains.

major comments (2)

[Theoretical derivation of dominance shifts] The central proof that dominance shifts are structurally unavoidable (referenced in the abstract as reinterpreting PAC risk components) must explicitly derive a-priori detectable signatures for the phase boundaries; without this, the segmented regression on proxies risks defining the tripartite taxonomy post-hoc rather than recovering theoretically predicted transitions, rendering the taxonomy descriptive instead of derived.
[Phase identification and experimental validation] The operationalization via proxies and segmented regression (abstract) must demonstrate that phase boundaries are recovered independently of the fitting procedure applied to the same data used to claim strategy alignment effects; otherwise the explanation for why strategies excel at different stages becomes circular.

minor comments (1)

[Abstract] The abstract could specify the measurable proxies used for the data-driven, transition, and model-driven phases to aid immediate assessment of operationalization.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which help clarify the distinction between structural unavoidability and operational phase recovery. We address each point below and will revise the manuscript to make the theoretical signatures explicit and to include robustness checks for phase identification.

read point-by-point responses

Referee: [Theoretical derivation of dominance shifts] The central proof that dominance shifts are structurally unavoidable (referenced in the abstract as reinterpreting PAC risk components) must explicitly derive a-priori detectable signatures for the phase boundaries; without this, the segmented regression on proxies risks defining the tripartite taxonomy post-hoc rather than recovering theoretically predicted transitions, rendering the taxonomy descriptive instead of derived.

Authors: The proof in Section 3 establishes structural unavoidability by showing that the three PAC risk terms possess distinct asymptotic scalings with labeled-set size, guaranteeing at least two dominance crossings. To address the request for a-priori signatures, we will add an explicit derivation of the boundary loci as the solutions to the equation where the ratio of any two risk terms equals unity, expressed in terms of the Lipschitz constants and covering numbers of the hypothesis class. These loci constitute detectable signatures that can be estimated from unlabeled data statistics before regression is applied. Revision will incorporate this derivation so that the taxonomy is recovered from theoretically predicted transitions. revision: yes
Referee: [Phase identification and experimental validation] The operationalization via proxies and segmented regression (abstract) must demonstrate that phase boundaries are recovered independently of the fitting procedure applied to the same data used to claim strategy alignment effects; otherwise the explanation for why strategies excel at different stages becomes circular.

Authors: The proxies (gradient alignment, representation shift, and uncertainty entropy) are computed from model internals and unlabeled statistics that do not depend on the downstream AL strategy performance curves. To demonstrate independence from the particular segmented-regression fit, the revision will add (i) results using an alternative change-point algorithm (PELT) on the same proxy trajectories and (ii) a cross-validation protocol in which phase boundaries are estimated on the first 60 % of the labeling trajectory and alignment effects are evaluated on the held-out remainder. These checks will confirm that the reported strategy-phase alignments persist under different identification procedures. revision: yes

Circularity Check

1 steps flagged

Segmented regression on proxies defines phases post-hoc rather than recovering theory-predicted shifts

specific steps

fitted input called prediction [Abstract]
"By reinterpreting PAC-style risk components as dynamic interacting terms, we prove that dominance shifts are structurally unavoidable, creating a moving bottleneck for generalization. We operationalize this using measurable proxies and a segmented regression procedure to identify a tripartite taxonomy: data-driven, transition, and model-driven phases."

The proof is claimed to establish unavoidable shifts, yet the taxonomy itself is produced by applying segmented regression to proxies on the labeling data. The regression procedure necessarily partitions the trajectory into segments; therefore the three phases and their boundaries are defined by the fit rather than being a priori signatures recovered from the structural argument.

full rationale

The paper asserts a structural proof that dominance shifts between reinterpreted PAC risk components are unavoidable and yield a tripartite taxonomy. However, the taxonomy is located via measurable proxies plus segmented regression, which by construction detects change points in the observed trajectories. This makes the phase boundaries and the resulting data-driven/transition/model-driven classification an output of the fitting procedure applied to the same labeling trajectories used to claim the taxonomy, rather than an independent consequence of the proof. The alignment explanation for AL strategies therefore rests on a fitted description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities are stated. The reinterpretation of PAC risk components as dynamic terms is treated as a modeling choice whose independence cannot be checked.

pith-pipeline@v0.9.1-grok · 5703 in / 1064 out tokens · 23974 ms · 2026-07-02T19:34:31.194527+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 1 canonical work pages · 1 internal anchor

[1]

In: Pacific-Asia Conf

Aghaee, A., Ghadiri, M., Baghshah, M.S.: Active distance-based clustering using k-medoids. In: Pacific-Asia Conf. Adv. Knowl. Discov. Data Min. pp. 253–264 (2016)

2016
[2]

Ash, J.T., Goel, S., Krishnamurthy, A., Kakade, S.M.: Gone fishing: Neural active learning with fisher embeddings. In: Adv. Neural Inform. Process. Syst. pp. 8927– 8939 (2021)

2021
[3]

Ash, J.T., Zhang, C., Krishnamurthy, A., Langford, J., Agarwal, A.: Deep batch active learning by diverse, uncertain gradient lower bounds. In: Int. Conf. Learn. Represent. (2020)

2020
[4]

Bae,W.,Sutherland,D.J.,Oliveira,G.L.:UncertaintyHerding:Oneactivelearning method for all label budgets. In: Int. Conf. Learn. Represent. (2025)

2025
[5]

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for con- trastive learning of visual representations. In: Int. Conf. Mach. Learn. pp. 1597– 1607 (2020)

2020
[6]

Improved Baselines with Momentum Contrastive Learning

Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum con- trastive learning. arXiv:2003.04297 [cs.CV] (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2003
[7]

Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Int. Conf. Comput. Vis. pp. 9640–9649 (2021)

2021
[8]

Cortes, C., Mohri, M., Riley, M., Rostamizadeh, A.: Sample selection bias correc- tion theory. In: Int. Conf. Algorithmic Learn. Theory. pp. 38–53 (2008)

2008
[9]

Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: Int. Conf. Mach. Learn. pp. 1183–1192 (2017)

2017
[10]

MethodsX 7, 100864 (2020)

Gessert, N., Nielsen, M., Shaikh, M., Werner, R., Schlaefer, A.: Skin lesion classifi- cation using ensembles of multi-resolution EfficientNets with meta data. MethodsX 7, 100864 (2020)

2020
[11]

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Do- ersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: Adv. Neural Inform. Process. Syst. pp. 21271–21284 (2020)

2020
[12]

Hacohen, G., Dekel, A., Weinshall, D.: Active learning on a budget: Opposite strategies suit high and low budgets. In: Int. Conf. Mach. Learn. pp. 8175–8195 (2022)

2022
[13]

Springer (2009)

Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2009)

2009
[14]

In: IEEE Conf

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 770–778 (2016)

2016
[15]

In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Kaushal, V., Iyer, R., Kothawade, S., Mahadev, R., Doctor, K., Ramakrishnan, G.: Learning from less data: A unified data subset selection and active learning framework for computer vision. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 1289–1299. IEEE (2019)

2019
[16]

Advances in neural information pro- cessing systems32(2019)

Kirsch, A., Van Amersfoort, J., Gal, Y.: Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information pro- cessing systems32(2019)

2019
[17]

Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)

2009
[18]

SIGIR Forum29, 13–19 (1995)

Lewis, D.D.: A sequential algorithm for training text classifiers: Corrigendum and additional data. SIGIR Forum29, 13–19 (1995)

1995
[19]

In: Chinese Conf

Liang, H., Qiang, S., Ma, H., Wan, J., Liang, Y.: Semantic segmentation active learning with scene coverage coreset. In: Chinese Conf. Biom. Recognit. pp. 238– 247 (2024) A Mechanism-Driven Theory of Phase Transitions in Active Learning 17

2024
[20]

IEEE Trans

Maalouf, A., Eini, G., Mussay, B., Feldman, D., Osadchy, M.: A unified approach to coreset learning. IEEE Trans. Neural Networks Learn. Syst. pp. 6893–6905 (2024)

2024
[21]

In: Northern Lights Deep Learn

Menden, V., Saleh, Y., Iske, A.: Bounds on the generalization error in active learn- ing. In: Northern Lights Deep Learn. Conf. pp. 168–175 (2025)

2025
[22]

MIT press (2012)

Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. MIT press (2012)

2012
[23]

Scheffer, T., Decomain, C., Wrobel, S.: Active hidden markov models for informa- tion extraction. In: Int. Conf. Intell. Data Anal. pp. 309–318 (2001)

2001
[24]

Sener, O., Savarese, S.: Active learning for convolutional neural networks: A core- set approach. In: Int. Conf. Learn. Represent. (2018)

2018
[25]

Settles, B.: Active learning literature survey (2009)

2009
[26]

Electron

Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.: On the empirical estimation of integral probability metrics. Electron. J. Stat.6, 1550–1599 (2012)

2012
[27]

Valiant, L.G.: A theory of the learnable. Commun. ACM27(11), 1134–1142 (1984)

1984
[28]

Voevodski, K., Balcan, M.F., Röglin, H., Teng, S.H., Xia, Y.: Active clustering of biological sequences. J. Mach. Learn. Res.13, 203–225 (2012)

2012
[29]

In: International conference on machine learning

Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: International conference on machine learning. pp. 1954–1963. PMLR (2015)

1954
[30]

Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Eur. Conf. Inf. Retr. pp. 393–407 (2003)

2003
[31]

gradient

Yehuda, O., Dekel, A., Hacohen, G., Weinshall, D.: Active learning through a covering lens. In: Adv. Neural Inform. Process. Syst. pp. 22354–22367 (2022) Appendix A Structural Properties of Bound Components Lemma 1 (Non-monotonic empirical risk under adaptive sampling). Letℓbe a bounded loss and letS t ⊂S t+1 withS t+1 =S t ∪A t. Suppose the learner retur...

2022

[1] [1]

In: Pacific-Asia Conf

Aghaee, A., Ghadiri, M., Baghshah, M.S.: Active distance-based clustering using k-medoids. In: Pacific-Asia Conf. Adv. Knowl. Discov. Data Min. pp. 253–264 (2016)

2016

[2] [2]

Ash, J.T., Goel, S., Krishnamurthy, A., Kakade, S.M.: Gone fishing: Neural active learning with fisher embeddings. In: Adv. Neural Inform. Process. Syst. pp. 8927– 8939 (2021)

2021

[3] [3]

Ash, J.T., Zhang, C., Krishnamurthy, A., Langford, J., Agarwal, A.: Deep batch active learning by diverse, uncertain gradient lower bounds. In: Int. Conf. Learn. Represent. (2020)

2020

[4] [4]

Bae,W.,Sutherland,D.J.,Oliveira,G.L.:UncertaintyHerding:Oneactivelearning method for all label budgets. In: Int. Conf. Learn. Represent. (2025)

2025

[5] [5]

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for con- trastive learning of visual representations. In: Int. Conf. Mach. Learn. pp. 1597– 1607 (2020)

2020

[6] [6]

Improved Baselines with Momentum Contrastive Learning

Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum con- trastive learning. arXiv:2003.04297 [cs.CV] (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2003

[7] [7]

Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Int. Conf. Comput. Vis. pp. 9640–9649 (2021)

2021

[8] [8]

Cortes, C., Mohri, M., Riley, M., Rostamizadeh, A.: Sample selection bias correc- tion theory. In: Int. Conf. Algorithmic Learn. Theory. pp. 38–53 (2008)

2008

[9] [9]

Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: Int. Conf. Mach. Learn. pp. 1183–1192 (2017)

2017

[10] [10]

MethodsX 7, 100864 (2020)

Gessert, N., Nielsen, M., Shaikh, M., Werner, R., Schlaefer, A.: Skin lesion classifi- cation using ensembles of multi-resolution EfficientNets with meta data. MethodsX 7, 100864 (2020)

2020

[11] [11]

Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Do- ersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: Adv. Neural Inform. Process. Syst. pp. 21271–21284 (2020)

2020

[12] [12]

Hacohen, G., Dekel, A., Weinshall, D.: Active learning on a budget: Opposite strategies suit high and low budgets. In: Int. Conf. Mach. Learn. pp. 8175–8195 (2022)

2022

[13] [13]

Springer (2009)

Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer (2009)

2009

[14] [14]

In: IEEE Conf

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conf. Comput. Vis. Pattern Recog. pp. 770–778 (2016)

2016

[15] [15]

In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Kaushal, V., Iyer, R., Kothawade, S., Mahadev, R., Doctor, K., Ramakrishnan, G.: Learning from less data: A unified data subset selection and active learning framework for computer vision. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 1289–1299. IEEE (2019)

2019

[16] [16]

Advances in neural information pro- cessing systems32(2019)

Kirsch, A., Van Amersfoort, J., Gal, Y.: Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information pro- cessing systems32(2019)

2019

[17] [17]

Krizhevsky, A.: Learning multiple layers of features from tiny images (2009)

2009

[18] [18]

SIGIR Forum29, 13–19 (1995)

Lewis, D.D.: A sequential algorithm for training text classifiers: Corrigendum and additional data. SIGIR Forum29, 13–19 (1995)

1995

[19] [19]

In: Chinese Conf

Liang, H., Qiang, S., Ma, H., Wan, J., Liang, Y.: Semantic segmentation active learning with scene coverage coreset. In: Chinese Conf. Biom. Recognit. pp. 238– 247 (2024) A Mechanism-Driven Theory of Phase Transitions in Active Learning 17

2024

[20] [20]

IEEE Trans

Maalouf, A., Eini, G., Mussay, B., Feldman, D., Osadchy, M.: A unified approach to coreset learning. IEEE Trans. Neural Networks Learn. Syst. pp. 6893–6905 (2024)

2024

[21] [21]

In: Northern Lights Deep Learn

Menden, V., Saleh, Y., Iske, A.: Bounds on the generalization error in active learn- ing. In: Northern Lights Deep Learn. Conf. pp. 168–175 (2025)

2025

[22] [22]

MIT press (2012)

Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of machine learning. MIT press (2012)

2012

[23] [23]

Scheffer, T., Decomain, C., Wrobel, S.: Active hidden markov models for informa- tion extraction. In: Int. Conf. Intell. Data Anal. pp. 309–318 (2001)

2001

[24] [24]

Sener, O., Savarese, S.: Active learning for convolutional neural networks: A core- set approach. In: Int. Conf. Learn. Represent. (2018)

2018

[25] [25]

Settles, B.: Active learning literature survey (2009)

2009

[26] [26]

Electron

Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.: On the empirical estimation of integral probability metrics. Electron. J. Stat.6, 1550–1599 (2012)

2012

[27] [27]

Valiant, L.G.: A theory of the learnable. Commun. ACM27(11), 1134–1142 (1984)

1984

[28] [28]

Voevodski, K., Balcan, M.F., Röglin, H., Teng, S.H., Xia, Y.: Active clustering of biological sequences. J. Mach. Learn. Res.13, 203–225 (2012)

2012

[29] [29]

In: International conference on machine learning

Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: International conference on machine learning. pp. 1954–1963. PMLR (2015)

1954

[30] [30]

Xu, Z., Yu, K., Tresp, V., Xu, X., Wang, J.: Representative sampling for text classification using support vector machines. In: Eur. Conf. Inf. Retr. pp. 393–407 (2003)

2003

[31] [31]

gradient

Yehuda, O., Dekel, A., Hacohen, G., Weinshall, D.: Active learning through a covering lens. In: Adv. Neural Inform. Process. Syst. pp. 22354–22367 (2022) Appendix A Structural Properties of Bound Components Lemma 1 (Non-monotonic empirical risk under adaptive sampling). Letℓbe a bounded loss and letS t ⊂S t+1 withS t+1 =S t ∪A t. Suppose the learner retur...

2022