Cross-sample prediction churn between bootstrap-trained classifiers reaches 8-22% on chemistry benchmarks; K-bootstrap bagging reduces it 40-54% and twin-bootstrap with sym-KL consistency loss reduces it a further median 45% at matched 2x compute.
Machine Learning , year =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
Diverse ensembles of prompted and fine-tuned GPT-4.1-Mini monitors achieve 2.4x better detection of flawed code solutions than homogeneous ensembles on adversarial inputs.
citing papers explorer
-
Reducing cross-sample prediction churn in scientific machine learning
Cross-sample prediction churn between bootstrap-trained classifiers reaches 8-22% on chemistry benchmarks; K-bootstrap bagging reduces it 40-54% and twin-bootstrap with sym-KL consistency loss reduces it a further median 45% at matched 2x compute.
-
LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
-
Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute
Diverse ensembles of prompted and fine-tuned GPT-4.1-Mini monitors achieve 2.4x better detection of flawed code solutions than homogeneous ensembles on adversarial inputs.