LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
and Guyon, Isabelle M
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
TabKDE generates synthetic tabular data using copula transformations followed by kernel density estimation, matching prior accuracy with negligible training time and reduced storage via coresets.
Meta-learning with 24 classical complexity metrics predicts the optimal quantum encoding circuit among 9 candidates with up to 85.7% top-3 accuracy.
MLS is a new large-scale multilingual speech corpus derived from LibriVox with 44.5k hours of English and 6k hours across seven other languages, plus baseline ASR and LM models.
Per-class regularization hyperparameters in Gabriel graph classifiers create flexible thresholds that eliminate outliers and address class imbalance, improving performance per Friedman test.
NLP-derived attributes from construction incident reports remain strongly predictive of independently labeled safety outcomes even after removing potential label leakage, with injury severity now well predicted on a dataset of more than 90,000 reports.
Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.
citing papers explorer
-
LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification
LG-CoTrain, an LLM-guided co-training method, outperforms classical semi-supervised baselines for crisis tweet classification in low-resource settings with 5-25 labeled examples per class.
-
TabKDE: Simple and Scalable Tabular Data Generation with Kernel Density Estimates
TabKDE generates synthetic tabular data using copula transformations followed by kernel density estimation, matching prior accuracy with negligible training time and reduced storage via coresets.
-
Towards Automated Selection of Quantum Encoding Circuits via Meta-Learning
Meta-learning with 24 classical complexity metrics predicts the optimal quantum encoding circuit among 9 candidates with up to 85.7% top-3 accuracy.
-
MLS: A Large-Scale Multilingual Dataset for Speech Research
MLS is a new large-scale multilingual speech corpus derived from LibriVox with 44.5k hours of English and 6k hours across seven other languages, plus baseline ASR and LM models.
-
Large margin classifier with graph-based adaptive regularization
Per-class regularization hyperparameters in Gabriel graph classifiers create flexible thresholds that eliminate outliers and address class imbalance, improving performance per Friedman test.
-
AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes
NLP-derived attributes from construction incident reports remain strongly predictive of independently labeled safety outcomes even after removing potential label leakage, with injury severity now well predicted on a dataset of more than 90,000 reports.
-
Automatically Learning Construction Injury Precursors from Text
Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.