LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
Damien Ernst, Pierre Geurts, and Louis Wehenkel
10 Pith papers cite this work, alongside 4,684 external citations. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 2polarities
use method 2representative citing papers
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value of joint task use.
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.
CFQI extends fitted Q-iteration by using separate modules for compositional task variants to learn policies robust to imbalanced patient sub-populations in medical RL.
Benchmarking in pediatric ICU antimicrobial stewardship shows performance depends mainly on target prevalence and dataset traits rather than model complexity, with sequence models improving precision-recall at 24-hour resolution but showing poorer calibration than tabular models.
Invariant and equivariant semi-supervised learning improves multi-task detection and segmentation performance on partially labeled vision datasets compared to supervised baselines.
STR-Net achieves AUROC of 0.933 for binary bone-loss screening and 0.801 correlation for T-score estimation from knee X-rays on a held-out test set.
A multi-head RoBERTa model with overlapping chunking and max-pooling achieves Macro-F1 of 0.80 on 3-way clarity classification and 0.51 on 9-way evasion strategy detection, ranking 11th in both subtasks of SemEval-2026 Task 6.
A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.
citing papers explorer
-
Hypothesis generation and updating in large language models
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
-
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
-
Toward Unified Fine-Grained Vehicle Classification and Automatic License Plate Recognition
UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value of joint task use.
-
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.
-
Compositional Q-learning for electrolyte repletion with imbalanced patient sub-populations
CFQI extends fitted Q-iteration by using separate modules for compositional task variants to learn policies robust to imbalanced patient sub-populations in medical RL.
-
Benchmarking Machine Learning Architectures for Antimicrobial Stewardship in Pediatric ICUs
Benchmarking in pediatric ICU antimicrobial stewardship shows performance depends mainly on target prevalence and dataset traits rather than model complexity, with sequence models improving precision-recall at 24-hour resolution but showing poorer calibration than tabular models.
-
Multi-task learning on partially labeled datasets via invariant/equivariant semi-supervised learning
Invariant and equivariant semi-supervised learning improves multi-task detection and segmentation performance on partially labeled vision datasets compared to supervised baselines.
-
Opportunistic Bone-Loss Screening from Routine Knee Radiographs Using a Multi-Task Deep Learning Framework with Sensitivity-Constrained Threshold Optimization
STR-Net achieves AUROC of 0.933 for binary bone-loss screening and 0.801 correlation for T-score estimation from knee X-rays on a held-out test set.
-
SG-UniBuc-NLP at SemEval-2026 Task 6: Multi-Head RoBERTa with Chunking for Long-Context Evasion Detection
A multi-head RoBERTa model with overlapping chunking and max-pooling achieves Macro-F1 of 0.80 on 3-way clarity classification and 0.51 on 9-way evasion strategy detection, ranking 11th in both subtasks of SemEval-2026 Task 6.
-
YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling
A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.