LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
hub
Damien Ernst, Pierre Geurts, and Louis Wehenkel
11 Pith papers cite this work, alongside 4,684 external citations. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
method 2polarities
use method 2representative citing papers
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value of joint task use.
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.
CFQI extends fitted Q-iteration by using separate modules for compositional task variants to learn policies robust to imbalanced patient sub-populations in medical RL.
SAGMTL decomposes dynamic sparse OD demand prediction into joint structural state modeling and flow intensity estimation via node-edge collaborative graph representations.
Benchmarking in pediatric ICU antimicrobial stewardship shows performance depends mainly on target prevalence and dataset traits rather than model complexity, with sequence models improving precision-recall at 24-hour resolution but showing poorer calibration than tabular models.
Invariant and equivariant semi-supervised learning improves multi-task detection and segmentation performance on partially labeled vision datasets compared to supervised baselines.
STR-Net achieves AUROC of 0.933 for binary bone-loss screening and 0.801 correlation for T-score estimation from knee X-rays on a held-out test set.
A multi-head RoBERTa model with overlapping chunking and max-pooling achieves Macro-F1 of 0.80 on 3-way clarity classification and 0.51 on 9-way evasion strategy detection, ranking 11th in both subtasks of SemEval-2026 Task 6.
A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.
citing papers explorer
-
Response-free item difficulty modelling for multiple-choice items with fine-tuned transformers: Component-wise representation and multi-task learning
Fine-tuned transformers with multi-task learning recover substantial wording-derived signal for item difficulty at small sample sizes typical in applied testing.