ElemeNet is a unified ML software package for molecular property prediction across elements 1-100 with built-in uncertainty quantification and competitive benchmarks on diverse chemistry datasets.
Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Cross-entropy (CE) is the default training loss for supervised classification, but its sample efficiency is limited when labels are scarce. Existing remedies primarily act on the data side, via augmentation, synthesis, or transfer from pretrained models; the training objective itself is rarely revisited. We revisit it here. Drawing on the classical observation that generative classifiers reach their asymptotic error with fewer samples than discriminative ones, we propose Generative Cross-Entropy (GenCE), a drop-in replacement for CE that introduces a generative learning principle into a standard discriminative network without altering the architecture or fitting a separate density model. GenCE follows from a Bayesian rewrite of the class-conditional likelihood and, in the mini-batch approximation, reduces to normalizing each sample's softmax score against the model's predictions on the batch, coupling the training signal across examples sharing a class. We extend the proper-scoring-rule framework to such non-local losses and prove that GenCE is strictly proper under a mild completeness condition: its population risk is uniquely minimized at the true posterior. Across three datasets, on two architectures and in both balanced small-data and class-imbalanced regimes, GenCE outperforms CE and other widely used losses, while also producing better-calibrated probabilities and stronger out-of-distribution detection.
fields
physics.chem-ph 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ElemeNet: Multiscale Molecular Machine Learning with Uncertainty Quantification Across the Periodic Table
ElemeNet is a unified ML software package for molecular property prediction across elements 1-100 with built-in uncertainty quantification and competitive benchmarks on diverse chemistry datasets.