HERMES provides a reusable hierarchical labeling substrate for pre-training data that reveals granularity-specific effects in data mixing rules during model training.
International Conference on Learning Representations , year=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
Drop-by-Drop uses additive codebooks and Matryoshka-style training to produce one LLM model whose ordered codebook subsets give accurate reconstructions at successively higher bitwidths under a weighted MSE distortion.
citing papers explorer
-
HERMES: A Multi-Granularity Labeling Substrate for Pre-training Data Mixtures
HERMES provides a reusable hierarchical labeling substrate for pre-training data that reveals granularity-specific effects in data mixing rules during model training.
-
Multi-Bitwidth Quantization for LLMs Using Additive Codebooks
Drop-by-Drop uses additive codebooks and Matryoshka-style training to produce one LLM model whose ordered codebook subsets give accurate reconstructions at successively higher bitwidths under a weighted MSE distortion.