pith. sign in

Data mixing laws: Optimizing data mixtures by predicting language modeling performance

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 2 baseline 1

citation-polarity summary

clear filters

representative citing papers

On the Invariance and Generality of Neural Scaling Laws

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Neural scaling laws are invariant under bijective data transformations and change predictably with information resolution ρ under non-bijective transformations, enabling cross-domain transport of fitted exponents.

Data and Evaluation Closed-Loop for Model Capability Enhancement

cs.AI · 2026-06-26 · unverdicted · novelty 6.0

Proposes capability slices with dual taxonomies and mapping rules to form a closed loop converting benchmark failures into targeted data interventions, validated via two opposing case studies on BBH and math reasoning.

Scaling Laws for Mixture Pretraining Under Data Constraints

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Empirical study shows mixture pretraining tolerates higher target data repetition than single-source training, with a new repetition-aware scaling law enabling principled mixture selection based on data size, compute, and model scale.

Knowledge Transfer Scaling Laws for 3D Medical Imaging

cs.CV · 2026-05-07 · conditional · novelty 6.0

Transfer-aware data allocation derived from observed power-law scaling laws for asymmetric knowledge transfer in 3D medical imaging outperforms standard proportional sampling by up to 58% and generalizes to new budgets.

Evaluation-driven Scaling for Scientific Discovery

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

citing papers explorer

Showing 7 of 7 citing papers after filters.

  • D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training cs.CL · 2026-05-29 · unverdicted · none · ref 27

    D³ introduces a dynamic directional graph-constrained framework that models sample interactions via loss dependencies to derive an optimized training sequence for LLMs.

  • On the Invariance and Generality of Neural Scaling Laws cs.LG · 2026-05-08 · unverdicted · none · ref 51

    Neural scaling laws are invariant under bijective data transformations and change predictably with information resolution ρ under non-bijective transformations, enabling cross-domain transport of fitted exponents.

  • Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training cs.LG · 2025-07-21 · unverdicted · none · ref 47

    An RL agent learns domain re-weighting policies from evaluation feedback to improve balanced performance in continual pre-training of LLMs across source and target domains.

  • Data and Evaluation Closed-Loop for Model Capability Enhancement cs.AI · 2026-06-26 · unverdicted · none · ref 28

    Proposes capability slices with dual taxonomies and mapping rules to form a closed loop converting benchmark failures into targeted data interventions, validated via two opposing case studies on BBH and math reasoning.

  • Scaling Laws for Mixture Pretraining Under Data Constraints cs.LG · 2026-05-12 · unverdicted · none · ref 32

    Empirical study shows mixture pretraining tolerates higher target data repetition than single-source training, with a new repetition-aware scaling law enabling principled mixture selection based on data size, compute, and model scale.

  • Evaluation-driven Scaling for Scientific Discovery cs.LG · 2026-04-21 · unverdicted · none · ref 165

    SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.

  • MegaScale-Data: Scaling Dataloader for Multisource Large Foundation Model Training cs.DC · 2025-04-14 · unverdicted · none · ref 75

    MegaScale-Data is a distributed data loading system that disaggregates preprocessing and applies auto-partitioning to deliver 4.5x higher end-to-end training throughput and 13.5x lower CPU memory usage for multisource large foundation models.