DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks
Pith reviewed 2026-05-23 03:54 UTC · model grok-4.3
The pith
DUET converges to the optimal training data mixture for an unseen task using only performance feedback.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DUET is a novel global-to-local algorithm that interleaves influence function as a data selection method with Bayesian optimization to optimize data mixture via feedback from a specific unseen evaluation task. By analyzing DUET's cumulative regret, the paper shows that DUET converges to the optimal training data mixture for an unseen task even without any data knowledge of the task.
What carries the argument
Global-to-local interleaving of influence functions for approximating data utility and Bayesian optimization for searching mixture proportions.
If this is right
- DUET applies to cases where task data is encrypted or private.
- The method guarantees convergence to the optimal mixture through regret bounds.
- It outperforms standard data mixing methods when task data is unavailable.
- Multiple rounds of model deployment feedback are sufficient to guide the optimization.
Where Pith is reading between the lines
- This approach could support fine-tuning models on user-specific interactions while preserving privacy.
- The framework might generalize to optimizing data for other machine learning models beyond LLMs.
- Testing on tasks with known optima could validate the regret analysis in practice.
Load-bearing premise
The influence function provides a sufficiently accurate local approximation of how data affects performance on the unseen task.
What would settle it
Running DUET on a controlled task where the true optimal mixture is known in advance and observing whether it reaches that mixture or gets stuck due to poor influence estimates.
Figures
read the original abstract
The performance of an LLM depends heavily on the relevance of its training data to the downstream evaluation task. However, in practice, the data involved in an unseen evaluation task is often unknown (e.g., conversations between an LLM and a user are end-to-end encrypted). Hence, it is unclear what data are relevant for fine-tuning the LLM to maximize its performance on the specific unseen evaluation task. Instead, one can only deploy the LLM on the unseen task to gather multiple rounds of feedback on how well the model performs (e.g., user ratings). This novel setting offers a refreshing perspective towards optimizing training data mixtures via feedback from an unseen evaluation task, which prior data mixing and selection works do not consider. Our paper presents DUET, a novel global-to-local algorithm that interleaves influence function as a data selection method with Bayesian optimization to optimize data mixture via feedback from a specific unseen evaluation task. By analyzing DUET's cumulative regret, we theoretically show that DUET converges to the optimal training data mixture for an unseen task even without any data knowledge of the task. Finally, our experiments across a variety of language tasks demonstrate that DUET outperforms existing data selection and mixing methods in the unseen-task setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DUET, a global-to-local algorithm that interleaves influence-function-based data selection with Bayesian optimization to tune LLM training data mixtures using only feedback (e.g., user ratings) from an unseen evaluation task. It claims a cumulative-regret analysis proving convergence to the optimal mixture without any knowledge of the task data, and reports experimental outperformance versus existing data-selection and mixing baselines across language tasks.
Significance. If the regret bound holds, the work supplies a theoretically grounded method for data-mixture optimization in privacy-sensitive regimes where task data cannot be inspected. The global-to-local interleaving and the explicit regret guarantee are the primary contributions; the experiments provide supporting empirical evidence but are secondary to the theoretical claim.
major comments (2)
- [Regret analysis (global-to-local interleaving)] Regret analysis section (derivation of cumulative regret bound): the sub-linear regret guarantee treats the influence-function estimate as a sufficiently faithful local utility surrogate for the Bayesian optimization step to make progress toward the global optimum. No explicit error term, bias bound, or Lipschitz-style control on the approximation gap is supplied when the evaluation task is unseen; this assumption is load-bearing for the convergence claim.
- [Global-to-local interleaving description] Description of the influence-function step (global-to-local procedure): the analysis assumes the local approximation remains accurate enough across rounds of unseen-task feedback, yet no quantitative control is given on how non-convexity of the LLM or distribution shift between training mixtures and the unseen task affects the surrogate quality. This directly affects whether the regret bound remains valid.
minor comments (2)
- [Abstract] The abstract states that the regret bound is derived but does not indicate the section or equation numbers where the full proof appears, making it difficult to locate the precise assumptions.
- [Experiments] Experimental section: the description of how influence functions are computed for each candidate mixture and how the Bayesian optimization acquisition function is defined could be expanded for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and insightful comments on the theoretical foundations of DUET. We address each major comment below and will incorporate clarifications into the revised manuscript.
read point-by-point responses
-
Referee: Regret analysis section (derivation of cumulative regret bound): the sub-linear regret guarantee treats the influence-function estimate as a sufficiently faithful local utility surrogate for the Bayesian optimization step to make progress toward the global optimum. No explicit error term, bias bound, or Lipschitz-style control on the approximation gap is supplied when the evaluation task is unseen; this assumption is load-bearing for the convergence claim.
Authors: The cumulative regret bound is derived with respect to the surrogate utility defined by the influence-function estimates; under this surrogate the global-to-local interleaving yields sublinear regret. We agree that the manuscript does not supply an explicit error term, bias bound, or Lipschitz control quantifying the gap between the surrogate and the true (unseen-task) utility. In the revision we will add an explicit statement of this modeling assumption together with a short discussion of its role in the convergence claim. revision: partial
-
Referee: Description of the influence-function step (global-to-local procedure): the analysis assumes the local approximation remains accurate enough across rounds of unseen-task feedback, yet no quantitative control is given on how non-convexity of the LLM or distribution shift between training mixtures and the unseen task affects the surrogate quality. This directly affects whether the regret bound remains valid.
Authors: Non-convexity of the LLM loss and distribution shift between training mixtures and the unseen evaluation task can indeed degrade surrogate quality. The current analysis treats the influence function as a first-order local approximation and establishes regret relative to that surrogate; the global-to-local loop uses fresh feedback to periodically refresh the selection. We will revise the text to state clearly that the regret guarantee is conditional on the surrogate remaining sufficiently faithful and to note that quantitative controls on non-convexity and shift effects are left for future work. revision: partial
Circularity Check
No circularity: regret bound is a standard analysis under stated assumptions
full rationale
The paper's central claim is a cumulative-regret bound showing convergence of the DUET interleaving of influence functions and Bayesian optimization to the optimal mixture for an unseen task. The abstract and description present this as a derived theoretical result rather than a tautology, fit, or reduction to prior self-citation. No equations or text in the supplied material exhibit self-definitional structure, a fitted parameter renamed as prediction, or load-bearing self-citation. The influence-function approximation is treated as an explicit modeling assumption whose validity is external to the bound itself; this does not constitute circularity under the evaluation criteria. The derivation is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Influence functions yield a reliable local estimate of data point importance for the current model
- domain assumption The black-box feedback function satisfies conditions that allow standard Bayesian optimization regret bounds to apply
Forward citations
Cited by 2 Pith papers
-
BoLT: A Benchmark to Democratize Black-box Optimization Research for Expensive LLM Tasks
BoLT is a benchmark of surrogate models fitted to real LLM experiment data that enables evaluation of Bayesian and black-box optimization methods on multi-fidelity, multi-objective, high-dimensional LLM tasks.
-
Data Mixing for Large Language Models Pretraining: A Survey and Outlook
A survey that taxonomizes data mixing strategies for LLM pretraining into static rule-based, learning-based, and dynamic adaptive families while highlighting transferability challenges and evaluation gaps.
Reference graph
Works this paper leans on
-
[1]
Efficient online data mixing for language model pre-training
Alon Albalak, Liangming Pan, Colin Raffel, and William Yang Wang. Efficient online data mixing for language model pre-training. arXiv:2312.02406,
-
[2]
arXiv preprint arXiv:2402.16827
Alon Albalak, Yanai Elazar, Sang Michael Xie, Shayne Longpre, Nathan Lambert, Xinyi Wang, Niklas Muennighoff, Bairu Hou, Liangming Pan, Haewon Jeong, Colin Raffel, Shiyu Chang, Tatsunori Hashimoto, and William Yang Wang. A survey on data selection for language models. arXiv:2402.16827,
-
[3]
Mayee F. Chen, Michael Y . Hu, Nicholas Lourie, Kyunghyun Cho, and Christopher Ré. Aioli: A unified optimization framework for language model data mixing. arXiv:2411.05735, 2024a. Zhiliang Chen, Chuan-Sheng Foo, and Bryan Kian Hsiang Low. Towards AutoAI: Optimizing a machine learning system with black-box and differentiable components. In Proc. ICML, 2024...
-
[4]
Training Verifiers to Solve Math Word Problems
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems. arXiv:2110.14168,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Doge: Domain reweighting with generalization estimation
Simin Fan, Matteo Pagliardini, and Martin Jaggi. Doge: Domain reweighting with generalization estimation. arXiv:2310.15393,
-
[6]
He, B., Yin, L., Zhen, H.-L., Liu, S., Wu, H., Zhang, X., Yuan, M., and Ma, C
URL https://zenodo.org/records/12608602. Jacob Gardner, Matt Kusner, Xu Zhixiang, Kilian Weinberger, and John Cunningham. Bayesian optimization with inequality constraints. In Proc. ICML,
-
[7]
Bimix: A bivariate data mixing law for language model pretraining
Ce Ge, Zhijian Ma, Daoyuan Chen, Yaliang Li, and Bolin Ding. Bimix: A bivariate data mixing law for language model pretraining. arXiv:2405.14908,
-
[8]
Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, and Yang You
doi: 10.1109/ACCESS.2020.2966228. Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, and Yang You. Towards lossless dataset distillation via difficulty-aligned trajectory matching. arXiv:2310.05773,
-
[9]
LoRA: Low-Rank Adaptation of Large Language Models
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models. arXiv:2106.09685,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Fastshap: Real-time shapley value estimation
Neil Jethani, Mukund Sudarshan, Ian Covert, Su-In Lee, and Rajesh Ranganath. Fastshap: Real-time shapley value estimation. arXiv:2107.07436,
-
[11]
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William W. Cohen, and Xinghua Lu. Pubmedqa: A dataset for biomedical research question answering. arXiv:1909.06146,
work page internal anchor Pith review arXiv 1909
-
[12]
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv:1705.03551,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Generalizing the german tank problem
Anthony Lee and Steven J Miller. Generalizing the german tank problem. arXiv:2210.15339,
-
[14]
Human- centered privacy research in the age of large language models
Tianshi Li, Sauvik Das, Hao-Ping Lee, Dakuo Wang, Bingsheng Yao, and Zhiping Zhang. Human- centered privacy research in the age of large language models. arXiv:2402.01994,
-
[15]
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods. arXiv:2109.07958,
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
Pointer Sentinel Mixture Models
Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv:1609.07843,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Coresets for data-efficient training of machine learning models
Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for data-efficient training of machine learning models. arXiv:1906.01827,
-
[18]
Domain Generalization via Invariant Feature Representation
Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. Domain generalization via invariant feature representation. arXiv:1301.2115,
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
Estimating training data influence by tracing gradient descent
Garima Pruthi, Frederick Liu, Mukund Sundararajan, and Satyen Kale. Estimating training data influence by tracing gradient descent. arXiv:2002.08484,
-
[20]
Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...
work page internal anchor Pith review Pith/arXiv arXiv
-
[21]
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv:1811.00937,
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
Optimal Sub-sampling with Influence Functions
Daniel Ting and Eric Brochu. Optimal sub-sampling with influence functions. arXiv:1709.01716,
work page internal anchor Pith review Pith/arXiv arXiv
-
[23]
Markosyan, Luke Zettlemoyer, and Armen Aghajanyan
Kushal Tirumala, Aram H. Markosyan, Luke Zettlemoyer, and Armen Aghajanyan. Memorization without overfitting: Analyzing the training dynamics of large language models. arXiv:2205.10770,
-
[24]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. Llama: Open and efficient foundation language models. arXiv:2302.13971,
work page internal anchor Pith review Pith/arXiv arXiv
- [25]
-
[26]
Jingtan Wang, Xiaoqiang Lin, Rui Qiao, Chuan-Sheng Foo, and Bryan Kian Hsiang Low. Helpful or harmful data? fine-tuning-free shapley attribution for explaining language model predictions. In Proc. ICML, 2024a. Peiqi Wang, Yikang Shen, Zhen Guo, Matthew Stallone, Yoon Kim, Polina Golland, and Rameswar Panda. Diversity measurement and subset selection for i...
-
[27]
Less: Selecting influential data for targeted instruction tuning
Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, and Danqi Chen. Less: Selecting influential data for targeted instruction tuning. arXiv:2402.04333,
-
[28]
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence? arXiv:1905.07830,
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[29]
Few-shot adaptation of pre-trained networks for domain shift
Wenyu Zhang, Li Shen, Wanyue Zhang, and Chuan-Sheng Foo. Few-shot adaptation of pre-trained networks for domain shift. arXiv:2205.15234,
-
[30]
Speculative coreset selection for task-specific fine-tuning
12 Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen, Tianlin Li, Weipeng Jiang, and Yang Liu. Speculative coreset selection for task-specific fine-tuning. arXiv:2410.01296,
-
[31]
2.2) can be gathered from the task using a trained LLM
A Technical Appendices and Supplementary Material B Additional Discussions B.1 Real-world examples of our problem setting In our problem setting, (a) there is no direct access to the data (e.g., its domain, distribution, or labels) involved in the unseen evaluation task but (b) multiple rounds of coarse feedback (details covered in Sec. 2.2) can be gather...
work page 2024
-
[32]
In addition, data mixing works (Xie et al., 2023; Ge et al., 2025; Albalak et al.,
showed that training a model with strategically selected data points allows it to perform better. In addition, data mixing works (Xie et al., 2023; Ge et al., 2025; Albalak et al.,
work page 2023
-
[33]
DUET for extremely large datasets used in pre-training
irrelevant information that are difficult to be overwritten in later BO iterations. DUET for extremely large datasets used in pre-training. We can amortize the computational cost of IF computation by pre-computing and storing them beforehand (App. B.4) in our paper’s fine-tuning setting. However, the size of datasets used in pre-training could be extremel...
work page 2017
-
[34]
IF values can be pre-computed and stored
In our algorithm, we repeat this procedure for every data domain. IF values can be pre-computed and stored . In addition, we just need to pre-compute the IF values of every data point once before reusing them repeatedly at every BO iteration to perform IF-weighted sampling. This greatly improves our algorithm’s efficiency and runtime, as compared to other...
work page 2024
-
[35]
δ1 = √ δ • (4) ≤ uses Chebyshev’s inequality overϵt with probability at least 1 − δ2
w.r.t. δ1 = √ δ • (4) ≤ uses Chebyshev’s inequality overϵt with probability at least 1 − δ2. • (5) = usesPT t=1 σt−1(xt) ≤ O(√T γT ) as shown in Lemma 4 by Chowdhury & Gopalan (Chowdhury & Gopalan, 2017). 20 • (6) = uses the fact that ϵt is bounded on [0, c] and all bounded random variables are R-sub- Gaussian with R = c2 4 (Arbel et al., 2019). Next, we ...
work page 2017
-
[36]
Derive attainable cumulative regret . Lastly, we analyze the convergence rate of our algorithm using the growth of attained cumulative regret (Chen et al., 2024b) ˜RT =PT t=1 |fy∗rt − f (rt)| = PT t=1 |f (r∗) + ϵt − f (rt)| for T BO iterations. Since the error term ϵt has the same expectation and variance of our estimator, we can use the results from Step...
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.