pith. sign in

arxiv: 2605.23595 · v1 · pith:FAZV7SP2new · submitted 2026-05-22 · 💻 cs.LG · cs.AI· cs.CV· cs.ET· cs.PF

Cost-Effective Model Evaluation with Meta-Learning

Pith reviewed 2026-05-25 04:42 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CVcs.ETcs.PF
keywords meta-learningmodel evaluationunlabeled dataperformance estimationmodel-agnosticcost-effectivemachine learning benchmarkinglabel-free assessment
0
0 comments X

The pith

Meta-learning from reference models enables accurate evaluation of new models on completely unlabeled data without labels or per-model adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MetaEvaluator as a framework that meta-learns from evaluating many reference models to quickly judge the performance of entirely new models on data that carries no labels. Standard evaluation pipelines require either costly new annotations or repeated fine-tuning for each incoming model, which becomes unsustainable with rapid model releases across domains. By training once on a pool of reference models to create a reusable initialization, the method claims to amortize that cost and apply the result directly to unseen models of different architectures and modalities. This would allow practical, scalable checks on how well fresh models perform on real-world unlabeled datasets.

Core claim

MetaEvaluator leverages meta-learning over a pool of reference models to obtain a transferable initialization, enabling accurate evaluation of new models on entirely unlabeled datasets while amortizing cost across the pool and removing the need for per-model retraining; it is presented as the first model-agnostic framework capable of this.

What carries the argument

Meta-learning over a pool of reference models to obtain a transferable initialization for label-free evaluation of new models on unlabeled target data.

If this is right

  • Performance estimates for new models remain stable and accurate even when the target dataset has no labels at all.
  • The computational and annotation cost of evaluation is shared across many models rather than repeated individually.
  • The same initialization works across diverse model architectures and data modalities without modification.
  • No additional fine-tuning or adaptation step is required when a new model is presented for evaluation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the initialization transfers reliably, the framework could support ongoing monitoring of deployed models on private unlabeled streams where labeling is prohibited.
  • Similar amortization might apply to other post-training tasks such as model selection or drift detection on unlabeled data.
  • The approach could reduce dependence on fixed labeled benchmarks by enabling evaluation on fresh, domain-specific unlabeled collections.

Load-bearing premise

Meta-learning over reference models yields a transferable initialization that generalizes to new model families, architectures, and modalities on completely unlabeled target data without labels or per-model adaptation.

What would settle it

Apply the method to a model from a new architecture family and modality absent from the reference pool and compare its performance estimates against ground-truth accuracy obtained with labels; large systematic errors would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.23595 by Hongzhi Yin, Quoc Viet Hung Nguyen, Thanh Tam Nguyen, Trinh Pham, Viet Huynh.

Figure 1
Figure 1. Figure 1: Unlike existing methods that struggle to assess [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: MetaEvaluator applies meta-learning over a pool of reference models, using data from MetaDataset to learn how to [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: t-SNE of semantic coverage across modalities. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Calibration of accuracy estimation across transfers. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Latency–MAE trade-offs on unseen models. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Total training and evaluation latency as the number [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Meta-learning improves with pool size. Inset: Hes [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: MetaEvaluator consistently reduces both MAE and [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
read the original abstract

The rapid growth of machine learning has produced an ever-expanding ecosystem of models, making it increasingly challenging to verify the reliability of newly released models on unseen, unlabeled data. Conventional evaluation pipelines depend on expensive annotation, repeated fine-tuning, or narrow assumptions that fail to transfer across model families. We present MetaEvaluator, a cost-effective, model-agnostic framework for rapid, label-free assessment of unseen models spanning diverse architectures and modalities. MetaEvaluator leverages meta-learning over a pool of reference models to obtain a transferable initialization, enabling accurate evaluation of new models while amortizing cost across the pool and removing the need for per-model retraining. To the best of our knowledge, this is the first model-agnostic framework capable of evaluating new models on entirely unlabeled datasets. Extensive experiments show that MetaEvaluator produces stable and accurate performance estimates at substantially reduced cost compared to conventional approaches, making scalable benchmarking of emerging models on unlabeled data practical.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces MetaEvaluator, a meta-learning framework that trains a transferable initialization over a pool of reference models to enable label-free performance estimation for entirely new, unseen models on unlabeled target data. It claims to be the first model-agnostic method for this task, amortizing evaluation cost across the reference pool without requiring per-model retraining or labels, and asserts that extensive experiments demonstrate stable, accurate estimates at substantially lower cost than conventional approaches.

Significance. If the transferability claim holds, the work could meaningfully lower the barrier to benchmarking new models on unlabeled data across architectures and modalities. The amortization of meta-learning cost and removal of annotation requirements address a practical pain point in ML deployment and evaluation pipelines.

major comments (2)
  1. [Abstract] Abstract: the central claim that a single meta-learned initialization generalizes to 'entirely new model families, architectures, and modalities' on completely unlabeled target data without any per-model adaptation or labels is presented without any supporting cross-family, cross-architecture, or cross-modal results, quantitative metrics, or description of the reference-pool diversity; this assumption is load-bearing for the model-agnostic and label-free assertions.
  2. [Abstract] Abstract: the statement that 'extensive experiments show that MetaEvaluator produces stable and accurate performance estimates' is made without reference to any datasets, baselines, evaluation metrics, number of trials, or numerical results, so it is impossible to determine whether the data actually support the accuracy and cost-reduction claims.
minor comments (1)
  1. [Abstract] The abstract uses the phrase 'to the best of our knowledge' for the 'first model-agnostic framework' claim but provides no comparison table or citation list to prior meta-learning or label-free evaluation methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these comments on the abstract. We agree that the abstract would be strengthened by incorporating more specific details from the experiments and will revise it accordingly. We address each point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that a single meta-learned initialization generalizes to 'entirely new model families, architectures, and modalities' on completely unlabeled target data without any per-model adaptation or labels is presented without any supporting cross-family, cross-architecture, or cross-modal results, quantitative metrics, or description of the reference-pool diversity; this assumption is load-bearing for the model-agnostic and label-free assertions.

    Authors: The manuscript body (Sections 4–5) contains the supporting cross-family, cross-architecture, and cross-modal results, including quantitative metrics and a description of the reference-pool composition and diversity. The abstract is a concise summary of these findings. We will revise the abstract to briefly note the reference-pool diversity and key generalization metrics so that the model-agnostic and label-free claims are better grounded within the abstract itself. revision: yes

  2. Referee: [Abstract] Abstract: the statement that 'extensive experiments show that MetaEvaluator produces stable and accurate performance estimates' is made without reference to any datasets, baselines, evaluation metrics, number of trials, or numerical results, so it is impossible to determine whether the data actually support the accuracy and cost-reduction claims.

    Authors: The full manuscript details the datasets, baselines, metrics (e.g., MAE, correlation), number of trials, and numerical results supporting stability and accuracy. We will revise the abstract to include concise references to the experimental scope (e.g., number of datasets and main performance metrics) to make these claims more verifiable from the abstract alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework proposal is empirically grounded rather than self-referential by construction

full rationale

The paper presents MetaEvaluator as a meta-learning framework trained on a reference pool of models and then applied to held-out new models on unlabeled data. This follows the standard meta-learning train/test split on distinct model sets and does not reduce any claimed performance estimate to a fitted parameter or self-citation by definition. No equations, uniqueness theorems, or ansatzes are shown that would make the output equivalent to the input by construction. The transfer claim to new families/modalities is an empirical assertion whose validity is independent of the meta-training procedure itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; full text would be required to populate the ledger.

pith-pipeline@v0.9.0 · 5705 in / 1012 out tokens · 20690 ms · 2026-05-25T04:42:24.612571+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · 3 internal anchors

  1. [1]

    Anastasios N Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zrnic. 2023. Prediction-powered inference.Science(2023)

  2. [2]

    Pierre Boyeau, Anastasios Nikolas Angelopoulos, Tianle Li, Nir Yosef, Jitendra Malik, and Michael I. Jordan. 2025. AutoEval Done Right: Using Synthetic Data for Model Evaluation. InICML. Cost-Effective Model Evaluation with Meta-Learning Preprint, 2026,

  3. [3]

    Jiefeng Chen, Frederick Liu, Besim Avci, Xi Wu, Yingyu Liang, and Somesh Jha. 2021. Detecting errors and estimating accuracy on unlabeled data with self-training ensembles.NeurIPS(2021)

  4. [4]

    Shaoguo Cui, Keying Wen, Binbin Sang, Tiansong Li, Yi Zhang, and Huan Gao

  5. [5]

    LLM-Based Data Synthesis and Distillation for High-Quality Text-to-SQL Training. InICIC

  6. [6]

    Yaxun Dai, Haiqin Yang, Mou Hao, and Pingfu Chao. 2025. PARSQL: Enhancing Text-to-SQL through SQL Parsing and Reasoning. InACL

  7. [7]

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Ima- geNet: A Large-Scale Hierarchical Image Database. InCVPR

  8. [8]

    Weijian Deng and Liang Zheng. 2021. Are labels always necessary for classifier accuracy evaluation?. InCVPR

  9. [9]

    Chi Thang Duong, Thanh Tam Nguyen, Trung-Dung Hoang, Hongzhi Yin, Matthias Weidlich, and Quoc Viet Hung Nguyen. 2022. Deep MinCut: Learning Node Embeddings from Detecting Communities.Pattern Recognition(2022), 109126

  10. [10]

    Chi Thang Duong, Thanh Tam Nguyen, Hongzhi Yin, Matthias Weidlich, Thai Son Mai, Karl Aberer, and Quoc Viet Hung Nguyen. 2022. Efficient and Effective Multi- Modal Queries Through Heterogeneous Network Embedding.IEEE Transactions on Knowledge and Data Engineering34, 11 (2022), 5307–5320

  11. [11]

    Gus Eggert, Kevin Huo, Max Biven, Jeff Waugh, et al. 2023. TabLib: A Dataset of 627M Tables with Context.arXiv:2310.07875(2023)

  12. [12]

    Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. IJCV(2010)

  13. [13]

    Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta- Learning for Fast Adaptation of Deep Networks. InICML

  14. [14]

    Alex Hofer, Bhuwan Dhingra, Amir Globerson, and William W

    Adam Fisch, Joshua Maynez, R. Alex Hofer, Bhuwan Dhingra, Amir Globerson, and William W. Cohen. 2024. Stratified Prediction-Powered Inference for Effective Hybrid Evaluation of Language Models. InNeurIPS

  15. [15]

    Saurabh Garg, Sivaraman Balakrishnan, Zachary Chase Lipton, Behnam Neyshabur, and Hanie Sedghi. 2022. Leveraging unlabeled data to predict out-of- distribution performance. (2022)

  16. [16]

    Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al . 2024. A survey on llm-as-a-judge.The Innovation(2024)

  17. [17]

    Devin Guillory, Vaishaal Shankar, Sayna Ebrahimi, Trevor Darrell, and Ludwig Schmidt. 2021. Predicting with confidence on unseen distributions. InICCV

  18. [18]

    Yu Guo, Dong Jin, Shenghao Ye, Shuangwu Chen, Jian Yang, and Xiaobin Tan

  19. [19]

    SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs. InACL

  20. [20]

    Rundong He, Yicong Dong, Lan-Zhe Guo, Yilong Yin, and Tailin Wu. 2025. Re- Evaluating the Impact of Unseen-Class Unlabeled Data on Semi-Supervised Learning Model. InICLR

  21. [21]

    Thanh Dat Hoang, Thanh Trung Huynh, Matthias Weidlich, Thanh Tam Nguyen, Tong Chen, Hongzhi Yin, and Quoc Viet Hung Nguyen. 2026. Boosting Small Language Models for Text-to-SQL with Fine-Grained Execution Feedback and Cost-Efficient Rewards. InICDE. IEEE

  22. [22]

    Jonathan J. Hull. 2002. A database for handwritten text recognition research. TPAMI(2002)

  23. [23]

    Nguyen Quoc Viet Hung, Duong Chi Thang, Nguyen Thanh Tam, Matthias Weidlich, Karl Aberer, Hongzhi Yin, and Xiaofang Zhou. 2017. Answer validation for generic crowdsourcing tasks with minimal efforts.The VLDB Journal26 (2017), 855–880

  24. [24]

    Nguyen Quoc Viet Hung, Matthias Weidlich, Nguyen Thanh Tam, Zoltán Miklós, Karl Aberer, Avigdor Gal, and Bela Stantic. 2019. Handling probabilistic integrity constraints in pay-as-you-go reconciliation of data models.Information Systems 83 (2019), 166–180

  25. [25]

    Thanh Trung Huynh, Chi Thang Duong, Thanh Tam Nguyen, Vinh Tong Van, Ab- dul Sattar, Hongzhi Yin, and Quoc Viet Hung Nguyen. 2021. Network alignment with holistic embeddings.TKDE35, 2 (2021), 1881–1894

  26. [26]

    Thanh Trung Huynh, Minh Hieu Nguyen, Thanh Tam Nguyen, Phi Le Nguyen, Matthias Weidlich, Quoc Viet Hung Nguyen, and Karl Aberer. 2023. Efficient integration of multi-order dynamics and internal dynamics in stock movement prediction. InProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 850–858

  27. [27]

    Thanh Trung Huynh, Trong Bang Nguyen, Phi Le Nguyen, Thanh Tam Nguyen, Matthias Weidlich, Quoc Viet Hung Nguyen, and Karl Aberer. 2024. Fast-fedul: A training-free federated unlearning with provable skew resilience. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 55–72

  28. [28]

    Thanh Trung Huynh, Trong Bang Nguyen, Thanh Toan Nguyen, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen, and Thanh Tam Nguyen. 2025. Certified Unlearning for Federated Recommendation.ACM Transactions on Information Systems(2025)

  29. [29]

    Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, and J Zico Kolter. 2022. Assessing Generalization of SGD via Disagreement. InICLR

  30. [30]

    Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 2002. Gradient- based learning applied to document recognition.Proc. IEEE(2002)

  31. [31]

    Chia-Hsuan Lee, Hao Cheng, Jacob Devlin, Kristina Toutanova, and Jianfeng Gao

  32. [32]

    KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers. InACL

  33. [33]

    Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin, Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, and Edward Choi. 2022. EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records. InNeurIPS

  34. [34]

    Fangyu Lei, Jixuan Chen, Yuxiao Ye, Ruisheng Cao, Dongchan Shin, Hongjin SU, ZHAOQING SUO, Hongcheng Gao, Wenjing Hu, Pengcheng Yin, Victor Zhong, Caiming Xiong, Ruoxi Sun, Qian Liu, Sida Wang, and Tao Yu. 2025. Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows. InICLR

  35. [35]

    Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, and Nan Tang. 2024. The Dawn of Natural Language to SQL: Are We Fully Ready?VLDB(2024)

  36. [36]

    Haoyang Li, Shang Wu, Xiaokang Zhang, Xinmei Huang, Jing Zhang, Fuxin Jiang, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Hong Chen, and Cuiping Li

  37. [37]

    OmniSQL: Synthesizing High-Quality Text-to-SQL Data at Scale.VLDB (2025)

  38. [38]

    Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Rongyu Cao, Ruiying Geng, et al. 2023. Can LLM Already Serve as a Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to- SQLs. InNeurIPS

  39. [39]

    Zhenguo Li, Fengwei Zhou, Fei Chen, and Hang Li. 2017. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning.arXiv:1707.09835(2017)

  40. [40]

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. InECCV

  41. [41]

    Renpu Liu and Jing Yang. 2025. Unlabeled Data Can Provably Enhance In-Context Learning of Transformers. InNeurIPS

  42. [42]

    Xinyu Liu, Shuyu Shen, Boyan Li, Nan Tang, and Yuyu Luo. 2025. NL2SQL-BUGs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation. InSIGKDD

  43. [43]

    Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Baolin Wu, An- drew Y Ng, et al. 2011. Reading digits in natural images with unsupervised feature learning. InNeurIPS

  44. [44]

    Dong Duc Anh Nguyen, Minh Hieu Nguyen, Phi Le Nguyen, Jun Jo, Hongzhi Yin, and Thanh Tam Nguyen. 2024. Multi-task Learning of Heterogeneous Hypergraph Representations in LBSNs. InInternational Conference on Advanced Data Mining and Applications. Springer, 161–177

  45. [45]

    Minh Hieu Nguyen, Thanh Trung Huynh, Thanh Toan Nguyen, Phi Le Nguyen, Hien Thu Pham, Jun Jo, and Thanh Tam Nguyen. 2025. On-device diagnos- tic recommendation with heterogeneous federated BlockNets.Science China Information Sciences68, 4 (2025), 140102

  46. [46]

    Minh Hieu Nguyen, Thanh Tam Nguyen, Jun Jo, Duc Anh Nguyen, Hongzhi Yin, and Quoc Viet Hung Nguyen. 2026. Handling Data Sparsity and Model Poisoning Attacks in Federated Sequential Recommender Systems.Knowledge- Based Systems(2026), 115545

  47. [47]

    Quoc Viet Hung Nguyen, Son Thanh Do, Thanh Tam Nguyen, and Karl Aberer

  48. [48]

    InInternational Conference on Database Systems for Advanced Applications

    Tag-based paper retrieval: minimizing user effort with diversity awareness. InInternational Conference on Database Systems for Advanced Applications. 510– 528

  49. [49]

    Quoc Viet Hung Nguyen, Chi Thang Duong, Thanh Tam Nguyen, Matthias Wei- dlich, Karl Aberer, Hongzhi Yin, and Xiaofang Zhou. 2017. Argument discovery via crowdsourcing.The VLDB Journal26, 4 (2017), 511–535

  50. [50]

    Quoc Viet Hung Nguyen, Thanh Tam Nguyen, Vinh Tuan Chau, Tri Kurniawan Wijaya, Zoltán Miklós, Karl Aberer, Avigdor Gal, and Matthias Weidlich. 2015. SMART: A tool for analyzing and reconciling schema matching networks. In ICDE. 1488–1491

  51. [51]

    Quoc Viet Hung Nguyen, Tam Nguyen Thanh, Zoltán Miklós, and Karl Aberer

  52. [52]

    Reconciling schema matching networks through crowdsourcing.EAI Endorsed Transactions on Collaborative Computing1, 2 (2014), e2

  53. [53]

    Quoc Viet Hung Nguyen, Kai Zheng, Matthias Weidlich, Bolong Zheng, Hongzhi Yin, Thanh Tam Nguyen, and Bela Stantic. 2018. What-if analysis with conflicting goals: Recommending data ranges for exploration. In2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 89–100

  54. [54]

    Thanh Tam Nguyen, Thanh Trung Huynh, Zhao Ren, Thanh Toan Nguyen, Phi Le Nguyen, Hongzhi Yin, and Quoc Viet Hung Nguyen. 2025. Privacy-preserving explainable AI: a survey.Science China Information Sciences68, 1 (2025), 111101

  55. [55]

    Thanh Tam Nguyen, Thanh Trung Huynh, Hongzhi Yin, Matthias Weidlich, Thanh Thi Nguyen, Thai Son Mai, and Quoc Viet Hung Nguyen. 2023. Detecting rumours with latency guarantees using massive streaming data.The VLDB Journal32, 2 (2023), 369–387

  56. [56]

    Thanh Toan Nguyen, Thanh Tam Nguyen, Thanh Hung Nguyen, Hongzhi Yin, Thanh Thi Nguyen, Jun Jo, and Quoc Viet Hung Nguyen. 2023. Isomorphic Graph Embedding for Progressive Maximal Frequent Subgraph Mining.ACM Transactions on Intelligent Systems and Technology15, 1 (2023), 1–26

  57. [57]

    Thanh Tam Nguyen, Thanh Toan Nguyen, Matthias Weidlich, Jun Jo, Quoc Viet Hung Nguyen, Hongzhi Yin, and Alan Wee-Chung Liew. 2024. Handling Low Homophily in Recommender Systems with Partitioned Graph Transformer. Preprint, 2026, Pham et al. IEEE Transactions on Knowledge and Data Engineering(2024)

  58. [58]

    Thanh Tam Nguyen, Thanh Cong Phan, Minh Hieu Nguyen, Matthias Weidlich, Hongzhi Yin, Jun Jo, and Quoc Viet Hung Nguyen. 2022. Model-agnostic and diverse explanations for streaming rumour graphs.Knowledge-Based Systems 253 (2022), 109438

  59. [59]

    Thanh Tam Nguyen, Thanh Cong Phan, Hien Thu Pham, Thanh Thi Nguyen, Jun Jo, and Quoc Viet Hung Nguyen. 2023. Example-based explanations for streaming fraud detection on graphs.Information Sciences621 (2023), 319–340

  60. [60]

    Thanh Toan Nguyen, Nguyen Quoc Viet Hung, Thanh Tam Nguyen, Thanh Trung Huynh, Thanh Thi Nguyen, Matthias Weidlich, and Hongzhi Yin. 2024. Manipu- lating recommender systems: A survey of poisoning attacks and countermeasures. Comput. Surveys57, 1 (2024), 1–39

  61. [61]

    Thanh Tam Nguyen, Zhao Ren, Thanh Toan Nguyen, Jun Jo, Quoc Viet Hung Nguyen, and Hongzhi Yin. 2024. Portable graph-based rumour detection against multi-modal heterophily.Knowledge-Based Systems284 (2024), 111310

  62. [62]

    Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Phi Le Nguyen, Quoc Viet Hung Nguyen, and Hongzhi Yin. 2026. A review of instruction-guided image editing. EAAI(2026)

  63. [63]

    Thanh Tam Nguyen, Matthias Weidlich, Hongzhi Yin, Bolong Zheng, Quang Huy Nguyen, and Quoc Viet Hung Nguyen. 2020. Factcatch: Incremental pay-as-you- go fact checking with minimal user effort. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2165–2168

  64. [64]

    Toan Nguyen Thanh, Nguyen Duc Khang Quach, Thanh Tam Nguyen, Thanh Trung Huynh, Viet Hung Vu, Phi Le Nguyen, Jun Jo, and Quoc Viet Hung Nguyen. 2023. Poisoning GNN-based recommender systems with generative surrogate-based attacks.ACM Transactions on Information Systems41, 3 (2023), 1–24

  65. [65]

    Alex Nichol, Joshua Achiam, and John Schulman. 2018. On First-Order Meta- Learning Algorithms.arXiv:1803.02999(2018)

  66. [66]

    Khanh Trinh Pham, Thu Huong Nguyen, Jun Jo, Quoc Viet Hung Nguyen, and Thanh Tam Nguyen. 2025. Multilingual Text-to-SQL: Benchmarking the Limits of Language Models with Collaborative Language Agents. InAustralasian Database Conference. Springer, 108–123

  67. [67]

    Khanh Trinh Pham, Thanh Tam Nguyen, Viet Huynh, Hongzhi Yin, and Quoc Viet Hung Nguyen. 2026. An Efficient and Effective Evaluator for Text2SQL Models on Unseen and Unlabeled Data. In2026 IEEE 42nd International Conference on Data Engineering (ICDE). IEEE

  68. [68]

    Minh Tam Pham, Thanh Trung Huynh, Thanh Tam Nguyen, Thanh Toan Nguyen, Thanh Thi Nguyen, Jun Jo, Hongzhi Yin, and Quoc Viet Hung Nguyen. 2024. A dual benchmarking study of facial forgery and facial forensics.CAAI Transactions on Intelligence Technology9, 6 (2024), 1377–1397

  69. [69]

    Minh Tam Pham, Quoc Viet Hung Nguyen, Jun Jo, and Thanh Tam Nguyen. 2025. An Extensible Benchmark for Value Ambiguity Resolution in Text-to-SQL. In Australasian Database Conference. Springer, 124–138

  70. [70]

    Trinh Pham, Viet Huynh, Hongzhi Yin, Quoc Viet Hung Nguyen, and Thanh Tam Nguyen. 2026. Learning to Evaluate: Cost-Effective Model Evaluation on Unla- beled Data with Meta-Learning. InKDD

  71. [71]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al

  72. [72]

    Learning Transferable Visual Models From Natural Language Supervision. InICML

  73. [73]

    Anirudh Raghu, Maithra Raghu, Samy Bengio, and Oriol Vinyals. 2020. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML. InICLR

  74. [74]

    Zhao Ren, Yi Chang, Thanh Tam Nguyen, Yang Tan, Kun Qian, and Björn W Schuller. 2024. A comprehensive survey on heart sound analysis in the deep learning era.IEEE Computational Intelligence Magazine19, 3 (2024), 42–57

  75. [75]

    Zhao Ren, Thanh Tam Nguyen, and Wolfgang Nejdl. 2022. Prototype learning for interpretable respiratory sound analysis. InProc. ICASSP. 9087–9091

  76. [76]

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis with Latent Diffusion Models. InCVPR

  77. [77]

    Darnbi Sakong, Viet Hung Vu, Thanh Trung Huynh, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen, and Thanh Tam Nguyen. 2024. Higher-order knowledge-enhanced recommendation with heterogeneous hypergraph multi- attention.Information Sciences680 (2024), 121165

  78. [78]

    David Salinas, Omar Swelam, and Frank Hutter. 2025. Tuning LLM Judge Design Decisions for 1/1000 of the Cost. InICML

  79. [79]

    Sebastian Schelter, Tammo Rukat, and Felix Biessmann. 2020. Learning to Validate the Predictions of Black Box Classifiers on Unseen Data. InSIGMOD

  80. [80]

    Konstantin Schürholt, Diyar Taskiran, Boris Knyazev, Xavier Giró-i Nieto, and Damian Borth. 2022. Model zoos: A dataset of diverse populations of neural network models.NeurIPS(2022)

Showing first 80 references.