pith. machine review for the scientific record. sign in

arxiv: 2604.16309 · v1 · submitted 2026-01-29 · 💻 cs.SE · cs.CR

AgentGuard: A Multi-Agent Framework for Robust Package Confusion Detection via Hybrid Search and Metadata-Content Fusion

Pith reviewed 2026-05-16 09:55 UTC · model grok-4.3

classification 💻 cs.SE cs.CR
keywords package confusion detectionsoftware supply chainmulti-agent frameworkhybrid similarity searchmetadata content fusionfalse positive reductionadversarial evasionopen source security
0
0 comments X

The pith

AgentGuard detects confused packages by fusing metadata and content analysis after hybrid name search in a multi-agent setup.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AgentGuard as a multi-agent framework to catch package confusion attacks, where malicious code is published under names that closely resemble legitimate open-source packages. Existing single-signal methods that use only lexical or semantic name matching produce high false positives because they cannot tell apart benign similar names from malicious code that differs substantially. AgentGuard first locates candidate targets through hybrid similarity search on fine-tuned word embeddings, then applies a fused machine learning model that merges multi-dimensional metadata features with a new package content analysis group. Evaluation on the ConfuDB and NeupaneDB datasets shows higher precision and lower false-positive rates than the ConfuGuard and Typomind baselines while also surfacing the actual confused package. A reader would care because the approach directly targets a practical supply-chain risk that current tools leave unresolved.

Core claim

AgentGuard is a multi-agent framework that first discovers potential confusion targets using fine-tuned word embedding models with hybrid similarity search and then evaluates risk via a fused machine learning model that combines a multi-dimensional metadata group with a novel package content analysis group, thereby reducing false positives and mitigating adversarial evasion.

What carries the argument

The fused machine learning model that integrates a multi-dimensional metadata group and a novel package content analysis group after hybrid similarity search.

If this is right

  • Precision improves by 12% to 49% relative to ConfuGuard and Typomind on the evaluated datasets.
  • False-positive rate drops by 11% to 35% while still surfacing the confused package.
  • The hybrid search plus fused-model pipeline resists simple name-only evasion attempts.
  • The framework scales to real-world OSS repositories without relying on single-signal retrieval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Package managers could embed the fused model to flag uploads in real time before they reach users.
  • The same metadata-plus-content fusion pattern may apply to detecting other supply-chain impersonations such as domain or library-name collisions.
  • Security teams could reduce manual triage volume by routing only high-risk fused-model scores for review.
  • Periodic retraining of the content-analysis group on newly published packages would keep the detector current.

Load-bearing premise

The fused model that merges metadata and content signals will reliably separate benign similar-named packages from malicious ones without hidden biases or the need for post-hoc tuning.

What would settle it

Run AgentGuard on a fresh set of adversarial packages engineered to match both names and surface-level content patterns from the ConfuDB dataset and measure whether the reported false-positive rate stays below the baseline levels.

Figures

Figures reproduced from arXiv: 2604.16309 by Hao Liu, Junyi Tao, Lingxiao Jiang, Qiang Hu, Wei Ma, Ye Liu, Yongqiang Lyu, Yu Li, Zhi Chen.

Figure 1
Figure 1. Figure 1: The overall architecture and workflow of the AgentGuard system. The Orchestrator Agent coordinates three specialized [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Relationship of F1-score with threshold. The plot [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Target Discovery Rate (TDR@k) comparison of [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: SHAP dot plot showing the contribution of all 18 [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

The proliferation of open-source software (OSS) has made software supply chains prime targets for attacks like Package Confusion, where adversaries publish malicious packages with names deceptively similar to legitimate ones. To protect against such attacks and safeguard the use of OSS, multiple confusion detection methods have been proposed. However, existing methods are limited to single-signal retrieval strategies (relying solely on lexical or semantic metrics), struggle with high false positive rates (FPR), and are vulnerable to adversarial evasion. Critically, as content-agnostic approaches, they fundamentally fail to distinguish benign packages with high naming similarity from malicious, code-dissimilar impersonations, leading to persistent high FPR. To address these limitations, we introduce AgentGuard, a novel multi-agents based framework for package confusion detection. Specifically, it first discovers potential confusion targets using fine-tuned word embedding models with hybrid similarity search. After that, It subsequently evaluates risk via a fused machine learning model that uniquely combines: (1) a multi-dimensional metadata group and (2) a novel package content analysis group, to reduce the FPR and mitigate the impact of adversarial evasion. To assess the effectiveness of AgentGuard, we evaluate it on challenging ConfuDB and NeupaneDB datasets. Our results demonstrate that AgentGuard significantly outperforms state-of-the-art baselines, ConfuGuard and Typomind, improving precision by 12\%-49\% while simultaneously reducing the FPR by 11\%-35\%, and effectively discovers the confused package.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces AgentGuard, a multi-agent framework for detecting package confusion in open-source software supply chains. It first identifies candidate confusing packages via fine-tuned word embeddings and hybrid similarity search, then applies a fused machine learning model that combines a multi-dimensional metadata group with a novel package content analysis group to assess risk, lower false positive rates, and resist adversarial evasion. Evaluation on the ConfuDB and NeupaneDB datasets reports that AgentGuard outperforms baselines ConfuGuard and Typomind, with precision gains of 12%-49% and FPR reductions of 11%-35%.

Significance. If the performance claims are substantiated, the work could meaningfully advance software supply chain security by moving beyond single-signal lexical or semantic methods to a hybrid metadata-content approach that directly targets the content-agnostic limitations of prior detectors. The multi-agent structure and explicit content analysis group offer a concrete path toward lower-FPR, more evasion-resistant detection, which would be valuable for package registries and dependency tools.

major comments (3)
  1. [Evaluation] Evaluation section: the headline claim that the fused metadata-content model drives the 12%-49% precision lift and 11%-35% FPR drop is unsupported without ablation results (metadata-only, content-only, hybrid-search-only, and full-fusion runs). The reported gains could be attributable to the upstream fine-tuned embedding step alone.
  2. [Methodology] Methodology section: no training details, feature definitions, hyper-parameter settings, or cross-validation procedure are supplied for the fused ML model or the fine-tuned word embeddings. Without these, the precision and FPR numbers cannot be reproduced or verified.
  3. [§3] §3 (or equivalent): the multi-agent framework is described at a high level but lacks concrete specification of agent roles, communication protocol, and decision aggregation, which are load-bearing for the claimed robustness against evasion.
minor comments (2)
  1. [Abstract] The abstract and introduction use the term 'multi-agents based' inconsistently; standardize to 'multi-agent'.
  2. [Evaluation] Dataset descriptions for ConfuDB and NeupaneDB should include size, label distribution, and any preprocessing steps applied before evaluation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the current manuscript would benefit from additional ablation studies, expanded methodological details, and more concrete specifications of the multi-agent components. We will incorporate these revisions to strengthen the paper's reproducibility and clarity.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the headline claim that the fused metadata-content model drives the 12%-49% precision lift and 11%-35% FPR drop is unsupported without ablation results (metadata-only, content-only, hybrid-search-only, and full-fusion runs). The reported gains could be attributable to the upstream fine-tuned embedding step alone.

    Authors: We agree that ablation studies are required to isolate the contribution of the fused metadata-content model. In the revised manuscript we will add a dedicated ablation subsection reporting precision and FPR for (1) metadata-only, (2) content-only, (3) hybrid-search-only, and (4) the full fusion configuration on both ConfuDB and NeupaneDB. These results will demonstrate that the reported gains are not solely attributable to the embedding step. revision: yes

  2. Referee: [Methodology] Methodology section: no training details, feature definitions, hyper-parameter settings, or cross-validation procedure are supplied for the fused ML model or the fine-tuned word embeddings. Without these, the precision and FPR numbers cannot be reproduced or verified.

    Authors: We acknowledge the omission of implementation details. The revised manuscript will include a new subsection that fully specifies: the training corpus and procedure for the fine-tuned word embeddings, the exact feature definitions for the metadata and content groups, all hyper-parameter values and selection method, the loss function, optimizer, and the cross-validation protocol (including fold count and stratification). This will enable full reproducibility of the reported metrics. revision: yes

  3. Referee: [§3] §3 (or equivalent): the multi-agent framework is described at a high level but lacks concrete specification of agent roles, communication protocol, and decision aggregation, which are load-bearing for the claimed robustness against evasion.

    Authors: We will expand the description of the multi-agent framework in §3 (and add an accompanying figure and pseudocode listing). The revision will explicitly define each agent's role, the message format and protocol used for inter-agent communication, and the decision-aggregation rule (including how metadata and content signals are fused). These additions will clarify how the architecture contributes to evasion resistance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external datasets

full rationale

The paper introduces an empirical multi-agent framework evaluated on the external ConfuDB and NeupaneDB datasets with comparisons to baselines ConfuGuard and Typomind. No equations, derivations, or first-principles predictions appear in the provided text. Performance improvements are reported as measured outcomes rather than quantities forced by construction from fitted parameters or self-referential definitions. No load-bearing self-citations to prior author work are invoked to justify uniqueness or forbid alternatives. The skeptic concern about missing ablations addresses experimental completeness, not circular reduction of any derivation chain to its inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on the unverified effectiveness of the hybrid search step and the fusion model; no independent evidence or parameter-free derivation is supplied in the abstract.

free parameters (2)
  • fine-tuned word embedding parameters
    Fine-tuning implies parameters learned from data to support the hybrid similarity search.
  • fused ML model parameters and weights
    The machine learning model that combines metadata and content groups requires fitted parameters.
axioms (2)
  • domain assumption Hybrid similarity search on fine-tuned embeddings discovers relevant confusion targets
    Invoked as the first stage of the framework.
  • domain assumption Fused metadata and content analysis reduces FPR and mitigates adversarial evasion
    Central premise of the risk evaluation stage.

pith-pipeline@v0.9.0 · 5595 in / 1391 out tokens · 56337 ms · 2026-05-16T09:55:34.388937+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

  1. [1]

    npm: The Package Manager for Node.js

    npm, Inc. npm: The Package Manager for Node.js. https: //www.npmjs.com/, 2025

  2. [2]

    Acceptable use policy

    Python Software Foundation. Acceptable use policy. https://policies.python.org/pypi.org/ Acceptable-Use-Policy/, 2024

  3. [3]

    Npm package json: name

    NPM Contributors. Npm package json: name. https://docs.npmjs.com/cli/v9/configuring-npm/ package-json#name, 2024

  4. [4]

    The emergence of software diversity in maven central

    C ´esar Soto-Valero, Amine Benelallam, Nicolas Harrand, Olivier Barais, and Benoit Baudry. The emergence of software diversity in maven central. InProceedings of the 16th International Conference on Mining Software Repositories, MSR ’19, page 333–343. IEEE Press, 2019

  5. [5]

    Signing in four public software package registries: Quantity, quality, and influencing factors, 2024

    Taylor R Schorlemmer, Kelechi G Kalu, Luke Chigges, Kyung Myung Ko, Eman Abu Isghair, Saurabh Baghi, Santiago Torres-Arias, and James C Davis. Signing in four public software package registries: Quantity, quality, and influencing factors, 2024

  6. [6]

    On the feasibility of detecting injections in malicious npm packages

    Simone Scalco, Ranindya Paramitha, Duc-Ly Vu, and Fabio Massacci. On the feasibility of detecting injections in malicious npm packages. InProceedings of the 17th International Conference on Availability, Reliability and Security, ARES ’22, New York, NY , USA, 2022. Association for Computing Machinery

  7. [7]

    Lastpymile: identifying the discrepancy between sources and packages

    Duc-Ly Vu, Fabio Massacci, Ivan Pashchenko, Henrik Plate, and Antonino Sabetta. Lastpymile: identifying the discrepancy between sources and packages. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, page 780–792, New York, NY , USA, 2021. Assoc...

  8. [8]

    Breaking trust: Shades of crisis across an insecure software supply chain

    Trey Herr. Breaking trust: Shades of crisis across an insecure software supply chain. Technical report, Atlantic Council, July 2020

  9. [9]

    Towards using source code repositories to identify software supply chain attacks

    Duc Ly Vu, Ivan Pashchenko, Fabio Massacci, Henrik Plate, and Antonino Sabetta. Towards using source code repositories to identify software supply chain attacks. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, CCS ’20, page 2093–2095, New York, NY , USA, 2020. Association for Computing Machinery

  10. [10]

    State of the softwarw supply chain

    SonaType. State of the softwarw supply chain. technical report, 2021

  11. [11]

    Typosquatting and com- bosquatting attacks on the python ecosystem.2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pages 509–514, 2020

    Duc-Ly Vu, Ivan Pashchenko, Fabio Massacci, Henrik Plate, and Antonino Sabetta. Typosquatting and com- bosquatting attacks on the python ecosystem.2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pages 509–514, 2020

  12. [12]

    Beyond typosquatting: an in-depth look at package confusion

    Shradha Neupane, Grant Holmes, Elizabeth Wyss, Drew Davidson, and Lorenzo De Carli. Beyond typosquatting: an in-depth look at package confusion. InProceedings of the 32nd USENIX Conference on Security Symposium, SEC ’23, USA, 2023. USENIX Association

  13. [13]

    Agentguard: An active threat discovery system for package confusion using multi- agent collaboration, 2025

    Wei Ma, Yu Li, Zhi Chen, Ye Liu, Lingxiao Jiang, Qiang Hu, and Junyi Tao. Agentguard: An active threat discovery system for package confusion using multi- agent collaboration, 2025

  14. [14]

    Kalu, Sofia Okorafor, Bet ¨ul Durak, Kim Laine, Radames C

    Kelechi G. Kalu, Sofia Okorafor, Bet ¨ul Durak, Kim Laine, Radames C. Moreno, Santiago Torres-Arias, and James C. Davis. Arms: A vision for actor reputation metric systems in the open-source software supply chain, 2025

  15. [15]

    Amusuo, Kyle A

    Paschal C. Amusuo, Kyle A. Robinson, Tanmay Singla, Huiyun Peng, Aravind Machiry, Santiago Torres-Arias, Laurent Simon, and James C. Davis. ZTDJA V A: Miti- gating software supply chain vulnerabilities via zero-trust dependencies. In47th IEEE/ACM International Con- ference on Software Engineering, ICSE 2025, Ottawa, ON, Canada, April 26 - May 6, 2025, pag...

  16. [16]

    A survey on common threats in npm and pypi registries, 2021

    Berkay Kaplan and Jingyu Qian. A survey on common threats in npm and pypi registries, 2021

  17. [17]

    Dns typo-squatting domain detection: A data analytics & machine learning based approach

    Abdallah Moubayed, MohammadNoor Injadat, Abdallah Shami, and Hanan Lutfiyya. Dns typo-squatting domain detection: A data analytics & machine learning based approach. In2018 IEEE Global Communications Confer- ence (GLOBECOM), page 1–7. IEEE, December 2018

  18. [18]

    Typosquatting 3.0: Characterizing Squatting in Blockchain Naming Sys- tems

    Muhammad Muzammil, Zhengyu Wu, Lalith Harisha, Brian Kondracki, and Nick Nikiforakis. Typosquatting 3.0: Characterizing Squatting in Blockchain Naming Sys- tems . In2024 APWG Symposium on Electronic Crime Research (eCrime), pages 94–108, Los Alamitos, CA, USA, September 2024. IEEE Computer Society

  19. [19]

    Exploring the unchartered space of container registry typosquatting

    Guannan Liu, Xing Gao, Haining Wang, and Kun Sun. Exploring the unchartered space of container registry typosquatting. In31st USENIX Security Symposium (USENIX Security 22), pages 35–51, Boston, MA, Au- gust 2022. USENIX Association

  20. [20]

    Defending against package typosquatting

    Matthew Taylor, Ruturaj Vaidya, Drew Davidson, Lorenzo De Carli, and Vaibhav Rastogi. Defending against package typosquatting. InNetwork and System Security: 14th International Conference, NSS 2020, Mel- bourne, VIC, Australia, November 25–27, 2020, Proceed- ings, page 112–131, Berlin, Heidelberg, 2020. Springer- Verlag

  21. [21]

    Microsoft ossgadget

    Microsoft. Microsoft ossgadget. https://github.com/ microsoft/OSSGadget. 13

  22. [22]

    Enriching word vectors with subword information, 2017

    Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching word vectors with subword information, 2017

  23. [23]

    Advances in pre- training distributed word representations

    Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. Advances in pre- training distributed word representations. In Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, H ´el`ene Mazo, Asun- cion Moreno, Jan Odijk, Stelios Piper...

  24. [24]

    fastText: Library for efficient text classification and representation learning

    Meta AI. fastText: Library for efficient text classification and representation learning. https://fasttext.cc/, 2025. Accessed: 2025-11-18

  25. [25]

    Similarity as a risk factor in drug-name confu- sion errors: the look-alike (orthographic) and sound-alike (phonetic) model.Medical care, 37:1214–25, 01 2000

    Bruce Lambert, S Lin, Kwan-Young Chang, and Sanjay Gandhi. Similarity as a risk factor in drug-name confu- sion errors: the look-alike (orthographic) and sound-alike (phonetic) model.Medical care, 37:1214–25, 01 2000

  26. [26]

    Automated detection of wrong-drug prescribing errors.BMJ Quality & Safety, 28:bmjqs–2019, 08 2019

    Bruce Lambert, William Galanter, King Liu, Suzanne Falck, Gordon Schiff, Christine Rash-Foanio, Kelly Schmidt, Neeha Shrestha, Allen Vaida, and Michael Gaunt. Automated detection of wrong-drug prescribing errors.BMJ Quality & Safety, 28:bmjqs–2019, 08 2019

  27. [27]

    Smallworld with high risks: a study of security threats in the npm ecosystem

    Markus Zimmermann, Cristian-Alexandru Staicu, Cam Tenny, and Michael Pradel. Smallworld with high risks: a study of security threats in the npm ecosystem. In Proceedings of the 28th USENIX Conference on Secu- rity Symposium, SEC’19, page 995–1010, USA, 2019. USENIX Association

  28. [28]

    Sok: Practical detection of software supply chain attacks

    Marc Ohm and Charlene Stuke. Sok: Practical detection of software supply chain attacks. InProceedings of the 18th International Conference on Availability, Reliability and Security, ARES ’23, New York, NY , USA, 2023. Association for Computing Machinery

  29. [29]

    Wenxin Jiang, Berk C ¸ akar, Mikola Lysenko, and James C. Davis. ConfuGuard: Using Metadata to Detect Active and Stealthy Package Confusion At- tacks Accurately and at Scale.arXiv e-prints, page arXiv:2502.20528, February 2025

  30. [30]

    Practical automated de- tection of malicious npm packages

    Adriana Sejfia and Max Sch ¨afer. Practical automated de- tection of malicious npm packages. InProceedings of the 44th International Conference on Software Engineering, page 1681–1692. ACM, May 2022

  31. [31]

    Detecting suspicious pack- age updates

    Kalil Garrett, Gabriel Ferreira, Limin Jia, Joshua Sun- shine, and Christian K ¨astner. Detecting suspicious pack- age updates. InProceedings of the 41st International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER ’19, page 13–16. IEEE Press, 2019

  32. [32]

    What the fork? finding hidden code clones in npm

    Elizabeth Wyss, Lorenzo De Carli, and Drew Davidson. What the fork? finding hidden code clones in npm. InProceedings of the 44th International Conference on Software Engineering, ICSE ’22, page 2415–2426, New York, NY , USA, 2022. Association for Computing Machinery

  33. [33]

    Malicious package detection using metadata information

    Sajal Halder, Michael Bewong, Arash Mahboubi, Yinhao Jiang, Md Rafiqul Islam, Md Zahid Islam, Ryan HL Ip, Muhammad Ejaz Ahmed, Gowri Sankar Ramachandran, and Muhammad Ali Babar. Malicious package detection using metadata information. InProceedings of the ACM Web Conference 2024, WWW ’24, page 1779–1789, New York, NY , USA, 2024. Association for Computing ...

  34. [34]

    Libraries.io: The Open Source Discovery Service

    Libraries.io. Libraries.io: The Open Source Discovery Service. https://libraries.io/, 2025

  35. [35]

    Ecosyste.ms: Open Data to Empower the Software Ecosystem

    Ecosyste.ms. Ecosyste.ms: Open Data to Empower the Software Ecosystem. https://ecosyste.ms/, 2025

  36. [36]

    Levenshtein

    Vladimir I. Levenshtein. Binary codes capable of correct- ing deletions, insertions, and reversals.Soviet physics. Doklady, 10:707–710, 1965

  37. [37]

    William E. Winkler. String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. InProceedings of the Section on Sur- vey Research Methods, American Statistical Association, pages 354–359, 1990

  38. [38]

    A case of identity: Detection of suspicious idn homograph domains using active dns measurements

    Ramin Yazdani, Olivier van der Toorn, and Anna Sper- otto. A case of identity: Detection of suspicious idn homograph domains using active dns measurements. In 2020 IEEE European Symposium on Security and Pri- vacy Workshops (EuroS&PW), pages 559–564, 2020

  39. [39]

    CodeBERT: A pre-trained model for programming and natural lan- guages

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. CodeBERT: A pre-trained model for programming and natural lan- guages. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguis- tics: EMNLP 2020, pages 1536–1547, Online, November

  40. [40]

    Association for Computational Linguistics

  41. [41]

    Random forests.Mach

    Leo Breiman. Random forests.Mach. Learn., 45(1):5–32, October 2001

  42. [42]

    A study of cross-validation and bootstrap for accuracy estimation and model selection

    Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. InInterna- tional Joint Conference on Artificial Intelligence, 1995

  43. [43]

    A k-fold averaging cross- validation procedure.Journal of Nonparametric Statis- tics, 27(2):167–179, April 2015

    Yoonsuh Jung and Jianhua Hu. A k-fold averaging cross- validation procedure.Journal of Nonparametric Statis- tics, 27(2):167–179, April 2015. Publisher Copyright: © 2015, © 2015 American Statistical Association and Taylor & Francis

  44. [44]

    Wolfinger and Pei-Yi Tan

    Russell D. Wolfinger and Pei-Yi Tan. Stacked ensemble models for improved prediction accuracy. InProceedings of the SAS Global Forum 2017, 2017

  45. [45]

    Maddix, Yuyang Wang, Gau- rav Gupta, and Youngsuk Park

    Hilaf Hasson, Danielle C. Maddix, Yuyang Wang, Gau- rav Gupta, and Youngsuk Park. Theoretical guarantees of learning ensembling strategies with applications to time series forecasting, 2023

  46. [46]

    An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, June 2006

    Tom Fawcett. An introduction to roc analysis.Pattern Recognition Letters, 27(8):861–874, June 2006

  47. [47]

    The meaning and use of the area under a receiver operating characteristic (roc) curve

    J A Hanley and B J McNeil. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36, April 1982

  48. [48]

    A unified approach to interpreting model predictions, 2017

    Scott Lundberg and Su-In Lee. A unified approach to interpreting model predictions, 2017