pith. sign in

arxiv: 2605.09817 · v2 · pith:TAVMBJ67new · submitted 2026-05-10 · 💻 cs.SE · cs.CR

Evaluating Tool Cloning in Agentic-AI Ecosystems

Pith reviewed 2026-05-20 22:26 UTC · model grok-4.3

classification 💻 cs.SE cs.CR
keywords tool cloningagentic AILLM agentscode duplicationsimilarity metricsMCP ecosystembenchmark contaminationsoftware provenance
0
0 comments X

The pith

Tool cloning is pervasive in agentic AI ecosystems, with most high-similarity repository pairs confirmed as clones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Agent tools let LLM agents reach external services, yet marketplaces may contain many cloned or lightly modified versions that inflate apparent diversity. The paper assembles a dataset of 8,861 repositories containing over 100,000 tools and runs pairwise similarity checks with lexical and fuzzy metrics across MCP and Skills platforms. Manual checks on sampled high-similarity pairs show that 60 percent of high-Jaccard cases and 85 percent of high-ssdeep cases in the MCP ecosystem are genuine clones. These findings mean raw tool counts overstate ecosystem variety and can introduce hidden duplication into benchmarks, security analyses, and generalization tests.

Core claim

Cloning is not an isolated artifact: high-similarity regions appear across comparison settings, and 60% of high-Jaccard candidates and 85% of high-ssdeep candidates in the MCP ecosystem are manually verified as clones. These results indicate that tool cloning is a pervasive and severe source of hidden duplication in agent-tool ecosystems.

What carries the argument

repository-level auditing pipeline that computes pairwise similarity using complementary lexical (Jaccard) and fuzzy-structural (ssdeep) metrics on extracted tool code

If this is right

  • Raw counts of tools in marketplaces substantially overstate actual ecosystem diversity.
  • Benchmark splits that ignore repository provenance become contaminated by duplicate implementations.
  • Vulnerable code can propagate more widely through cloned tools.
  • Measurements of tool-use generalization become biased when cloned variants are treated as independent.
  • Provenance, attribution, and intellectual-property questions arise for derived tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Evaluation protocols for agent tool use should incorporate similarity-based deduplication before reporting performance numbers.
  • Marketplace operators could add automated clone detection to surface derivative tools for users.
  • Security scanning efforts would benefit from treating high-similarity clusters as single entities rather than independent codebases.

Load-bearing premise

The 100 sampled high-similarity pairs per ecosystem represent the full distribution of clones and manual review reliably separates true cloning from coincidental similarity or shared templates.

What would settle it

A full manual audit of every high-similarity pair that finds verification rates well below 60 percent for Jaccard and 85 percent for ssdeep would falsify the claim of pervasive cloning.

Figures

Figures reproduced from arXiv: 2605.09817 by David Jiang, Neil Gong, Taein Kim, Yuepeng Hu, Yuqi Jia.

Figure 1
Figure 1. Figure 1: Description length distributions for MCP and Skills tools. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Functionality and description-space analysis of MCP tools. (a) MCP functionality dis [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Developer contribution distributions in the MCP and Skills ecosystems. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pairwise repository similarity distributions across three comparison groups. The top row [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of MCP and Skills repository sizes measured by normalized source tokens. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of tool counts for the top 40 authors in the MCP tool ecosystem. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of skill counts for the top 40 authors in the Skills tool ecosystem. [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Log-log distributions of developer contribution frequency. (a) MCP ecosystem. (b) Skills [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Distribution of MCP and Skills tools for authors present in both ecosystems (log scale). [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Functionality and description-space analysis of Skills. (a) Skills functionality distribu [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
read the original abstract

Agent tools are becoming a core interface through which LLM agents access external data, services, and execution environments. As these tools are distributed through public marketplaces, raw tool counts may substantially overstate ecosystem diversity if many repositories are cloned, lightly modified, or derived from shared templates. Such hidden duplication can contaminate benchmark splits, propagate vulnerable implementations, bias measurements of tool-use generalization, and raise provenance, attribution, and intellectual-property concerns. We present, to our knowledge, the first large-scale measurement study of tool cloning in agentic AI ecosystems. We curate a unified dataset from multiple public platforms, covering 7,508 Model Context Protocol (MCP) repositories with 87,564 extracted tools and 1,353 Skills repositories with 12,447 tools, for a total of 8,861 repositories and 100,011 tool entries. To measure implementation-level duplication, we build a repository-level auditing pipeline using complementary lexical and fuzzy-structural similarity metrics, and compute pairwise similarity across MCP-to-MCP, Skills-to-Skills, and MCP-to-Skills repository pairs. We further manually verify 100 sampled pairs per MCP and Skills ecosystem across similarity-score buckets to calibrate how often high similarity reflects true code cloning. Our analysis shows that cloning is not an isolated artifact: high-similarity regions appear across comparison settings, and 60\% of high-Jaccard candidates and 85\% of high-ssdeep candidates in the MCP ecosystem are manually verified as clones. These results indicate that tool cloning is a pervasive and severe source of hidden duplication in agent-tool ecosystems. They further suggest that agent-tool datasets and benchmarks should account for repository provenance and implementation similarity when measuring tool diversity or constructing evaluation splits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the first large-scale empirical measurement of tool cloning in agentic-AI ecosystems. It curates a unified dataset of 8,861 repositories and 100,011 tools from the Model Context Protocol (MCP) and Skills platforms, computes pairwise repository-level similarities using complementary lexical (Jaccard) and fuzzy-structural (ssdeep) metrics across MCP-to-MCP, Skills-to-Skills, and cross-ecosystem pairs, and manually verifies 100 sampled high-similarity pairs per ecosystem across similarity-score buckets. The central finding is that cloning is pervasive rather than isolated, with 60% of high-Jaccard candidates and 85% of high-ssdeep candidates in the MCP ecosystem manually confirmed as clones, implying that raw tool counts overstate diversity and that benchmarks must account for provenance and implementation similarity.

Significance. If the sampling and verification procedures prove robust, the work supplies a valuable baseline for an emerging problem at the intersection of software engineering, AI agents, and tool ecosystems. The scale of the curated dataset, the use of multiple complementary similarity metrics, and the explicit linkage to downstream concerns (benchmark contamination, vulnerability propagation, IP) are clear strengths. The empirical focus on hidden duplication addresses a practical gap that existing tool-use literature has largely overlooked.

major comments (2)
  1. Abstract and auditing-pipeline description: The headline clone rates (60% high-Jaccard, 85% high-ssdeep in MCP) are computed solely from the 100 manually inspected pairs per ecosystem. The text states that sampling occurs “across similarity-score buckets” but supplies neither the bucket boundaries, the total number of candidate pairs falling into each bucket, the exact sampling procedure within buckets, nor the verification rubric used to distinguish true implementation clones from shared templates or coincidental similarity. Because these rates are the primary quantitative support for the pervasiveness claim, the absence of this information prevents assessment of representativeness and reproducibility.
  2. Manual-verification procedure: No inter-rater agreement statistic, number of annotators, or explicit decision criteria for labeling a pair as a clone are reported. Without these details it is impossible to gauge the reliability of the 60% and 85% figures or to determine whether the manual labels systematically separate cloning from boilerplate reuse.
minor comments (2)
  1. The abstract would be clearer if it stated the concrete similarity thresholds that define the “high-Jaccard” and “high-ssdeep” buckets from which the 100 pairs were drawn.
  2. The manuscript should report the total number of pairwise comparisons performed in each setting (MCP-to-MCP, Skills-to-Skills, MCP-to-Skills) so readers can contextualize the scale of the high-similarity regions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and constructive suggestions. We agree that additional details on the sampling and verification procedures are important for assessing the robustness of our findings. We address each major comment below and will incorporate the requested information into the revised manuscript.

read point-by-point responses
  1. Referee: Abstract and auditing-pipeline description: The headline clone rates (60% high-Jaccard, 85% high-ssdeep in MCP) are computed solely from the 100 manually inspected pairs per ecosystem. The text states that sampling occurs “across similarity-score buckets” but supplies neither the bucket boundaries, the total number of candidate pairs falling into each bucket, the exact sampling procedure within buckets, nor the verification rubric used to distinguish true implementation clones from shared templates or coincidental similarity. Because these rates are the primary quantitative support for the pervasiveness claim, the absence of this information prevents assessment of representativeness and reproducibility.

    Authors: We acknowledge that these methodological details were not included in the submitted manuscript. To address this, we will add a new subsection in the Methods section that specifies the similarity-score buckets used (for example, for Jaccard similarity: [0.7, 0.8), [0.8, 0.9), [0.9, 1.0]; similar for ssdeep), the total number of repository pairs in each bucket for the MCP ecosystem, the stratified sampling approach (selecting a fixed number of pairs from each bucket to ensure coverage across the similarity range), and the detailed verification rubric. The rubric defines a true clone as a pair where one repository is a direct copy or minor modification of the other, excluding cases of shared templates or coincidental similarity based on code structure and functionality. This will allow readers to evaluate the representativeness of the 100 sampled pairs. revision: yes

  2. Referee: Manual-verification procedure: No inter-rater agreement statistic, number of annotators, or explicit decision criteria for labeling a pair as a clone are reported. Without these details it is impossible to gauge the reliability of the 60% and 85% figures or to determine whether the manual labels systematically separate cloning from boilerplate reuse.

    Authors: We agree that reporting these details is essential for transparency. The manual verification was conducted by two independent annotators (both authors with expertise in software engineering), who reviewed each pair and labeled it as clone or not based on explicit criteria. Disagreements were resolved through discussion with a third author. We will report the inter-rater agreement using Cohen's kappa statistic in the revised version. The decision criteria include: (1) substantial code overlap (>60% after removing comments and whitespace), (2) evidence of direct copying such as identical file structures and function names with minor variable changes, or (3) one repository being a fork with limited modifications. Pairs with only shared dependencies or boilerplate code were not labeled as clones. We will include these details and the agreement statistic in the updated manuscript. revision: yes

Circularity Check

0 steps flagged

Empirical measurement study with no derivations or self-referential reductions

full rationale

The paper is a large-scale empirical measurement study that curates datasets, applies standard lexical and fuzzy similarity metrics (Jaccard, ssdeep), computes pairwise scores, and reports direct counts from manual verification of 100 sampled high-similarity pairs per ecosystem. No equations, fitted parameters, predictions, or derivations appear in the provided text. The reported clone fractions (60% Jaccard, 85% ssdeep) are literal outcomes of the verification step rather than quantities forced by construction from the similarity scores themselves. No self-citations, uniqueness theorems, or ansatzes are invoked to justify core claims. The analysis therefore remains self-contained and does not reduce to its inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the representativeness of the curated dataset and the validity of similarity metrics plus manual review as proxies for cloning. No free parameters are fitted to produce the headline percentages; thresholds appear chosen for analysis rather than optimized to the result.

axioms (2)
  • domain assumption High lexical or fuzzy-structural similarity between repositories indicates cloning or derivation rather than independent implementation of similar functionality.
    Invoked when interpreting high-Jaccard and high-ssdeep scores as evidence of cloning.
  • domain assumption The public marketplaces sampled are representative of the broader agent-tool ecosystem.
    Used to generalize from the 8,861 repositories to the ecosystem as a whole.

pith-pipeline@v0.9.0 · 5840 in / 1421 out tokens · 23191 ms · 2026-05-20T22:26:53.766654+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We build a repository-level auditing pipeline using complementary lexical and fuzzy-structural similarity metrics, and compute pairwise similarity across MCP-to-MCP, Skills-to-Skills, and MCP-to-Skills repository pairs. We further manually verify 100 sampled pairs per MCP and Skills ecosystem across similarity-score buckets

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 2 internal anchors

  1. [1]

    Proceedings

    Clone detection using abstract syntax trees , author=. Proceedings. International Conference on Software Maintenance , year=

  2. [2]

    Queen’s School of computing TR , year=

    A survey on software clone detection research , author=. Queen’s School of computing TR , year=

  3. [3]

    Science of computer programming , year=

    Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , author=. Science of computer programming , year=

  4. [4]

    2007 , publisher=

    Survey of research on software clones , author=. 2007 , publisher=

  5. [5]

    IEEE Transactions on software engineering , year=

    Comparison and evaluation of clone detection tools , author=. IEEE Transactions on software engineering , year=

  6. [6]

    2009 IEEE 31st International Conference on Software Engineering , pages=

    Do code clones matter? , author=. 2009 IEEE 31st International Conference on Software Engineering , pages=. 2009 , organization=

  7. [7]

    , author=

    On finding duplication and near-duplication in large software systems. , author=. wcre , volume=

  8. [8]

    1 , author=

    The distribution of the flora in the alpine zone. 1 , author=. New phytologist , volume=. 1912 , publisher=

  9. [9]

    Digital investigation , volume=

    Identifying almost identical files using context triggered piecewise hashing , author=. Digital investigation , volume=. 2006 , publisher=

  10. [10]

    , author=

    A comparison of string distance metrics for name-matching tasks. , author=. IIWeb , volume=

  11. [11]

    Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

    Finding near-duplicate web pages: a large-scale evaluation of algorithms , author=. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

  12. [12]

    , author=

    Drebin: Effective and explainable detection of android malware in your pocket. , author=. Ndss , volume=. 2014 , organization=

  13. [13]

    Empirical Software Engineering , volume=

    Empirical study of android repackaged applications , author=. Empirical Software Engineering , volume=. 2019 , publisher=

  14. [14]

    The 2014 ACM international conference on Measurement and modeling of computer systems , pages=

    A measurement study of google play , author=. The 2014 ACM international conference on Measurement and modeling of computer systems , pages=

  15. [15]

    GraphCodeBERT: Pre-training Code Representations with Data Flow

    Graphcodebert: Pre-training code representations with data flow , author=. arXiv preprint arXiv:2009.08366 , year=

  16. [16]

    Findings of the association for computational linguistics: EMNLP 2020 , pages=

    Codebert: A pre-trained model for programming and natural languages , author=. Findings of the association for computational linguistics: EMNLP 2020 , pages=

  17. [17]

    2024 , howpublished =

    Introducing the Model Context Protocol , author =. 2024 , howpublished =

  18. [18]

    2025 , howpublished =

    Equipping Agents for the Real World with Agent Skills , author =. 2025 , howpublished =

  19. [20]

    The twelfth international conference on learning representations , year=

    Toolllm: Facilitating large language models to master 16000+ real-world apis , author=. The twelfth international conference on learning representations , year=

  20. [21]

    International Conference on Learning Representations (ICLR) , year=

    ReAct: Synergizing Reasoning and Acting in Language Models , author=. International Conference on Learning Representations (ICLR) , year=

  21. [22]

    Advances in neural information processing systems , volume=

    Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=

  22. [23]

    Advances in Neural Information Processing Systems , volume=

    Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face , author=. Advances in Neural Information Processing Systems , volume=

  23. [24]

    Advances in neural information processing systems , volume=

    Toolformer: Language models can teach themselves to use tools , author=. Advances in neural information processing systems , volume=

  24. [25]

    2012 IEEE symposium on security and privacy , pages=

    Dissecting android malware: Characterization and evolution , author=. 2012 IEEE symposium on security and privacy , pages=. 2012 , organization=

  25. [26]

    2025 , howpublished =

    Llama 4: Open Foundation Models for Multimodal and Efficient AI , author =. 2025 , howpublished =

  26. [27]

    Advances in Neural Information Processing Systems , year=

    Webshop: Towards scalable real-world web interaction with grounded language agents , author=. Advances in Neural Information Processing Systems , year=

  27. [28]

    The Twelfth International Conference on Learning Representations , year=

    AgentBench: Evaluating LLMs as Agents , author=. The Twelfth International Conference on Learning Representations , year=

  28. [29]

    The twelfth international conference on learning representations , year=

    Swe-bench: Can language models resolve real-world github issues? , author=. The twelfth international conference on learning representations , year=

  29. [30]

    Advances in Neural Information Processing Systems , year=

    Gorilla: Large language model connected with massive apis , author=. Advances in Neural Information Processing Systems , year=

  30. [31]

    IEEE transactions on software engineering , year=

    CCFinder: A multilinguistic token-based code clone detection system for large scale source code , author=. IEEE transactions on software engineering , year=

  31. [32]

    29th International Conference on Software Engineering (ICSE'07) , year=

    Deckard: Scalable and accurate tree-based detection of code clones , author=. 29th International Conference on Software Engineering (ICSE'07) , year=

  32. [33]

    Proceedings of the 38th international conference on software engineering , year=

    Sourcerercc: Scaling code clone detection to big-code , author=. Proceedings of the 38th international conference on software engineering , year=

  33. [34]

    IEEE Transactions on software Engineering , year=

    CP-Miner: Finding copy-paste and related bugs in large-scale software code , author=. IEEE Transactions on software Engineering , year=

  34. [35]

    Proceedings of the second ACM conference on Data and Application Security and Privacy , year=

    Detecting repackaged smartphone applications in third-party android marketplaces , author=. Proceedings of the second ACM conference on Data and Application Security and Privacy , year=

  35. [36]

    European Symposium on Research in Computer Security , year=

    Attack of the clones: Detecting cloned applications on android markets , author=. European Symposium on Research in Computer Security , year=

  36. [37]

    European Symposium on Research in Computer Security , year=

    Andarwin: Scalable detection of semantically similar android applications , author=. European Symposium on Research in Computer Security , year=

  37. [38]

    MCP.so , year = 2025, howpublished =

  38. [39]

    MCPServers.org , year = 2025, howpublished =

  39. [40]

    MCP Market , title =

  40. [41]

    Introducing the model context protocol

    Anthropic . Introducing the model context protocol. https://www.anthropic.com/news/model-context-protocol, 2024

  41. [42]

    Equipping agents for the real world with agent skills

    Anthropic . Equipping agents for the real world with agent skills. https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills, 2025

  42. [43]

    Clone detection using abstract syntax trees

    Ira D Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant'Anna, and Lorraine Bier. Clone detection using abstract syntax trees. In Proceedings. International Conference on Software Maintenance, 1998

  43. [44]

    Comparison and evaluation of clone detection tools

    Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. Comparison and evaluation of clone detection tools. IEEE Transactions on software engineering, 2007

  44. [45]

    Attack of the clones: Detecting cloned applications on android markets

    Jonathan Crussell, Clint Gibler, and Hao Chen. Attack of the clones: Detecting cloned applications on android markets. In European Symposium on Research in Computer Security, 2012

  45. [46]

    Andarwin: Scalable detection of semantically similar android applications

    Jonathan Crussell, Clint Gibler, and Hao Chen. Andarwin: Scalable detection of semantically similar android applications. In European Symposium on Research in Computer Security, 2013

  46. [47]

    Deckard: Scalable and accurate tree-based detection of code clones

    Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In 29th International Conference on Software Engineering (ICSE'07), 2007

  47. [48]

    Swe-bench: Can language models resolve real-world github issues? In The twelfth international conference on learning representations, 2023

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. Swe-bench: Can language models resolve real-world github issues? In The twelfth international conference on learning representations, 2023

  48. [49]

    Do code clones matter? In 2009 IEEE 31st International Conference on Software Engineering, pages 485--495

    Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner. Do code clones matter? In 2009 IEEE 31st International Conference on Software Engineering, pages 485--495. IEEE, 2009

  49. [50]

    Ccfinder: A multilinguistic token-based code clone detection system for large scale source code

    Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. Ccfinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE transactions on software engineering, 2002

  50. [51]

    Identifying almost identical files using context triggered piecewise hashing

    Jesse Kornblum. Identifying almost identical files using context triggered piecewise hashing. Digital investigation, 3: 0 91--97, 2006

  51. [52]

    Survey of research on software clones

    Rainer Koschke. Survey of research on software clones. 2007

  52. [53]

    Cp-miner: Finding copy-paste and related bugs in large-scale software code

    Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on software Engineering, 2006

  53. [54]

    Agentbench: Evaluating llms as agents

    Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agentbench: Evaluating llms as agents. In The Twelfth International Conference on Learning Representations, 2024

  54. [55]

    Mcp market

    MCP Market. Mcp market. https://mcpmarket.com/, 2025

  55. [56]

    Llama 4: Open foundation models for multimodal and efficient ai

    Meta AI . Llama 4: Open foundation models for multimodal and efficient ai. https://ai.meta.com/llama/, 2025. Accessed: 2026-05-06

  56. [57]

    Gorilla: Large language model connected with massive apis

    Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis. In Advances in Neural Information Processing Systems, 2024

  57. [58]

    Toolllm: Facilitating large language models to master 16000+ real-world apis

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. Toolllm: Facilitating large language models to master 16000+ real-world apis. In The twelfth international conference on learning representations, 2023

  58. [59]

    Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

    Chanchal K Roy, James R Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of computer programming, 2009

  59. [60]

    A survey on software clone detection research

    Chanchal Kumar Roy and James R Cordy. A survey on software clone detection research. Queen’s School of computing TR, 2007

  60. [61]

    Sourcerercc: Scaling code clone detection to big-code

    Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. Sourcerercc: Scaling code clone detection to big-code. In Proceedings of the 38th international conference on software engineering, 2016

  61. [62]

    Toolformer: Language models can teach themselves to use tools

    Timo Schick, Jane Dwivedi-Yu, Roberto Dess \` , Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in neural information processing systems, 36: 0 68539--68551, 2023

  62. [63]

    Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face

    Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Processing Systems, 36: 0 38154--38180, 2023

  63. [64]

    Reflexion: Language agents with verbal reinforcement learning

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems, 36: 0 8634--8652, 2023

  64. [65]

    Skillsmp

    SkillsMP. Skillsmp. https://skillsmp.com/, 2025

  65. [66]

    ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

    Qiaoyu Tang, Ziliang Deng, Hongyu Lin, Xianpei Han, Qiao Liang, Boxi Cao, and Le Sun. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301, 2023

  66. [67]

    Mcpservers.org

    MCPServers.org. Mcpservers.org. https://mcpservers.org/, 2025

  67. [68]

    MCP.so. Mcp.so. https://mcp.so/, 2025

  68. [69]

    Webshop: Towards scalable real-world web interaction with grounded language agents

    Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents. In Advances in Neural Information Processing Systems, 2022

  69. [70]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023

  70. [71]

    Detecting repackaged smartphone applications in third-party android marketplaces

    Wu Zhou, Yajin Zhou, Xuxian Jiang, and Peng Ning. Detecting repackaged smartphone applications in third-party android marketplaces. In Proceedings of the second ACM conference on Data and Application Security and Privacy, 2012

  71. [72]

    Dissecting android malware: Characterization and evolution

    Yajin Zhou and Xuxian Jiang. Dissecting android malware: Characterization and evolution. In 2012 IEEE symposium on security and privacy, pages 95--109. IEEE, 2012