Evaluating Tool Cloning in Agentic-AI Ecosystems

David Jiang; Neil Gong; Taein Kim; Yuepeng Hu; Yuqi Jia

arxiv: 2605.09817 · v2 · pith:TAVMBJ67new · submitted 2026-05-10 · 💻 cs.SE · cs.CR

Evaluating Tool Cloning in Agentic-AI Ecosystems

Taein Kim , David Jiang , Yuepeng Hu , Yuqi Jia , Neil Gong This is my paper

Pith reviewed 2026-05-20 22:26 UTC · model grok-4.3

classification 💻 cs.SE cs.CR

keywords tool cloningagentic AILLM agentscode duplicationsimilarity metricsMCP ecosystembenchmark contaminationsoftware provenance

0 comments

The pith

Tool cloning is pervasive in agentic AI ecosystems, with most high-similarity repository pairs confirmed as clones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Agent tools let LLM agents reach external services, yet marketplaces may contain many cloned or lightly modified versions that inflate apparent diversity. The paper assembles a dataset of 8,861 repositories containing over 100,000 tools and runs pairwise similarity checks with lexical and fuzzy metrics across MCP and Skills platforms. Manual checks on sampled high-similarity pairs show that 60 percent of high-Jaccard cases and 85 percent of high-ssdeep cases in the MCP ecosystem are genuine clones. These findings mean raw tool counts overstate ecosystem variety and can introduce hidden duplication into benchmarks, security analyses, and generalization tests.

Core claim

Cloning is not an isolated artifact: high-similarity regions appear across comparison settings, and 60% of high-Jaccard candidates and 85% of high-ssdeep candidates in the MCP ecosystem are manually verified as clones. These results indicate that tool cloning is a pervasive and severe source of hidden duplication in agent-tool ecosystems.

What carries the argument

repository-level auditing pipeline that computes pairwise similarity using complementary lexical (Jaccard) and fuzzy-structural (ssdeep) metrics on extracted tool code

If this is right

Raw counts of tools in marketplaces substantially overstate actual ecosystem diversity.
Benchmark splits that ignore repository provenance become contaminated by duplicate implementations.
Vulnerable code can propagate more widely through cloned tools.
Measurements of tool-use generalization become biased when cloned variants are treated as independent.
Provenance, attribution, and intellectual-property questions arise for derived tools.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Evaluation protocols for agent tool use should incorporate similarity-based deduplication before reporting performance numbers.
Marketplace operators could add automated clone detection to surface derivative tools for users.
Security scanning efforts would benefit from treating high-similarity clusters as single entities rather than independent codebases.

Load-bearing premise

The 100 sampled high-similarity pairs per ecosystem represent the full distribution of clones and manual review reliably separates true cloning from coincidental similarity or shared templates.

What would settle it

A full manual audit of every high-similarity pair that finds verification rates well below 60 percent for Jaccard and 85 percent for ssdeep would falsify the claim of pervasive cloning.

Figures

Figures reproduced from arXiv: 2605.09817 by David Jiang, Neil Gong, Taein Kim, Yuepeng Hu, Yuqi Jia.

**Figure 2.** Figure 2: Functionality and description-space analysis of MCP tools. (a) MCP functionality dis [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Developer contribution distributions in the MCP and Skills ecosystems. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Pairwise repository similarity distributions across three comparison groups. The top row [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of MCP and Skills repository sizes measured by normalized source tokens. [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Distribution of tool counts for the top 40 authors in the MCP tool ecosystem. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of skill counts for the top 40 authors in the Skills tool ecosystem. [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Log-log distributions of developer contribution frequency. (a) MCP ecosystem. (b) Skills [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: Distribution of MCP and Skills tools for authors present in both ecosystems (log scale). [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Functionality and description-space analysis of Skills. (a) Skills functionality distribu [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗

read the original abstract

Agent tools are becoming a core interface through which LLM agents access external data, services, and execution environments. As these tools are distributed through public marketplaces, raw tool counts may substantially overstate ecosystem diversity if many repositories are cloned, lightly modified, or derived from shared templates. Such hidden duplication can contaminate benchmark splits, propagate vulnerable implementations, bias measurements of tool-use generalization, and raise provenance, attribution, and intellectual-property concerns. We present, to our knowledge, the first large-scale measurement study of tool cloning in agentic AI ecosystems. We curate a unified dataset from multiple public platforms, covering 7,508 Model Context Protocol (MCP) repositories with 87,564 extracted tools and 1,353 Skills repositories with 12,447 tools, for a total of 8,861 repositories and 100,011 tool entries. To measure implementation-level duplication, we build a repository-level auditing pipeline using complementary lexical and fuzzy-structural similarity metrics, and compute pairwise similarity across MCP-to-MCP, Skills-to-Skills, and MCP-to-Skills repository pairs. We further manually verify 100 sampled pairs per MCP and Skills ecosystem across similarity-score buckets to calibrate how often high similarity reflects true code cloning. Our analysis shows that cloning is not an isolated artifact: high-similarity regions appear across comparison settings, and 60\% of high-Jaccard candidates and 85\% of high-ssdeep candidates in the MCP ecosystem are manually verified as clones. These results indicate that tool cloning is a pervasive and severe source of hidden duplication in agent-tool ecosystems. They further suggest that agent-tool datasets and benchmarks should account for repository provenance and implementation similarity when measuring tool diversity or constructing evaluation splits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first large-scale numbers on tool cloning in public agentic AI repos, with manual checks backing up high duplication rates, though sampling and verification details are thin.

read the letter

Hey, The punchline is that this paper finds tool cloning to be pretty common in these agentic AI tool repositories, with manual checks confirming 60% of high Jaccard similarity pairs and 85% of high ssdeep ones as actual clones in the MCP set. It's the first study to measure this at scale in this specific area. What stands out as new is the curation of a big unified dataset from public platforms: 7508 MCP repos with over 87k tools and 1353 Skills repos with 12k tools. They run pairwise similarity using both lexical and fuzzy metrics across MCP-MCP, Skills-Skills, and cross comparisons. The manual verification of 100 sampled pairs per ecosystem across score buckets gives some calibration to turn similarity scores into clone rates. This addresses a practical issue for benchmarks and security in growing agentic systems. The paper does well by using complementary metrics to catch different kinds of duplication and by including manual review instead of relying only on thresholds. That makes the pervasiveness claim more believable than pure automation would. The soft spots are around the manual verification step. We don't get the exact similarity score buckets, the distribution of candidates in each, or any measure of agreement between the people doing the checks. The sampling is described as across buckets, but without more on how it was done or the verification rubric, it's possible the rates don't fully generalize. This is a moderate issue for trusting the exact percentages, but the overall finding of high duplication regions still seems supported by the data they have. This kind of work is for researchers building or evaluating agent tools, or those concerned with data quality in AI ecosystems. A reader looking for empirical evidence on duplication in code repos for LLMs would get useful numbers and a starting dataset. It deserves a serious referee to sort out the method details and see if the conclusions hold with more transparency. I'd recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript presents the first large-scale empirical measurement of tool cloning in agentic-AI ecosystems. It curates a unified dataset of 8,861 repositories and 100,011 tools from the Model Context Protocol (MCP) and Skills platforms, computes pairwise repository-level similarities using complementary lexical (Jaccard) and fuzzy-structural (ssdeep) metrics across MCP-to-MCP, Skills-to-Skills, and cross-ecosystem pairs, and manually verifies 100 sampled high-similarity pairs per ecosystem across similarity-score buckets. The central finding is that cloning is pervasive rather than isolated, with 60% of high-Jaccard candidates and 85% of high-ssdeep candidates in the MCP ecosystem manually confirmed as clones, implying that raw tool counts overstate diversity and that benchmarks must account for provenance and implementation similarity.

Significance. If the sampling and verification procedures prove robust, the work supplies a valuable baseline for an emerging problem at the intersection of software engineering, AI agents, and tool ecosystems. The scale of the curated dataset, the use of multiple complementary similarity metrics, and the explicit linkage to downstream concerns (benchmark contamination, vulnerability propagation, IP) are clear strengths. The empirical focus on hidden duplication addresses a practical gap that existing tool-use literature has largely overlooked.

major comments (2)

Abstract and auditing-pipeline description: The headline clone rates (60% high-Jaccard, 85% high-ssdeep in MCP) are computed solely from the 100 manually inspected pairs per ecosystem. The text states that sampling occurs “across similarity-score buckets” but supplies neither the bucket boundaries, the total number of candidate pairs falling into each bucket, the exact sampling procedure within buckets, nor the verification rubric used to distinguish true implementation clones from shared templates or coincidental similarity. Because these rates are the primary quantitative support for the pervasiveness claim, the absence of this information prevents assessment of representativeness and reproducibility.
Manual-verification procedure: No inter-rater agreement statistic, number of annotators, or explicit decision criteria for labeling a pair as a clone are reported. Without these details it is impossible to gauge the reliability of the 60% and 85% figures or to determine whether the manual labels systematically separate cloning from boilerplate reuse.

minor comments (2)

The abstract would be clearer if it stated the concrete similarity thresholds that define the “high-Jaccard” and “high-ssdeep” buckets from which the 100 pairs were drawn.
The manuscript should report the total number of pairwise comparisons performed in each setting (MCP-to-MCP, Skills-to-Skills, MCP-to-Skills) so readers can contextualize the scale of the high-similarity regions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and constructive suggestions. We agree that additional details on the sampling and verification procedures are important for assessing the robustness of our findings. We address each major comment below and will incorporate the requested information into the revised manuscript.

read point-by-point responses

Referee: Abstract and auditing-pipeline description: The headline clone rates (60% high-Jaccard, 85% high-ssdeep in MCP) are computed solely from the 100 manually inspected pairs per ecosystem. The text states that sampling occurs “across similarity-score buckets” but supplies neither the bucket boundaries, the total number of candidate pairs falling into each bucket, the exact sampling procedure within buckets, nor the verification rubric used to distinguish true implementation clones from shared templates or coincidental similarity. Because these rates are the primary quantitative support for the pervasiveness claim, the absence of this information prevents assessment of representativeness and reproducibility.

Authors: We acknowledge that these methodological details were not included in the submitted manuscript. To address this, we will add a new subsection in the Methods section that specifies the similarity-score buckets used (for example, for Jaccard similarity: [0.7, 0.8), [0.8, 0.9), [0.9, 1.0]; similar for ssdeep), the total number of repository pairs in each bucket for the MCP ecosystem, the stratified sampling approach (selecting a fixed number of pairs from each bucket to ensure coverage across the similarity range), and the detailed verification rubric. The rubric defines a true clone as a pair where one repository is a direct copy or minor modification of the other, excluding cases of shared templates or coincidental similarity based on code structure and functionality. This will allow readers to evaluate the representativeness of the 100 sampled pairs. revision: yes
Referee: Manual-verification procedure: No inter-rater agreement statistic, number of annotators, or explicit decision criteria for labeling a pair as a clone are reported. Without these details it is impossible to gauge the reliability of the 60% and 85% figures or to determine whether the manual labels systematically separate cloning from boilerplate reuse.

Authors: We agree that reporting these details is essential for transparency. The manual verification was conducted by two independent annotators (both authors with expertise in software engineering), who reviewed each pair and labeled it as clone or not based on explicit criteria. Disagreements were resolved through discussion with a third author. We will report the inter-rater agreement using Cohen's kappa statistic in the revised version. The decision criteria include: (1) substantial code overlap (>60% after removing comments and whitespace), (2) evidence of direct copying such as identical file structures and function names with minor variable changes, or (3) one repository being a fork with limited modifications. Pairs with only shared dependencies or boilerplate code were not labeled as clones. We will include these details and the agreement statistic in the updated manuscript. revision: yes

Circularity Check

0 steps flagged

Empirical measurement study with no derivations or self-referential reductions

full rationale

The paper is a large-scale empirical measurement study that curates datasets, applies standard lexical and fuzzy similarity metrics (Jaccard, ssdeep), computes pairwise scores, and reports direct counts from manual verification of 100 sampled high-similarity pairs per ecosystem. No equations, fitted parameters, predictions, or derivations appear in the provided text. The reported clone fractions (60% Jaccard, 85% ssdeep) are literal outcomes of the verification step rather than quantities forced by construction from the similarity scores themselves. No self-citations, uniqueness theorems, or ansatzes are invoked to justify core claims. The analysis therefore remains self-contained and does not reduce to its inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the representativeness of the curated dataset and the validity of similarity metrics plus manual review as proxies for cloning. No free parameters are fitted to produce the headline percentages; thresholds appear chosen for analysis rather than optimized to the result.

axioms (2)

domain assumption High lexical or fuzzy-structural similarity between repositories indicates cloning or derivation rather than independent implementation of similar functionality.
Invoked when interpreting high-Jaccard and high-ssdeep scores as evidence of cloning.
domain assumption The public marketplaces sampled are representative of the broader agent-tool ecosystem.
Used to generalize from the 8,861 repositories to the ecosystem as a whole.

pith-pipeline@v0.9.0 · 5840 in / 1421 out tokens · 23191 ms · 2026-05-20T22:26:53.766654+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We build a repository-level auditing pipeline using complementary lexical and fuzzy-structural similarity metrics, and compute pairwise similarity across MCP-to-MCP, Skills-to-Skills, and MCP-to-Skills repository pairs. We further manually verify 100 sampled pairs per MCP and Skills ecosystem across similarity-score buckets

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 2 internal anchors

[1]

Proceedings

Clone detection using abstract syntax trees , author=. Proceedings. International Conference on Software Maintenance , year=

work page
[2]

Queen’s School of computing TR , year=

A survey on software clone detection research , author=. Queen’s School of computing TR , year=

work page
[3]

Science of computer programming , year=

Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , author=. Science of computer programming , year=

work page
[4]

2007 , publisher=

Survey of research on software clones , author=. 2007 , publisher=

work page 2007
[5]

IEEE Transactions on software engineering , year=

Comparison and evaluation of clone detection tools , author=. IEEE Transactions on software engineering , year=

work page
[6]

2009 IEEE 31st International Conference on Software Engineering , pages=

Do code clones matter? , author=. 2009 IEEE 31st International Conference on Software Engineering , pages=. 2009 , organization=

work page 2009
[7]

, author=

On finding duplication and near-duplication in large software systems. , author=. wcre , volume=

work page
[8]

1 , author=

The distribution of the flora in the alpine zone. 1 , author=. New phytologist , volume=. 1912 , publisher=

work page 1912
[9]

Digital investigation , volume=

Identifying almost identical files using context triggered piecewise hashing , author=. Digital investigation , volume=. 2006 , publisher=

work page 2006
[10]

, author=

A comparison of string distance metrics for name-matching tasks. , author=. IIWeb , volume=

work page
[11]

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

Finding near-duplicate web pages: a large-scale evaluation of algorithms , author=. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

work page
[12]

, author=

Drebin: Effective and explainable detection of android malware in your pocket. , author=. Ndss , volume=. 2014 , organization=

work page 2014
[13]

Empirical Software Engineering , volume=

Empirical study of android repackaged applications , author=. Empirical Software Engineering , volume=. 2019 , publisher=

work page 2019
[14]

The 2014 ACM international conference on Measurement and modeling of computer systems , pages=

A measurement study of google play , author=. The 2014 ACM international conference on Measurement and modeling of computer systems , pages=

work page 2014
[15]

GraphCodeBERT: Pre-training Code Representations with Data Flow

Graphcodebert: Pre-training code representations with data flow , author=. arXiv preprint arXiv:2009.08366 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2009
[16]

Findings of the association for computational linguistics: EMNLP 2020 , pages=

Codebert: A pre-trained model for programming and natural languages , author=. Findings of the association for computational linguistics: EMNLP 2020 , pages=

work page 2020
[17]

2024 , howpublished =

Introducing the Model Context Protocol , author =. 2024 , howpublished =

work page 2024
[18]

2025 , howpublished =

Equipping Agents for the Real World with Agent Skills , author =. 2025 , howpublished =

work page 2025
[20]

The twelfth international conference on learning representations , year=

Toolllm: Facilitating large language models to master 16000+ real-world apis , author=. The twelfth international conference on learning representations , year=

work page
[21]

International Conference on Learning Representations (ICLR) , year=

ReAct: Synergizing Reasoning and Acting in Language Models , author=. International Conference on Learning Representations (ICLR) , year=

work page
[22]

Advances in neural information processing systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=

work page
[23]

Advances in Neural Information Processing Systems , volume=

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face , author=. Advances in Neural Information Processing Systems , volume=

work page
[24]

Advances in neural information processing systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in neural information processing systems , volume=

work page
[25]

2012 IEEE symposium on security and privacy , pages=

Dissecting android malware: Characterization and evolution , author=. 2012 IEEE symposium on security and privacy , pages=. 2012 , organization=

work page 2012
[26]

2025 , howpublished =

Llama 4: Open Foundation Models for Multimodal and Efficient AI , author =. 2025 , howpublished =

work page 2025
[27]

Advances in Neural Information Processing Systems , year=

Webshop: Towards scalable real-world web interaction with grounded language agents , author=. Advances in Neural Information Processing Systems , year=

work page
[28]

The Twelfth International Conference on Learning Representations , year=

AgentBench: Evaluating LLMs as Agents , author=. The Twelfth International Conference on Learning Representations , year=

work page
[29]

The twelfth international conference on learning representations , year=

Swe-bench: Can language models resolve real-world github issues? , author=. The twelfth international conference on learning representations , year=

work page
[30]

Advances in Neural Information Processing Systems , year=

Gorilla: Large language model connected with massive apis , author=. Advances in Neural Information Processing Systems , year=

work page
[31]

IEEE transactions on software engineering , year=

CCFinder: A multilinguistic token-based code clone detection system for large scale source code , author=. IEEE transactions on software engineering , year=

work page
[32]

29th International Conference on Software Engineering (ICSE'07) , year=

Deckard: Scalable and accurate tree-based detection of code clones , author=. 29th International Conference on Software Engineering (ICSE'07) , year=

work page
[33]

Proceedings of the 38th international conference on software engineering , year=

Sourcerercc: Scaling code clone detection to big-code , author=. Proceedings of the 38th international conference on software engineering , year=

work page
[34]

IEEE Transactions on software Engineering , year=

CP-Miner: Finding copy-paste and related bugs in large-scale software code , author=. IEEE Transactions on software Engineering , year=

work page
[35]

Proceedings of the second ACM conference on Data and Application Security and Privacy , year=

Detecting repackaged smartphone applications in third-party android marketplaces , author=. Proceedings of the second ACM conference on Data and Application Security and Privacy , year=

work page
[36]

European Symposium on Research in Computer Security , year=

Attack of the clones: Detecting cloned applications on android markets , author=. European Symposium on Research in Computer Security , year=

work page
[37]

European Symposium on Research in Computer Security , year=

Andarwin: Scalable detection of semantically similar android applications , author=. European Symposium on Research in Computer Security , year=

work page
[38]

MCP.so , year = 2025, howpublished =

work page 2025
[39]

MCPServers.org , year = 2025, howpublished =

work page 2025
[40]

MCP Market , title =

work page
[41]

Introducing the model context protocol

Anthropic . Introducing the model context protocol. https://www.anthropic.com/news/model-context-protocol, 2024

work page 2024
[42]

Equipping agents for the real world with agent skills

Anthropic . Equipping agents for the real world with agent skills. https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills, 2025

work page 2025
[43]

Clone detection using abstract syntax trees

Ira D Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant'Anna, and Lorraine Bier. Clone detection using abstract syntax trees. In Proceedings. International Conference on Software Maintenance, 1998

work page 1998
[44]

Comparison and evaluation of clone detection tools

Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. Comparison and evaluation of clone detection tools. IEEE Transactions on software engineering, 2007

work page 2007
[45]

Attack of the clones: Detecting cloned applications on android markets

Jonathan Crussell, Clint Gibler, and Hao Chen. Attack of the clones: Detecting cloned applications on android markets. In European Symposium on Research in Computer Security, 2012

work page 2012
[46]

Andarwin: Scalable detection of semantically similar android applications

Jonathan Crussell, Clint Gibler, and Hao Chen. Andarwin: Scalable detection of semantically similar android applications. In European Symposium on Research in Computer Security, 2013

work page 2013
[47]

Deckard: Scalable and accurate tree-based detection of code clones

Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In 29th International Conference on Software Engineering (ICSE'07), 2007

work page 2007
[48]

Swe-bench: Can language models resolve real-world github issues? In The twelfth international conference on learning representations, 2023

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. Swe-bench: Can language models resolve real-world github issues? In The twelfth international conference on learning representations, 2023

work page 2023
[49]

Do code clones matter? In 2009 IEEE 31st International Conference on Software Engineering, pages 485--495

Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner. Do code clones matter? In 2009 IEEE 31st International Conference on Software Engineering, pages 485--495. IEEE, 2009

work page 2009
[50]

Ccfinder: A multilinguistic token-based code clone detection system for large scale source code

Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. Ccfinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE transactions on software engineering, 2002

work page 2002
[51]

Identifying almost identical files using context triggered piecewise hashing

Jesse Kornblum. Identifying almost identical files using context triggered piecewise hashing. Digital investigation, 3: 0 91--97, 2006

work page 2006
[52]

Survey of research on software clones

Rainer Koschke. Survey of research on software clones. 2007

work page 2007
[53]

Cp-miner: Finding copy-paste and related bugs in large-scale software code

Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on software Engineering, 2006

work page 2006
[54]

Agentbench: Evaluating llms as agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agentbench: Evaluating llms as agents. In The Twelfth International Conference on Learning Representations, 2024

work page 2024
[55]

Mcp market

MCP Market. Mcp market. https://mcpmarket.com/, 2025

work page 2025
[56]

Llama 4: Open foundation models for multimodal and efficient ai

Meta AI . Llama 4: Open foundation models for multimodal and efficient ai. https://ai.meta.com/llama/, 2025. Accessed: 2026-05-06

work page 2025
[57]

Gorilla: Large language model connected with massive apis

Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis. In Advances in Neural Information Processing Systems, 2024

work page 2024
[58]

Toolllm: Facilitating large language models to master 16000+ real-world apis

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. Toolllm: Facilitating large language models to master 16000+ real-world apis. In The twelfth international conference on learning representations, 2023

work page 2023
[59]

Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Chanchal K Roy, James R Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of computer programming, 2009

work page 2009
[60]

A survey on software clone detection research

Chanchal Kumar Roy and James R Cordy. A survey on software clone detection research. Queen’s School of computing TR, 2007

work page 2007
[61]

Sourcerercc: Scaling code clone detection to big-code

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. Sourcerercc: Scaling code clone detection to big-code. In Proceedings of the 38th international conference on software engineering, 2016

work page 2016
[62]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dess \` , Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in neural information processing systems, 36: 0 68539--68551, 2023

work page 2023
[63]

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face

Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Processing Systems, 36: 0 38154--38180, 2023

work page 2023
[64]

Reflexion: Language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems, 36: 0 8634--8652, 2023

work page 2023
[65]

Skillsmp

SkillsMP. Skillsmp. https://skillsmp.com/, 2025

work page 2025
[66]

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Qiaoyu Tang, Ziliang Deng, Hongyu Lin, Xianpei Han, Qiao Liang, Boxi Cao, and Le Sun. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[67]

Mcpservers.org

MCPServers.org. Mcpservers.org. https://mcpservers.org/, 2025

work page 2025
[68]

MCP.so. Mcp.so. https://mcp.so/, 2025

work page 2025
[69]

Webshop: Towards scalable real-world web interaction with grounded language agents

Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents. In Advances in Neural Information Processing Systems, 2022

work page 2022
[70]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023

work page 2023
[71]

Detecting repackaged smartphone applications in third-party android marketplaces

Wu Zhou, Yajin Zhou, Xuxian Jiang, and Peng Ning. Detecting repackaged smartphone applications in third-party android marketplaces. In Proceedings of the second ACM conference on Data and Application Security and Privacy, 2012

work page 2012
[72]

Dissecting android malware: Characterization and evolution

Yajin Zhou and Xuxian Jiang. Dissecting android malware: Characterization and evolution. In 2012 IEEE symposium on security and privacy, pages 95--109. IEEE, 2012

work page 2012

[1] [1]

Proceedings

Clone detection using abstract syntax trees , author=. Proceedings. International Conference on Software Maintenance , year=

work page

[2] [2]

Queen’s School of computing TR , year=

A survey on software clone detection research , author=. Queen’s School of computing TR , year=

work page

[3] [3]

Science of computer programming , year=

Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , author=. Science of computer programming , year=

work page

[4] [4]

2007 , publisher=

Survey of research on software clones , author=. 2007 , publisher=

work page 2007

[5] [5]

IEEE Transactions on software engineering , year=

Comparison and evaluation of clone detection tools , author=. IEEE Transactions on software engineering , year=

work page

[6] [6]

2009 IEEE 31st International Conference on Software Engineering , pages=

Do code clones matter? , author=. 2009 IEEE 31st International Conference on Software Engineering , pages=. 2009 , organization=

work page 2009

[7] [7]

, author=

On finding duplication and near-duplication in large software systems. , author=. wcre , volume=

work page

[8] [8]

1 , author=

The distribution of the flora in the alpine zone. 1 , author=. New phytologist , volume=. 1912 , publisher=

work page 1912

[9] [9]

Digital investigation , volume=

Identifying almost identical files using context triggered piecewise hashing , author=. Digital investigation , volume=. 2006 , publisher=

work page 2006

[10] [10]

, author=

A comparison of string distance metrics for name-matching tasks. , author=. IIWeb , volume=

work page

[11] [11]

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

Finding near-duplicate web pages: a large-scale evaluation of algorithms , author=. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval , pages=

work page

[12] [12]

, author=

Drebin: Effective and explainable detection of android malware in your pocket. , author=. Ndss , volume=. 2014 , organization=

work page 2014

[13] [13]

Empirical Software Engineering , volume=

Empirical study of android repackaged applications , author=. Empirical Software Engineering , volume=. 2019 , publisher=

work page 2019

[14] [14]

The 2014 ACM international conference on Measurement and modeling of computer systems , pages=

A measurement study of google play , author=. The 2014 ACM international conference on Measurement and modeling of computer systems , pages=

work page 2014

[15] [15]

GraphCodeBERT: Pre-training Code Representations with Data Flow

Graphcodebert: Pre-training code representations with data flow , author=. arXiv preprint arXiv:2009.08366 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2009

[16] [16]

Findings of the association for computational linguistics: EMNLP 2020 , pages=

Codebert: A pre-trained model for programming and natural languages , author=. Findings of the association for computational linguistics: EMNLP 2020 , pages=

work page 2020

[17] [17]

2024 , howpublished =

Introducing the Model Context Protocol , author =. 2024 , howpublished =

work page 2024

[18] [18]

2025 , howpublished =

Equipping Agents for the Real World with Agent Skills , author =. 2025 , howpublished =

work page 2025

[19] [20]

The twelfth international conference on learning representations , year=

Toolllm: Facilitating large language models to master 16000+ real-world apis , author=. The twelfth international conference on learning representations , year=

work page

[20] [21]

International Conference on Learning Representations (ICLR) , year=

ReAct: Synergizing Reasoning and Acting in Language Models , author=. International Conference on Learning Representations (ICLR) , year=

work page

[21] [22]

Advances in neural information processing systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=

work page

[22] [23]

Advances in Neural Information Processing Systems , volume=

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face , author=. Advances in Neural Information Processing Systems , volume=

work page

[23] [24]

Advances in neural information processing systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in neural information processing systems , volume=

work page

[24] [25]

2012 IEEE symposium on security and privacy , pages=

Dissecting android malware: Characterization and evolution , author=. 2012 IEEE symposium on security and privacy , pages=. 2012 , organization=

work page 2012

[25] [26]

2025 , howpublished =

Llama 4: Open Foundation Models for Multimodal and Efficient AI , author =. 2025 , howpublished =

work page 2025

[26] [27]

Advances in Neural Information Processing Systems , year=

Webshop: Towards scalable real-world web interaction with grounded language agents , author=. Advances in Neural Information Processing Systems , year=

work page

[27] [28]

The Twelfth International Conference on Learning Representations , year=

AgentBench: Evaluating LLMs as Agents , author=. The Twelfth International Conference on Learning Representations , year=

work page

[28] [29]

The twelfth international conference on learning representations , year=

Swe-bench: Can language models resolve real-world github issues? , author=. The twelfth international conference on learning representations , year=

work page

[29] [30]

Advances in Neural Information Processing Systems , year=

Gorilla: Large language model connected with massive apis , author=. Advances in Neural Information Processing Systems , year=

work page

[30] [31]

IEEE transactions on software engineering , year=

CCFinder: A multilinguistic token-based code clone detection system for large scale source code , author=. IEEE transactions on software engineering , year=

work page

[31] [32]

29th International Conference on Software Engineering (ICSE'07) , year=

Deckard: Scalable and accurate tree-based detection of code clones , author=. 29th International Conference on Software Engineering (ICSE'07) , year=

work page

[32] [33]

Proceedings of the 38th international conference on software engineering , year=

Sourcerercc: Scaling code clone detection to big-code , author=. Proceedings of the 38th international conference on software engineering , year=

work page

[33] [34]

IEEE Transactions on software Engineering , year=

CP-Miner: Finding copy-paste and related bugs in large-scale software code , author=. IEEE Transactions on software Engineering , year=

work page

[34] [35]

Proceedings of the second ACM conference on Data and Application Security and Privacy , year=

Detecting repackaged smartphone applications in third-party android marketplaces , author=. Proceedings of the second ACM conference on Data and Application Security and Privacy , year=

work page

[35] [36]

European Symposium on Research in Computer Security , year=

Attack of the clones: Detecting cloned applications on android markets , author=. European Symposium on Research in Computer Security , year=

work page

[36] [37]

European Symposium on Research in Computer Security , year=

Andarwin: Scalable detection of semantically similar android applications , author=. European Symposium on Research in Computer Security , year=

work page

[37] [38]

MCP.so , year = 2025, howpublished =

work page 2025

[38] [39]

MCPServers.org , year = 2025, howpublished =

work page 2025

[39] [40]

MCP Market , title =

work page

[40] [41]

Introducing the model context protocol

Anthropic . Introducing the model context protocol. https://www.anthropic.com/news/model-context-protocol, 2024

work page 2024

[41] [42]

Equipping agents for the real world with agent skills

Anthropic . Equipping agents for the real world with agent skills. https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills, 2025

work page 2025

[42] [43]

Clone detection using abstract syntax trees

Ira D Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant'Anna, and Lorraine Bier. Clone detection using abstract syntax trees. In Proceedings. International Conference on Software Maintenance, 1998

work page 1998

[43] [44]

Comparison and evaluation of clone detection tools

Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. Comparison and evaluation of clone detection tools. IEEE Transactions on software engineering, 2007

work page 2007

[44] [45]

Attack of the clones: Detecting cloned applications on android markets

Jonathan Crussell, Clint Gibler, and Hao Chen. Attack of the clones: Detecting cloned applications on android markets. In European Symposium on Research in Computer Security, 2012

work page 2012

[45] [46]

Andarwin: Scalable detection of semantically similar android applications

Jonathan Crussell, Clint Gibler, and Hao Chen. Andarwin: Scalable detection of semantically similar android applications. In European Symposium on Research in Computer Security, 2013

work page 2013

[46] [47]

Deckard: Scalable and accurate tree-based detection of code clones

Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In 29th International Conference on Software Engineering (ICSE'07), 2007

work page 2007

[47] [48]

Swe-bench: Can language models resolve real-world github issues? In The twelfth international conference on learning representations, 2023

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. Swe-bench: Can language models resolve real-world github issues? In The twelfth international conference on learning representations, 2023

work page 2023

[48] [49]

Do code clones matter? In 2009 IEEE 31st International Conference on Software Engineering, pages 485--495

Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner. Do code clones matter? In 2009 IEEE 31st International Conference on Software Engineering, pages 485--495. IEEE, 2009

work page 2009

[49] [50]

Ccfinder: A multilinguistic token-based code clone detection system for large scale source code

Toshihiro Kamiya, Shinji Kusumoto, and Katsuro Inoue. Ccfinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE transactions on software engineering, 2002

work page 2002

[50] [51]

Identifying almost identical files using context triggered piecewise hashing

Jesse Kornblum. Identifying almost identical files using context triggered piecewise hashing. Digital investigation, 3: 0 91--97, 2006

work page 2006

[51] [52]

Survey of research on software clones

Rainer Koschke. Survey of research on software clones. 2007

work page 2007

[52] [53]

Cp-miner: Finding copy-paste and related bugs in large-scale software code

Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Transactions on software Engineering, 2006

work page 2006

[53] [54]

Agentbench: Evaluating llms as agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. Agentbench: Evaluating llms as agents. In The Twelfth International Conference on Learning Representations, 2024

work page 2024

[54] [55]

Mcp market

MCP Market. Mcp market. https://mcpmarket.com/, 2025

work page 2025

[55] [56]

Llama 4: Open foundation models for multimodal and efficient ai

Meta AI . Llama 4: Open foundation models for multimodal and efficient ai. https://ai.meta.com/llama/, 2025. Accessed: 2026-05-06

work page 2025

[56] [57]

Gorilla: Large language model connected with massive apis

Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis. In Advances in Neural Information Processing Systems, 2024

work page 2024

[57] [58]

Toolllm: Facilitating large language models to master 16000+ real-world apis

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. Toolllm: Facilitating large language models to master 16000+ real-world apis. In The twelfth international conference on learning representations, 2023

work page 2023

[58] [59]

Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Chanchal K Roy, James R Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of computer programming, 2009

work page 2009

[59] [60]

A survey on software clone detection research

Chanchal Kumar Roy and James R Cordy. A survey on software clone detection research. Queen’s School of computing TR, 2007

work page 2007

[60] [61]

Sourcerercc: Scaling code clone detection to big-code

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. Sourcerercc: Scaling code clone detection to big-code. In Proceedings of the 38th international conference on software engineering, 2016

work page 2016

[61] [62]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dess \` , Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in neural information processing systems, 36: 0 68539--68551, 2023

work page 2023

[62] [63]

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face

Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. Advances in Neural Information Processing Systems, 36: 0 38154--38180, 2023

work page 2023

[63] [64]

Reflexion: Language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems, 36: 0 8634--8652, 2023

work page 2023

[64] [65]

Skillsmp

SkillsMP. Skillsmp. https://skillsmp.com/, 2025

work page 2025

[65] [66]

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Qiaoyu Tang, Ziliang Deng, Hongyu Lin, Xianpei Han, Qiao Liang, Boxi Cao, and Le Sun. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[66] [67]

Mcpservers.org

MCPServers.org. Mcpservers.org. https://mcpservers.org/, 2025

work page 2025

[67] [68]

MCP.so. Mcp.so. https://mcp.so/, 2025

work page 2025

[68] [69]

Webshop: Towards scalable real-world web interaction with grounded language agents

Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents. In Advances in Neural Information Processing Systems, 2022

work page 2022

[69] [70]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023

work page 2023

[70] [71]

Detecting repackaged smartphone applications in third-party android marketplaces

Wu Zhou, Yajin Zhou, Xuxian Jiang, and Peng Ning. Detecting repackaged smartphone applications in third-party android marketplaces. In Proceedings of the second ACM conference on Data and Application Security and Privacy, 2012

work page 2012

[71] [72]

Dissecting android malware: Characterization and evolution

Yajin Zhou and Xuxian Jiang. Dissecting android malware: Characterization and evolution. In 2012 IEEE symposium on security and privacy, pages 95--109. IEEE, 2012

work page 2012