pith. sign in

arxiv: 2509.24496 · v3 · submitted 2025-09-29 · 💻 cs.LG · cs.AI

LLM DNA: Tracing Model Evolution via Functional Representations

Pith reviewed 2026-05-18 12:16 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords LLM DNAmodel evolutionfunctional representationsphylogenetic treesfine-tuningmodel managementbi-Lipschitz embeddingtraining-free extraction
0
0 comments X

The pith

A low-dimensional representation of functional behavior encodes the evolutionary relationships among large language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper mathematically defines LLM DNA as a compact encoding that captures how models perform on various tasks. It proves that this representation follows rules similar to biological DNA, including inheritance from base models to fine-tuned versions and genetic determinism. From this, the authors develop a method to extract the DNA without any extra training or task-specific data. A sympathetic reader would care because it offers a way to organize the chaotic landscape of millions of LLMs by their relationships rather than just their sizes or benchmarks.

Core claim

We prove that LLM DNA satisfies inheritance and genetic determinism properties and establish the existence of DNA. Building on this theory, we derive a general, scalable, training-free pipeline for DNA extraction. In experiments across 305 LLMs, DNA aligns with prior studies and uncovers previously undocumented relationships, allowing construction of an evolutionary tree that reflects architectural and temporal progressions.

What carries the argument

LLM DNA as a low-dimensional bi-Lipschitz representation of functional behavior that carries the inheritance and evolutionary tracing properties.

If this is right

  • DNA extraction works across arbitrary model families and tokenizers without training.
  • Comparisons of DNA reveal undocumented relationships among LLMs.
  • The evolutionary tree built from DNA matches shifts to decoder-only architectures and temporal progression.
  • Different LLM families exhibit distinct evolutionary speeds.
  • Performance on specific tasks is superior or competitive to existing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the functional embedding holds, it could be applied to track evolution in other AI domains like vision or reinforcement learning models.
  • Model developers might use DNA similarity to detect unauthorized adaptations or clones of their models.
  • The approach suggests that model relationships can be studied independently of specific tasks or architectures.

Load-bearing premise

That a single low-dimensional bi-Lipschitz embedding of functional behavior exists and is sufficient to encode the evolutionary relationships induced by fine-tuning, distillation, or adaptation across arbitrary model families and tokenizers.

What would settle it

Observing that the extracted representations of parent and child models do not cluster together in the low-dimensional space according to known fine-tuning relationships would falsify the central claim.

Figures

Figures reproduced from arXiv: 2509.24496 by Bingsheng He, Haodong Zhao, Jizhou Guo, Qian Wang, Zhaomin Wu, Ziyang Wang.

Figure 1
Figure 1. Figure 1: Visualization of LLM DNA extraction workflow. [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: shows the DNA distribution of the eight models (four correlated, four independent) in [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of DNAs by t-SNE. Colors denote organizations releasing LLMs. Organi [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Phylogenetic Tree of LLM families built from DNA [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Phylogenetic tree of representative LLMs constructed using the Neighbor-Joining algo [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

The explosive growth of large language models (LLMs) has created a vast but opaque landscape: millions of models exist, yet their evolutionary relationships through fine-tuning, distillation, or adaptation are often undocumented or unclear, complicating LLM management. Existing methods are limited by task specificity, fixed model sets, or strict assumptions about tokenizers or architectures. Inspired by biological DNA, we address these limitations by mathematically defining LLM DNA as a low-dimensional, bi-Lipschitz representation of functional behavior. We prove that LLM DNA satisfies inheritance and genetic determinism properties and establish the existence of DNA. Building on this theory, we derive a general, scalable, training-free pipeline for DNA extraction. In experiments across 305 LLMs, DNA aligns with prior studies on limited subsets and achieves superior or competitive performance on specific tasks. Beyond these tasks, DNA comparisons uncover previously undocumented relationships among LLMs. We further construct the evolutionary tree of LLMs using phylogenetic algorithms, which align with shifts from encoder-decoder to decoder-only architectures, reflect temporal progression, and reveal distinct evolutionary speeds across LLM families.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes LLM DNA as a low-dimensional bi-Lipschitz functional representation of LLMs' behavior. It claims to mathematically define this representation, prove that it satisfies inheritance and genetic determinism properties, establish its existence, and derive a general, scalable, training-free pipeline for DNA extraction. Experiments on 305 LLMs demonstrate alignment with prior studies, superior performance on some tasks, discovery of undocumented relationships, and construction of an evolutionary tree consistent with architectural shifts and temporal progression.

Significance. If the theoretical foundations hold, this approach could offer a unified, architecture- and tokenizer-agnostic method for tracing LLM evolution, addressing limitations of task-specific or assumption-heavy existing methods. The scale of the experiments (305 models) and the training-free pipeline are notable strengths that could make the method practical for large-scale model management. The application of phylogenetic algorithms to reveal evolutionary patterns in LLMs is innovative.

major comments (2)
  1. [Abstract] Abstract: The abstract asserts proofs of inheritance, genetic determinism, and existence of LLM DNA but supplies no derivation steps, explicit definition of the functional representation, or details on how the bi-Lipschitz property is constructed or verified; this is load-bearing for the central claim that the representation encodes evolutionary relationships.
  2. [Theoretical sections (e.g., §3–4 on definition and existence)] Theoretical sections (e.g., §3–4 on definition and existence): The claimed low-dimensional bi-Lipschitz embedding requires a canonical metric on output distributions or behaviors that remains comparable across incompatible tokenizers and architectures, yet no explicit construction or bound showing dimension independence from model family is provided; without this the inheritance and genetic-determinism properties do not necessarily transfer.
minor comments (2)
  1. [Experiments section] Experiments section: Quantitative comparisons showing how DNA aligns with or exceeds prior studies on limited subsets should include explicit metrics and baselines for each task.
  2. [Evolutionary tree figures] Evolutionary tree figures: Labels for architectural transitions (encoder-decoder to decoder-only) and temporal markers could be added for clearer interpretation of the phylogenetic results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for acknowledging the significance of the theoretical foundations, the scale of the 305-model experiments, and the training-free pipeline. We address the major comments point by point below and will revise the manuscript to improve clarity on the abstract and theoretical details while preserving the core claims and results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The abstract asserts proofs of inheritance, genetic determinism, and existence of LLM DNA but supplies no derivation steps, explicit definition of the functional representation, or details on how the bi-Lipschitz property is constructed or verified; this is load-bearing for the central claim that the representation encodes evolutionary relationships.

    Authors: We agree that the abstract, constrained by length, does not include derivation steps or explicit constructions. These elements are developed in Sections 3 and 4, where the functional representation is defined, the bi-Lipschitz property is established, and the inheritance and genetic-determinism properties are proven. To better orient readers, we will revise the abstract to briefly reference the definition of the low-dimensional representation and the key properties, while retaining its summary character. revision: yes

  2. Referee: [Theoretical sections (e.g., §3–4 on definition and existence)] Theoretical sections (e.g., §3–4 on definition and existence): The claimed low-dimensional bi-Lipschitz embedding requires a canonical metric on output distributions or behaviors that remains comparable across incompatible tokenizers and architectures, yet no explicit construction or bound showing dimension independence from model family is provided; without this the inheritance and genetic-determinism properties do not necessarily transfer.

    Authors: We appreciate this observation on the necessity of a canonical, comparable metric. Section 3 defines the functional representation via a behavior-based metric intended to be architecture- and tokenizer-agnostic through a shared output space. Nevertheless, we concur that an explicit construction of this metric together with dimension-independence bounds would make the transfer of the inheritance and genetic-determinism properties more transparent. We will revise Sections 3 and 4 to supply the detailed metric construction and the required bounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theoretical claims rest on independent mathematical definitions and empirical validation.

full rationale

The paper defines LLM DNA as a low-dimensional bi-Lipschitz functional representation, proves inheritance/genetic-determinism properties, and derives an extraction pipeline. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or ansatz smuggled from prior work by the same authors. The existence claim is presented as a mathematical result rather than a renaming or self-definition, and experiments across 305 models supply external validation independent of the core definitions. The derivation chain is therefore self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review; the central claim rests on the existence of a low-dimensional bi-Lipschitz map that preserves evolutionary semantics, but no explicit free parameters, axioms, or invented entities are stated in the provided text.

invented entities (1)
  • LLM DNA no independent evidence
    purpose: Low-dimensional bi-Lipschitz representation of functional behavior that encodes inheritance
    The paper introduces this as a new mathematical object; no independent evidence outside the definition is supplied in the abstract.

pith-pipeline@v0.9.0 · 5724 in / 1222 out tokens · 34645 ms · 2026-05-18T12:16:33.519284+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ModelLens: Finding the Best for Your Task from Myriads of Models

    cs.LG 2026-05 unverdicted novelty 6.0

    ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · cited by 1 Pith paper · 8 internal anchors

  1. [1]

    Trans- ferring backdoors between large language models by knowledge distillation.arXiv preprint arXiv:2408.09878,

    Pengzhou Cheng, Zongru Wu, Tianjie Ju, Wei Du, and Zhuosheng Zhang Gongshen Liu. Trans- ferring backdoors between large language models by knowledge distillation.arXiv preprint arXiv:2408.09878,

  2. [2]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457,

  3. [3]

    LLM Multi-Agent Systems: Challenges and Open Problems

    Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, and Zhaozhuo Xu. Llm multi-agent sys- tems: Challenges and open problems.arXiv preprint arXiv:2402.03578,

  4. [4]

    William B Johnson, Joram Lindenstrauss, et al

    Accessed: 2025-09-24. William B Johnson, Joram Lindenstrauss, et al. Extensions of lipschitz mappings into a hilbert space.Contemporary mathematics, 26(189-206):1,

  5. [5]

    The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction

    Kasper Green Larsen and Jelani Nelson. The johnson-lindenstrauss lemma is optimal for linear dimensionality reduction.arXiv preprint arXiv:1411.2404,

  6. [6]

    Quantification of large language model distillation

    Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Haihong Wu, Tianci Liu, Jiaheng Liu, Hamid Alinejad-Rokny, et al. Quantification of large language model distillation. arXiv preprint arXiv:2501.12619,

  7. [7]

    Model provenance testing for large language models.arXiv preprint arXiv:2502.00706,

    Ivica Nikolic, Teodora Baluta, and Prateek Saxena. Model provenance testing for large language models.arXiv preprint arXiv:2502.00706,

  8. [8]

    Accessed: 2026-01-06

    Mohammad Shahedur Rahman, Runbang Hu, Peng Gao, and Yuede Ji. Hugginggraph: Understand- ing the supply chain of llm ecosystem.arXiv preprint arXiv:2507.14240,

  9. [9]

    SQuAD: 100,000+ Questions for Machine Comprehension of Text

    Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text.arXiv preprint arXiv:1606.05250,

  10. [10]

    CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

    Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. Commonsenseqa: A question answering challenge targeting commonsense knowledge.arXiv preprint arXiv:1811.00937,

  11. [11]

    Towards the resistance of neural network watermarking to fine-tuning.arXiv preprint arXiv:2505.01007,

    Ling Tang, Yuefeng Chen, Hui Xue, and Quanshi Zhang. Towards the resistance of neural network watermarking to fine-tuning.arXiv preprint arXiv:2505.01007,

  12. [12]

    Rofl: Robust fingerprinting of language models.arXiv preprint arXiv:2505.12682,

    Yun-Yun Tsai, Chuan Guo, Junfeng Yang, and Laurens van der Maaten. Rofl: Robust fingerprinting of language models.arXiv preprint arXiv:2505.12682,

  13. [13]

    Towards unified task embeddings across multiple models: Bridging the gap for prompt-based large language models and beyond.arXiv preprint arXiv:2402.14522,

    Xinyu Wang, Hainiu Xu, Lin Gui, and Yulan He. Towards unified task embeddings across multiple models: Bridging the gap for prompt-based large language models and beyond.arXiv preprint arXiv:2402.14522,

  14. [14]

    Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends

    Zhenhua Xu, Xubin Yue, Zhebo Wang, Qichen Liu, Xixiang Zhao, Jingxuan Zhang, Wenjun Zeng, Wengpeng Xing, Dezhang Kong, Changting Lin, et al. Copyright protection for large language models: A survey of methods, challenges, and trends.arXiv preprint arXiv:2508.11548,

  15. [15]

    HellaSwag: Can a Machine Really Finish Your Sentence?

    Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a ma- chine really finish your sentence?arXiv preprint arXiv:1905.07830,

  16. [16]

    Nsmark: Null space based black-box watermarking defense framework for language models.arXiv preprint arXiv:2410.13907,

    Haodong Zhao, Jinming Hu, Peixuan Li, Fangqi Li, Jinrui Sha, Tianjie Ju, Peixuan Chen, Zhu- osheng Zhang, and Gongshen Liu. Nsmark: Null space based black-box watermarking defense framework for language models.arXiv preprint arXiv:2410.13907,

  17. [17]

    A Survey of Large Language Models

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 1(2),

  18. [18]

    Embedllm: Learning compact representations of large language models.arXiv preprint arXiv:2410.02223,

    Richard Zhuang, Tianhao Wu, Zhaojin Wen, Andrew Li, Jiantao Jiao, and Kannan Ramchan- dran. Embedllm: Learning compact representations of large language models.arXiv preprint arXiv:2410.02223,

  19. [19]

    evolutionary effort,

    This is achieved by applying the Johnson-Lindenstrauss (JL) Lemma. First, we define a symmetric distortion parameterϵand a scaling factorαfrom our target constants c1 andc 2: ϵ= c2 −c 1 c2 +c 1 , α= c1 +c 2 2 (6) Sincec 2 > c 1 >0, it follows thatϵ∈(0,1), which is a valid distortion parameter for the JL Lemma. We invoke the JL Lemma with thisϵ. The lemma ...