LLM DNA: Tracing Model Evolution via Functional Representations
Pith reviewed 2026-05-18 12:16 UTC · model grok-4.3
The pith
A low-dimensional representation of functional behavior encodes the evolutionary relationships among large language models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We prove that LLM DNA satisfies inheritance and genetic determinism properties and establish the existence of DNA. Building on this theory, we derive a general, scalable, training-free pipeline for DNA extraction. In experiments across 305 LLMs, DNA aligns with prior studies and uncovers previously undocumented relationships, allowing construction of an evolutionary tree that reflects architectural and temporal progressions.
What carries the argument
LLM DNA as a low-dimensional bi-Lipschitz representation of functional behavior that carries the inheritance and evolutionary tracing properties.
If this is right
- DNA extraction works across arbitrary model families and tokenizers without training.
- Comparisons of DNA reveal undocumented relationships among LLMs.
- The evolutionary tree built from DNA matches shifts to decoder-only architectures and temporal progression.
- Different LLM families exhibit distinct evolutionary speeds.
- Performance on specific tasks is superior or competitive to existing methods.
Where Pith is reading between the lines
- If the functional embedding holds, it could be applied to track evolution in other AI domains like vision or reinforcement learning models.
- Model developers might use DNA similarity to detect unauthorized adaptations or clones of their models.
- The approach suggests that model relationships can be studied independently of specific tasks or architectures.
Load-bearing premise
That a single low-dimensional bi-Lipschitz embedding of functional behavior exists and is sufficient to encode the evolutionary relationships induced by fine-tuning, distillation, or adaptation across arbitrary model families and tokenizers.
What would settle it
Observing that the extracted representations of parent and child models do not cluster together in the low-dimensional space according to known fine-tuning relationships would falsify the central claim.
Figures
read the original abstract
The explosive growth of large language models (LLMs) has created a vast but opaque landscape: millions of models exist, yet their evolutionary relationships through fine-tuning, distillation, or adaptation are often undocumented or unclear, complicating LLM management. Existing methods are limited by task specificity, fixed model sets, or strict assumptions about tokenizers or architectures. Inspired by biological DNA, we address these limitations by mathematically defining LLM DNA as a low-dimensional, bi-Lipschitz representation of functional behavior. We prove that LLM DNA satisfies inheritance and genetic determinism properties and establish the existence of DNA. Building on this theory, we derive a general, scalable, training-free pipeline for DNA extraction. In experiments across 305 LLMs, DNA aligns with prior studies on limited subsets and achieves superior or competitive performance on specific tasks. Beyond these tasks, DNA comparisons uncover previously undocumented relationships among LLMs. We further construct the evolutionary tree of LLMs using phylogenetic algorithms, which align with shifts from encoder-decoder to decoder-only architectures, reflect temporal progression, and reveal distinct evolutionary speeds across LLM families.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes LLM DNA as a low-dimensional bi-Lipschitz functional representation of LLMs' behavior. It claims to mathematically define this representation, prove that it satisfies inheritance and genetic determinism properties, establish its existence, and derive a general, scalable, training-free pipeline for DNA extraction. Experiments on 305 LLMs demonstrate alignment with prior studies, superior performance on some tasks, discovery of undocumented relationships, and construction of an evolutionary tree consistent with architectural shifts and temporal progression.
Significance. If the theoretical foundations hold, this approach could offer a unified, architecture- and tokenizer-agnostic method for tracing LLM evolution, addressing limitations of task-specific or assumption-heavy existing methods. The scale of the experiments (305 models) and the training-free pipeline are notable strengths that could make the method practical for large-scale model management. The application of phylogenetic algorithms to reveal evolutionary patterns in LLMs is innovative.
major comments (2)
- [Abstract] Abstract: The abstract asserts proofs of inheritance, genetic determinism, and existence of LLM DNA but supplies no derivation steps, explicit definition of the functional representation, or details on how the bi-Lipschitz property is constructed or verified; this is load-bearing for the central claim that the representation encodes evolutionary relationships.
- [Theoretical sections (e.g., §3–4 on definition and existence)] Theoretical sections (e.g., §3–4 on definition and existence): The claimed low-dimensional bi-Lipschitz embedding requires a canonical metric on output distributions or behaviors that remains comparable across incompatible tokenizers and architectures, yet no explicit construction or bound showing dimension independence from model family is provided; without this the inheritance and genetic-determinism properties do not necessarily transfer.
minor comments (2)
- [Experiments section] Experiments section: Quantitative comparisons showing how DNA aligns with or exceeds prior studies on limited subsets should include explicit metrics and baselines for each task.
- [Evolutionary tree figures] Evolutionary tree figures: Labels for architectural transitions (encoder-decoder to decoder-only) and temporal markers could be added for clearer interpretation of the phylogenetic results.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for acknowledging the significance of the theoretical foundations, the scale of the 305-model experiments, and the training-free pipeline. We address the major comments point by point below and will revise the manuscript to improve clarity on the abstract and theoretical details while preserving the core claims and results.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract asserts proofs of inheritance, genetic determinism, and existence of LLM DNA but supplies no derivation steps, explicit definition of the functional representation, or details on how the bi-Lipschitz property is constructed or verified; this is load-bearing for the central claim that the representation encodes evolutionary relationships.
Authors: We agree that the abstract, constrained by length, does not include derivation steps or explicit constructions. These elements are developed in Sections 3 and 4, where the functional representation is defined, the bi-Lipschitz property is established, and the inheritance and genetic-determinism properties are proven. To better orient readers, we will revise the abstract to briefly reference the definition of the low-dimensional representation and the key properties, while retaining its summary character. revision: yes
-
Referee: [Theoretical sections (e.g., §3–4 on definition and existence)] Theoretical sections (e.g., §3–4 on definition and existence): The claimed low-dimensional bi-Lipschitz embedding requires a canonical metric on output distributions or behaviors that remains comparable across incompatible tokenizers and architectures, yet no explicit construction or bound showing dimension independence from model family is provided; without this the inheritance and genetic-determinism properties do not necessarily transfer.
Authors: We appreciate this observation on the necessity of a canonical, comparable metric. Section 3 defines the functional representation via a behavior-based metric intended to be architecture- and tokenizer-agnostic through a shared output space. Nevertheless, we concur that an explicit construction of this metric together with dimension-independence bounds would make the transfer of the inheritance and genetic-determinism properties more transparent. We will revise Sections 3 and 4 to supply the detailed metric construction and the required bounds. revision: yes
Circularity Check
No significant circularity; theoretical claims rest on independent mathematical definitions and empirical validation.
full rationale
The paper defines LLM DNA as a low-dimensional bi-Lipschitz functional representation, proves inheritance/genetic-determinism properties, and derives an extraction pipeline. No load-bearing step reduces by construction to a fitted parameter, self-citation chain, or ansatz smuggled from prior work by the same authors. The existence claim is presented as a mathematical result rather than a renaming or self-definition, and experiments across 305 models supply external validation independent of the core definitions. The derivation chain is therefore self-contained against the stated assumptions.
Axiom & Free-Parameter Ledger
invented entities (1)
-
LLM DNA
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formally define LLM DNA as a low-dimensional vector that obtained via a bi-Lipschitz map from the LLM functional space... c1 · dH(f1,f2) ≤ dτ(τf1,τf2) ≤ c2 · dH(f1,f2)
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
a DNA representation ... with target dimension L = O(((c2+c1)/(c2-c1))^2 log K)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
ModelLens: Finding the Best for Your Task from Myriads of Models
ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.
Reference graph
Works this paper leans on
-
[1]
Pengzhou Cheng, Zongru Wu, Tianjie Ju, Wei Du, and Zhuosheng Zhang Gongshen Liu. Trans- ferring backdoors between large language models by knowledge distillation.arXiv preprint arXiv:2408.09878,
-
[2]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
LLM Multi-Agent Systems: Challenges and Open Problems
Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, and Zhaozhuo Xu. Llm multi-agent sys- tems: Challenges and open problems.arXiv preprint arXiv:2402.03578,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
William B Johnson, Joram Lindenstrauss, et al
Accessed: 2025-09-24. William B Johnson, Joram Lindenstrauss, et al. Extensions of lipschitz mappings into a hilbert space.Contemporary mathematics, 26(189-206):1,
work page 2025
-
[5]
The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction
Kasper Green Larsen and Jelani Nelson. The johnson-lindenstrauss lemma is optimal for linear dimensionality reduction.arXiv preprint arXiv:1411.2404,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Quantification of large language model distillation
Sunbowen Lee, Junting Zhou, Chang Ao, Kaige Li, Xinrun Du, Sirui He, Haihong Wu, Tianci Liu, Jiaheng Liu, Hamid Alinejad-Rokny, et al. Quantification of large language model distillation. arXiv preprint arXiv:2501.12619,
-
[7]
Model provenance testing for large language models.arXiv preprint arXiv:2502.00706,
Ivica Nikolic, Teodora Baluta, and Prateek Saxena. Model provenance testing for large language models.arXiv preprint arXiv:2502.00706,
-
[8]
Mohammad Shahedur Rahman, Runbang Hu, Peng Gao, and Yuede Ji. Hugginggraph: Understand- ing the supply chain of llm ecosystem.arXiv preprint arXiv:2507.14240,
-
[9]
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text.arXiv preprint arXiv:1606.05250,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. Commonsenseqa: A question answering challenge targeting commonsense knowledge.arXiv preprint arXiv:1811.00937,
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Ling Tang, Yuefeng Chen, Hui Xue, and Quanshi Zhang. Towards the resistance of neural network watermarking to fine-tuning.arXiv preprint arXiv:2505.01007,
-
[12]
Rofl: Robust fingerprinting of language models.arXiv preprint arXiv:2505.12682,
Yun-Yun Tsai, Chuan Guo, Junfeng Yang, and Laurens van der Maaten. Rofl: Robust fingerprinting of language models.arXiv preprint arXiv:2505.12682,
-
[13]
Xinyu Wang, Hainiu Xu, Lin Gui, and Yulan He. Towards unified task embeddings across multiple models: Bridging the gap for prompt-based large language models and beyond.arXiv preprint arXiv:2402.14522,
-
[14]
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and Trends
Zhenhua Xu, Xubin Yue, Zhebo Wang, Qichen Liu, Xixiang Zhao, Jingxuan Zhang, Wenjun Zeng, Wengpeng Xing, Dezhang Kong, Changting Lin, et al. Copyright protection for large language models: A survey of methods, challenges, and trends.arXiv preprint arXiv:2508.11548,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a ma- chine really finish your sentence?arXiv preprint arXiv:1905.07830,
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[16]
Haodong Zhao, Jinming Hu, Peixuan Li, Fangqi Li, Jinrui Sha, Tianjie Ju, Peixuan Chen, Zhu- osheng Zhang, and Gongshen Liu. Nsmark: Null space based black-box watermarking defense framework for language models.arXiv preprint arXiv:2410.13907,
-
[17]
A Survey of Large Language Models
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models.arXiv preprint arXiv:2303.18223, 1(2),
work page internal anchor Pith review Pith/arXiv arXiv
-
[18]
Embedllm: Learning compact representations of large language models.arXiv preprint arXiv:2410.02223,
Richard Zhuang, Tianhao Wu, Zhaojin Wen, Andrew Li, Jiantao Jiao, and Kannan Ramchan- dran. Embedllm: Learning compact representations of large language models.arXiv preprint arXiv:2410.02223,
-
[19]
This is achieved by applying the Johnson-Lindenstrauss (JL) Lemma. First, we define a symmetric distortion parameterϵand a scaling factorαfrom our target constants c1 andc 2: ϵ= c2 −c 1 c2 +c 1 , α= c1 +c 2 2 (6) Sincec 2 > c 1 >0, it follows thatϵ∈(0,1), which is a valid distortion parameter for the JL Lemma. We invoke the JL Lemma with thisϵ. The lemma ...
work page 1987
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.