Learning the statistics and landscape of somatic mutation-induced insertions and deletions in antibodies
Pith reviewed 2026-05-24 12:26 UTC · model grok-4.3
The pith
The lengths of insertions and deletions during antibody affinity maturation follow a geometric distribution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a probabilistic inference tool that learns the statistics of indels from repertoire sequencing data, which overcomes the pitfalls and biases of standard annotation methods. The model includes antibody-specific maturation ages to account for variable mutational loads in the repertoire. After validation on synthetic data, application to human immunoglobulin heavy chains reveals distinct insertion and deletion hotspots and shows that the distribution of lengths of indels follows a geometric distribution.
What carries the argument
A probabilistic inference tool that incorporates antibody-specific maturation ages to infer indel statistics directly from sequencing data.
If this is right
- Mechanistic models of somatic hypermutation must produce geometric length distributions for indels.
- Insertion and deletion events occur at distinct sequence hotspots in heavy chains.
- Universal statistical features of indels exist across human heavy chain repertoires.
- The inferred model can be used to annotate indels in new sequencing datasets.
Where Pith is reading between the lines
- The geometric length distribution may point to a memoryless process in how DNA segments are added or removed during hypermutation.
- The same inference approach could be tested on light chains or on data from other species to check for conserved features.
- If the geometric property holds, it simplifies simulation of antibody sequence diversity in computational immunology.
Load-bearing premise
The probabilistic inference tool overcomes the pitfalls and biases of standard annotation methods without introducing comparable new biases of its own.
What would settle it
New repertoire sequencing data in which the length histogram of indels deviates significantly from a geometric distribution after the same inference procedure.
Figures
read the original abstract
Affinity maturation is crucial for improving the binding affinity of antibodies to antigens. This process is mainly driven by point substitutions caused by somatic hypermutations of the immunoglobulin gene. It also includes deletions and insertions of genomic material known as indels. While the landscape of point substitutions has been extensively studied, a detailed statistical description of indels is still lacking. Here we present a probabilistic inference tool to learn the statistics of indels from repertoire sequencing data, which overcomes the pitfalls and biases of standard annotation methods. The model includes antibody-specific maturation ages to account for variable mutational loads in the repertoire. After validation on synthetic data, we applied our tool to a large dataset of human immunoglobulin heavy chains. The inferred model allows us to identify universal statistical features of indels in heavy chains. We report distinct insertion and deletion hotspots, and show that the distribution of lengths of indels follows a geometric distribution, which puts constraints on future mechanistic models of the hypermutation process.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a probabilistic inference tool to extract statistics of somatic hypermutation-induced insertions and deletions (indels) from antibody repertoire sequencing data. The model incorporates antibody-specific maturation ages to account for variable mutational loads. After validation on synthetic data, the tool is applied to a large set of human immunoglobulin heavy chain sequences, identifying distinct insertion and deletion hotspots and reporting that indel length distributions follow a geometric distribution, which constrains mechanistic models of hypermutation.
Significance. If the inference tool recovers true indel length statistics without introducing new length-dependent biases, the geometric distribution result would supply an important empirical constraint on hypermutation mechanisms, addressing a gap relative to the extensively characterized point-mutation landscape.
major comments (1)
- [Abstract] Abstract: validation on synthetic data is stated without quantitative performance metrics, error analysis, or comparisons to baselines. This is load-bearing for the central claim that indel lengths follow a geometric distribution, because the result depends on the tool correctly extracting length statistics from real data after overcoming annotation biases.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the single major comment below and will revise the manuscript to improve the presentation of our synthetic validation results.
read point-by-point responses
-
Referee: [Abstract] Abstract: validation on synthetic data is stated without quantitative performance metrics, error analysis, or comparisons to baselines. This is load-bearing for the central claim that indel lengths follow a geometric distribution, because the result depends on the tool correctly extracting length statistics from real data after overcoming annotation biases.
Authors: We agree that the abstract would be strengthened by including quantitative performance metrics, error analysis, and baseline comparisons from the synthetic validation, as these details support the reliability of the inferred geometric length distributions. The full manuscript provides these in the Methods (model validation procedure) and Results (recovery accuracy, length-dependent bias quantification, and comparisons to standard annotation pipelines) sections, including metrics such as precision/recall on simulated indels of varying lengths and error bars across replicate simulations. However, the abstract currently summarizes this only qualitatively. We will revise the abstract to incorporate key quantitative results (e.g., overall recovery rate of indel lengths and reduction in annotation bias relative to baselines) while remaining within length limits. This addresses the load-bearing concern without altering the central claims. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper introduces a probabilistic inference tool that incorporates antibody-specific maturation ages, validates it on synthetic data, and applies the tool to real human IgH repertoire data to extract indel statistics. The reported geometric length distribution is presented as an output of this inference on the empirical data rather than an input assumption or a quantity fitted by construction. No equations, self-citations, or ansatzes are shown in the provided text that reduce the central claim to a renaming, a prior, or a self-referential definition. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Projection was done using the procedure described in Ref. [60]. Speeding up the computation The algorithm described so far is computationally very costly. Just the basic step of computing the align- ment likelihood L(s|µs;φ) for a single sequence at fixed µs is time-consuming: if we allow for a maximum size ℓ = Θ for single-event deletions and insertions, ...
-
[2]
Hozumi N, Tonegawa S (1976) Evidence for somatic rear- rangement of immunoglobulin genes coding for variable and constant regions. Proc. Natl. Acad. Sci. 73:3628
work page 1976
-
[3]
Boyd SD, et al. (2009) Measurement and Clinical Monitoring of Human Lymphocyte Clonality by Mas- sively Parallel V-D-J Pyrosequencing. Sci. Transl. Med. 14 1:12ra23
work page 2009
-
[4]
Glanville J, et al. (2009) Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc. Natl. Acad. Sci. 106:20216
work page 2009
-
[5]
Larimore K, McCormick MW, Robins HS, Greenberg PD (2012) Shaping of Human Germline IgH Repertoires Re- vealed by Deep Sequencing. J. Immunol. 189:3221
work page 2012
-
[6]
(2015) Inferring processes underlying B-cell repertoire diversity
Elhanati Y, et al. (2015) Inferring processes underlying B-cell repertoire diversity. Philos. Trans. R. Soc. B Biol. Sci. 370:20140243
work page 2015
-
[7]
(2016) A Public Database of Mem- ory and Naive B-Cell Receptor Sequences
DeWitt WS, et al. (2016) A Public Database of Mem- ory and Naive B-Cell Receptor Sequences. PLoS One 11:e0160853
work page 2016
-
[8]
Marcou Q, Mora T, Walczak AM (2018) High- throughput immune repertoire analysis with IGoR. Nat. Commun. 9:561
work page 2018
-
[9]
Briney B, Inderbitzin A, Joyce C, Burton DR (2019) Commonality despite exceptional diversity in the baseline human antibody repertoire. Nature 566:393
work page 2019
-
[10]
Elsner RA, Shlomchik MJ (2020) Germinal Center and Extrafollicular B Cell Responses in Vaccination, Immu- nity, and Autoimmunity. Immunity 53:1136–1150
work page 2020
-
[11]
Victora GD, Nussenzweig MC (2012) Germinal Centers. Annu. Rev. Immunol. 30:429
work page 2012
-
[12]
Cobey S, Wilson PC, Matsen IV FA (2015) The evo- lution within us. Philos. Trans. R. Soc. B Biol. Sci. 370:20140235
work page 2015
-
[13]
Mesin L, Ersching J, Victora GD (2016) Germinal Center B Cell Dynamics. Immunity 45:471
work page 2016
-
[14]
Feng Y, Seija N, Di Noia JM, Martin A (2020) AID in Antibody Diversification: There and Back Again. Trends Immunol. 41:P586
work page 2020
-
[15]
Kleinstein SH, Louzoun Y, Shlomchik MJ (2003) Esti- mating Hypermutation Rates from Clonal Tree Data. J. Immunol. 171:4639
work page 2003
-
[16]
Odegard VH, Schatz DG (2006) Targeting of somatic hypermutation. Nat. Rev. Immunol. 6:573
work page 2006
-
[17]
Yaari G, et al. (2013) Models of Somatic Hypermuta- tion Targeting and Substitution Based on Synonymous Mutations from High-Throughput Immunoglobulin Se- quencing Data. Front. Immunol. 4:358
work page 2013
-
[18]
(2015) Quantifying evolutionary con- straints on B-cell affinity maturation
McCoy CO, et al. (2015) Quantifying evolutionary con- straints on B-cell affinity maturation. Philos. Trans. R. Soc. B Biol. Sci. 370:20140244
work page 2015
-
[19]
Cui A, et al. (2016) A Model of Somatic Hypermuta- tion Targeting in Mice Based on High-Throughput Ig Se- quencing Data. J. Immunol. 197:3566
work page 2016
-
[20]
Sheng Z, et al. (2017) Gene-Specific Substitution Pro- files Describe the Types and Frequencies of Amino Acid Changes during Antibody Somatic Hypermutation. Front. Immunol. 8
work page 2017
-
[21]
Hoehn KB, Lunter G, Pybus OG (2017) A Phylogenetic Codon Substitution Model for Antibody Lineages. Ge- netics 206:417
work page 2017
-
[22]
Dhar A, Davidsen K, Matsen IV FA, Minin VN (2018) Predicting B cell receptor substitution profiles using pub- lic repertoire data. PLOS Comput. Biol. 14:e1006388
work page 2018
-
[23]
Spisak N, Walczak AM, Mora T (2020) Learning the het- erogeneous hypermutation landscape of immunoglobulins from high-throughput repertoire data. Nucleic Acids Res. 48:10702
work page 2020
-
[24]
(1998) Somatic Hypermutation Intro- duces Insertions and Deletions into Immunoglobulin V Genes
Wilson PC, et al. (1998) Somatic Hypermutation Intro- duces Insertions and Deletions into Immunoglobulin V Genes. J. Exp. Med. 187:59
work page 1998
-
[25]
Wilson PC, Liu YJ, Bonchereau J, Capra JD, Pascual V (1998) Amino acid insertions and deletions contribute to diversify the human Ig repertoire. Immunol. Rev. 162:143
work page 1998
-
[26]
(1998) Somatic hypermutation in normal and transformed human B cells
Klein U, et al. (1998) Somatic hypermutation in normal and transformed human B cells. Immunol. Rev. 162:261
work page 1998
-
[27]
Fischer M, K¨ uppers R (1998) Human IgA- and IgM- secreting intestinal plasma cells carry heavily mutated VH region genes. Eur. J. Immunol. 28:2971
work page 1998
-
[28]
Goossens T, Klein U, K¨ uppers R (1998) Frequent occur- rence of deletions and duplications during somatic hyper- mutation: Implications for oncogene translocations and heavy chain disease. Proc. Natl. Acad. Sci. 95:2463
work page 1998
-
[29]
Ohlin M, Borrebaeck CAK (1998) Insertions and dele- tions in hypervariable loops of antibody heavy chains contribute to molecular diversity. Mol. Immunol. 35:233
work page 1998
-
[30]
de Wildt RMT, van Venrooij WJ, Winter G, Hoet RMA, Tomlinson IM (1999) Somatic insertions and deletions shape the human antibody repertoire. J. Mol. Biol. 294:701
work page 1999
-
[31]
K¨ uppers R, Goossens T, Klein U (1999) inMech. B Cell Neoplasia 1998. Curr. Top. Microbiol. Immunol. vol 246 , eds Melchers F, Potter M (Springer, Berlin), p 193
work page 1999
-
[32]
Bemark M, Neuberger MS (2003) By-products of im- munoglobulin somatic hypermutation. Genes, Chromo- som. Cancer 38:32
work page 2003
-
[33]
Reason DC, Zhou J (2006) Codon insertion and dele- tion functions as a somatic diversification mechanism in human antibody repertoires. Biol. Direct 1:24
work page 2006
-
[34]
Briney BS, Willis JR, Crowe JE (2012) Location and length distribution of somatic hypermutation-associated DNA insertions and deletions reveals regions of antibody structural plasticity. Genes & Immun. 13:523
work page 2012
-
[35]
(2015) Sequence-Intrinsic Mechanisms that Target AID Mutational Outcomes on Antibody Genes
Yeap LS, et al. (2015) Sequence-Intrinsic Mechanisms that Target AID Mutational Outcomes on Antibody Genes. Cell 163:1124
work page 2015
-
[36]
Zhou J, Lottenbach KR, Barenkamp SJ, Reason DC (2004) Somatic Hypermutation and Diverse Im- munoglobulin Gene Usage in the Human Antibody Re- sponse to the Capsular Polysaccharide of S treptococcus pneumoniae Type 6B. Infect. Immun. 72:3505
work page 2004
-
[37]
Wu X, et al. (2010) Rational Design of Envelope Identi- fies Broadly Neutralizing Human Monoclonal Antibodies to HIV-1. Science 329:856
work page 2010
-
[38]
Walker LM, et al. (2009) Broad and Potent Neutralizing Antibodies from an African Donor Reveal a New HIV-1 Vaccine Target. Science 326:285
work page 2009
-
[39]
(2011) Broad neutralization cover- age of HIV by multiple highly potent antibodies
Walker LM, et al. (2011) Broad neutralization cover- age of HIV by multiple highly potent antibodies. Nature 477:466
work page 2011
-
[40]
Kepler TB, et al. (2014) Immunoglobulin Gene Inser- tions and Deletions in the Affinity Maturation of HIV-1 Broadly Reactive Neutralizing Antibodies. Cell Host & Microbe 16:304
work page 2014
-
[41]
Krause JC, et al. (2011) An Insertion Mutation That Distorts Antibody Binding Site Architecture Enhances Function of a Human Antibody. MBio 2:e00345–10
work page 2011
-
[42]
(2011) A Potent and Broad Neutraliz- ing Antibody Recognizes and Penetrates the HIV Glycan Shield
Pejchal R, et al. (2011) A Potent and Broad Neutraliz- ing Antibody Recognizes and Penetrates the HIV Glycan Shield. Science 334:1097
work page 2011
-
[43]
Wu X, et al. (2011) Focused Evolution of HIV-1 Neu- tralizing Antibodies Revealed by Structures and Deep Sequencing. Science 333:1593. 15
work page 2011
-
[44]
Mascola JR, Haynes BF (2013) HIV-1 neutralizing an- tibodies: understanding nature’s pathways. Immunol. Rev. 254:225
work page 2013
-
[45]
Steichen JM, et al. (2019) A generalized HIV vaccine design strategy for priming of broadly neutralizing anti- body responses. Science 366:eaax4380
work page 2019
-
[46]
(1966) Frameshift Mutations and the Genetic Code
Streisinger G, et al. (1966) Frameshift Mutations and the Genetic Code. Cold Spring Harb. Symp. Quant. Biol. 31:77
work page 1966
-
[47]
Golding GB, Gearhart PJ, Glickman BW (1987) Pat- terns of Somatic Mutations in Immunoglobulin Variable Genes. Genetics 115:169
work page 1987
-
[48]
Murugan A, Mora T, Walczak AM, Callan Jr CG (2012) Statistical inference of the generation probability of T- cell receptors from sequence repertoires. Proc. Natl. Acad. Sci. 109:16161
work page 2012
-
[49]
Ye J, Ma N, Madden TL, Ostell JM (2013) IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 41:W34–W40
work page 2013
-
[50]
Hwang JK, et al. (2017) Sequence intrinsic somatic mutation mechanisms contribute to affinity maturation of VRC01-class HIV-1 broadly neutralizing antibodies. Proc. Natl. Acad. Sci. 114:8614
work page 2017
-
[51]
Giudicelli V, et al. (2006) IMGT/LIGM-DB, the IMGT® comprehensive database of immunoglobulin and T cell receptor nucleotide sequences. Nucleic Acids Res. 34:D781–D784
work page 2006
-
[52]
Saini J, Hershberg U (2015) B cell Variable genes have evolved their codon usage to focus the targeted patterns of somatic mutation on the complementarity determining regions. Mol. Immunol. 65:157
work page 2015
-
[53]
(2020) An Integrated Multi-omic Single- Cell Atlas of Human B Cell Identity
Glass DR, et al. (2020) An Integrated Multi-omic Single- Cell Atlas of Human B Cell Identity. Immunity 53:217– 232.e5
work page 2020
- [54]
-
[55]
Sok D, Burton DR (2018) Recent progress in broadly neutralizing antibodies to HIV. Nat. Immunol. 19:1179
work page 2018
-
[56]
Vander Heiden JA, et al. (2014) pRESTO: a toolkit for processing high-throughput sequencing raw reads of lym- phocyte receptor repertoires. Bioinformatics 30:1930
work page 2014
-
[57]
Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Bio- logical Sequence Analysis (Cambridge University Press)
work page 1998
-
[58]
Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. Ser. B 39:1
work page 1977
-
[59]
McLachlan GJ, Krishnan T (2008) The EM Algorithm and Extensions (Wiley)
work page 2008
-
[60]
Parikh N, Boyd S (2014) Proximal Algorithms. Found. Trends Optim. 1:127
work page 2014
-
[61]
Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the ℓ1-ball for learning in high dimensions (ACM Press, New York, New York, USA), p 272
work page 2008
-
[62]
Kluge T (2015) C++ cubic spline interpolation. 16 Supplementary information 0 5 10 15 20 25 30 insertion length ℓins 0.0 0.2 0.4 0.6 0.8 1.0 3′ end overlap unbalance IgG NP true insertions random insertions FIG. S1: Fraction of times the overlap between inserted base pairs and same-length flanking region on the 3 ′ end is larger than the overlap with the 5...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.