pith. sign in

arxiv: 1907.03351 · v1 · pith:CKMVSQYMnew · submitted 2019-07-07 · 🧬 q-bio.MN

Network analysis of synonymous codon usage

Pith reviewed 2026-05-25 01:03 UTC · model grok-4.3

classification 🧬 q-bio.MN
keywords synonymous codonsprotein structure networksnetwork centralityco-translational foldingrare codonsprotein functionevolutionary conservation
0
0 comments X

The pith

In 84% of proteins, at least one codon category occupies significantly different network-central positions than the others.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models protein three-dimensional structures as networks to compare the network positions of amino acids encoded by evolutionary conserved rare codons, evolutionary non-conserved rare codons, and commonly used codons. It reports that in 84% of the analyzed proteins, at least one of these three categories shows statistically significant differences in centrality compared with the other categories. Proteins are then grouped according to the specific pattern of these centrality differences, and the groups turn out to be enriched for distinct biological functions. This supplies evidence for a connection between codon usage at the sequence level, the structural positions of those codons, and the functional roles of the finished proteins.

Core claim

By representing protein structures as networks and analyzing the network centrality of residues encoded by three codon categories—evolutionary conserved rare, evolutionary non-conserved rare, and commonly used—the analysis reveals that in 84% of the proteins at least one codon category occupies significantly more or less central positions than the others. Protein groups defined by their distinct codon-centrality trends are enriched in different biological functions, implying a link between codon usage, protein folding, and protein function.

What carries the argument

Network centrality measures computed on amino-acid nodes in graphs derived from protein three-dimensional structures, with nodes partitioned by the synonymous codon category that encodes each amino acid.

If this is right

  • Protein groups defined by different codon-centrality trends are enriched in different biological functions.
  • The placement of rare codons may be tuned to the folding requirements of particular protein classes.
  • A connection exists between codon usage patterns, co-translational folding, and final protein function.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Network centrality could serve as a filter to flag candidate codon sites whose mutation would most affect folding in a given protein.
  • The functional enrichment patterns could be used to generate testable predictions about which protein classes are most sensitive to codon optimization choices.
  • Overlaying codon-centrality data with known chaperone-binding sites might reveal whether central rare codons coincide with folding bottlenecks.

Load-bearing premise

The chosen network representation of each protein structure and the chosen centrality measure accurately capture positions that matter for co-translational folding.

What would settle it

A set of proteins in which the three codon categories show no difference in measured folding kinetics or chaperone interaction when their network-central positions are experimentally swapped would falsify the claimed link.

Figures

Figures reproduced from arXiv: 1907.03351 by Gabriel Wright, Jacob Piland, Jun Li, Khalique Newaz, Patricia Clark, Scott Emrich, Tijana Milenkovic.

Figure 1
Figure 1. Figure 1: The six possible relationships between amino acids in a protein (i.e., nodes in a PSN) encoded by conserved rare, non-conserved rare, and common codons. We perform the above six comparisons (i.e., test the six relationships) for each of the 63 proteins (i.e., PSNs), using each of the six network centrality measures. Hence, we perform 6×63×6 =2,268 comparisons, i.e., Wilcoxon signed-rank tests, and obtain 2… view at source ↗
Figure 2
Figure 2. Figure 2: The 17 different codon centrality trends (i.e., different combinations of relationships between PSN positions of the three codon categories) present in our data. 3.2 No meaningful codon usage trends can be observed from randomized codon usage data If there is some biochemical signal behind our identified codon usage groups, we expect that if we randomize the codon usage data (i.e., randomly reshuffle label… view at source ↗
Figure 3
Figure 3. Figure 3: Numbers of proteins having the different codon usage trends. The 16 trends (i.e., codon usage groups) that exhibit at least one relationship with respect to at least one centrality measure are shown. The 17th “no codon usage” group with 10 proteins is left out, since no relationship is exhibited with respect to any centrality measure. The figure can be interpreted as follows. As an illustration, there are … view at source ↗
Figure 4
Figure 4. Figure 4: Functional enrichment of codon usage groups in terms of biological process GO terms. We consider those 13 out of all 17 groups that have more than two proteins, and we consider only those biological process GO terms that annotate at least two proteins in at least one of the 13 groups (Section 3.1). In the figure, a colored matrix cell indicates that the given GO term annotates at least two proteins in the … view at source ↗
read the original abstract

Most amino acids are encoded by multiple synonymous codons. For an amino acid, some of its synonymous codons are used much more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact co-translational protein folding and that positions of some rare codons are evolutionary conserved. Analyses of positions of rare codons in proteins' 3-dimensional structures, which are richer in biochemical information than sequences alone, might further explain the role of rare codons in protein folding. We analyze a protein set recently annotated with codon usage information, considering non-redundant proteins with sufficient structural information. We model the proteins' structures as networks and study potential differences between network positions of amino acids encoded by evolutionary conserved rare, evolutionary non-conserved rare, and commonly used codons. In 84% of the proteins, at least one of the three codon categories occupies significantly more or less network-central positions than the other codon categories. Different protein groups showing different codon centrality trends (i.e., different types of relationships between network positions of the three codon categories) are enriched in different biological functions, implying the existence of a link between codon usage, protein folding, and protein function.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes a set of non-redundant proteins annotated with codon usage data. Protein structures are modeled as residue contact networks. The authors compare network centrality positions of residues encoded by three codon categories (evolutionary conserved rare codons, non-conserved rare codons, and common codons). They report that in 84% of proteins at least one category occupies significantly more or less central positions than the others. Proteins are grouped by their codon-centrality trend patterns; these groups show distinct functional enrichments, which the authors interpret as evidence for a link between codon usage, co-translational folding, and protein function.

Significance. If the network centrality differences are shown to mark positions relevant to folding, the 84% statistic and the function-specific trend enrichments would constitute a useful empirical observation linking codon bias to structural positioning in a large fraction of proteins. The work is grounded in existing annotations rather than new derivations or simulations and supplies no parameter-free predictions or machine-checked results. Its interpretive reach depends on the untested assumption that the chosen graph model and centrality statistic proxy co-translational folding constraints.

major comments (2)
  1. [Abstract / Results] Abstract and Results: the central claim that 'in 84% of the proteins, at least one of the three codon categories occupies significantly more or less network-central positions' cannot be evaluated because the manuscript supplies no information on protein selection criteria, the precise definition of the residue contact network, the centrality measure employed, the statistical test used, or any multiple-testing correction. These omissions are load-bearing for the quantitative result.
  2. [Abstract / Discussion] Abstract and Discussion: the interpretation that the observed centrality trends imply a link to co-translational folding is not anchored by any validation that the chosen network representation or centrality statistic identifies positions known to affect folding kinetics. No comparison is made to folding data, alternative structural encodings, or experimentally verified rare-codon sites.
minor comments (2)
  1. [Abstract] The abstract states 'non-redundant proteins with sufficient structural information' without defining the redundancy or structural-quality thresholds applied.
  2. [Throughout] Notation for the three codon categories is introduced only in the abstract; consistent labels should be used throughout the text and figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address each major point below, with clarifications on methodology and adjustments to interpretation where appropriate.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results: the central claim that 'in 84% of the proteins, at least one of the three codon categories occupies significantly more or less network-central positions' cannot be evaluated because the manuscript supplies no information on protein selection criteria, the precise definition of the residue contact network, the centrality measure employed, the statistical test used, or any multiple-testing correction. These omissions are load-bearing for the quantitative result.

    Authors: The protein selection criteria (non-redundant proteins with sufficient structural information from the annotated set), residue contact network definition (residues in spatial proximity), centrality measure, statistical tests for comparing category positions, and multiple-testing approach are detailed in the Methods section. To make the 84% result directly evaluable from the Abstract and Results without cross-reference, we will add a concise summary of these elements to the Results section. revision: yes

  2. Referee: [Abstract / Discussion] Abstract and Discussion: the interpretation that the observed centrality trends imply a link to co-translational folding is not anchored by any validation that the chosen network representation or centrality statistic identifies positions known to affect folding kinetics. No comparison is made to folding data, alternative structural encodings, or experimentally verified rare-codon sites.

    Authors: We acknowledge that the manuscript does not include direct comparisons to folding kinetics data or experimental rare-codon sites, as the study is observational and relies on existing structural annotations and functional enrichment analysis. The residue-contact network and centrality measures are motivated by established literature linking network centrality to structurally critical positions. We will revise the Discussion to present the co-translational folding link as an interpretive hypothesis supported by the observed patterns and functional enrichments, rather than a validated conclusion, and explicitly note the absence of direct kinetic validation. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical statistical comparison of network centralities on external annotations

full rationale

The manuscript performs direct computation of network centralities on residue-contact graphs derived from PDB structures, then applies standard statistical tests to compare positions of three codon classes drawn from an externally annotated protein set. No equations are presented, no parameters are fitted and relabeled as predictions, and no self-citations supply uniqueness theorems or ansatzes that the results depend upon. The 84 % figure and functional-enrichment observations are data-driven counts, not reductions to the modeling choices by construction. The modeling assumptions (contact definition, centrality measure) are stated but remain external to the reported statistics; their biological interpretation is a separate validation question, not a circularity issue.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Abstract-only review; ledger populated from stated elements in the abstract. The analysis depends on standard definitions of codon rarity and conservation plus the validity of the network model.

free parameters (2)
  • rare-codon frequency threshold
    Abstract does not specify how 'rare' versus 'common' codons are defined; this cutoff is required to assign amino acids to categories.
  • evolutionary conservation criterion
    Abstract does not detail the sequence-alignment or conservation score threshold used to label codons as 'evolutionary conserved rare'.
axioms (2)
  • domain assumption Protein structures can be represented as networks in which node centrality reflects positions relevant to co-translational folding.
    The paper invokes this when interpreting differences in network positions as informative about folding.
  • domain assumption The selected non-redundant proteins with sufficient structural information form a representative sample for the reported trends.
    Abstract states the protein set but provides no justification or sampling details.

pith-pipeline@v0.9.0 · 5748 in / 1486 out tokens · 30797 ms · 2026-05-25T01:03:41.562489+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

  1. [1]

    Sharp, P . M. & Li, W .-H. An evolutionary perspective on synon ymous codon usage in unicellular organisms. Journal of Molecular Evolution 24, 28–38 (1986)

  2. [2]

    Chaney, J. L. & Clark, P . L. Roles for synonymous codon usage i n protein biogenesis. Annual Review of Biophysics 44, 143–166 (2015)

  3. [3]

    S., Hockenberry, A

    Liu, S. S., Hockenberry, A. J., Jewett, M. C. & Amaral, L. A. A n ovel framework for evaluating the performance of codon usage bias metrics. Journal of The Royal Society Interface 15, 20170667 (2018)

  4. [4]

    Codon usage and trna content in unicellular and m ulticellular organisms

    Ikemura, T. Codon usage and trna content in unicellular and m ulticellular organisms. Molecular Biology and Evolution 2, 13–34 (1985)

  5. [5]

    Sharp, P . M. & Li, W .-H. The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Research 15, 1281–1295 (1987)

  6. [6]

    Kramer, E. B. & Farabaugh, P . J. The frequency of translation al misreading errors in e. coli is largely determined by trna competition. RNA 13, 87–96 (2007)

  7. [7]

    & Wilke, C

    Zhou, T., Weems, M. & Wilke, C. O. Translationally optimal co dons associate with structurally sensitive sites in protei ns. Molecular Biology and Evolution 26, 1571–1580 (2009)

  8. [8]

    & Hurst, L

    Warnecke, T. & Hurst, L. D. Groel dependency affects codon us age—support for a critical role of misfolding in gene evolution. Molecular Systems Biology 6, 340 (2010)

  9. [9]

    & Frydman, J

    Pechmann, S. & Frydman, J. Evolutionary conservation of cod on optimality reveals hidden signatures of cotranslationa l folding. Nature Structural & Molecular Biology 20, 237 (2013)

  10. [10]

    & Liu, Y

    Zhou, M., Wang, T., Fu, J., Xiao, G. & Liu, Y . Nonoptimal codon usage influences protein structure in intrinsically disordered regions. Molecular Microbiology 97, 974–987 (2015)

  11. [11]

    Komar, A. A. A pause for thought along the co-translational f olding pathway. Trends in Biochemical Sciences 34, 16–24 (2009)

  12. [12]

    Kimchi-Sarfaty, C. et al. A ”silent” polymorphism in the mdr1 gene changes substrate s pecificity. Science 315, 525–528 (2007)

  13. [13]

    Zhou, M. et al. Non-optimal codon usage affects expression, structure and function of clock protein frq. Nature 495, 111 (2013)

  14. [14]

    A., Lesnik, T

    Komar, A. A., Lesnik, T. & Reiss, C. Synonymous codon substit utions affect ribosome traffic and protein folding during in vitro translation. FEBS Letters 462, 387–391 (1999)

  15. [15]

    M., Chaney, J

    Sander, I. M., Chaney, J. L. & Clark, P . L. Expanding Anfinsen’ s principle: contributions of synonymous codon selection to rational protein design. Journal of the American Chemical Society 136, 858–861 (2014). 10/12

  16. [16]

    Buhr, F. et al. Synonymous codons direct cotranslational folding toward d ifferent protein conformations. Molecular Cell 61, 341–351 (2016)

  17. [17]

    Jacobson, G. N. & Clark, P . L. Quality over quantity: optimiz ing co-translational protein folding with non- ‘optimal’synonymous codons. Current Opinion in Structural Biology 38, 102–110 (2016)

  18. [18]

    Illerg˚ ard, K., Ardell, D. H. & Elofsson, A. Structure is thr ee to ten times more conserved than sequence—a study of structural response in protein cores. Proteins: Structure, Function, and Bioinformatics 77, 499–508 (2009)

  19. [19]

    Chaney, J. L. et al. Widespread position-specific conservation of synonymous rare codons within coding sequences. PLoS Computational Biology 13, e1005531 (2017)

  20. [20]

    Jacobs, W . M. & Shakhnovich, E. I. Evidence of evolutionary s election for cotranslational folding. Proceedings of the National Academy of Sciences 114, 11434–11439 (2017)

  21. [21]

    Ba, A. N. N. et al. Proteome-wide discovery of evolutionary conserved sequences in disordered regions. Science Signaling 5, rs1–rs1 (2012)

  22. [22]

    & Medina, F

    Gonz´ alez-Camacho, F. & Medina, F. J. Nucleolins from different model organisms have conserved sequences reflecting the conservation of key cellular functions through evoluti on. Journal of Applied Biomedicine 2, 151–161 (2004)

  23. [23]

    & Ghosh, T

    Gupta, S., Majumdar, S., Bhattacharya, T. & Ghosh, T. Studies on the relationships between the synonymous codon usage and protein secondary structural units. Biochemical and Biophysical Research Communications 269, 692–696 (2000)

  24. [24]

    Networks (Oxford University Press, 2018)

    Newman, M. Networks (Oxford University Press, 2018)

  25. [25]

    & Prˇ zulj, N

    Milenkovi´ c, T., Filippis, I., Lappe, M. & Prˇ zulj, N. Optimized null model for protein structure networks. PLoS ONE 4, e5967 (2009)

  26. [26]

    Faisal, F. E. et al. GRAFENE: Graphlet-based alignment-free network approach integrates 3d structural and sequence (residue order) data to improve protein structural compari son. Scientific Reports 7, 14890 (2017)

  27. [27]

    Newaz, K., Rahnama, A., Ghalehnovi, M., Antsaklis, P . J. & Mi lenkovic, T. Network-based protein structural classifica- tion. arXiv:1804.04725v2 (2018)

  28. [28]

    & Rosenstr¨ om, P

    Holm, L. & Rosenstr¨ om, P . Dali server: conservation mapping in 3d. Nucleic Acids Research 38, W545 (2010)

  29. [29]

    & Skolnick, J

    Zhang, Y . & Skolnick, J. Tm-align: a protein structure alignment algorithm based on the tm-score. Nucleic Acids Research 33, 2302–2309 (2005)

  30. [30]

    & Hamelryck, T

    Harder, T., Borg, M., Boomsma, W ., Røgen, P . & Hamelryck, T. F ast large-scale clustering of protein structures using gauss integrals. Bioinformatics 28, 510–515 (2012)

  31. [31]

    & Y ang, J

    Xia, J., Peng, Z., Qi, D., Mu, H. & Y ang, J. An ensemble approac h to protein fold classification by integration of template-based assignment and support vector machine clas sifier. Bioinformatics 33, 863–870 (2016)

  32. [32]

    V ., Paci, E

    V endruscolo, M., Dokholyan, N. V ., Paci, E. & Karplus, M. Sma ll-world view of the amino acids that play a key role in protein folding. Physical Review E 65, 061910 (2002)

  33. [33]

    Amitai, G. et al. Network analysis of protein structures identifies function al residues. Journal of Molecular Biology 344, 1135–1146 (2004)

  34. [34]

    D., Fujihashi, H., Amoros, D

    Sol, A. D., Fujihashi, H., Amoros, D. & Nussinov, R. Residue c entrality, functionally important residues, and active si te shape: analysis of enzyme and non-enzyme families. Protein Science 15, 2120–2128 (2006)

  35. [35]

    M., Lonardi, S

    V acic, V ., Iakoucheva, L. M., Lonardi, S. & Radivojac, P . Graphlet kernels for prediction of functional residues in prot ein structures. Journal of Computational Biology 17, 55–72 (2010)

  36. [36]

    & Lensink, M

    Brysbaert, G., Mauri, T., de Ruyck, J. & Lensink, M. F. Identi fication of key residues in proteins through centrality analysis and flexibility prediction with rinspector. Current Protocols in Bioinformatics e66 (2018)

  37. [37]

    Berman, H. M. et al. The protein data bank. Nucleic Acids Research 28, 235–242 (2000)

  38. [38]

    Faisal, F. E. & Milenkovi´ c, T. Dynamic networks reveal key players in aging. Bioinformatics 30, 1721 (2014)

  39. [39]

    Clarke, T. F. & Clark, P . L. Rare codons cluster. PloS ONE 3, e3412 (2008)

  40. [40]

    F., Gish, W ., Miller, W ., Myers, E

    Altschul, S. F., Gish, W ., Miller, W ., Myers, E. W . & Lipman, D . J. Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)

  41. [41]

    Predicting function: from genes to genomes and back

    Bork, P .et al. Predicting function: from genes to genomes and back. Journal of Molecular Biology 283, 707–725 (1998). 11/12

  42. [42]

    & Sander, C

    Holm, L. & Sander, C. Removing near-neighbour redundancy fr om large protein sequence collections. Bioinformatics 14, 423–429 (1998)

  43. [43]

    & Carugo, O

    Sikic, K. & Carugo, O. Protein sequence redundancy reductio n: comparison of various method. Bioinformation 5, 234 (2010)

  44. [44]

    & Prˇ zulj, N

    Milenkovi´ c, T., Memiˇ sevi´ c, V ., Bonato, A. & Prˇ zulj, N. Dominating biological networks. PloS ONE 6, e23016 (2011)

  45. [45]

    Greene, L. H. et al. The CA TH domain structure database: new protocols and class ification levels give a more compre- hensive resource for exploring evolution. Nucleic Acids Research 35, D291–D297 (2006)

  46. [46]

    G., Brenner, S

    Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)

  47. [47]

    Prˇ zulj, N., Corneil, D. G. & Jurisica, I. Modeling interact ome: scale-free or geometric? Bioinformatics 20, 3508–3515 (2004)

  48. [48]

    & Gentleman, R

    Falcon, S. & Gentleman, R. Hypergeometric testing used for g ene set enrichment analysis. In Bioconductor Case Studies, 207–220 (Springer, 2008)

  49. [49]

    Feise, R. J. Do multiple outcome measures require p-value ad justment? BMC Medical Research Methodology 2, 8 (2002)

  50. [50]

    & Hochberg, Y

    Benjamini, Y . & Hochberg, Y . Controlling the false discovery rate: a practical and powerful approach to multiple testin g. Journal of the Royal Statistical Society. Series B (Methodo logical) 289–300 (1995). 12/12