pith. sign in

arxiv: 2511.10708 · v4 · pith:WRRIUJJTnew · submitted 2025-11-13 · 🧬 q-bio.QM

MOSAIC: Codon Harmonization of Monte Carlo-Based Simulated Annealing for Linked Codons in Heterologous Protein Expression

Pith reviewed 2026-05-21 18:48 UTC · model grok-4.3

classification 🧬 q-bio.QM
keywords codon harmonizationlinked codonssimulated annealingMonte Carloheterologous expressionribosomal proteinsprotein solubilitytranslation efficiency
0
0 comments X

The pith

Monte Carlo annealing on linked codon sets produces more soluble ribosomal protein than wild-type genes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops MOSAIC, a Monte Carlo-based simulated annealing method that harmonizes entire groups of linked codons at once rather than adjusting codons one by one. This strategy is intended to better preserve native translation speeds and co-translational folding for proteins that are sensitive to those rates. When the authors applied the method to ribosomal proteins and expressed the harmonized S18 gene, it produced visibly higher total protein amounts with a larger soluble fraction than the unmodified wild-type sequence. A reader would care because codon choice directly affects whether recombinant proteins can be made reliably and in usable form for biotechnology. The work treats codon sets as the unit of optimization to capture dependencies that single-codon methods miss.

Core claim

The central claim is that applying Monte Carlo simulated annealing to sets of linked codons generates harmonized gene versions that, when expressed heterologously, deliver higher total yields of ribosomal protein S18 together with substantially more soluble protein than the corresponding wild-type gene.

What carries the argument

MOSAIC (Monte Carlo-based Simulated Annealing for Linked Codons), an algorithm that jointly optimizes groups of consecutive codons to match native translation timing.

If this is right

  • Higher expression levels become possible for other proteins whose folding depends on controlled translation speed.
  • The fraction of soluble, active protein rises when codon groups are harmonized together.
  • Recombinant production of sensitive proteins becomes more predictable without extensive trial-and-error codon trials.
  • The method extends naturally to additional ribosomal or folding-sensitive targets beyond the four tested.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same linked-set approach could be tested on membrane proteins or enzymes that misfold under rapid translation.
  • Combining the algorithm with measured ribosome dwell times from ribosome profiling might tighten the match to native kinetics.
  • If the yield benefit holds across hosts, the technique could reduce the need for rare-codon supplementation in industrial strains.

Load-bearing premise

The measured gains in total protein and soluble fraction for S18 arise specifically from the linked-codon harmonization rather than from unmeasured differences in vector, host, or growth conditions.

What would settle it

Express the identical harmonized S18 sequence under the same vector and induction conditions but without the linked-codon optimization step and observe no increase in yield or solubility.

read the original abstract

Codon usage bias has a crucial impact on the translation efficiency and co-translational folding of proteins, necessitating the algorithmic development of codon optimization/harmonization methods, particularly for heterologous recombinant protein expression. Codon harmonization is especially valuable for proteins sensitive to translation rates, because it can potentially replicate native translation speeds, preserving proper folding and maintaining protein activity. This work proposes a Monte Carlo-based codon harmonization algorithm, MOSAIC (Monte Carlo-based Simulated Annealing for Linked Codons), for the harmonization of a set of linked codons, which differs from conventional codon harmonization, by focusing on the codon sets rather than individual ones. Our MOSAIC demonstrates robust computational performance on ribosomal proteins (S18, S15, S10, and L11) as model systems. Among them, the harmonized gene of RP S18 was expressed and compared with the expression of the wild-type gene. The harmonized gene clearly yielded a larger quantity of the protein, from which the amount of the soluble protein was also significant. These results underscored the potential of the linked codon harmonization approach to enhance the expression and functionality of sensitive proteins, setting the stage for more efficient production of recombinant proteins in various biotechnological and pharmaceutical applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces MOSAIC, a Monte Carlo-based simulated annealing algorithm for harmonizing sets of linked codons (rather than individual codons) to optimize heterologous protein expression while preserving native translation kinetics. It reports computational application to ribosomal proteins S18, S15, S10, and L11 as model systems, and provides experimental comparison for RP S18 showing that the harmonized gene produced a larger quantity of total protein with a significant soluble fraction relative to the wild-type gene.

Significance. If the experimental results can be shown to isolate the effect of linked-codon harmonization through matched controls, replicates, and quantitative metrics, the work could contribute a new search strategy for codon optimization that targets co-translational folding in sensitive proteins. This has potential relevance for recombinant protein production in biotechnology, though the current support for the central experimental claim remains limited.

major comments (2)
  1. [Abstract] Abstract (and corresponding results section): The claim that the MOSAIC-harmonized RP S18 gene 'clearly yielded a larger quantity of the protein' with 'the amount of the soluble protein was also significant' supplies no quantitative values, replicate numbers, statistical tests, error bars, or controls for confounding variables such as vector backbone, promoter, ribosome binding site, host strain, growth media, induction protocol, or protein quantification method. Without these, the observed difference cannot be attributed specifically to the linked-codon harmonization strategy.
  2. [Methods] Methods/Results (algorithm validation): The description of MOSAIC as a 'robust' Monte Carlo simulated annealing procedure for linked codons does not include sufficient implementation details (e.g., exact energy function for codon sets, annealing schedule parameters, convergence criteria, or comparison baselines against standard codon harmonization tools) to evaluate reproducibility or to confirm that performance gains arise from the linked-codon formulation rather than generic optimization.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by adding at least one quantitative metric (e.g., fold-change in yield or solubility percentage) and a brief statement of the number of biological replicates performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript on MOSAIC. We address each major comment below and describe the revisions that will be made to improve the clarity, reproducibility, and rigor of the work.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and corresponding results section): The claim that the MOSAIC-harmonized RP S18 gene 'clearly yielded a larger quantity of the protein' with 'the amount of the soluble protein was also significant' supplies no quantitative values, replicate numbers, statistical tests, error bars, or controls for confounding variables such as vector backbone, promoter, ribosome binding site, host strain, growth media, induction protocol, or protein quantification method. Without these, the observed difference cannot be attributed specifically to the linked-codon harmonization strategy.

    Authors: We agree that the current presentation of the experimental results for RP S18 lacks the quantitative detail needed to fully support the claims and to isolate the contribution of linked-codon harmonization. In the revised manuscript we will expand both the abstract and the corresponding results section to report specific protein yield values (total and soluble fractions), the number of biological and technical replicates, appropriate statistical tests with p-values, error bars or standard deviations, and a complete description of all experimental controls including vector backbone, promoter, RBS, host strain, media, induction conditions, and the protein quantification method employed. These additions will allow readers to evaluate the magnitude and specificity of the observed improvement. revision: yes

  2. Referee: [Methods] Methods/Results (algorithm validation): The description of MOSAIC as a 'robust' Monte Carlo simulated annealing procedure for linked codons does not include sufficient implementation details (e.g., exact energy function for codon sets, annealing schedule parameters, convergence criteria, or comparison baselines against standard codon harmonization tools) to evaluate reproducibility or to confirm that performance gains arise from the linked-codon formulation rather than generic optimization.

    Authors: We acknowledge that the current Methods section does not provide enough implementation specifics for independent reproduction or for distinguishing the benefit of the linked-codon formulation. In the revised manuscript we will supply the precise energy function used to score sets of linked codons, the full annealing schedule (initial temperature, cooling rate, number of iterations per temperature), the convergence criteria, and explicit benchmark comparisons against representative individual-codon harmonization methods. These details will be presented in a new or expanded subsection to demonstrate that the reported performance improvements derive from the linked-codon treatment. revision: yes

Circularity Check

0 steps flagged

No circularity: new Monte Carlo algorithm validated by direct experimental comparison

full rationale

The paper introduces MOSAIC as a Monte Carlo simulated annealing procedure for linked-codon harmonization and reports computational performance on ribosomal proteins plus one experimental expression comparison (harmonized RP S18 vs wild-type). No derivation chain, fitted parameter, or self-citation is invoked to generate the reported yield improvement; the result is an empirical observation rather than a quantity forced by construction from the algorithm's inputs or prior self-referential claims. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review performed on abstract alone; full manuscript would be needed to enumerate specific free parameters in the annealing schedule, any domain assumptions about translation kinetics, or invented entities. The abstract invokes the general premise that codon usage bias affects translation efficiency and co-translational folding.

free parameters (1)
  • Monte Carlo and annealing schedule parameters
    Likely tuned values for temperature schedule, acceptance criteria, or codon set size that control the search but are not quantified in the abstract.
axioms (1)
  • domain assumption Codon usage bias has a crucial impact on the translation efficiency and co-translational folding of proteins
    Opening premise of the abstract that justifies the need for harmonization methods.

pith-pipeline@v0.9.0 · 5789 in / 1425 out tokens · 57774 ms · 2026-05-21T18:48:46.870644+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

  1. [1]

    J; Lin, H; Yang, X

    (1) Huang, C. J; Lin, H; Yang, X. Industrial production of recombinant therapeutics in Escherichia coli and its recent advancements. J. Ind. Microbiol. Biotechnol. 2012, 39, 383-399. (2) Elena, C.; Ravasi, P.; Castelli, M. E.; Peirú, S.; Menzella, H. G. Expression of codon optimized genes in microbial systems: current industrial applications and perspecti...

  2. [2]

    R.; Li, Y

    (3) Wang, J. R.; Li, Y. Y.; Liu, D. N.; Liu, J. S.; Li, P.; Chen, L. Z.; Xu, S. D. Codon optimization significantly improves the expression level of α‐amylase gene from Bacillus licheniformis in Pichia pastoris. BioMed res. Int. 2015, 2015, 248680. (4) Angov, E.; Hillier, C. J.; Kincaid, R. L.; Lyon, J. A. Heterologous protein expression is enhanced by ha...

  3. [3]

    Codon optimization of the calf prochymosin gene and its expression in Kluyveromyces lactis

    (12) Feng, Z.; Zhang, L.; Han, X.; Zhang, Y. Codon optimization of the calf prochymosin gene and its expression in Kluyveromyces lactis. World J. Microbiol. Biotechnol. 2010, 26, 895-901. (13) Marlatt, N. M.; Spratt, D. E.; Shaw, G. S. Codon optimization for enhanced Escherichia coli expression of human S100A11 and S100A1 proteins. Protein Expr. Purif. 20...

  4. [4]

    Engineering genes for predictable protein expression

    (15) Gustafsson, C.; Minshull, J.; Govindarajan, S.; Ness, J.; Villalobos, A.; Welch, M. Engineering genes for predictable protein expression. Protein Expr. Purif. 2012, 83, 37-46. (16) Gong, M.; Gong, F.; Yanofsky, C. Overexpression of tnaC of Escherichia coli inhibits growth by depleting tRNA2Pro availability. J. Bacteriol. 2006, 188, 1892-1898. (17) Wa...

  5. [5]

    (19) Xu, Y.; Ma, P.; Shah, P.; Rokas, A.; Liu, Y.; Johnson, C. H. Non-optimal codon usage is a mechanism to achieve circadian clock conditionality. Nature 2013, 495, 116-120. (20) Yu, C. H.; Dang, Y.; Zhou, Z.; Wu, C.; Zhao, F.; Sachs, M. S.; Liu, Y. Codon usage influences the local rate of translation elongation to regulate co-translational protein foldi...

  6. [6]

    Codon harmonization–going beyond the speed limit for protein expression

    (22) Mignon, C.; Mariano, N.; Stadthagen, G.; Lugari, A.; Lagoutte, P.; Donnat, S.; Chenavas, S.; Perot, C.; Sodoyer, R.; Werle, B. Codon harmonization–going beyond the speed limit for protein expression. FEBS Lett. 2018, 592, 1554-1564. (23) Punde, N.; Kooken, J.; Leary, D.; Legler, P. M.; Angov, E. Codon harmonization reduces amino acid misincorporation...

  7. [7]

    CodonWizard

    (24) Chowdhury, D. R.; A ngov, E.; Kariuki, T.; Kumar, N. A potent malaria transmission blocking vaccine based on codon harmonized full length Pfs48/45 expressed in Escherichia coli. PLoS One 2009, 4, e6352. 23 (25) Gaspar, P.; Oliveira, J. L.; Frommlet, J.; Santos, M. A.; Moura, G. EuGene: maximizing synthetic gene design for heterologous expression. Bio...

  8. [8]

    Homologous and heterologous expression of a ribosomal protein gene in Podospora anserina requires an intron

    (40) Dequard-Chablat, M.; Rötig, A. Homologous and heterologous expression of a ribosomal protein gene in Podospora anserina requires an intron. Mol. Gen. Genet. 1997, 253, 546-552. 25 (41) Liao, X.; Zhao, J.; Liang, S.; Jin, J.; Li, C.; Xiao, R.; Li, L.; Guo, M.; Zhang, G.; Lin, Y. Enhancing co-translational folding of heterologous protein by deleting no...

  9. [9]

    V.; Mikheenko, A.; Vollger, M

    (42) Nurk, S.; Koren, S.; Rhie, A.; Rautiainen, M.; Bzikadze, A. V.; Mikheenko, A.; Vollger, M. R.; Altemose, N.; Uralsky, L.; Gershman, A.; et al. The complete sequence of a human genome. Science 2022, 376, 44-53. (43) Liu, Y. A code within the genetic code: codon usage regulates co-translational protein folding. Cell Commun. Signal. 2020, 18,

  10. [10]

    Protein synthesis rates and ribosome occupancies reveal determinants of translation elongation rates

    (44) Riba, A.; Di Nanni, N.; Mittal, N.; Arhné, E.; Schmidt, A.; Zavolan, M. Protein synthesis rates and ribosome occupancies reveal determinants of translation elongation rates. Proc. Natl Acad. Sci. USA 2019, 116, 15023-15032. (45) Lorenz, R.; Bernhart, S. H.; Höner zu Siederdissen, C.; Tafer, H.; Flamm, C.; Stadler, P. F.; Hofacker, I. L. ViennaRNA Pac...

  11. [11]

    A.; Farrell, C

    (46) Pujar, S.; O’Leary, N. A.; Farrell, C. M.; Loveland, J. E.; Mudge, J. M.; Wallin, C.; Girón, C. G.; Diekhans, M.; Barnes, I.; Bennett, R.; et al. Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation. Nucleic Acids Res. 2018, 46, D221-D228. (47) Alexaki, A.; Kames, J.; Hol...