An Evolutionary Approach for Designing Stable and Highly Expressible Low-Immunogenicity Therapeutic mRNA Sequences

Dhawa Sang Dong; Mausam Gurung; Suraj Kandel

arxiv: 2605.27986 · v1 · pith:4JLQO36Onew · submitted 2026-05-27 · 💻 cs.CL · q-bio.QM

An Evolutionary Approach for Designing Stable and Highly Expressible Low-Immunogenicity Therapeutic mRNA Sequences

Dhawa Sang Dong , Mausam Gurung , Suraj Kandel This is my paper

Pith reviewed 2026-06-29 13:34 UTC · model grok-4.3

classification 💻 cs.CL q-bio.QM

keywords mRNA sequence optimizationgenetic algorithmCodonTransformercodon adaptation indexRNA minimum free energyimmunogenicity motifstherapeutic mRNA designevolutionary computation

0 comments

The pith

A BERT-like model followed by genetic algorithm optimization yields mRNA sequences with improved balance of translation efficiency, folding stability, and lower immunogenicity than prior methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a two-stage computational pipeline for creating therapeutic mRNA. A pretrained CodonTransformer language model first produces candidate sequences that code for a chosen protein. A genetic algorithm then evolves those candidates through targeted codon swaps, scoring each version on a combination of translation speed, RNA folding energy, base composition, and immune-triggering motifs. The process reaches sequences with higher CAI and tAI, more accessible 5' regions, balanced global folding energy, and fewer immune motifs than two comparison approaches. Readers would care because the method offers an automated way to generate candidate sequences that satisfy multiple practical constraints at once.

Core claim

The BERT-GA framework produces sequences that reach CAI values of 0.73-0.74 and tAI values of 0.63-0.64 (over 6 percent gains), codon-pair bias near 0.97, 5' unpaired fraction of 0.87, global MFE between -346 and -356 kcal/mol for roughly 84 percent base-paired structure, and average immune penalty of 27.3, thereby locating an equilibrium between translation, stability, and immunogenicity that Linear Design overshoots with excessive rigidity and BiLSTM-CRF misses by ignoring structural constraints.

What carries the argument

The genetic algorithm stage, which performs codon-aware crossover and synonymous mutation on outputs from the CodonTransformer and scores them with a multi-objective fitness function that includes CAI, tAI, codon-pair bias, local and global MFE from RNAfold, GC content, and CpG/UpA motif counts.

If this is right

CAI and tAI rise by more than 6 percent across generations while codon-pair bias stays high and stable.
5' end ribosomal accessibility improves to an unpaired fraction of 0.87.
Global MFE settles in the -346 to -356 kcal/mol range that supports both stability and translation.
Average immune penalty drops to 27.3 through reduced CpG and UpA motifs.
The resulting sequences avoid both the hyper-stability of Linear Design and the unconstrained high-CAI focus of BiLSTM-CRF.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The sequences could be synthesized and run through cell-based expression assays to test the fitness predictions directly.
The same pipeline might be applied to design mRNA for non-antigen proteins such as enzymes or receptors.
Adding measured translation or immunogenicity data back into the fitness function could tighten the optimization loop.
Wider use of this in-silico route could shorten the early design phase for new mRNA candidates.

Load-bearing premise

The chosen numerical fitness functions serve as adequate stand-ins for real translation rates, structural behavior, and immune stimulation inside living cells.

What would settle it

Transfecting the final optimized sequences into human cells and measuring actual protein output plus cytokine release would show whether the predicted improvements in the fitness scores correspond to real performance.

Figures

Figures reproduced from arXiv: 2605.27986 by Dhawa Sang Dong, Mausam Gurung, Suraj Kandel.

**Figure 1.** Figure 1: Work Flow Diagram start of genetic evolutionary optimization with linguistically and biologically realistic mRNA candidates that resemble naturally expressed transcripts, but it does not consider the stability of secondary structures of mRNA explicitly, and GA optimization considers stability in the case of our optimization framework. b. Genetic Algorithm Optimization: The sequences generated by the CodonT… view at source ↗

**Figure 2.** Figure 2: GA Optimization over the Generation 6 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Comparative radar plot of the top three optimized mRNA candidates (seq 2, 6, and 9) from the [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: mRNA Secondary Structures 8 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Messenger RNA (mRNA) sequences as therapeutics require optimized design to ensure efficient translation, structural stability, and minimal immunogenicity. This study presents a two-stage in-silico framework that integrates deep learning and evolutionary computation for rational mRNA optimization instead of existing state-of-the-art models. In the first stage, a pretrained CodonTransformer (BERT-like Large Language Model) generates biologically coherent mRNA sequences encoding the target antigen. In the second stage, a genetic algorithm (GA) evolves these candidate sequences through codon-aware crossover and synonymous mutation guided by human codon usage preferences. Fitness functions for evaluation combined translation-related metrics (CAI, tAI, codon-pair bias), mRNA structural stability (local and global MFE via RNAfold, GC content), and reduced immunogenicity (CpG/UpA motif frequency). Over successive generations (38th, 40th, and 42nd), the GA improved (achieved CAI values of 0.73 to 0.74 and tAI values of 0.63 to 0.64) CAI and tAI by over 6% and codon-pair bias is high and consistent (0.97 ) and improved ribosomal accessibility at the 5' end, with an unpaired_30 fraction reaching 0.87; Global Minimum Free Energy (MFE) converged to a balanced range of -346 to -356 kcal/mol, achieving approximately 84% base-paired structural stability, and reduced immune-stimulatory motifs - lowering the average immune penalty to 27.3 in the final generation. Linear Design produces hyper-stable transcripts (MFE < - 2000 kcal/mol) that risk translation inefficiency due to extreme rigidity, and BiLSTM-CRF focuses solely on high CAI (0.96 to 0.98) without structural constraints, our framework achieves an optimal translation-stability equilibrium, highlighting the proposed BERT-GA framework as an effective, data-driven approach for the design and optimization of in-silico mRNA sequences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The reported gains in CAI, tAI, and MFE are direct outputs of the fitness function the GA was told to maximize, so the central claim does not stand without external checks.

read the letter

The paper takes a pretrained CodonTransformer to produce initial mRNA candidates for a target antigen, then runs a genetic algorithm that applies codon-aware crossover and mutation while selecting on a composite score of CAI, tAI, codon-pair bias, local and global MFE from RNAfold, GC content, and CpG/UpA counts. It shows the population moving over roughly 40 generations, with the numbers shifting in the expected direction.

What it does reasonably is lay out the fitness components and the GA operators in plain terms. The description of how the algorithm balances the different terms and the explicit final values (CAI 0.74, tAI 0.64, MFE around -350 kcal/mol, immune penalty 27.3) are easy to follow. The comparison to Linear Design and BiLSTM-CRF is also stated clearly, even if it stays inside the same metric set.

The main weakness is that every reported improvement is tautological to the selection rule. The abstract gives no statistical test against random search, no error bars across runs, no held-out validation, and no wet-lab data on actual translation or immunogenicity. The claim of reaching an “optimal translation-stability equilibrium” therefore rests entirely on the untested premise that these computational proxies are sufficient. The hyperparameters of the GA are also not justified or ablated.

This is a straightforward in-silico pipeline paper aimed at computational biologists who already work on mRNA design and want to see one more example of combining a language model with evolutionary search. It does not introduce new primitives or falsifiable predictions that would move the broader field.

I would not bring it to reading group or cite it. It does not have enough independent evidence to justify sending it to peer review.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a two-stage in-silico framework for therapeutic mRNA design: a pretrained BERT-like CodonTransformer generates candidate sequences, which are then evolved via a genetic algorithm using codon-aware operators and a composite fitness function (CAI, tAI, codon-pair bias, local/global MFE from RNAfold, GC content, CpG/UpA motif counts). The abstract reports metric gains across generations 38–42 (CAI 0.73→0.74, tAI 0.63→0.64, unpaired_30=0.87, MFE −346 to −356 kcal/mol, immune penalty 27.3) and claims the method reaches an “optimal translation-stability equilibrium” superior to Linear Design and BiLSTM-CRF.

Significance. If the computational proxies were shown to predict in-vivo performance, the framework could supply a practical tool for mRNA sequence optimization. At present the work demonstrates only that an optimizer improves the scalar it is explicitly told to maximize; this limits significance to a methodological illustration rather than a validated design advance.

major comments (3)

[Abstract] Abstract: the stated improvements (CAI 0.73→0.74, tAI 0.63→0.64, MFE −346 to −356 kcal/mol, immune penalty 27.3) are direct outputs of the fitness function used for GA selection. The central claim of an “optimal translation-stability equilibrium” therefore reduces to the statement that the optimizer improved the quantity it was instructed to improve, without an orthogonal test.
[Abstract] Abstract: no statistical tests, error bars, held-out validation set, or comparison against random search are supplied to establish that the reported generational gains exceed what would be obtained by chance or by alternative hyperparameter choices. GA hyperparameters (population size, mutation rate, crossover probability, number of generations) are also left unspecified.
[Abstract] Abstract: comparisons to Linear Design (MFE < −2000 kcal/mol) and BiLSTM-CRF (CAI 0.96–0.98) are performed entirely within the same metric suite used for fitness; no independent biological assay or external dataset is used to test whether the resulting sequences actually improve translation efficiency or reduce immunogenicity in cells.

minor comments (1)

[Abstract] Abstract contains awkward phrasing that obscures the reported numbers: “improved (achieved CAI values of 0.73 to 0.74 and tAI values of 0.63 to 0.64) CAI and tAI by over 6%”.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the detailed comments. Our work presents a purely computational two-stage framework for mRNA sequence optimization and does not include experimental validation. We address each major comment below and indicate where revisions will be made to clarify scope and add missing details.

read point-by-point responses

Referee: [Abstract] Abstract: the stated improvements (CAI 0.73→0.74, tAI 0.63→0.64, MFE −346 to −356 kcal/mol, immune penalty 27.3) are direct outputs of the fitness function used for GA selection. The central claim of an “optimal translation-stability equilibrium” therefore reduces to the statement that the optimizer improved the quantity it was instructed to improve, without an orthogonal test.

Authors: We agree that the reported metric changes are produced by the composite fitness function. The manuscript's contribution is the multi-objective optimization that simultaneously balances translation efficiency, structural stability, and immunogenicity reduction, in contrast to Linear Design (which produces extreme MFE values) and BiLSTM-CRF (which targets only high CAI). We will revise the abstract and discussion to explicitly frame the result as achieving a balanced trade-off across objectives rather than claiming an independently validated biological optimum. revision: partial
Referee: [Abstract] Abstract: no statistical tests, error bars, held-out validation set, or comparison against random search are supplied to establish that the reported generational gains exceed what would be obtained by chance or by alternative hyperparameter choices. GA hyperparameters (population size, mutation rate, crossover probability, number of generations) are also left unspecified.

Authors: We acknowledge these omissions. The methods section will be expanded to report all GA hyperparameters. We will also add results from multiple independent runs with error bars and a comparison against random search to demonstrate that the observed improvements are not attributable to chance. revision: yes
Referee: [Abstract] Abstract: comparisons to Linear Design (MFE < −2000 kcal/mol) and BiLSTM-CRF (CAI 0.96–0.98) are performed entirely within the same metric suite used for fitness; no independent biological assay or external dataset is used to test whether the resulting sequences actually improve translation efficiency or reduce immunogenicity in cells.

Authors: The comparisons are performed using the same computational proxies because the study is strictly in-silico. We will revise the abstract and conclusions to remove any implication of biological superiority and to state clearly that the framework provides an in-silico design tool whose sequences would require subsequent experimental testing. revision: yes

standing simulated objections not resolved

Independent biological assays or in-vivo validation of the optimized sequences, which lies outside the computational scope of the current manuscript.

Circularity Check

2 steps flagged

Reported gains in CAI/tAI/MFE/immune penalty are direct outputs of the GA fitness function

specific steps

fitted input called prediction [Abstract]
"Over successive generations (38th, 40th, and 42nd), the GA improved (achieved CAI values of 0.73 to 0.74 and tAI values of 0.63 to 0.64) CAI and tAI by over 6% and codon-pair bias is high and consistent (0.97 ) and improved ribosomal accessibility at the 5' end, with an unpaired_30 fraction reaching 0.87; Global Minimum Free Energy (MFE) converged to a balanced range of -346 to -356 kcal/mol, achieving approximately 84% base-paired structural stability, and reduced immune-stimulatory motifs - lowering the average immune penalty to 27.3 in the final generation."

The GA's fitness function is defined as the composite of precisely these quantities (CAI, tAI, codon-pair bias, MFE via RNAfold, GC content, CpG/UpA motif counts). The reported post-optimization values are therefore the direct numerical consequence of the selection rule and do not constitute an independent prediction.
fitted input called prediction [Abstract]
"Linear Design produces hyper-stable transcripts (MFE < - 2000 kcal/mol) that risk translation inefficiency due to extreme rigidity, and BiLSTM-CRF focuses solely on high CAI (0.96 to 0.98) without structural constraints, our framework achieves an optimal translation-stability equilibrium"

The superiority claim is evaluated solely on the same fitness metrics that the GA was optimized against; the equilibrium is declared optimal because it scores well on the objective the algorithm was told to optimize.

full rationale

The paper generates candidate sequences via a pretrained BERT-like model then applies GA selection whose fitness is defined exactly as the linear combination of CAI, tAI, codon-pair bias, local/global MFE (RNAfold), GC%, and CpG/UpA counts. All reported numerical improvements (CAI 0.73→0.74, tAI 0.63→0.64, MFE −346 to −356 kcal/mol, immune penalty 27.3, unpaired_30=0.87) are therefore the quantities the optimizer was explicitly instructed to maximize; the claim of an “optimal translation-stability equilibrium” relative to Linear Design and BiLSTM-CRF is likewise internal to the same metric set. No independent biological readout or external validation is supplied, so the central result reduces by construction to the statement that the GA improved its own objective function.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework depends on standard domain assumptions about codon bias and RNA structure prediction tools; no new entities are postulated and the only free parameters are the unspecified GA hyperparameters.

free parameters (1)

GA hyperparameters (population size, mutation rate, crossover probability, number of generations)
Required to run the evolutionary stage but not reported in the abstract.

axioms (2)

domain assumption Human codon usage tables and tAI accurately predict translation efficiency for therapeutic mRNAs
Invoked when defining the fitness function that guides synonymous mutations.
domain assumption RNAfold minimum free energy calculations are reliable proxies for in vivo mRNA structural stability
Used to compute both local and global stability terms in the fitness function.

pith-pipeline@v0.9.1-grok · 5913 in / 1646 out tokens · 80324 ms · 2026-06-29T13:34:23.895758+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 29 canonical work pages

[1]

Current landscape of mrna technologies and delivery systems for new modality therapeutics,

R.-M. Lu, H.-E. Hsu, S. J. L. P. Perez, M. Kumari, G.-H. Chen, M.-H. Hong, Y .-S. Lin, C.-H. Liu, S.-H. Ko, C. A. P. Concio, Y .-J. Su, Y .-H. Chang, W.-S. Li, and H.-C. Wu, “Current landscape of mrna technologies and delivery systems for new modality therapeutics,”Journal of Biomedical Science, vol. 31, no. 1, 2024. [Online]. Available: http://dx.doi.org...

work page doi:10.1186/s12929-024-01080-z 2024
[2]

A comprehensive review of mrna vaccines,

V . Gote, P. K. Bolla, N. Kommineni, A. Butreddy, P. K. Nukala, S. S. Palakurthi, and W. Khan, “A comprehensive review of mrna vaccines,”International Journal of Molecular Sciences, vol. 24, no. 3, p. 2700, Jan. 2023. [Online]. Available: http://dx.doi.org/10.3390/ijms24032700

work page doi:10.3390/ijms24032700 2023
[3]

mrna-based vaccines and therapeutics: an in-depth survey of current and upcoming clinical applications,

Y .-S. Wang, M. Kumari, G.-H. Chen, M.-H. Hong, J. P.-Y . Yuan, J.-L. Tsai, and H.-C. Wu, “mrna-based vaccines and therapeutics: an in-depth survey of current and upcoming clinical applications,”Journal of Biomedical Science, vol. 30, no. 1, Oct. 2023. [Online]. Available: http://dx.doi.org/10.1186/s12929-023-00977-5 9 A PREPRINT

work page doi:10.1186/s12929-023-00977-5 2023
[4]

mrna therapeutic modalities design, formulation and manufacturing under pharma 4.0 principles,

A. Ouranidis, T. Vavilis, E. Mandala, C. Davidopoulou, E. Stamoula, C. K. Markopoulou, A. Karagianni, and K. Kachrimanis, “mrna therapeutic modalities design, formulation and manufacturing under pharma 4.0 principles,”Biomedicines, vol. 10, no. 1, p. 50, Dec. 2021. [Online]. Available: http://dx.doi.org/10.3390/biomedicines10010050

work page doi:10.3390/biomedicines10010050 2021
[5]

Current analytical strategies for mrna-based therapeutics,

J. Camperi, K. Chatla, E. Freund, C. Galan, S. Lippold, and A. Guilbaud, “Current analytical strategies for mrna-based therapeutics,”Molecules, vol. 30, no. 7, p. 1629, Apr. 2025. [Online]. Available: http://dx.doi.org/10.3390/molecules30071629

work page doi:10.3390/molecules30071629 2025
[6]

Recent advances and prospects in rna drug development,

H. Tani, “Recent advances and prospects in rna drug development,”International Journal of Molecular Sciences, vol. 25, no. 22, p. 12284, Nov. 2024. [Online]. Available: http://dx.doi.org/10.3390/ijms252212284

work page doi:10.3390/ijms252212284 2024
[7]

Large language models facilitating modern molecular biology and novel drug development,

X.-h. Liu, Z.-h. Lu, T. Wang, and F. Liu, “Large language models facilitating modern molecular biology and novel drug development,”Frontiers in Pharmacology, vol. 15, Dec. 2024. [Online]. Available: http://dx.doi.org/10.3389/fphar.2024.1458739

work page doi:10.3389/fphar.2024.1458739 2024
[8]

mrna vaccines — a new era in vaccinology,

N. Pardi, M. J. Hogan, F. W. Porter, and D. Weissman, “mrna vaccines — a new era in vaccinology,”Nature Reviews Drug Discovery, vol. 17, no. 4, p. 261–279, Jan. 2018. [Online]. Available: http://dx.doi.org/10.1038/nrd.2017.243

work page doi:10.1038/nrd.2017.243 2018
[9]

Recent advances in mrna vaccine technology,

N. Pardi, M. J. Hogan, and D. Weissman, “Recent advances in mrna vaccine technology,”Current Opinion in Immunology, vol. 65, p. 14–20, Aug. 2020. [Online]. Available: http://dx.doi.org/10.1016/j.coi.2020.01.008

work page doi:10.1016/j.coi.2020.01.008 2020
[10]

Generative ai and large language models: A new frontier in reverse vaccinology,

K. Hayawi, S. Shahriar, H. Alashwal, and M. A. Serhani, “Generative ai and large language models: A new frontier in reverse vaccinology,”Informatics in Medicine Unlocked, vol. 48, p. 101533, 2024. [Online]. Available: http://dx.doi.org/10.1016/j.imu.2024.101533

work page doi:10.1016/j.imu.2024.101533 2024
[11]

A new deep-learning-based approach for mrna optimization: High fidelity, computation efficiency, and multiple optimization factors,

Z. Gong, Z. Jiang, W. Gao, D. Zhuo, and L. Ma, “A new deep-learning-based approach for mrna optimization: High fidelity, computation efficiency, and multiple optimization factors,” 2025. [Online]. Available: https://arxiv.org/abs/2505.23862

work page arXiv 2025
[12]

mrna-lm: full-length integrated slm for mrna analysis,

S. Li, S. Noroozizadeh, S. Moayedpour, L. Kogler-Anele, Z. Xue, D. Zheng, F. U. Montoya, V . Agarwal, Z. Bar-Joseph, and S. Jager, “mrna-lm: full-length integrated slm for mrna analysis,”Nucleic Acids Research, vol. 53, no. 3, Jan. 2025. [Online]. Available: http://dx.doi.org/10.1093/nar/gkaf044

work page doi:10.1093/nar/gkaf044 2025
[13]

Emerging opportunities of using large language models for translation between drug molecules and indications,

D. Oniani, J. Hilsman, C. Zang, J. Wang, L. Cai, J. Zawala, and Y . Wang, “Emerging opportunities of using large language models for translation between drug molecules and indications,”Scientific Reports, vol. 14, no. 1, May
[14]

Available: http://dx.doi.org/10.1038/s41598-024-61124-0

[Online]. Available: http://dx.doi.org/10.1038/s41598-024-61124-0

work page doi:10.1038/s41598-024-61124-0
[15]

mrna-based therapeutics: powerful and versatile tools to combat diseases,

S. Qin, X. Tang, Y . Chen, K. Chen, N. Fan, W. Xiao, Q. Zheng, G. Li, Y . Teng, M. Wu, and X. Song, “mrna-based therapeutics: powerful and versatile tools to combat diseases,”Signal Transduction and Targeted Therapy, vol. 7, no. 1, May 2022. [Online]. Available: http://dx.doi.org/10.1038/s41392-022-01007-w

work page doi:10.1038/s41392-022-01007-w 2022
[16]

A survey of large language model for drug research and development,

H. Guo, X. Xing, Y . Zhou, W. Jiang, X. Chen, T. Wang, Z. Jiang, Y . Wang, J. Hou, Y . Jiang, and J. Xu, “A survey of large language model for drug research and development,”IEEE Access, vol. 13, p. 51110–51129, 2025. [Online]. Available: http://dx.doi.org/10.1109/ACCESS.2025.3552256

work page doi:10.1109/access.2025.3552256 2025
[17]

Comprehensive benchmarking of large language models for rna secondary structure prediction,

L. I. Zablocki, L. A. Bugnon, M. Gerard, L. Di Persia, G. Stegmayer, and D. H. Milone, “Comprehensive benchmarking of large language models for rna secondary structure prediction,”Briefings in Bioinformatics, vol. 26, no. 2, Mar. 2025. [Online]. Available: http://dx.doi.org/10.1093/bib/bbaf137

work page doi:10.1093/bib/bbaf137 2025
[18]

Rna secondary structure prediction using deep learning with thermodynamic integration,

K. Sato, M. Akiyama, and Y . Sakakibara, “Rna secondary structure prediction using deep learning with thermodynamic integration,”Nature Communications, vol. 12, no. 1, Feb. 2021. [Online]. Available: http://dx.doi.org/10.1038/s41467-021-21194-4

work page doi:10.1038/s41467-021-21194-4 2021
[19]

Algorithm for optimized mrna design improves stability and immunogenicity,

H. Zhang, L. Zhang, A. Lin, C. Xu, Z. Li, K. Liu, B. Liu, X. Ma, F. Zhao, H. Jiang, C. Chen, H. Shen, H. Li, D. H. Mathews, Y . Zhang, and L. Huang, “Algorithm for optimized mrna design improves stability and immunogenicity,”Nature, vol. 621, no. 7978, p. 396–403, May 2023. [Online]. Available: http://dx.doi.org/10.1038/s41586-023-06127-z

work page doi:10.1038/s41586-023-06127-z 2023
[20]

Codon optimization with deep learning to enhance protein expression,

H. Fu, Y . Liang, X. Zhong, Z. Pan, L. Huang, H. Zhang, Y . Xu, W. Zhou, and Z. Liu, “Codon optimization with deep learning to enhance protein expression,”Scientific Reports, vol. 10, no. 1, Oct. 2020. [Online]. Available: http://dx.doi.org/10.1038/s41598-020-74091-z

work page doi:10.1038/s41598-020-74091-z 2020
[21]

mrnaid, an open-source platform for therapeutic mrna design and optimization strategies,

N. V ostrosablin, S. Lim, P. Gopal, K. Brazdilova, S. Parajuli, X. Wei, A. Gromek, D. Prihoda, M. Spale, A. Muzdalo, J. Greig, C. Yeo, J. Wardyn, P. Mejzlik, B. Henry, A. W. Partridge, and D. A. Bitton, “mrnaid, an open-source platform for therapeutic mrna design and optimization strategies,”NAR Genomics and Bioinformatics, vol. 6, no. 1, Jan. 2024. [Onli...

work page doi:10.1093/nargab/lqae028 2024
[22]

Large language models for drug discovery and development,

Y . Zheng, H. Y . Koh, J. Ju, M. Yang, L. T. May, G. I. Webb, L. Li, S. Pan, and G. Church, “Large language models for drug discovery and development,”Patterns, vol. 6, no. 10, p. 101346, Oct. 2025. [Online]. Available: http://dx.doi.org/10.1016/j.patter.2025.101346 10 A PREPRINT

work page doi:10.1016/j.patter.2025.101346 2025
[23]

Codonbert: Large language models for mrna design and optimization,

S. Li, S. Moayedpour, R. Li, M. Bailey, S. Riahi, L. Kogler-Anele, M. Miladi, J. Miner, D. Zheng, J. Wang, A. Balsubramani, K. Tran, M. Zacharia, M. Wu, X. Gu, R. Clinton, C. Asquith, J. Skaleski, L. Boeglin, S. Chivukula, A. Dias, F. U. Montoya, V . Agarwal, Z. Bar-Joseph, and S. Jager, “Codonbert: Large language models for mrna design and optimization,”...

work page doi:10.1101/2023.09.09.556981 2023
[24]

Codon and amino acid content are associated with mrna stability in mammalian cells,

M. E. Forrest, O. Pinkard, S. Martin, T. J. Sweet, G. Hanson, and J. Coller, “Codon and amino acid content are associated with mrna stability in mammalian cells,”PLOS ONE, vol. 15, no. 2, p. e0228730, Feb. 2020. [Online]. Available: http://dx.doi.org/10.1371/journal.pone.0228730

work page doi:10.1371/journal.pone.0228730 2020
[25]

Codonbert: a bert-based architecture tailored for codon optimization using the cross-attention mechanism,

Z. Ren, L. Jiang, Y . Di, D. Zhang, J. Gong, J. Gong, Q. Jiang, Z. Fu, P. Sun, B. Zhou, and M. Ni, “Codonbert: a bert-based architecture tailored for codon optimization using the cross-attention mechanism,”Bioinformatics, vol. 40, no. 7, May 2024. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btae330

work page doi:10.1093/bioinformatics/btae330 2024
[26]

Helix-mrna: A hybrid foundation model for full sequence mrna therapeutics,

M. Wood, M. Klop, and M. Allard, “Helix-mrna: A hybrid foundation model for full sequence mrna therapeutics,”
[27]

Available: https://arxiv.org/abs/2502.13785

[Online]. Available: https://arxiv.org/abs/2502.13785

work page arXiv
[28]

Deep generative optimization of mrna codon sequences for enhanced mrna translation and therapeutic efficacy,

Y . Li, F. Wang, J. Yang, Z. Han, L. Chen, W. Jiang, H. Zhou, T. Li, Z. Tang, J. Deng, X. He, G. Zha, Z. Hu, Y . Hu, L. Wu, C. Zhan, C. Sun, Y . He, and Z. Xie, “Deep generative optimization of mrna codon sequences for enhanced mrna translation and therapeutic efficacy,”Nature Communications, vol. 16, no. 1, Nov. 2025. [Online]. Available: http://dx.doi.o...

work page doi:10.1038/s41467-025-64894-x 2025
[29]

Integrated mrna sequence optimization using deep learning,

H. Gong, J. Wen, R. Luo, Y . Feng, J. Guo, H. Fu, and X. Zhou, “Integrated mrna sequence optimization using deep learning,”Briefings in Bioinformatics, vol. 24, no. 1, Jan. 2023. [Online]. Available: http://dx.doi.org/10.1093/bib/bbad001

work page doi:10.1093/bib/bbad001 2023
[30]

mrnabert: advancing mrna sequence design with a universal language model and comprehensive dataset,

Y . Xiong, A. Wang, Y . Kang, C. Shen, C.-Y . Hsieh, and T. Hou, “mrnabert: advancing mrna sequence design with a universal language model and comprehensive dataset,”Nature Communications, vol. 16, no. 1, Nov. 2025. [Online]. Available: http://dx.doi.org/10.1038/s41467-025-65340-8

work page doi:10.1038/s41467-025-65340-8 2025
[31]

mrna2vec: mrna embedding with language model in the 5’utr-cds for mrna design,

H. Zhang, X. Gao, J. Zhang, and L. Lai, “mrna2vec: mrna embedding with language model in the 5’utr-cds for mrna design,” 2024. [Online]. Available: https://arxiv.org/abs/2408.09048 11

work page arXiv 2024

[1] [1]

Current landscape of mrna technologies and delivery systems for new modality therapeutics,

R.-M. Lu, H.-E. Hsu, S. J. L. P. Perez, M. Kumari, G.-H. Chen, M.-H. Hong, Y .-S. Lin, C.-H. Liu, S.-H. Ko, C. A. P. Concio, Y .-J. Su, Y .-H. Chang, W.-S. Li, and H.-C. Wu, “Current landscape of mrna technologies and delivery systems for new modality therapeutics,”Journal of Biomedical Science, vol. 31, no. 1, 2024. [Online]. Available: http://dx.doi.org...

work page doi:10.1186/s12929-024-01080-z 2024

[2] [2]

A comprehensive review of mrna vaccines,

V . Gote, P. K. Bolla, N. Kommineni, A. Butreddy, P. K. Nukala, S. S. Palakurthi, and W. Khan, “A comprehensive review of mrna vaccines,”International Journal of Molecular Sciences, vol. 24, no. 3, p. 2700, Jan. 2023. [Online]. Available: http://dx.doi.org/10.3390/ijms24032700

work page doi:10.3390/ijms24032700 2023

[3] [3]

mrna-based vaccines and therapeutics: an in-depth survey of current and upcoming clinical applications,

Y .-S. Wang, M. Kumari, G.-H. Chen, M.-H. Hong, J. P.-Y . Yuan, J.-L. Tsai, and H.-C. Wu, “mrna-based vaccines and therapeutics: an in-depth survey of current and upcoming clinical applications,”Journal of Biomedical Science, vol. 30, no. 1, Oct. 2023. [Online]. Available: http://dx.doi.org/10.1186/s12929-023-00977-5 9 A PREPRINT

work page doi:10.1186/s12929-023-00977-5 2023

[4] [4]

mrna therapeutic modalities design, formulation and manufacturing under pharma 4.0 principles,

A. Ouranidis, T. Vavilis, E. Mandala, C. Davidopoulou, E. Stamoula, C. K. Markopoulou, A. Karagianni, and K. Kachrimanis, “mrna therapeutic modalities design, formulation and manufacturing under pharma 4.0 principles,”Biomedicines, vol. 10, no. 1, p. 50, Dec. 2021. [Online]. Available: http://dx.doi.org/10.3390/biomedicines10010050

work page doi:10.3390/biomedicines10010050 2021

[5] [5]

Current analytical strategies for mrna-based therapeutics,

J. Camperi, K. Chatla, E. Freund, C. Galan, S. Lippold, and A. Guilbaud, “Current analytical strategies for mrna-based therapeutics,”Molecules, vol. 30, no. 7, p. 1629, Apr. 2025. [Online]. Available: http://dx.doi.org/10.3390/molecules30071629

work page doi:10.3390/molecules30071629 2025

[6] [6]

Recent advances and prospects in rna drug development,

H. Tani, “Recent advances and prospects in rna drug development,”International Journal of Molecular Sciences, vol. 25, no. 22, p. 12284, Nov. 2024. [Online]. Available: http://dx.doi.org/10.3390/ijms252212284

work page doi:10.3390/ijms252212284 2024

[7] [7]

Large language models facilitating modern molecular biology and novel drug development,

X.-h. Liu, Z.-h. Lu, T. Wang, and F. Liu, “Large language models facilitating modern molecular biology and novel drug development,”Frontiers in Pharmacology, vol. 15, Dec. 2024. [Online]. Available: http://dx.doi.org/10.3389/fphar.2024.1458739

work page doi:10.3389/fphar.2024.1458739 2024

[8] [8]

mrna vaccines — a new era in vaccinology,

N. Pardi, M. J. Hogan, F. W. Porter, and D. Weissman, “mrna vaccines — a new era in vaccinology,”Nature Reviews Drug Discovery, vol. 17, no. 4, p. 261–279, Jan. 2018. [Online]. Available: http://dx.doi.org/10.1038/nrd.2017.243

work page doi:10.1038/nrd.2017.243 2018

[9] [9]

Recent advances in mrna vaccine technology,

N. Pardi, M. J. Hogan, and D. Weissman, “Recent advances in mrna vaccine technology,”Current Opinion in Immunology, vol. 65, p. 14–20, Aug. 2020. [Online]. Available: http://dx.doi.org/10.1016/j.coi.2020.01.008

work page doi:10.1016/j.coi.2020.01.008 2020

[10] [10]

Generative ai and large language models: A new frontier in reverse vaccinology,

K. Hayawi, S. Shahriar, H. Alashwal, and M. A. Serhani, “Generative ai and large language models: A new frontier in reverse vaccinology,”Informatics in Medicine Unlocked, vol. 48, p. 101533, 2024. [Online]. Available: http://dx.doi.org/10.1016/j.imu.2024.101533

work page doi:10.1016/j.imu.2024.101533 2024

[11] [11]

A new deep-learning-based approach for mrna optimization: High fidelity, computation efficiency, and multiple optimization factors,

Z. Gong, Z. Jiang, W. Gao, D. Zhuo, and L. Ma, “A new deep-learning-based approach for mrna optimization: High fidelity, computation efficiency, and multiple optimization factors,” 2025. [Online]. Available: https://arxiv.org/abs/2505.23862

work page arXiv 2025

[12] [12]

mrna-lm: full-length integrated slm for mrna analysis,

S. Li, S. Noroozizadeh, S. Moayedpour, L. Kogler-Anele, Z. Xue, D. Zheng, F. U. Montoya, V . Agarwal, Z. Bar-Joseph, and S. Jager, “mrna-lm: full-length integrated slm for mrna analysis,”Nucleic Acids Research, vol. 53, no. 3, Jan. 2025. [Online]. Available: http://dx.doi.org/10.1093/nar/gkaf044

work page doi:10.1093/nar/gkaf044 2025

[13] [13]

Emerging opportunities of using large language models for translation between drug molecules and indications,

D. Oniani, J. Hilsman, C. Zang, J. Wang, L. Cai, J. Zawala, and Y . Wang, “Emerging opportunities of using large language models for translation between drug molecules and indications,”Scientific Reports, vol. 14, no. 1, May

[14] [14]

Available: http://dx.doi.org/10.1038/s41598-024-61124-0

[Online]. Available: http://dx.doi.org/10.1038/s41598-024-61124-0

work page doi:10.1038/s41598-024-61124-0

[15] [15]

mrna-based therapeutics: powerful and versatile tools to combat diseases,

S. Qin, X. Tang, Y . Chen, K. Chen, N. Fan, W. Xiao, Q. Zheng, G. Li, Y . Teng, M. Wu, and X. Song, “mrna-based therapeutics: powerful and versatile tools to combat diseases,”Signal Transduction and Targeted Therapy, vol. 7, no. 1, May 2022. [Online]. Available: http://dx.doi.org/10.1038/s41392-022-01007-w

work page doi:10.1038/s41392-022-01007-w 2022

[16] [16]

A survey of large language model for drug research and development,

H. Guo, X. Xing, Y . Zhou, W. Jiang, X. Chen, T. Wang, Z. Jiang, Y . Wang, J. Hou, Y . Jiang, and J. Xu, “A survey of large language model for drug research and development,”IEEE Access, vol. 13, p. 51110–51129, 2025. [Online]. Available: http://dx.doi.org/10.1109/ACCESS.2025.3552256

work page doi:10.1109/access.2025.3552256 2025

[17] [17]

Comprehensive benchmarking of large language models for rna secondary structure prediction,

L. I. Zablocki, L. A. Bugnon, M. Gerard, L. Di Persia, G. Stegmayer, and D. H. Milone, “Comprehensive benchmarking of large language models for rna secondary structure prediction,”Briefings in Bioinformatics, vol. 26, no. 2, Mar. 2025. [Online]. Available: http://dx.doi.org/10.1093/bib/bbaf137

work page doi:10.1093/bib/bbaf137 2025

[18] [18]

Rna secondary structure prediction using deep learning with thermodynamic integration,

K. Sato, M. Akiyama, and Y . Sakakibara, “Rna secondary structure prediction using deep learning with thermodynamic integration,”Nature Communications, vol. 12, no. 1, Feb. 2021. [Online]. Available: http://dx.doi.org/10.1038/s41467-021-21194-4

work page doi:10.1038/s41467-021-21194-4 2021

[19] [19]

Algorithm for optimized mrna design improves stability and immunogenicity,

H. Zhang, L. Zhang, A. Lin, C. Xu, Z. Li, K. Liu, B. Liu, X. Ma, F. Zhao, H. Jiang, C. Chen, H. Shen, H. Li, D. H. Mathews, Y . Zhang, and L. Huang, “Algorithm for optimized mrna design improves stability and immunogenicity,”Nature, vol. 621, no. 7978, p. 396–403, May 2023. [Online]. Available: http://dx.doi.org/10.1038/s41586-023-06127-z

work page doi:10.1038/s41586-023-06127-z 2023

[20] [20]

Codon optimization with deep learning to enhance protein expression,

H. Fu, Y . Liang, X. Zhong, Z. Pan, L. Huang, H. Zhang, Y . Xu, W. Zhou, and Z. Liu, “Codon optimization with deep learning to enhance protein expression,”Scientific Reports, vol. 10, no. 1, Oct. 2020. [Online]. Available: http://dx.doi.org/10.1038/s41598-020-74091-z

work page doi:10.1038/s41598-020-74091-z 2020

[21] [21]

mrnaid, an open-source platform for therapeutic mrna design and optimization strategies,

N. V ostrosablin, S. Lim, P. Gopal, K. Brazdilova, S. Parajuli, X. Wei, A. Gromek, D. Prihoda, M. Spale, A. Muzdalo, J. Greig, C. Yeo, J. Wardyn, P. Mejzlik, B. Henry, A. W. Partridge, and D. A. Bitton, “mrnaid, an open-source platform for therapeutic mrna design and optimization strategies,”NAR Genomics and Bioinformatics, vol. 6, no. 1, Jan. 2024. [Onli...

work page doi:10.1093/nargab/lqae028 2024

[22] [22]

Large language models for drug discovery and development,

Y . Zheng, H. Y . Koh, J. Ju, M. Yang, L. T. May, G. I. Webb, L. Li, S. Pan, and G. Church, “Large language models for drug discovery and development,”Patterns, vol. 6, no. 10, p. 101346, Oct. 2025. [Online]. Available: http://dx.doi.org/10.1016/j.patter.2025.101346 10 A PREPRINT

work page doi:10.1016/j.patter.2025.101346 2025

[23] [23]

Codonbert: Large language models for mrna design and optimization,

S. Li, S. Moayedpour, R. Li, M. Bailey, S. Riahi, L. Kogler-Anele, M. Miladi, J. Miner, D. Zheng, J. Wang, A. Balsubramani, K. Tran, M. Zacharia, M. Wu, X. Gu, R. Clinton, C. Asquith, J. Skaleski, L. Boeglin, S. Chivukula, A. Dias, F. U. Montoya, V . Agarwal, Z. Bar-Joseph, and S. Jager, “Codonbert: Large language models for mrna design and optimization,”...

work page doi:10.1101/2023.09.09.556981 2023

[24] [24]

Codon and amino acid content are associated with mrna stability in mammalian cells,

M. E. Forrest, O. Pinkard, S. Martin, T. J. Sweet, G. Hanson, and J. Coller, “Codon and amino acid content are associated with mrna stability in mammalian cells,”PLOS ONE, vol. 15, no. 2, p. e0228730, Feb. 2020. [Online]. Available: http://dx.doi.org/10.1371/journal.pone.0228730

work page doi:10.1371/journal.pone.0228730 2020

[25] [25]

Codonbert: a bert-based architecture tailored for codon optimization using the cross-attention mechanism,

Z. Ren, L. Jiang, Y . Di, D. Zhang, J. Gong, J. Gong, Q. Jiang, Z. Fu, P. Sun, B. Zhou, and M. Ni, “Codonbert: a bert-based architecture tailored for codon optimization using the cross-attention mechanism,”Bioinformatics, vol. 40, no. 7, May 2024. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btae330

work page doi:10.1093/bioinformatics/btae330 2024

[26] [26]

Helix-mrna: A hybrid foundation model for full sequence mrna therapeutics,

M. Wood, M. Klop, and M. Allard, “Helix-mrna: A hybrid foundation model for full sequence mrna therapeutics,”

[27] [27]

Available: https://arxiv.org/abs/2502.13785

[Online]. Available: https://arxiv.org/abs/2502.13785

work page arXiv

[28] [28]

Deep generative optimization of mrna codon sequences for enhanced mrna translation and therapeutic efficacy,

Y . Li, F. Wang, J. Yang, Z. Han, L. Chen, W. Jiang, H. Zhou, T. Li, Z. Tang, J. Deng, X. He, G. Zha, Z. Hu, Y . Hu, L. Wu, C. Zhan, C. Sun, Y . He, and Z. Xie, “Deep generative optimization of mrna codon sequences for enhanced mrna translation and therapeutic efficacy,”Nature Communications, vol. 16, no. 1, Nov. 2025. [Online]. Available: http://dx.doi.o...

work page doi:10.1038/s41467-025-64894-x 2025

[29] [29]

Integrated mrna sequence optimization using deep learning,

H. Gong, J. Wen, R. Luo, Y . Feng, J. Guo, H. Fu, and X. Zhou, “Integrated mrna sequence optimization using deep learning,”Briefings in Bioinformatics, vol. 24, no. 1, Jan. 2023. [Online]. Available: http://dx.doi.org/10.1093/bib/bbad001

work page doi:10.1093/bib/bbad001 2023

[30] [30]

mrnabert: advancing mrna sequence design with a universal language model and comprehensive dataset,

Y . Xiong, A. Wang, Y . Kang, C. Shen, C.-Y . Hsieh, and T. Hou, “mrnabert: advancing mrna sequence design with a universal language model and comprehensive dataset,”Nature Communications, vol. 16, no. 1, Nov. 2025. [Online]. Available: http://dx.doi.org/10.1038/s41467-025-65340-8

work page doi:10.1038/s41467-025-65340-8 2025

[31] [31]

mrna2vec: mrna embedding with language model in the 5’utr-cds for mrna design,

H. Zhang, X. Gao, J. Zhang, and L. Lai, “mrna2vec: mrna embedding with language model in the 5’utr-cds for mrna design,” 2024. [Online]. Available: https://arxiv.org/abs/2408.09048 11

work page arXiv 2024