Protein-Conditioned Multi-Objective Reinforcement Learning for Full-Length mRNA Design
Pith reviewed 2026-05-09 15:01 UTC · model grok-4.3
The pith
ProMORNA generates full-length mRNAs from protein sequences using multi-objective RL, improving predicted half-life and translation efficiency on unseen targets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ProMORNA produces complete mRNA transcripts de novo directly from a target protein sequence. It trains a BART-style encoder-decoder model on over 6 million natural protein-mRNA pairs and then applies Multi-Objective Group Relative Policy Optimization (MO-GRPO) to optimize various biological objectives simultaneously. As a case study on the firefly luciferase target excluded from training data, ProMORNA improves the in silico Pareto frontier for predicted half-life and translation efficiency relative to standard supervised baselines and achieves higher predicted functional scores than a state-of-the-art baseline under the same evaluation pipeline.
What carries the argument
Multi-Objective Group Relative Policy Optimization (MO-GRPO), which unifies optimization across multiple biological objectives within a single reinforcement learning stage after initial supervised pretraining on protein-mRNA pairs.
If this is right
- The method produces full-length mRNA sequences conditioned only on the amino-acid sequence of the target protein.
- Generated sequences improve the in-silico trade-off curve between predicted half-life and translation efficiency.
- Predicted functional scores exceed those of a state-of-the-art baseline when evaluated with the same pipeline.
- The approach demonstrates feasibility for de-novo mRNA design on protein targets withheld from both training and prompt data.
Where Pith is reading between the lines
- If the predictors prove accurate in wet-lab tests, the method could shorten the initial candidate screening phase for mRNA therapeutics.
- Additional objectives such as reduced innate immune activation could be folded into the same MO-GRPO objective without changing the overall architecture.
- The same protein-to-mRNA conditioning plus multi-objective RL pattern might transfer to other nucleic-acid design tasks such as guide RNAs or antisense oligos.
Load-bearing premise
The in-silico predictors for half-life, translation efficiency, and functional scores accurately reflect real biological performance for sequences generated on unseen targets.
What would settle it
Cell-based experiments that directly measure the half-life and protein production rate of ProMORNA-designed mRNAs versus those from supervised baselines, using the firefly luciferase system or an equivalent reporter.
Figures
read the original abstract
Designing therapeutic messenger RNA (mRNA) requires creating full-length transcripts that carefully balance stability, translation efficiency, and immune safety. To address this challenge, we propose ProMORNA, a multi-objective generation framework that produces complete mRNA transcripts \textit{de novo} directly from a target protein sequence. Our approach begins by training a BART-style encoder-decoder model on over 6 million natural protein-mRNA pairs. We then introduce Multi-Objective Group Relative Policy Optimization (MO-GRPO) to simultaneously optimize for various biological objectives in a unified way. As a case study, we evaluated ProMORNA on the widely used firefly luciferase target, excluding it from both our supervised training data and the prompt pool. The results indicate that ProMORNA improves the \textit{in silico} Pareto frontier for predicted half-life and translation efficiency relative to standard supervised baselines. Additionally, it achieves higher predicted functional scores than a state-of-the-art baseline under the same evaluation pipeline. These computational findings demonstrate the feasibility of using multi-objective reinforcement learning for full-length mRNA design on unseen targets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ProMORNA, a protein-conditioned framework for de novo full-length mRNA design. It pretrains a BART-style encoder-decoder on over 6 million natural protein-mRNA pairs, then applies Multi-Objective Group Relative Policy Optimization (MO-GRPO) to jointly optimize for predicted half-life, translation efficiency, and functional scores. On the held-out firefly luciferase target, the method reports improvements to the in-silico Pareto frontier relative to supervised baselines and higher predicted functional scores than a state-of-the-art baseline under the same evaluation pipeline.
Significance. If the in-silico predictors remain calibrated on RL-generated sequences, the work would demonstrate a practical route to multi-objective optimization of complete mRNA transcripts directly from protein sequence, with potential utility for therapeutic design. The pretraining scale and unified MO-GRPO formulation are strengths, but the current evidence is limited to in-silico metrics without wet-lab confirmation or explicit validation of predictor generalization.
major comments (3)
- [Abstract / luciferase case study] Abstract and luciferase case study: the reported gains in predicted half-life and translation efficiency are obtained by directly optimizing the policy against the same predictors used for evaluation. No ablation, calibration check, or out-of-distribution test is provided to show that these predictors remain reliable on de novo sequences produced by MO-GRPO rather than natural mRNAs.
- [Methods] Methods (reward formulation): the multi-objective reward weights are listed as free parameters, yet the manuscript supplies neither the specific values used, the procedure for balancing them, nor sensitivity analysis. This leaves the Pareto-frontier improvements difficult to reproduce or interpret.
- [Results] Results: no error bars, confidence intervals, or statistical tests accompany the Pareto-frontier or functional-score comparisons, and no ablation of the RL stage versus the supervised pretraining baseline is reported. These omissions make it impossible to assess whether the observed differences are robust.
minor comments (2)
- [Methods] Notation for the MO-GRPO objective and the precise definition of the group-relative advantage should be stated explicitly in a single equation block for clarity.
- [Experimental setup] The manuscript should clarify whether the luciferase target was excluded only from the prompt pool or also from the entire pretraining corpus, and report any leakage checks performed.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for recognizing the scale of pretraining and the unified MO-GRPO formulation as strengths of the work. We address each major comment below, indicating the revisions we will make to improve clarity, reproducibility, and statistical rigor while maintaining the computational focus of the study.
read point-by-point responses
-
Referee: [Abstract / luciferase case study] Abstract and luciferase case study: the reported gains in predicted half-life and translation efficiency are obtained by directly optimizing the policy against the same predictors used for evaluation. No ablation, calibration check, or out-of-distribution test is provided to show that these predictors remain reliable on de novo sequences produced by MO-GRPO rather than natural mRNAs.
Authors: We agree that optimizing directly against the evaluation predictors raises a valid concern about potential over-optimism. The half-life and translation-efficiency predictors are fixed, literature-derived models that were never retrained on our generated sequences, and the luciferase target was excluded from all training data. Nevertheless, we acknowledge the absence of explicit calibration or distribution-shift analysis. In the revised manuscript we will add a new subsection under Results that compares predictor score distributions on held-out natural mRNAs versus a sample of MO-GRPO-generated sequences and will include a brief limitations paragraph discussing the in-silico nature of the evaluation. A full wet-lab validation of predictor generalization lies outside the scope of this computational paper. revision: partial
-
Referee: [Methods] Methods (reward formulation): the multi-objective reward weights are listed as free parameters, yet the manuscript supplies neither the specific values used, the procedure for balancing them, nor sensitivity analysis. This leaves the Pareto-frontier improvements difficult to reproduce or interpret.
Authors: The referee is correct; the specific weight values and balancing procedure were omitted. In the revised Methods section we will explicitly report the weights used (λ_half-life = 0.4, λ_translation = 0.4, λ_functional = 0.2), describe the per-objective normalization to [0,1] followed by the weighted-sum formulation, and add a sensitivity analysis in which each weight is varied by ±20 % while keeping the others fixed. The resulting Pareto frontiers and functional scores will be shown to confirm robustness. revision: yes
-
Referee: [Results] Results: no error bars, confidence intervals, or statistical tests accompany the Pareto-frontier or functional-score comparisons, and no ablation of the RL stage versus the supervised pretraining baseline is reported. These omissions make it impossible to assess whether the observed differences are robust.
Authors: We accept this criticism. The revised Results section will report all Pareto-frontier and functional-score metrics as means ± standard deviation across five independent random seeds, include Wilcoxon rank-sum p-values for the key comparisons, and add an explicit ablation study that isolates the contribution of the MO-GRPO stage relative to the supervised BART baseline alone. These additions will allow readers to evaluate the statistical reliability of the reported improvements. revision: yes
Circularity Check
RL stage optimizes directly against the same in-silico predictors used for final evaluation
specific steps
-
fitted input called prediction
[Abstract and Section 3 (MO-GRPO description)]
"We then introduce Multi-Objective Group Relative Policy Optimization (MO-GRPO) to simultaneously optimize for various biological objectives in a unified way. ... ProMORNA improves the in silico Pareto frontier for predicted half-life and translation efficiency relative to standard supervised baselines."
The policy parameters are updated to maximize the identical predicted scores (half-life, translation efficiency) that are later used to declare an improved Pareto frontier. The 'improvement' is therefore the expected outcome of the optimization objective rather than an out-of-sample verification independent of the fitted rewards.
full rationale
The paper pretrains a BART model on natural protein-mRNA pairs (external data), then applies MO-GRPO whose reward is explicitly the predicted half-life, translation efficiency, and functional scores. Reported gains are measured on exactly those same predictors for sequences generated by the optimized policy. This creates moderate circularity because improvements on the Pareto frontier are the direct consequence of maximizing the evaluation objectives rather than an independent test; the luciferase exclusion from training data does not address predictor calibration on the RL-generated distribution. No self-citation chain or definitional loop is present, so score remains at 4 rather than 6+.
Axiom & Free-Parameter Ledger
free parameters (1)
- multi-objective reward weights
axioms (1)
- domain assumption In-silico predictors for mRNA half-life and translation efficiency are sufficiently accurate proxies for real cellular behavior
Reference graph
Works this paper leans on
-
[1]
Fernando P. Polack, Stephen J. Thomas, Nicholas Kitchin, Judith Absalon, Alejandra Gurtman, Stephen Lockhart, John L. Perez, Gonzalo P´ erez Marc, Edson D. Moreira, Cristiano Zerbini, et al. Safety and efficacy of the BNT162b2 mRNA Covid-19 vaccine.The New England Journal of Medicine, 383(27):2603–2615, 2020
work page 2020
-
[2]
Lindsey R. Baden, Hana M. El Sahly, Brandon Essink, Karen Kotloff, Sharon Frey, Rick Novak, David Diemert, Stephen A. Spector, Nadine Rouphael, C. Buddy Creech, et al. Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine.The New England Journal of Medicine, 384(5):403–416, 2021
work page 2021
-
[3]
Norbert Pardi, Michael J. Hogan, Frederick W. Porter, and Drew Weissman. mRNA vaccines—a new era in vaccinology.Nature Reviews Drug Discovery, 17(4):261–279, 2018
work page 2018
-
[4]
Shugang Qin, Xiaoshan Tang, Yuting Chen, Kepan Chen, Na Fan, Wen Xiao, Qian Zheng, Guohong Li, Yuqing Teng, Min Wu, et al. mRNA-based therapeutics: powerful and versatile tools to combat diseases.Signal Transduction and Targeted Therapy, 7(1):166, 2022
work page 2022
-
[5]
Foo, Alexander Goedel, and Kenneth R
Eduarde Rohner, Ran Yang, Kylie S. Foo, Alexander Goedel, and Kenneth R. Chien. Unlocking the promise of mRNA therapeutics.Nature Biotechnology, 40(11):1586–1600, 2022
work page 2022
-
[6]
Mihir Metkar, Christopher S. Pepin, and Melissa J. Moore. Tailor made: the art of therapeutic mRNA design.Nature Reviews Drug Discovery, 23(1):67–83, 2024
work page 2024
-
[7]
Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, second edition, 2018
work page 2018
-
[8]
Byeon, Wipapat Kladwang, Hannah K
Kathrin Leppek, Geon W. Byeon, Wipapat Kladwang, Hannah K. Wayment-Steele, Christine H. Kerr, Ariel F. Xu, D. S. Kim, Vishal V. Topkar, Charles Choe, Daniel Rothschild, et al. 17 Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutics.Nature Communications, 13(1):1536, 2022
work page 2022
-
[9]
Ayoub Medjmedj, Hugo Genon, Dounia Hezili, Albert Ngalle Loth, Rudy Clemen¸ con, Cyril Guimpied, Lucile Mollet, Anne Bigot, Frank Wien, Josef Hamacek, et al. Evaluation of synthetic mRNA with selected UTR sequences and alternative poly(A) tail, in vitro and in vivo.Molecular Therapy - Nucleic Acids, 36(3):102648, 2025
work page 2025
-
[10]
Mathews, Yujian Zhang, and Liang Huang
He Zhang, Liang Zhang, Ang Lin, Congcong Xu, Ziyu Li, Kaibo Liu, Boxiang Liu, Xiaopin Ma, Fanfan Zhao, Huiling Jiang, Chunxiu Chen, Haifa Shen, Hangwen Li, David H. Mathews, Yujian Zhang, and Liang Huang. Algorithm for optimized mRNA design improves stability and immunogenicity.Nature, 621(7978):396–403, 2023
work page 2023
-
[11]
Multi-objective- guided generative design of mRNA with therapeutic properties
Sawan Patel, Sophia Tang, Yinuo Zhang, Pranam Chatterjee, and Sherwood Yao. Multi-objective- guided generative design of mRNA with therapeutic properties. InICML 2025 Generative AI and Biology (GenBio) Workshop, 2025
work page 2025
-
[12]
He Zhang, Hailong Liu, Yushan Xu, Haoran Huang, Yiming Liu, Jia Wang, Yan Qin, Haiyan Wang, Lili Ma, Zhiyuan Xun, et al. Deep generative models design mRNA sequences with enhanced translational capacity and stability.Science, 390(6773):eadr8470, 2025
work page 2025
-
[13]
Sizhen Li, Paul Chauvin, Ofek Gross, Michael Bailey, and Sven Jager. mRNA-GPT: A generative model for full-length mRNA design and optimization.bioRxiv, page 2026.03.31.715707, 2026
work page 2026
-
[14]
Synthesis Lectures on Artificial Intelligence and Machine Learning
Csaba Szepesv´ ari.Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Springer Cham, 2022
work page 2022
-
[15]
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models, 2024
work page 2024
-
[16]
Nuala A. O’Leary, Mathew W. Wright, J. Rodney Brister, Stacy Ciufo, Diana Haddad, Rich McVeigh, Bhanu Rajput, Barbara Robbertse, Brian Smith-White, Danso Ako-Adjei, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.Nucleic Acids Research, 44(D1):D733–D745, 2016
work page 2016
-
[17]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880. Associa...
work page 2020
-
[18]
mRNABERT: advancing mRNA sequence design with a universal language model and comprehensive dataset
Ying Xiong, Aowen Wang, Yu Kang, Chao Shen, Chang-Yu Hsieh, and Tingjun Hou. mRNABERT: advancing mRNA sequence design with a universal language model and comprehensive dataset. Nature Communications, 16(1):10371, 2025
work page 2025
-
[19]
Liang Huang, He Zhang, Dezhong Deng, Kai Zhao, Kaibo Liu, David A. Hendrix, and David H. Mathews. LinearFold: linear-time approximate RNA folding by 5’-to-3’ dynamic programming and beam search.Bioinformatics, 35(14):i295–i304, 2019. 18
work page 2019
-
[20]
Carine Barreau, Luc Paillard, and H. Beverley Osborne. AU-rich elements and associated factors: are there unifying principles?Nucleic Acids Research, 33(22):7138–7150, 2005
work page 2005
-
[21]
Katalin Karik´ o, Michael Buckstein, Houping Ni, and Drew Weissman. Suppression of RNA recognition by toll-like receptors: The impact of nucleoside modification and the evolutionary origin of RNA.Immunity, 23(2):165–175, 2005
work page 2005
-
[22]
Hogan, Karin Lor´ e, and Norbert Pardi
Rein Verbeke, Michael J. Hogan, Karin Lor´ e, and Norbert Pardi. Innate immune mechanisms of mRNA vaccines.Immunity, 55(11):1993–2005, 2022
work page 1993
-
[23]
DAPO: An open-source LLM reinforcement learning system at scale, 2025
Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. DAPO: An open-source LLM reinforcement learning system at scale, 2025
work page 2025
-
[24]
DeepSeek-V3.2: Pushing the frontier of open large language models, 2025
DeepSeek-AI, Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al. DeepSeek-V3.2: Pushing the frontier of open large language models, 2025
work page 2025
-
[25]
Muon is scalable for LLM training, 2025
Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, et al. Muon is scalable for LLM training, 2025
work page 2025
-
[26]
Muon: An optimizer for hidden layers in neural networks, 2024
Keller Jordan, Yuchen Jin, Vlado Boza, Jiacheng You, Franz Cesista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks, 2024
work page 2024
-
[27]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019
work page 2019
-
[28]
Predicting the translation efficiency of messenger RNA in mammalian cells
Dinghai Zheng, Logan Persyn, Jun Wang, Yue Liu, Fernando Ulloa-Montoya, Can Cenik, and Vikram Agarwal. Predicting the translation efficiency of messenger RNA in mammalian cells. Nature Biotechnology, 2025
work page 2025
-
[29]
Vikram Agarwal and David R. Kelley. The genetic and biochemical determinants of mRNA degradation rates in mammals.Genome Biology, 23:245, 2022
work page 2022
-
[30]
Optuna: A next-generation hyperparameter optimization framework
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2623–2631. Association for Computing Machinery, 2019
work page 2019
-
[31]
PyTorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: An imperative style, high-performance deep learning library. InAdvances in Neural Information Processing Systems, volume 32, pages 8024–8035, 2019
work page 2019
-
[32]
Transformers: State-of-the-art natural language processing
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R´ emi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45. Association for ...
work page 2020
-
[33]
Grzegorz Kudla, Lukasz Lipinski, Francois Caffin, Aleksandra Helwak, and Maciej Zylicz. High guanine and cytosine content increases mRNA levels in mammalian cells.PLoS Biology, 4(6):e180, 2006
work page 2006
-
[34]
Paul M. Sharp and Wen-Hsiung Li. The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications.Nucleic Acids Research, 15(3):1281– 1295, 1987
work page 1987
-
[35]
Tessa E. F. Quax, Nico J. H. P. Claassens, Dieter S¨ oll, and John van der Oost. Codon bias as a means to fine-tune gene expression.Molecular Cell, 59(2):149–161, 2015
work page 2015
-
[36]
Alicia A. Bicknell, David W. Reid, Marissa C. Licata, Adriana K. Jones, Yi Min Cheng, Mengying Li, Chiaowen Joyce Hsiao, Christopher S. Pepin, Mihir Metkar, Melissa J. Moore, et al. Attenuating ribosome load improves protein output from mRNA by limiting translation-dependent mRNA decay. Cell Reports, 43(4):114098, 2024
work page 2024
-
[37]
Bioinformatics analysis to design a multi-epitope mRNA vaccine against S
Mahdi Barazesh, Maryam Abbasi, Mohsen Mohammadi, Mohammad Naser Nasiri, Faranak Rezaei, Shiva Mohammadi, and Soudabeh Kavousipour. Bioinformatics analysis to design a multi-epitope mRNA vaccine against S. agalactiae exploiting pathogenic proteins.Scientific Reports, 14:28294, 2024
work page 2024
-
[38]
Alexandra Forsbach, Jean-Guy Nemorin, Carmen Montino, Christian M¨ uller, Ulrike Samulowitz, Alain P. Vicari, Marion Jurk, George K. Mutwiri, Arthur M. Krieg, Grayson B. Lipford, and J¨ org Vollmer. Identification of RNA sequence motifs stimulating sequence-specific TLR8-dependent immune responses.The Journal of Immunology, 180(6):3729–3738, 2008
work page 2008
-
[39]
Hiromi Tanji, Umeharu Ohto, Takuma Shibata, Masato Taoka, Yoshio Yamauchi, Toshiaki Isobe, Kensuke Miyake, and Toshiyuki Shimizu. Toll-like receptor 8 senses degradation products of single- stranded RNA.Nature Structural & Molecular Biology, 22(2):109–115, 2015
work page 2015
-
[40]
Waldman, Martin Kupiec, and Eytan Ruppin
Tamir Tuller, Yedael Y. Waldman, Martin Kupiec, and Eytan Ruppin. Translation efficiency is determined by both codon bias and folding energy.Proceedings of the National Academy of Sciences, 107(8):3645–3650, 2010
work page 2010
-
[41]
Tamir Tuller and Hadas Zur. Multiple roles of the coding sequence 5’ end in gene expression regulation.Nucleic Acids Research, 43(1):13–28, 2015
work page 2015
-
[42]
David M. Mauger, B. Joseph Cabral, Vladimir Presnyak, Stephen V. Su, David W. Reid, Brooke Goodman, Kristian Link, Nikhil Khatwani, John Reynders, Melissa J. Moore, and Iain J. McFadyen. mRNA structure regulates protein expression through changes in functional half-life.Proceedings of the National Academy of Sciences of the United States of America, 116(4...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.