STELLA: A Multimodal LLM for Protein Functional Annotation via Unified Sequence-Structure Encoding
Pith reviewed 2026-05-19 12:00 UTC · model grok-4.3
The pith
STELLA multimodal LLM achieves state-of-the-art results in protein functional annotation by aligning sequence-structure data with text.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
STELLA leverages ESM3 for unified bimodal encoding and Llama-3.1-8B-Instruct for natural language modeling, achieving state-of-the-art performance in two critical tasks: Functional Description Prediction and Enzyme-catalyzed Reaction Prediction by synergistically aligning bimodal (sequence-structure) representations with the textual modality.
What carries the argument
Synergistic alignment of bimodal sequence-structure representations from ESM3 with the textual modality inside the multimodal LLM.
If this is right
- Multimodal LLMs can outperform pure protein language models on functional annotation tasks.
- Incorporating structural geometry alongside sequence data yields more accurate predictions of protein roles and enzyme reactions.
- Textual context from natural language descriptions supports richer functional annotations than sequence or structure alone.
Where Pith is reading between the lines
- The same alignment strategy might improve function prediction for proteins with incomplete or low-resolution structural data.
- Similar multimodal fusion could be tested on related tasks such as protein-protein interaction prediction.
Load-bearing premise
High-resolution structural information from ESM3 can be effectively aligned with textual context inside a multimodal LLM to produce measurably better functional annotations than sequence-only models.
What would settle it
A head-to-head test on the same benchmarks showing that STELLA performs no better than or worse than sequence-only models on Functional Description Prediction or Enzyme-catalyzed Reaction Prediction would falsify the central claim.
Figures
read the original abstract
Understanding the intricate interplay among sequence, structure, and function remains a fundamental challenge in proteomics. The sequence-structure-function paradigm posits that biological roles are governed by the tertiary geometric conformations encoded within primary sequences; consequently, integrating these multi-modal descriptors is imperative for accurate functional annotation. While protein language models (pLMs) have achieved significant progress via representation learning on massive sequence data, they often lack the capacity to incorporate high-resolution structural information and the rich textual context that characterizes protein roles. In this work, we present STELLA, a multimodal LLM that synergistically aligns bimodal (sequence-structure) representations with the textual modality to advance protein functional annotation. By leveraging ESM3 for unified bimodal encoding and Llama-3.1-8B-Instruct for natural language modeling, STELLA achieves state-of-the-art performance in two critical tasks: Functional Description Prediction and Enzyme-catalyzed Reaction Prediction. This study demonstrates that multimodal LLMs represent a paradigm shift beyond pure pLMs, offering a new frontier for protein biology and biomedical discovery. The codes can be accessed via https://github.com/ocx-lab/STELLA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces STELLA, a multimodal LLM that uses ESM3 for unified sequence-structure encoding of proteins and aligns the resulting bimodal representations with textual context inside a Llama-3.1-8B-Instruct backbone. It reports state-of-the-art results on Functional Description Prediction and Enzyme-catalyzed Reaction Prediction, arguing that this multimodal approach constitutes a paradigm shift beyond sequence-only protein language models.
Significance. If the performance gains are shown to arise specifically from the structure-aware encoding rather than from model scale, instruction tuning, or dataset differences, the work could meaningfully advance multimodal methods in proteomics and support more accurate functional annotation. The public code release is a clear strength for reproducibility.
major comments (1)
- [Results] Results section (and associated tables/figures): the manuscript contains no ablation that isolates the contribution of ESM3 structure tokens. A controlled comparison that freezes the Llama-3.1 backbone, training regime, and prompting while removing or masking structure information is required to attribute any reported gains to the claimed bimodal synergy rather than to the larger LLM capacity or other factors. This directly tests the central claim and is therefore load-bearing.
minor comments (2)
- [Abstract] The abstract states that STELLA 'achieves state-of-the-art performance' but does not name the specific baselines, datasets, or metrics used; the main text should make these explicit in the first results subsection for immediate clarity.
- [Methods] Notation for the alignment module (e.g., how sequence and structure embeddings are projected and fused before the LLM) should be defined once in a dedicated methods subsection rather than introduced piecemeal in the text.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our work. Below we address the major comment point by point.
read point-by-point responses
-
Referee: [Results] Results section (and associated tables/figures): the manuscript contains no ablation that isolates the contribution of ESM3 structure tokens. A controlled comparison that freezes the Llama-3.1 backbone, training regime, and prompting while removing or masking structure information is required to attribute any reported gains to the claimed bimodal synergy rather than to the larger LLM capacity or other factors. This directly tests the central claim and is therefore load-bearing.
Authors: We acknowledge that the current manuscript does not include an explicit ablation study isolating the contribution of the structure tokens from ESM3. To address this, we will add a new experiment in the revised version. Specifically, we will train and evaluate a sequence-only variant of STELLA by disabling the structure encoding component of ESM3 while freezing the Llama-3.1-8B-Instruct backbone, maintaining the same training regime, dataset, and prompting strategy. This controlled comparison will directly demonstrate whether the performance gains stem from the bimodal sequence-structure encoding as claimed. revision: yes
Circularity Check
No circularity in empirical model and benchmark claims
full rationale
The paper presents STELLA as an engineering contribution that combines pre-trained ESM3 bimodal encodings with a Llama-3.1 backbone and reports experimental SOTA results on two annotation tasks. No derivation chain, equations, fitted parameters renamed as predictions, or self-citation load-bearing uniqueness theorems appear in the provided text. Performance numbers are obtained from standard train/test splits on external benchmarks rather than any quantity that is defined in terms of itself or forced by the model's own training objective. The central claim therefore remains an independent empirical statement rather than a tautological reduction to its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The sequence-structure-function paradigm posits that biological roles are governed by the tertiary geometric conformations encoded within primary sequences.
Forward citations
Cited by 1 Pith paper
-
Multimodal Protein Language Models for Enzyme Kinetic Parameters: From Substrate Recognition to Conformational Adaptation
ERBA is a new staged multimodal adapter that improves protein language model predictions of enzyme kinetic parameters by separately modeling substrate recognition and induced-fit conformational changes.
Reference graph
Works this paper leans on
-
[1]
Berman, John Westbrook, Zukang Feng, Gary Gilliland, T
Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, and Philip E. Bourne. The Protein Data Bank. Nucleic Acids Research, 28(1):235–242, 01 2000
work page 2000
-
[2]
Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Ankur V ora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard Kle...
work page 2021
-
[3]
John M. Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Zídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A A Kohl, Andy Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David A. Reiman,...
work page 2021
-
[4]
SaProt: Protein language modeling with structure-aware vocabulary
Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, and Fajie Yuan. SaProt: Protein language modeling with structure-aware vocabulary. bioRxiv 2023.10.01.560349, 2023
work page 2023
-
[5]
Jiashan Li, Xi Chen, He Huang, Mingliang Zeng, Jingcheng Yu, Xinqi Gong, and Qiwei Ye. Sable: bridging the gap in protein structure understanding with an empowering and versatile pre-training paradigm. Briefings in Bioinformatics, 26(2):bbaf120, 03 2025
work page 2025
-
[6]
Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers
Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, and Michalis Vazirgiannis. Prot2Text: Multimodal protein’s function generation with gnns and transformers. arXiv preprint arXiv:2307.14367, 2023
-
[7]
ProteinGPT: Multimodal LLM for protein property prediction and structure understanding
Yijia Xiao, Edward Sun, Yiqiao Jin, Qifan Wang, and Wei Wang. ProteinGPT: Multimodal LLM for protein property prediction and structure understanding. arXiv preprint arXiv:2408.11363, 2024
-
[8]
ProtChatGPT: towards understanding proteins with large language models
Chao Wang, Hehe Fan, Ruijie Quan, and Yi Yang. ProtChatGPT: Towards understanding proteins with large language models. arXiv preprint arXiv:2402.09649, 2024
-
[9]
arXiv preprint arXiv:2009.01411 , year=
Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael JL Townshend, and Ron Dror. Learning from protein structure with geometric vector perceptrons. arXiv preprint arXiv:2009.01411, 2020
-
[10]
Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C Lawrence Zitnick, Jerry Ma, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15):e2016239118, 2021
work page 2021
-
[11]
NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning
Michael Schantz Klausen, Martin Closter Jespersen, Henrik Nielsen, Kamilla Kjaergaard Jensen, Vanessa Isabell Jurtz, Casper Kaae Soenderby, Morten Otto Alexander Sommer, Ole Winther, Morten Nielsen, Bent Petersen, et al. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins: Structure, Function, and Bioinfo...
work page 2019
-
[12]
Learning inverse folding from millions of predicted structures
Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, and Alexander Rives. Learning inverse folding from millions of predicted structures. In International conference on machine learning, pages 8946–8970. PMLR, 2022
work page 2022
-
[13]
Simulating 500 million years of evolution with a language model
Tomas Hayes, Roshan Rao, Halil Akin, Nicholas J Sofroniew, Deniz Oktay, Zeming Lin, Robert Verkuil, Vincent Q Tran, Jonathan Deaton, Marius Wiggert, et al. Simulating 500 million years of evolution with a language model. bioRxiv, pages 2024–07, 2024
work page 2024
-
[14]
arXiv preprint arXiv:2501.10282 (2025)
Wenqi Fan, Yi Zhou, Shijie Wang, Yuyao Yan, Hui Liu, Qian Zhao, Le Song, and Qing Li. Computational protein science in the era of large language models (llms). arXiv preprint arXiv:2501.10282, 2025
-
[15]
ProtST: Multi-modality learning of protein sequences and biomedical texts
Minghao Xu, Xinyu Yuan, Santiago Miret, and Jian Tang. ProtST: Multi-modality learning of protein sequences and biomedical texts. arXiv preprint arXiv:2301.12040, 2023
-
[16]
A text-guided protein design framework
Shengchao Liu, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Anthony Gitter, Chaowei Xiao, Jian Tang, Hongyu Guo, and Anima Anandkumar. A text-guided protein design framework. arXiv preprint arXiv:2302.04611, 2023
-
[17]
ProTokens: A machine-learned language for compact and informative encoding of protein 3D structures
Xiaohan Lin, Zhenyu Chen, Yanheng Li, Xingyu Lu, Chuanliu Fan, Ziqiang Cao, Shihao Feng, Yi Qin Gao, and Jun Zhang. ProTokens: A machine-learned language for compact and informative encoding of protein 3D structures. bioRxiv, pages 2023–11, 2023
work page 2023
-
[18]
Instruct- Protein: Aligning human and protein language via knowledge instruction
Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, and Huajun Chen. Instruct- Protein: Aligning human and protein language via knowledge instruction. arXiv preprint arXiv:2310.03269, 2023
-
[19]
Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine, 2023
Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine, 2023
work page 2023
-
[20]
Language models of protein sequences at the scale of evolution enable accurate structure prediction
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022:500902, 2022
work page 2022
-
[21]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[22]
S2ORC: The semantic scholar open research corpus
Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld. S2ORC: The semantic scholar open research corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4969–4983, Online, July 2020. Association for Computational Linguistics
work page 2020
-
[23]
Grotjahn, Elizabeth Villa, Le Song, and Pengtao Xie
Mingjia Huo, Han Guo, Xingyi Cheng, Digvijay Singh, Hamidreza Rahmani, Shen Li, Philipp Gerlof, Trey Ideker, Danielle A. Grotjahn, Elizabeth Villa, Le Song, and Pengtao Xie. Multi-modal large language model enables protein function prediction. bioRxiv, 2024
work page 2024
-
[24]
xtrimopglm: unified 100b-scale pre-trained transformer for deciphering the language of protein
Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng, et al. xtrimopglm: unified 100b-scale pre-trained transformer for deciphering the language of protein. arXiv preprint arXiv:2401.06199, 2024
-
[25]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023
work page 2023
-
[26]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
Efficient multimodal learning from data-centric perspective, 2024
Muyang He, Yexin Liu, Boya Wu, Jianhao Yuan, Yueze Wang, Tiejun Huang, and Bo Zhao. Efficient multimodal learning from data-centric perspective, 2024
work page 2024
-
[28]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[29]
Measuring massive multitask language understanding, 2021
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding, 2021
work page 2021
-
[30]
Mmlu-pro: A more robust and challenging multi-task language understanding benchmark, 2024
Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, and Wenhu Chen. Mmlu-pro: A more robust and challenging multi-task language understanding benchmark, 2024
work page 2024
-
[31]
Instruction-following evaluation for large language models, 2023
Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models, 2023
work page 2023
-
[32]
Training verifiers to solve math word problems, 2021
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems, 2021
work page 2021
-
[33]
Measuring mathematical problem solving with the math dataset, 2021
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the math dataset, 2021
work page 2021
-
[34]
David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. Gpqa: A graduate-level google-proof q&a benchmark, 2023
work page 2023
-
[35]
Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge, 2018
work page 2018
-
[36]
Evaluating large language models trained on code, 2021
Mark Chen and Jerry Tworek et al. Evaluating large language models trained on code, 2021
work page 2021
-
[37]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation, 2023
work page 2023
-
[38]
Patil, Ion Stoica, and Joseph E
Fanjia Yan, Huanzhi Mao, Charlie Cheng-Jie Ji, Tianjun Zhang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Berkeley function calling leaderboard. 2024
work page 2024
-
[39]
Nexusraven: a commercially-permissive language model for function calling
Venkat Krishna Srinivasan, Zhen Dong, Banghua Zhu, Brian Yu, Hanzi Mao, Damon Mosk-Aoyama, Kurt Keutzer, Jiantao Jiao, and Jian Zhang. Nexusraven: a commercially-permissive language model for function calling. In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following, 2023
work page 2023
-
[40]
∞bench: Extending long context evaluation beyond 100k tokens, 2024
Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, Junhao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu, and Maosong Sun. ∞bench: Extending long context evaluation beyond 100k tokens, 2024
work page 2024
-
[41]
Language models are multilingual chain-of- thought reasoners, 2022
Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush V osoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, and Jason Wei. Language models are multilingual chain-of- thought reasoners, 2022
work page 2022
-
[42]
Jose M Dana, Aleksandras Gutmanas, Nidhi Tyagi, Guoying Qi, Claire O’Donovan, Maria Martin, and Sameer Velankar. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Research, 47(D1):D482–D489, 11 2018
work page 2018
-
[43]
Intrinsic-extrinsic convolution and pooling for learning on 3D protein structures
Pedro Hermosilla, Marco Schäfer, Matej Lang, Gloria Fackelmann, Pere-Pau Vázquez, Barbora Kozlikova, Michael Krone, Tobias Ritschel, and Timo Ropinski. Intrinsic-extrinsic convolution and pooling for learning on 3D protein structures. In International Conference on Learning Representations, 2021
work page 2021
-
[44]
Fast and accurate protein structure search with foldseek
Michel Van Kempen, Stephanie S Kim, Charlotte Tumescheit, Milot Mirdita, Jeongjae Lee, Cameron LM Gilchrist, Johannes Söding, and Martin Steinegger. Fast and accurate protein structure search with foldseek. Nature biotechnology, 42(2):243–246, 2024
work page 2024
-
[45]
Continuous-discrete convolution for geometry- sequence modeling in proteins
Hehe Fan, Zhangyang Wang, Yi Yang, and Mohan Kankanhalli. Continuous-discrete convolution for geometry- sequence modeling in proteins. In The Eleventh International Conference on Learning Representations, 2022
work page 2022
-
[46]
Unified rational protein engineering with sequence-based deep representation learning
Ethan C Alley, Grigory Khimulya, Surojit Biswas, Mohammed AlQuraishi, and George M Church. Unified rational protein engineering with sequence-based deep representation learning. Nature methods, 16(12):1315–1322, 2019
work page 2019
-
[47]
Deep convolutional networks for quality assessment of protein folds
Georgy Derevyanko, Sergei Grudinin, Yoshua Bengio, and Guillaume Lamoureux. Deep convolutional networks for quality assessment of protein folds. Bioinformatics, 34(23):4046–4053, 2018
work page 2018
-
[48]
Evaluating protein transfer learning with TAPE
Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Peter Chen, John Canny, Pieter Abbeel, and Yun Song. Evaluating protein transfer learning with TAPE. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019
work page 2019
-
[49]
HH-suite3 for fast remote homology detection and deep protein annotation
Martin Steinegger, Markus Meier, Milot Mirdita, Harald Vöhringer, Stephan J Haunsberger, and Johannes Söding. HH-suite3 for fast remote homology detection and deep protein annotation. BMC bioinformatics, 20:1–15, 2019
work page 2019
-
[50]
arXiv preprint arXiv:2203.06125 , year=
Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, and Jian Tang. Protein representation learning by geometric structure pretraining. arXiv preprint arXiv:2203.06125, 2022
-
[51]
Contrastive representation learning for 3D protein structures
Pedro Hermosilla and Timo Ropinski. Contrastive representation learning for 3D protein structures
-
[52]
Structure-based protein function prediction using graph convolutional networks
Vladimir Gligorijevi´c, P Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C Taylor, Ian M Fisk, Hera Vlamakis, et al. Structure-based protein function prediction using graph convolutional networks. Nature communications, 12(1):3168, 2021
work page 2021
-
[53]
ProtTrans: Toward understanding the language of life through self-supervised learning
Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, and Burkhard Rost. ProtTrans: Toward understanding the language of life through self-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):7112–7127, 2022
work page 2022
-
[54]
Lawrence Zitnick, Jerry Ma, and Rob Fergus
Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences , 118(15):e2016239118, 2021
work page 2021
-
[55]
Contrastive representation learning for 3D protein structures.arXiv preprint arXiv:2205.15675, 2022
Pedro Hermosilla and Timo Ropinski. Contrastive representation learning for 3D protein structures.arXiv preprint arXiv:2205.15675, 2022
-
[56]
Large multi-modal model for video captioning
Wenhao Chai. Large multi-modal model for video captioning. Master’s thesis, University of Washington, 2025
work page 2025
- [57]
-
[58]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b, 2023
work page 2023
-
[59]
Phi-3 technical report: A highly capable language model locally on your phone, 2024
Marah Abdin, Jyoti Aneja, and et al Hany Awadalla. Phi-3 technical report: A highly capable language model locally on your phone, 2024
work page 2024
-
[60]
Language models are super mario: Absorbing abilities from homologous models as a free lunch, 2024
Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, and Yongbin Li. Language models are super mario: Absorbing abilities from homologous models as a free lunch, 2024
work page 2024
-
[61]
Biomistral: A collection of open-source pretrained large language models for medical domains, 2024
Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, and Richard Dufour. Biomistral: A collection of open-source pretrained large language models for medical domains, 2024. A Example demonstration of STELLA’s capabilities through case studies Figure 3 shows two case studies of STELLA-ESM3-Llama-3.1-8B-Instruct to uncover ...
work page 2024
-
[62]
For example: "Required for accurate and efficient protein synthesis under certain stress conditions
Prepare ground truth functional descriptions as LLM input: We start with accurate, expert-reviewed descriptions of protein functions. For example: "Required for accurate and efficient protein synthesis under certain stress conditions. May act as a fidelity factor of the translation reaction by catalyzing a one-codon backward translocation of tRNAs on impr...
-
[63]
Prompt Llama-2-13B-Chat to generate conversational data: We utilize the Llama-2-13B-Chat model to convert these structured descriptions into conversational question-answer pairs. Specifically, we employ the following prompt to ensure detailed and meaningful dialogues: "Given a functional description of the protein, design two or three rounds of questions ...
-
[64]
Save the augmentated data in the format shown in the example L.2 in Appendix L. K Diversified instructions generated by ChatGPT (GPT-3.5) This section presents a comprehensive collection of diversified natural language instructions (see K.1- K.3) generated by ChatGPT (GPT-3.5), designed for two tasks–FP and EP. These instructions aim to simulate realistic...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.