Recognition: no theorem link
PPI2Text: Captioning Protein-Protein Interactions with Coordinate-Aligned Pair-Map Decoding
Pith reviewed 2026-05-12 01:56 UTC · model grok-4.3
The pith
A multimodal model generates free-text descriptions of protein-protein interactions from amino acid sequences alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PPI2Text encodes each protein with an ESM3 encoder, builds a pair map across all residue pairs to represent interactions, and autoregressively produces free-text descriptions with a Qwen3 decoder; a coordinate-aligned positional encoding (PaCo-RoPE) ensures each axis of the pair grid matches the residue positions of the corresponding protein. Trained on the 351k-pair PPI2Text-Dataset, the model surpasses strong baselines on linguistic metrics against synthesized references and on factuality metrics where an LLM judge scores outputs against raw biological evidence.
What carries the argument
The coordinate-aligned pair map, which represents every possible residue pair between two proteins in a grid and uses PaCo-RoPE to align positional information along each protein's residue axis before decoding into text.
If this is right
- Free-text PPI descriptions support richer biological detail and direct integration with literature knowledge bases compared with controlled-vocabulary labels.
- The model records higher linguistic metric scores against synthesized references and higher factuality scores when an LLM judge compares outputs to raw biological evidence.
- Ablation results indicate that both the full pair-map construction and the coordinate-aligned positional encoding contribute measurably to performance.
- The released 351k-pair dataset enables further training and benchmarking of text-based PPI models.
Where Pith is reading between the lines
- Automated generation of interaction summaries could assist literature curation or hypothesis generation in systems biology.
- The same pair-map approach might extend to modeling interactions within larger protein complexes once suitable training data are available.
- Independent wet-lab validation on interactions absent from the training synthesis would test whether the Gemini-generated references introduce systematic biases.
Load-bearing premise
The 351k-pair dataset of descriptions synthesized by Gemini from curated databases supplies reliable and unbiased ground truth for both model training and factuality evaluation.
What would settle it
Evaluating the model's generated descriptions against a set of experimentally verified interactions drawn from primary literature that was never used in the Gemini synthesis step, and measuring systematic mismatches in reported binding details or functional effects.
Figures
read the original abstract
Protein-protein interaction (PPI) modeling has been widely studied as a binary or multi-label classification task. While emerging multimodal large language models (LLMs) can now describe single proteins, they remain unable to generate free-form descriptions of interactions between protein pairs. Moving beyond controlled vocabulary annotations, we propose to model PPI using free-text description, enabling richer expressiveness, improved interpretability, and better integration with literature knowledge base. We present PPI2Text, a multimodal LLM for free-form PPI captioning from amino acid sequences, that encodes each protein using ESM3 encoder, constructs a pair map from the two representations to capture interactions across all residue pairs, and autoregressively generates descriptions using a Qwen3 language decoder. We further introduce PaCo-RoPE, a coordinate-aligned positional encoding that aligns each axis of the pair grid with the residue positions of the corresponding protein. In addition, we release PPI2Text-Dataset, a 351k-pair corpus of free-form PPI descriptions aggregated from ten curated biological databases and further synthesized with Gemini under evidence-tiered prompting. PPI2Text consistently outperforms strong baselines across multiple ablation settings and evaluation protocols. It not only achieves higher scores on linguistic metrics against synthesized references, but also excels on factuality metrics, where an LLM-based judge evaluates outputs against raw biological evidence.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces PPI2Text, a multimodal LLM for generating free-form textual captions of protein-protein interactions (PPIs) from amino acid sequences. Proteins are encoded separately with ESM3, a pair-map is constructed to model all residue-pair interactions, and a Qwen3 decoder performs autoregressive generation. A coordinate-aligned positional encoding (PaCo-RoPE) is proposed to align the pair-grid axes with each protein's residue indices. The authors also release the PPI2Text-Dataset: 351k free-form PPI descriptions aggregated from ten biological databases and synthesized via Gemini under evidence-tiered prompting. The central claim is that PPI2Text outperforms strong baselines on linguistic metrics (versus the synthesized references) and on factuality metrics (LLM judge versus raw biological evidence) across multiple ablation settings and evaluation protocols.
Significance. If the reported gains can be shown to arise from the architectural contributions rather than artifacts of the shared synthetic data pipeline, the work would meaningfully extend PPI modeling beyond binary or multi-label classification toward richer, literature-aligned textual descriptions. The public release of the 351k-pair corpus and the introduction of PaCo-RoPE constitute concrete, reusable contributions that could support follow-on research in multimodal biological sequence modeling.
major comments (2)
- [Abstract and Dataset Construction] Abstract and dataset construction: linguistic metrics are computed against Gemini-synthesized references that were also used to create the training data; this creates a circularity risk in which reported improvements may reflect imitation of the synthesis style or knowledge rather than independent modeling gains. The factuality protocol (LLM judge versus raw evidence) does not eliminate the concern, as no held-out human-validated test set or non-synthesized reference corpus is described.
- [Evaluation Protocols] Evaluation protocols: the abstract asserts consistent outperformance 'across multiple ablation settings and evaluation protocols' yet supplies no numerical baseline scores, ablation deltas, error bars, or statistical tests. Without these details the load-bearing claim of superiority cannot be assessed.
minor comments (1)
- [Model Architecture] The description of how the pair-map is constructed from the two ESM3 embeddings would benefit from an explicit equation or small diagram showing the dimensionality and interaction aggregation step.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the concerns about evaluation circularity and missing quantitative details below, with proposed revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract and Dataset Construction] Abstract and dataset construction: linguistic metrics are computed against Gemini-synthesized references that were also used to create the training data; this creates a circularity risk in which reported improvements may reflect imitation of the synthesis style or knowledge rather than independent modeling gains. The factuality protocol (LLM judge versus raw evidence) does not eliminate the concern, as no held-out human-validated test set or non-synthesized reference corpus is described.
Authors: We acknowledge the circularity risk for linguistic metrics, as both training and reference captions originate from the same Gemini synthesis pipeline grounded in database evidence. The test split is held-out, and the factuality protocol evaluates generated text directly against raw database evidence via LLM judge, independent of synthesized references. This mitigates but does not fully eliminate the concern. We will add a small human-validated held-out subset (with inter-annotator agreement) to the revised manuscript and report results against both synthesized and human references. revision: partial
-
Referee: [Evaluation Protocols] Evaluation protocols: the abstract asserts consistent outperformance 'across multiple ablation settings and evaluation protocols' yet supplies no numerical baseline scores, ablation deltas, error bars, or statistical tests. Without these details the load-bearing claim of superiority cannot be assessed.
Authors: The full manuscript includes tables with exact baseline scores, ablation deltas, standard deviations across runs, and statistical tests (paired t-tests with p-values). We will revise the abstract to summarize key numerical results (e.g., BLEU/ROUGE gains and factuality improvements with significance) and ensure all figures/tables explicitly report error bars and tests. revision: yes
Circularity Check
No significant circularity in model architecture or evaluation chain
full rationale
The paper introduces a multimodal architecture (ESM3 encoder + pair-map + PaCo-RoPE + Qwen3 decoder) trained on a dataset aggregated from ten databases and synthesized via Gemini. Linguistic metrics are reported against held-out synthesized references and factuality metrics use an LLM judge against raw biological evidence. No mathematical derivations, equations, or first-principles results are presented that reduce to inputs by construction. No self-citations are invoked as load-bearing. Standard supervised training and split-based evaluation on explicitly constructed data does not match any enumerated circularity pattern.
Axiom & Free-Parameter Ledger
free parameters (1)
- PaCo-RoPE scaling factors
axioms (1)
- domain assumption ESM3 and Qwen3 encoders/decoders provide sufficiently rich representations for interaction semantics
invented entities (1)
-
PaCo-RoPE positional encoding
no independent evidence
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2505.11194 , year=
Prot2text-v2: Protein function prediction with multimodal contrastive alignment , author=. arXiv preprint arXiv:2505.11194 , year=
-
[2]
Prott3: Protein-to-text generation for text-based protein understanding , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[3]
Decoding the molecular language of proteins with evolla , author=. bioRxiv , pages=. 2025 , publisher=
work page 2025
-
[4]
Galactica: A Large Language Model for Science
Galactica: A large language model for science , author=. arXiv preprint arXiv:2211.09085 , year=
work page internal anchor Pith review arXiv
-
[5]
Protein-protein interaction prediction is achievable with large language models , author=. bioRxiv , pages=. 2023 , publisher=
work page 2023
-
[6]
arXiv preprint arXiv:2405.06649 , year=
ProLLM: protein chain-of-thoughts enhanced LLM for protein-protein interaction prediction , author=. arXiv preprint arXiv:2405.06649 , year=
-
[7]
Protllm: An interleaved protein-language llm with protein-as-word pre-training , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[8]
Large language and protein assistant for protein-protein interactions prediction , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[9]
RAGPPI: Retrieval-Augmented Generation Benchmark for Protein--Protein Interactions in Drug Discovery , author=. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[10]
Computational and Structural Biotechnology Journal , volume=
Protein--protein interaction prediction with deep learning: A comprehensive review , author=. Computational and Structural Biotechnology Journal , volume=. 2022 , publisher=
work page 2022
-
[11]
OmniPath: guidelines and gateway for literature-curated signaling pathway resources , author=. Nature methods , volume=. 2016 , publisher=
work page 2016
-
[12]
Nucleic acids research , volume=
ConsensusPathDB—a database for integrating human functional interaction networks , author=. Nucleic acids research , volume=. 2009 , publisher=
work page 2009
-
[13]
Interdisciplinary Sciences: Computational Life Sciences , volume=
A novel protein mapping method for predicting the protein interactions in COVID-19 disease by deep learning , author=. Interdisciplinary Sciences: Computational Life Sciences , volume=. 2021 , publisher=
work page 2021
-
[14]
Deng, Lei and Zhao, Jiaojiao and Zhang, Jingpu , booktitle=. Predict the Protein-protein Interaction between Virus and Host through Hybrid Deep Neural Network , year=
-
[15]
The Journal of Physical Chemistry B , volume=
Residue-frustration-based prediction of protein--protein interactions using machine learning , author=. The Journal of Physical Chemistry B , volume=. 2022 , publisher=
work page 2022
-
[16]
Briefings in bioinformatics , volume=
LSTM-PHV: prediction of human-virus protein--protein interactions by LSTM with word2vec , author=. Briefings in bioinformatics , volume=. 2021 , publisher=
work page 2021
-
[17]
RAPPPID: towards generalizable protein interaction prediction with AWD-LSTM twin networks , author=. Bioinformatics , volume=. 2022 , publisher=
work page 2022
-
[18]
Analytical Biochemistry , volume=
Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism , author=. Analytical Biochemistry , volume=. 2024 , publisher=
work page 2024
-
[19]
Analytical biochemistry , volume=
Improving protein-protein interaction prediction using protein language model and protein network features , author=. Analytical biochemistry , volume=. 2024 , publisher=
work page 2024
-
[20]
SDNN-PPI: self-attention with deep neural network effect on protein-protein interaction prediction , author=. BMC genomics , volume=. 2022 , publisher=
work page 2022
-
[21]
Computational and Structural Biotechnology Journal , volume=
AttentionEP: Predicting essential proteins via fusion of multiscale features by attention mechanisms , author=. Computational and Structural Biotechnology Journal , volume=. 2024 , publisher=
work page 2024
-
[22]
A transformer-based ensemble framework for the prediction of protein--protein interaction sites , author=. Research , volume=. 2023 , publisher=
work page 2023
-
[23]
Briefings in Bioinformatics , volume=
HN-PPISP: a hybrid network based on MLP-Mixer for protein--protein interaction site prediction , author=. Briefings in Bioinformatics , volume=. 2023 , publisher=
work page 2023
-
[24]
Nucleic acids research , volume=
The IntAct molecular interaction database in 2012 , author=. Nucleic acids research , volume=. 2012 , publisher=
work page 2012
-
[25]
PubMed: the bibliographic database , author=. The NCBI handbook , volume=
-
[26]
Nucleic acids research , volume=
UniProt: a worldwide hub of protein knowledge , author=. Nucleic acids research , volume=. 2019 , publisher=
work page 2019
-
[27]
Nucleic acids research , volume=
3did: a catalog of domain-based interactions of known three-dimensional structure , author=. Nucleic acids research , volume=. 2014 , publisher=
work page 2014
-
[28]
Nucleic acids research , volume=
Pfam: The protein families database in 2021 , author=. Nucleic acids research , volume=. 2021 , publisher=
work page 2021
-
[29]
Nucleic acids research , volume=
STRING: a database of predicted functional associations between proteins , author=. Nucleic acids research , volume=. 2003 , publisher=
work page 2003
-
[30]
Nucleic acids research , volume=
SIGNOR: a database of causal relationships between biological entities , author=. Nucleic acids research , volume=. 2016 , publisher=
work page 2016
-
[31]
Nucleic acids research , volume=
Reactome: a database of reactions, pathways and biological processes , author=. Nucleic acids research , volume=. 2010 , publisher=
work page 2010
-
[32]
Nucleic acids research , volume=
CORUM: the comprehensive resource of mammalian protein complexes—2019 , author=. Nucleic acids research , volume=. 2019 , publisher=
work page 2019
-
[33]
Nucleic acids research , volume=
The complex portal-an encyclopaedia of macromolecular complexes , author=. Nucleic acids research , volume=. 2015 , publisher=
work page 2015
-
[34]
Nature biotechnology , volume=
MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets , author=. Nature biotechnology , volume=. 2017 , publisher=
work page 2017
-
[35]
Gemini: A Family of Highly Capable Multimodal Models
Gemini: a family of highly capable multimodal models , author=. arXiv preprint arXiv:2312.11805 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[36]
Improving language understanding by generative pre-training , author=. 2018 , publisher=
work page 2018
-
[37]
Constitutional AI: Harmlessness from AI Feedback
Constitutional ai: Harmlessness from ai feedback , author=. arXiv preprint arXiv:2212.08073 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[38]
Trends in biochemical sciences , volume=
Recent advances in predicting and modeling protein--protein interactions , author=. Trends in biochemical sciences , volume=. 2023 , publisher=
work page 2023
-
[39]
Thomas Hayes and Roshan Rao and Halil Akin and Nicholas J. Sofroniew and Deniz Oktay and Zeming Lin and Robert Verkuil and Vincent Q. Tran and Jonathan Deaton and Marius Wiggert and Rohil Badkundri and Irhum Shafkat and Jun Gong and Alexander Derry and Raul S. Molina and Neil Thomas and Yousuf A. Khan and Chetan Mishra and Carolyn Kim and Liam J. Bartie a...
-
[40]
Highly accurate protein structure prediction with AlphaFold , author=. nature , volume=. 2021 , publisher=
work page 2021
-
[41]
Learning the language of protein-protein interactions , author=. Nature Communications , year=
-
[42]
Qwen3-vl technical report , author=. arXiv preprint arXiv:2511.21631 , year=
work page internal anchor Pith review Pith/arXiv arXiv
- [43]
-
[44]
Briefings in Bioinformatics , volume =
Bernett, Judith and Blumenthal, David B and List, Markus , title =. Briefings in Bioinformatics , volume =. 2024 , month =. doi:10.1093/bib/bbae076 , url =
-
[45]
International journal of proteomics , volume=
Protein-protein interaction detection: methods and analysis , author=. International journal of proteomics , volume=. 2014 , publisher=
work page 2014
-
[46]
Protein- protein interactions: interface structure, binding thermodynamics, and mutational analysis , author=. Chemical reviews , volume=. 1997 , publisher=
work page 1997
-
[47]
Causal reasoning on biological networks: interpreting transcriptional changes , author=. Bioinformatics , volume=. 2012 , publisher=
work page 2012
-
[48]
Protein complex prediction with AlphaFold-Multimer , author=. biorxiv , pages=. 2021 , publisher=
work page 2021
-
[49]
Qwen3 Technical Report , author=. arXiv preprint arXiv:2505.09388 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[50]
Accurate structure prediction of biomolecular interactions with AlphaFold 3 , author=. Nature , volume=. 2024 , publisher=
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.