Edge-specific signal propagation on mature chromophore-region 3D mechanism graphs for fluorescent protein quantum-yield prediction

Steven Aw Yoong Kit; Swee Keong Yeap; Yuchen Xiong

arxiv: 2605.06644 · v2 · submitted 2026-05-07 · 💻 cs.LG

Edge-specific signal propagation on mature chromophore-region 3D mechanism graphs for fluorescent protein quantum-yield prediction

Yuchen Xiong , Swee Keong Yeap , Steven Aw Yoong Kit This is my paper

Pith reviewed 2026-05-12 04:34 UTC · model grok-4.3

classification 💻 cs.LG

keywords fluorescent proteinsquantum yield3D mechanism graphschromophoresignal propagationstructure-based predictionmachine learningprotein engineering

0 comments

The pith

A chromophore-centred 3D graph model captures local signal propagation to predict fluorescent protein quantum yield better than sequence models, with largest gains for remote homologs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Fluorescent protein quantum yield is controlled by the mature chromophore and its immediate three-dimensional microenvironment rather than sequence identity. The paper introduces a method that converts PDB structures into typed 3D residue graphs, registers them to the mature chromophore state, partitions the chromophore into phenolate, bridge and imidazolinone regions, and propagates signals along contact channels. The resulting 52 non-identity features feed band-specific regression and outperform protein language models on a 531-protein set. The advantage is clearest for proteins sharing less than 50 percent sequence similarity, and the same features recover band-specific mechanisms such as aromatic packing or charge balance. This structural approach supplies intrinsic interpretability because each feature directly encodes a channel, seed signal and target region.

Core claim

The paper claims that edge-specific signal propagation on mature chromophore-region 3D mechanism graphs, obtained by converting PDB structures to typed residue graphs, registering to mature-CRO state, partitioning into chromophore regions, and transforming by channel-signal-region propagation, supplies 52 non-identity features that enable band-specific ExtraTrees regression to reach R = 0.772 and MAE = 0.131 on random cross-validation, exceed sequence baselines, rank first in bright screening, and maintain superiority in the remote-homology bucket while recovering band-specific mechanisms through stable selected features.

What carries the argument

The chromophore-centred mechanism graph that encodes typed 3D residue contacts as channels carrying seed signals to partitioned target regions in the mature chromophore.

If this is right

The largest performance margin appears in the remote-homology bucket below 50 percent sequence similarity.
The model achieves the highest Bright P@5 of 0.704 among tested methods.
Stable features recover distinct mechanisms for GFP-like, Red and Far-red bands without post-hoc analysis.
Removing identity shortcuts leaves 52 features that still support the reported accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph construction could be tested on other photophysical properties controlled by local microenvironments, such as photostability or emission wavelength shifts.
If the channel-based propagation model is accurate, targeted mutation of high-importance contact residues should produce predictable changes in measured quantum yield.
Adding explicit dynamics or solvent terms to the 3D graphs might further improve accuracy for flexible chromophore environments.
The method offers a route to structure-guided design that reduces dependence on evolutionary sequence patterns.

Load-bearing premise

Converting PDB structures into typed 3D residue graphs registered to a mature chromophore state and partitioned into chromophore regions produces features that reflect genuine physical signal propagation without introducing artifacts or selection bias.

What would settle it

Randomizing the chromophore-region partition labels while leaving all contact channels and node features intact and then observing that regression performance collapses to the level of sequence-only baselines would show that the region-specific propagation step is not responsible for the reported gains.

Figures

Figures reproduced from arXiv: 2605.06644 by Steven Aw Yoong Kit, Swee Keong Yeap, Yuchen Xiong.

**Figure 1.** Figure 1: Clean overview of the proposed algorithm. A PDB structure is converted into a typed 3D residue graph, registered to a mature chromophore state, partitioned into functional CRO regions, transformed by edge-specific signal propagation, filtered to remove identity shortcuts and routed to a band-specific predictor. Mature-state chromophore registration. The immature precursor is represented as 𝑐 (0) 𝑖 = triad… view at source ↗

**Figure 2.** Figure 2: Prediction strength across random and homology-controlled regimes. The mechanism graph model is best under random CV and strongest in the most remote-homology bucket. labelled neighbours. In the < 50% similarity bucket, the proposed method produced the best bright precision at 𝐾 = 10, 15, 20 and 25, and the best dark precision at all tested 𝐾 values from 5 to 25 ( view at source ↗

**Figure 2.** Figure 2: ). Bucket membership was determined by each test protein’s maximum 5-mer Jaccard similarity to the fixed training set, not by random fold assignment. In the 0.70–0.85 bucket, the method reached 𝑅 = 0.756, outperforming Band mean (0.643), ESM-C (0.672) and SaProt (0.701). In the 0.50–0.70 bucket, it reached 𝑅 = 0.824, essentially matching Band mean (0.830) while clearly exceeding ESM-C (0.626) and SaProt (… view at source ↗

**Figure 3.** Figure 3: Top-K retrieval frontiers for bright and dark proteins. Left: random CV. Right: remote-homology bucket (< 50% similarity). The mechanism graph model provides the strongest overall screening behaviour in the remote-homology setting. matches crystallographic and photophysical work on far-red proteins showing that chromophore isomerization, planarity and steric restriction of torsional relaxation are central … view at source ↗

**Figure 3.** Figure 3: Top-K retrieval frontiers for bright and dark proteins. Left: random CV. Right: the most remote fixed-split bucket, where each held-out protein has maximum 5-mer Jaccard similarity 𝑚𝑘 < 0.50 to the fixed training set. The embedded panel label “Remote homology (< 50%)” is a compact label for this 𝐽5 < 0.50 bucket and does not denote pairwise sequence identity. together with clamp asymmetry. This agrees with… view at source ↗

**Figure 4.** Figure 4: Stable selected features across seeds and folds. Bubble area is proportional to recurrence among top-10 selected features across five seeds and five folds. Colours denote feature families, not propagation channels. The activated propagation channels are steric and hydrophobic; feature families describe the physicochemical seed signal or clamp descriptor carried by those channels. The label clamp asymmetry … view at source ↗

**Figure 4.** Figure 4: Stable selected mechanism descriptors across seeds and folds. The three columns correspond to the GFP-like, Red and Far-red band-specific models. Each bubble denotes a channel–signal–region descriptor or a local clamp descriptor that repeatedly appeared among the top-10 selected features across five seeds and five folds; bubble area is proportional to recurrence. Colours denote feature families, not propag… view at source ↗

read the original abstract

Fluorescent protein quantum yield (QY) is governed by the mature chromophore and its three-dimensional microenvironment rather than sequence identity alone. Protein language models and emission-band averages capture global trends, but do not model how local physical signals act on specific chromophore regions. We present a chromophore-centred mechanism graph algorithm for QY prediction. Each PDB structure is converted into a typed 3D residue graph, registered to a mature-CRO state, partitioned into phenolate, bridge and imidazolinone regions, and transformed by channel-signal-region propagation. The representation contains 121 enrichment features; after removing identity shortcuts, 52 non-identity features are used for band-specific ExtraTrees regression. Because each feature encodes a contact channel, seed signal and target CRO region, interpretation is intrinsic rather than post hoc. On a 531-protein benchmark, the method achieved the best random-CV performance among model-based baselines (R = 0.772 +/- 0.008, MAE = 0.131 +/- 0.002), exceeding Band mean (R = 0.632), ESM-C (R = 0.734) and SaProt (R = 0.731), and ranked first in bright screening (Bright P@5 = 0.704). Under homology control, the advantage was clearest in the remote bucket (<50% similarity; R = 0.697 versus 0.633, 0.575 and 0.408), with the strongest overall bright/dark Top-K screening. Stable selected features recovered band-specific mechanisms: aromatic packing and clamp asymmetry in GFP-like proteins, charge/clamp balance in Red proteins, and flexibility-risk/bulky-contact features in Far-red proteins. Source code, feature tables and evaluation scripts are available from the first author upon request. Contact: yuchenak05@gmail.com

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The 3D chromophore graph method beats sequence baselines on QY prediction but its remote-homology claim rests on a weak sequence-only control.

read the letter

The paper introduces a chromophore-centered 3D mechanism graph that converts PDB structures into typed residue graphs, registers them to a mature state, partitions the chromophore into phenolate/bridge/imidazolinone regions, and applies edge-specific channel-signal propagation to produce 52 non-identity features for band-specific ExtraTrees regression. On the 531-protein set it reaches R=0.772 random CV and R=0.697 in the <50% sequence-similarity bucket, ahead of ESM-C, SaProt, and band means, with the best bright-screening P@5. The feature construction ties directly to contacts and regions, which gives built-in interpretability and recovers plausible band-specific patterns such as aromatic packing or charge balance. Code and feature tables are offered on request, which helps reproducibility. Those are the concrete advances over prior sequence or global models. The soft spot is the homology control. Sequence identity below 50% does not guarantee structural dissimilarity; conserved folds and chromophore microenvironments can still produce matching 3D contact patterns, so the remote-bucket gain may partly reflect structural homology rather than the propagation mechanism itself. Feature selection to 52 items and the ExtraTrees step are also data-driven, and without public data or full derivation details it is hard to confirm no leakage in graph registration or region assignment. The numbers are reported clearly, but the validation does not yet isolate the claimed physical-signal advantage as cleanly as the abstract suggests. This is for protein engineers or computational biologists who already work with fluorescent-protein structures and want a local 3D alternative to language models. A reader focused on graph representations of active sites will get usable ideas even if the controls need work. It deserves a serious referee because the performance edge is measurable and the construction is distinct, though the paper would benefit from tighter structural-homology splits and public artifacts before final acceptance.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a chromophore-centred mechanism graph algorithm for predicting fluorescent protein quantum yields from PDB structures. Each structure is converted to a typed 3D residue graph, registered to a mature-CRO state, partitioned into phenolate/bridge/imidazolinone regions, and transformed via channel-signal-region propagation to produce 121 enrichment features; after removing identity shortcuts this yields 52 non-identity features for band-specific ExtraTrees regression. On a 531-protein benchmark the method reports the highest random-CV performance (R = 0.772 +/- 0.008, MAE = 0.131 +/- 0.002) among model-based baselines, outperforming Band mean, ESM-C and SaProt, with the largest advantage in the sequence-remote bucket (<50% similarity, R = 0.697) and strongest bright/dark Top-K screening; selected features recover band-specific mechanisms.

Significance. If the reported gains are shown to arise from genuine physical signal propagation rather than structural-homology leakage, the work would establish that explicit 3D contact-channel graphs can outperform sequence language models for QY prediction, especially for remote homologs. The intrinsic interpretability of each feature (contact channel + seed signal + target CRO region) and the provision of code, feature tables and evaluation scripts upon request are clear strengths that support reproducibility and mechanistic insight for fluorescent-protein engineering.

major comments (2)

[Homology control results] Homology-control section: the remote bucket is defined exclusively by sequence identity <50%. Because the model inputs are 3D residue graphs whose edges encode contacts and channels, structural similarity metrics (TM-score, Dali Z-score or equivalent) must be reported for proteins in this bucket. Sequence <50% frequently permits conserved folds and similar chromophore microenvironments, which could allow the 52 features to match training examples and inflate the reported R = 0.697 advantage over baselines.
[Methods / Feature derivation] Feature-selection paragraph: the reduction from 121 to 52 non-identity features by 'removing identity shortcuts' is described without stating whether the selection criteria and threshold were fixed a priori, performed inside each CV fold, or applied to the full labelled set. If the latter, this data-driven step risks circularity that undermines the cross-validation claims.

minor comments (2)

[Abstract] Abstract and results: clarify whether 'Band mean' is treated as a model-based baseline or a simple statistical baseline when claiming 'best ... among model-based baselines'.
[Data and code availability] Reproducibility statement: while code and scripts are offered upon request, public deposition (e.g., GitHub/Zenodo) would remove the barrier to independent verification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive and detailed review. We appreciate the focus on strengthening the homology controls and clarifying the feature derivation process. We address each major comment below and will incorporate the suggested revisions to improve the manuscript.

read point-by-point responses

Referee: [Homology control results] Homology-control section: the remote bucket is defined exclusively by sequence identity <50%. Because the model inputs are 3D residue graphs whose edges encode contacts and channels, structural similarity metrics (TM-score, Dali Z-score or equivalent) must be reported for proteins in this bucket. Sequence <50% frequently permits conserved folds and similar chromophore microenvironments, which could allow the 52 features to match training examples and inflate the reported R = 0.697 advantage over baselines.

Authors: We agree that structural similarity metrics are necessary to rule out fold-level leakage in the remote-homology bucket. In the revised manuscript we will compute TM-scores (via TM-align) between every remote-bucket test protein and its nearest training-set neighbor (by sequence identity). We will report the distribution of these TM-scores together with performance stratified by TM-score thresholds (e.g., TM-score < 0.5). This addition will allow readers to assess whether the reported R = 0.697 advantage persists for structurally divergent proteins and will directly address the concern that conserved chromophore microenvironments may be driving the results. revision: yes
Referee: [Methods / Feature derivation] Feature-selection paragraph: the reduction from 121 to 52 non-identity features by 'removing identity shortcuts' is described without stating whether the selection criteria and threshold were fixed a priori, performed inside each CV fold, or applied to the full labelled set. If the latter, this data-driven step risks circularity that undermines the cross-validation claims.

Authors: The identity-shortcut removal follows a fixed, a priori rule defined during mechanism-graph construction: any feature whose propagation path connects a residue to itself without involving a chromophore-region channel is excluded. This deterministic criterion was applied independently inside each cross-validation training fold using only training data. We will revise the Methods section to state this procedure explicitly, including the precise definition of identity shortcuts and confirmation that selection never used test-set information. This clarification will remove any ambiguity regarding circularity. revision: yes

Circularity Check

0 steps flagged

No significant circularity; feature engineering and CV evaluation remain independent of target labels

full rationale

The paper constructs typed 3D residue graphs from PDB structures, registers them to a mature chromophore state, partitions regions, and computes 121 enrichment features (later reduced to 52 non-identity features) that encode contact channels, seed signals, and target regions. These fixed structural descriptors are then fed to ExtraTrees regression. Performance is measured via random CV and sequence-homology buckets on a 531-protein benchmark, with no evidence that the regression outputs or selected features are algebraically equivalent to the QY labels by definition. No load-bearing self-citations, uniqueness theorems, or fitted-input-as-prediction steps appear in the derivation; the central claim rests on empirical generalization from pre-specified graph-derived features rather than tautological reduction to the training targets.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 2 invented entities

The method rests on standard structural biology inputs and introduces new algorithmic representations for signal propagation; no major new physical entities are postulated beyond the graph construction itself.

free parameters (2)

Selection threshold for 52 non-identity features
Chosen after removing identity shortcuts; exact selection criterion and whether it was tuned on the benchmark are not stated in the abstract.
ExtraTrees hyperparameters
Model parameters for regression not specified; typical for tree ensembles but still data-influenced.

axioms (2)

domain assumption PDB structures provide accurate 3D coordinates for residue contacts around the chromophore.
Invoked when converting structures into typed 3D residue graphs.
domain assumption Partitioning the chromophore into phenolate, bridge, and imidazolinone regions is physically meaningful for signal propagation.
Central to the mechanism-graph construction and feature encoding.

invented entities (2)

Mature chromophore-region 3D mechanism graph no independent evidence
purpose: To represent local physical signals acting on specific chromophore sub-regions.
New representation introduced by the algorithm.
Channel-signal-region propagation no independent evidence
purpose: To generate enrichment features that encode contact channels, seed signals, and target CRO regions.
Core step for creating the 121 (then 52) features.

pith-pipeline@v0.9.0 · 5655 in / 1889 out tokens · 84976 ms · 2026-05-12T04:34:09.614840+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

(1998) The green fluorescent protein.Annu

Tsien,R.Y. (1998) The green fluorescent protein.Annu. Rev. Biochem., 67, 509–544

work page 1998
[2]

and Tsien,R.Y

Shaner,N.C., Steinbach,P.A. and Tsien,R.Y. (2005) A guide to choosing fluorescent proteins.Nat. Methods,2, 905–909

work page 2005
[3]

and Verkhusha,V.V

Piatkevich,K.D. and Verkhusha,V.V. (2010) Advances in engineering of fluorescent proteins and photoactivatable proteins with red emission. Curr. Opin. Chem. Biol.,14, 23–29

work page 2010
[4]

Natl Acad

Rives,A.et al.(2021) Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.Proc. Natl Acad. Sci. USA,118, e2016239118

work page 2021
[5]

and Wehenkel,L

Geurts,P., Ernst,D. and Wehenkel,L. (2006) Extremely randomized trees. Mach. Learn.,63, 3–42

work page 2006
[6]

Jumper,J.et al.(2021) Highly accurate protein structure prediction with AlphaFold.Nature,596, 583–589

work page 2021
[7]

and Prasher,D.C

Chalfie,M., Tu,Y., Euskirchen,G., Ward,W.W. and Prasher,D.C. (1994) Green fluorescent protein as a marker for gene expression.Science,263, 802–805

work page 1994
[8]

and Remington,S.J

Ormö,M., Cubitt,A.B., Kallio,K., Gross,L.A., Tsien,R.Y. and Remington,S.J. (1996) Crystal structure of the Aequorea victoria green fluorescent protein.Science,273, 1392–1395

work page 1996
[9]

and Ranganathan,R

Wall,M.A., Socolich,M. and Ranganathan,R. (2000) The structural basis for red fluorescence in the tetrameric GFP homolog DsRed.Nat. Struct. Biol.,7, 1133–1138

work page 2000
[10]

and da Silva,J.C.G.E

Ferreira,J.R.M., Rodrigues,J.V., Silva,A.M.S. and da Silva,J.C.G.E. (2022) Locking the GFP fluorophore to enhance its emission intensity. Molecules,28, 234

work page 2022
[11]

and Pigault,C

Follenius-Wund,A., Bourotte,M., Schmitt,M., Iyice,F., Lami,H., Bourguignon,J.-J., Haiech,J. and Pigault,C. (2003) Fluorescent derivatives of the GFP chromophore give a new insight into the GFP fluorescence process.Biophys. J.,85, 1839–1850

work page 2003
[12]

and Rhee,Y.M

Park,J.W. and Rhee,Y.M. (2016) Electric field keeps chromophore planar and produces high yield fluorescence in green fluorescent protein.J. Am. Chem. Soc.,138, 13619–13629

work page 2016
[13]

and Hughes,T.E

Drobizhev,M., Molina,R.S., Callis,P.R., Scott,J.N., Lambert,G.G., Salih,A., Shaner,N.C. and Hughes,T.E. (2021) Local electric field controls fluorescence quantum yield of red and far-red fluorescent proteins.Front. Mol. Biosci.,8, 633217

work page 2021
[14]

and Gadella,T.W.J

Bindels,D.S., Haarbosch,L., van Weeren,L., Postma,M., Wiese,K.E., Mastop,M., Aumonier,S., Gotthard,G., Royant,A., Hink,M.A. and Gadella,T.W.J. (2017) mScarlet: a bright monomeric red fluorescent protein for cellular imaging.Nat. Methods,14, 53–56

work page 2017
[15]

and Chica,R.A

Legault,S., Fraser-Halberg,D.P., McAnelly,R.L., Eason,M.G., Thompson,M.C. and Chica,R.A. (2022) Generation of bright monomeric red fluorescent proteins via computational design of enhanced chromophore packing.Chem. Sci.,13, 1408–1418

work page 2022
[16]

and Pletnev,V

Pletnev,S., Shcherbo,D., Chudakov,D.M., Pletneva,N., Merzlyak,E.M., Wlodawer,A., Dauter,Z. and Pletnev,V. (2008) A crystallographic study of bright far-red fluorescent protein mKate reveals pH-induced cis–trans isomerization of the chromophore.J. Biol. Chem.,283, 28980–28987

work page 2008
[17]

and Rossjohn,J

Petersen,J., Wilmann,P.G., Beddoe,T., Oakley,A.J., Devenish,R.J., Prescott,M. and Rossjohn,J. (2003) The 2.0-Å crystal structure of eqFP611, a far red fluorescent protein from the sea anemoneEntacmaea quadricolor.J. Biol. Chem.,278, 44626–44631

work page 2003
[18]

(2019) FPbase: a community-editable fluorescent protein database.Nat

Lambert,T.J. (2019) FPbase: a community-editable fluorescent protein database.Nat. Methods,16, 277–278

work page 2019
[19]

and Bourne,P.E

Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank.Nucleic Acids Res.,28, 235–242

work page 2000
[20]

(2025) OpenFold3-preview: a fully open-source biomolecular structure prediction model based on AlphaFold3.Zenodo

The OpenFold3 Team. (2025) OpenFold3-preview: a fully open-source biomolecular structure prediction model based on AlphaFold3.Zenodo. https://doi.org/10.5281/zenodo.19001000

work page doi:10.5281/zenodo.19001000 2025
[21]

(2024) ESM Cambrian: revealing the mysteries of proteins with unsupervised learning

EvolutionaryScale Team. (2024) ESM Cambrian: revealing the mysteries of proteins with unsupervised learning

work page 2024
[22]

and Yuan,F

Su,J., Han,C., Zhou,Y., Shan,J., Zhou,X. and Yuan,F. (2024) SaProt: protein language modeling with structure-aware vocabulary. International Conference on Learning Representations

work page 2024
[23]

Pedregosa,F.et al.(2011) Scikit-learn: machine learning in Python.J. Mach. Learn. Res.,12, 2825–2830

work page 2011
[24]

and Fan,Z

Xie,Z., Zhang,P., Lin,Q., Zhang,Q. and Fan,Z. (2025) EM-PLA: environment-aware heterogeneous graph-based multimodal protein–ligand binding affinity prediction.Bioinformatics,41, btaf298

work page 2025
[25]

and Zhang,Q

Yang,J., Li,Z., Fan,X., Cheng,Y., Chu,Q. and Zhang,Q. (2022) Deep learning identifies explainable reasoning paths of mechanism of action for drug repurposing from multilayer biological network.Brief. Bioinform.,23, bbac469

work page 2022
[26]

and Zhang,Q

Yang,J., Xu,Z., Wu,W.K.K., Chu,Q. and Zhang,Q. (2021) GraphSynergy: a network-inspired deep learning model for anticancer drug combination prediction.J. Am. Med. Inform. Assoc.,28, 2336–2345. 10 A. Implementation details of channel–signal–region features A.1 Activated channels and reserved channels The implementation separates candidate physical annotatio...

work page 2021

[1] [1]

(1998) The green fluorescent protein.Annu

Tsien,R.Y. (1998) The green fluorescent protein.Annu. Rev. Biochem., 67, 509–544

work page 1998

[2] [2]

and Tsien,R.Y

Shaner,N.C., Steinbach,P.A. and Tsien,R.Y. (2005) A guide to choosing fluorescent proteins.Nat. Methods,2, 905–909

work page 2005

[3] [3]

and Verkhusha,V.V

Piatkevich,K.D. and Verkhusha,V.V. (2010) Advances in engineering of fluorescent proteins and photoactivatable proteins with red emission. Curr. Opin. Chem. Biol.,14, 23–29

work page 2010

[4] [4]

Natl Acad

Rives,A.et al.(2021) Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.Proc. Natl Acad. Sci. USA,118, e2016239118

work page 2021

[5] [5]

and Wehenkel,L

Geurts,P., Ernst,D. and Wehenkel,L. (2006) Extremely randomized trees. Mach. Learn.,63, 3–42

work page 2006

[6] [6]

Jumper,J.et al.(2021) Highly accurate protein structure prediction with AlphaFold.Nature,596, 583–589

work page 2021

[7] [7]

and Prasher,D.C

Chalfie,M., Tu,Y., Euskirchen,G., Ward,W.W. and Prasher,D.C. (1994) Green fluorescent protein as a marker for gene expression.Science,263, 802–805

work page 1994

[8] [8]

and Remington,S.J

Ormö,M., Cubitt,A.B., Kallio,K., Gross,L.A., Tsien,R.Y. and Remington,S.J. (1996) Crystal structure of the Aequorea victoria green fluorescent protein.Science,273, 1392–1395

work page 1996

[9] [9]

and Ranganathan,R

Wall,M.A., Socolich,M. and Ranganathan,R. (2000) The structural basis for red fluorescence in the tetrameric GFP homolog DsRed.Nat. Struct. Biol.,7, 1133–1138

work page 2000

[10] [10]

and da Silva,J.C.G.E

Ferreira,J.R.M., Rodrigues,J.V., Silva,A.M.S. and da Silva,J.C.G.E. (2022) Locking the GFP fluorophore to enhance its emission intensity. Molecules,28, 234

work page 2022

[11] [11]

and Pigault,C

Follenius-Wund,A., Bourotte,M., Schmitt,M., Iyice,F., Lami,H., Bourguignon,J.-J., Haiech,J. and Pigault,C. (2003) Fluorescent derivatives of the GFP chromophore give a new insight into the GFP fluorescence process.Biophys. J.,85, 1839–1850

work page 2003

[12] [12]

and Rhee,Y.M

Park,J.W. and Rhee,Y.M. (2016) Electric field keeps chromophore planar and produces high yield fluorescence in green fluorescent protein.J. Am. Chem. Soc.,138, 13619–13629

work page 2016

[13] [13]

and Hughes,T.E

Drobizhev,M., Molina,R.S., Callis,P.R., Scott,J.N., Lambert,G.G., Salih,A., Shaner,N.C. and Hughes,T.E. (2021) Local electric field controls fluorescence quantum yield of red and far-red fluorescent proteins.Front. Mol. Biosci.,8, 633217

work page 2021

[14] [14]

and Gadella,T.W.J

Bindels,D.S., Haarbosch,L., van Weeren,L., Postma,M., Wiese,K.E., Mastop,M., Aumonier,S., Gotthard,G., Royant,A., Hink,M.A. and Gadella,T.W.J. (2017) mScarlet: a bright monomeric red fluorescent protein for cellular imaging.Nat. Methods,14, 53–56

work page 2017

[15] [15]

and Chica,R.A

Legault,S., Fraser-Halberg,D.P., McAnelly,R.L., Eason,M.G., Thompson,M.C. and Chica,R.A. (2022) Generation of bright monomeric red fluorescent proteins via computational design of enhanced chromophore packing.Chem. Sci.,13, 1408–1418

work page 2022

[16] [16]

and Pletnev,V

Pletnev,S., Shcherbo,D., Chudakov,D.M., Pletneva,N., Merzlyak,E.M., Wlodawer,A., Dauter,Z. and Pletnev,V. (2008) A crystallographic study of bright far-red fluorescent protein mKate reveals pH-induced cis–trans isomerization of the chromophore.J. Biol. Chem.,283, 28980–28987

work page 2008

[17] [17]

and Rossjohn,J

Petersen,J., Wilmann,P.G., Beddoe,T., Oakley,A.J., Devenish,R.J., Prescott,M. and Rossjohn,J. (2003) The 2.0-Å crystal structure of eqFP611, a far red fluorescent protein from the sea anemoneEntacmaea quadricolor.J. Biol. Chem.,278, 44626–44631

work page 2003

[18] [18]

(2019) FPbase: a community-editable fluorescent protein database.Nat

Lambert,T.J. (2019) FPbase: a community-editable fluorescent protein database.Nat. Methods,16, 277–278

work page 2019

[19] [19]

and Bourne,P.E

Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank.Nucleic Acids Res.,28, 235–242

work page 2000

[20] [20]

(2025) OpenFold3-preview: a fully open-source biomolecular structure prediction model based on AlphaFold3.Zenodo

The OpenFold3 Team. (2025) OpenFold3-preview: a fully open-source biomolecular structure prediction model based on AlphaFold3.Zenodo. https://doi.org/10.5281/zenodo.19001000

work page doi:10.5281/zenodo.19001000 2025

[21] [21]

(2024) ESM Cambrian: revealing the mysteries of proteins with unsupervised learning

EvolutionaryScale Team. (2024) ESM Cambrian: revealing the mysteries of proteins with unsupervised learning

work page 2024

[22] [22]

and Yuan,F

Su,J., Han,C., Zhou,Y., Shan,J., Zhou,X. and Yuan,F. (2024) SaProt: protein language modeling with structure-aware vocabulary. International Conference on Learning Representations

work page 2024

[23] [23]

Pedregosa,F.et al.(2011) Scikit-learn: machine learning in Python.J. Mach. Learn. Res.,12, 2825–2830

work page 2011

[24] [24]

and Fan,Z

Xie,Z., Zhang,P., Lin,Q., Zhang,Q. and Fan,Z. (2025) EM-PLA: environment-aware heterogeneous graph-based multimodal protein–ligand binding affinity prediction.Bioinformatics,41, btaf298

work page 2025

[25] [25]

and Zhang,Q

Yang,J., Li,Z., Fan,X., Cheng,Y., Chu,Q. and Zhang,Q. (2022) Deep learning identifies explainable reasoning paths of mechanism of action for drug repurposing from multilayer biological network.Brief. Bioinform.,23, bbac469

work page 2022

[26] [26]

and Zhang,Q

Yang,J., Xu,Z., Wu,W.K.K., Chu,Q. and Zhang,Q. (2021) GraphSynergy: a network-inspired deep learning model for anticancer drug combination prediction.J. Am. Med. Inform. Assoc.,28, 2336–2345. 10 A. Implementation details of channel–signal–region features A.1 Activated channels and reserved channels The implementation separates candidate physical annotatio...

work page 2021