Introduces first protein sequence dataset for nine Bangladeshi fish species and a deployable hybrid CNN-Transformer model achieving 79.8% accuracy with strong efficiency advantages over ProtBERT.
ProteinBERT: a universal deep-learning model of protein sequence and function.Bioinformatics, 38(8): 2102–2110, February 2022
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
Masked-position MLM plus JEPA latent prediction outperforms MLM-only pretraining on 10-11 of 16 downstream tasks for 35M-150M protein models while JEPA alone fails.
Bucket Masking improves protein fitness prediction by up to 14% over random masking by preferentially masking structurally coupled residue groups on four downstream tasks.
EvoFlows uses evolutionary edit flows to generate controllable mutations on protein templates, producing variants consistent with natural families but farther from the starting sequence than prior methods.
PLASMA applies regularized optimal transport with Sinkhorn iterations to produce fast, interpretable residue-level alignments and similarity scores between protein structures.
citing papers explorer
-
Protein-Based Fish Species Identification: Dataset, Models, and Insights from Native Bangladeshi Fish
Introduces first protein sequence dataset for nine Bangladeshi fish species and a deployable hybrid CNN-Transformer model achieving 79.8% accuracy with strong efficiency advantages over ProtBERT.