Multi-modal Protein Knowledge Graph Construction and Applications
read the original abstract
Existing data-centric methods for protein science generally cannot sufficiently capture and leverage biology knowledge, which may be crucial for many protein tasks. To facilitate research in this field, we create ProteinKG65, a knowledge graph for protein science. Using gene ontology and Uniprot knowledge base as a basis, we transform and integrate various kinds of knowledge with aligned descriptions and protein sequences, respectively, to GO terms and protein entities. ProteinKG65 is mainly dedicated to providing a specialized protein knowledge graph, bringing the knowledge of Gene Ontology to protein function and structure prediction. We also illustrate the potential applications of ProteinKG65 with a prototype. Our dataset can be downloaded at https://w3id.org/proteinkg65.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design
AgentPLM equips pre-trained protein language models with reasoning-augmented decoding using tools such as ESMFold and FoldX plus contrastive agent policy optimization, reporting state-of-the-art gains on de novo enzym...
-
OptimusKG: Unifying biomedical knowledge in a modern multimodal graph
OptimusKG is a labeled property graph unifying biomedical knowledge from structured sources into 190,531 nodes of 10 types and 21.8 million edges of 26 types, with 70% of sampled edges supported by literature evidence...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.