pith. machine review for the scientific record. sign in

arxiv: 2604.20241 · v1 · submitted 2026-04-22 · 💻 cs.CL · physics.comp-ph

Recognition: unknown

Construction of a Battery Research Knowledge Graph using a Global Open Catalog

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:42 UTC · model grok-4.3

classification 💻 cs.CL physics.comp-ph
keywords battery researchknowledge graphOpenAlexauthor vectorsKeyBERTChatGPTRDFWikidata
0
0 comments X

The pith

A pipeline creates weighted research descriptor vectors for battery authors using OpenAlex and AI-extracted keyphrases to build an interoperable knowledge graph.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Tracking expertise in the fast-growing, interdisciplinary field of battery research is challenging due to its scale and cross-institutional nature. The paper outlines a method to build an author-centric knowledge graph from the OpenAlex bibliographic catalog covering 189,581 papers. Each author gets a vector of research descriptors that blends broad concepts from OpenAlex with precise keyphrases identified by KeyBERT and ChatGPT, with weights accounting for authorship role and paper date. These vectors power calculations of who has similar expertise, identification of research communities, and a web search tool. Finally, the entire graph is exported as RDF data linked to Wikidata entries so it can connect with other open datasets and apply to additional topics.

Core claim

We present a pipeline that constructs an author-centric knowledge graph of battery research from OpenAlex by deriving for each author a weighted vector of research descriptors. The vector combines coarse OpenAlex concepts with fine-grained keyphrases extracted via KeyBERT using ChatGPT as backend, with weights based on origin, authorship position, and recency. Applied to 189,581 works, the vectors enable author similarity computation, community detection, and browser-based exploratory search. The graph is serialized in RDF and linked to Wikidata identifiers for interoperability and extensibility.

What carries the argument

Weighted research descriptors vector combining OpenAlex concepts and KeyBERT/ChatGPT keyphrases, adjusted by authorship position and recency.

If this is right

  • Author-author similarity computation becomes possible from the vectors.
  • Community detection can identify groups of researchers with overlapping expertise.
  • Exploratory search is supported through a browser-based interface.
  • The knowledge graph can be serialized in RDF format.
  • It links to Wikidata for interoperability with external linked open data sources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The semantic grounding of similarities may identify potential collaborators missed by citation-based methods.
  • The approach is extensible to other research domains beyond battery science.
  • Integration with Wikidata opens possibilities for combining expertise data with institutional or funding information.

Load-bearing premise

The weighted combination of OpenAlex concepts and KeyBERT/ChatGPT-extracted keyphrases adjusted by authorship position and recency produces vectors that meaningfully represent an author's research expertise.

What would settle it

An experiment where battery domain experts rate the accuracy of suggested similar authors or community groupings derived from the vectors.

read the original abstract

Battery research is a rapidly growing and highly interdisciplinary field, making it increasingly difficult to track relevant expertise and identify potential collaborators across institutional boundaries. In this work, we present a pipeline for constructing an author-centric knowledge graph of battery research built on OpenAlex, a large-scale open bibliographic catalogue. For each author, we derive a weighted research descriptors vector that combines coarse-grained OpenAlex concepts with fine-grained keyphrases extracted from titles and abstracts using KeyBERT with ChatGPT (gpt-3.5-turbo) as the backend model, selected after evaluating multiple alternatives. Vector components are weighted by research descriptor origin, authorship position, and temporal recency. The framework is applied to a corpus of 189,581 battery-related works. The resulting vectors support author-author similarity computation, community detection, and exploratory search through a browser-based interface. The knowledge graph is then serialized in RDF and linked to Wikidata identifiers, making it interoperable with external linked open data sources and extensible beyond the battery domain. Unlike prior author-centric analyses confined to institutional repositories, our approach operates at cross-institutional scale and grounds similarity in domain semantics rather than citation or co-authorship structure alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper outlines a pipeline for building an author-centric knowledge graph in battery research using data from the OpenAlex catalog. For 189,581 papers, it creates weighted vectors combining OpenAlex concepts with keyphrases from KeyBERT and gpt-3.5-turbo, adjusted for authorship position and recency. These vectors are used for similarity, community detection, and search, with the graph serialized in RDF and linked to Wikidata identifiers for broader interoperability.

Significance. The construction of a large-scale, open, RDF-linked knowledge graph for battery research expertise represents a practical contribution to the field, particularly given the use of a global open catalog (OpenAlex) and the interoperability with Wikidata. If the vector representations are shown to be effective, this could facilitate cross-institutional collaboration in a rapidly evolving domain. The paper explicitly mentions evaluating multiple keyphrase extraction alternatives, which is a strength.

major comments (1)
  1. [Abstract] The central claim that the weighted combination of OpenAlex concepts and KeyBERT/ChatGPT-extracted keyphrases produces vectors that meaningfully represent research expertise and support valid author-author similarity, community detection, and exploratory search (Abstract) lacks supporting evidence: there are no ablation studies on the weighting scheme (authorship position and recency), no quantitative metrics for similarity accuracy (e.g., against known co-authorship overlaps), no comparison to citation-based baselines, and no error analysis or validation of the community detection results. This evaluation gap is load-bearing for the paper's assertion of utility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments and for recognizing the practical contribution of our work. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] The central claim that the weighted combination of OpenAlex concepts and KeyBERT/ChatGPT-extracted keyphrases produces vectors that meaningfully represent research expertise and support valid author-author similarity, community detection, and exploratory search (Abstract) lacks supporting evidence: there are no ablation studies on the weighting scheme (authorship position and recency), no quantitative metrics for similarity accuracy (e.g., against known co-authorship overlaps), no comparison to citation-based baselines, and no error analysis or validation of the community detection results. This evaluation gap is load-bearing for the paper's assertion of utility.

    Authors: We agree that the current manuscript does not include quantitative evaluations such as ablation studies on the weighting scheme, metrics for similarity accuracy against co-authorship data, comparisons to citation-based baselines, or validation of the community detection results. The paper focuses on the construction pipeline, the selection of keyphrase extraction methods after evaluating alternatives, and the resulting KG with an interface for exploratory use. While we believe the semantic grounding provides value beyond citation structures, we acknowledge that without these validations, the claims of utility for similarity and community detection are not fully substantiated. In the revised version, we will add an evaluation section including: (1) ablation studies varying the weights for authorship position and recency, (2) quantitative similarity evaluation using metrics like precision at k against known co-author overlaps or held-out papers, (3) comparison to simple citation-based author similarity, and (4) validation of detected communities against external knowledge of battery research groups or manual inspection with error analysis. We will update the abstract to reflect the added evaluations. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline uses external OpenAlex data and off-the-shelf NLP tools with explicit weighting rules

full rationale

The manuscript describes an author-vector construction pipeline that ingests bibliographic records from the external OpenAlex catalog, extracts keyphrases via KeyBERT and ChatGPT (chosen after comparing alternatives), and applies deterministic weighting by descriptor origin, authorship position, and recency. The resulting vectors are then used for similarity, community detection, and RDF serialization linked to Wikidata. No equation, parameter fit, or claimed result reduces by construction to a self-referential input; the central assertions rest on the external data source and standard tools rather than any closed loop or self-citation load-bearing step. This is the normal non-circular case for a data-processing pipeline paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim depends on the reliability of OpenAlex coverage for battery literature and the semantic accuracy of keyphrases from KeyBERT/ChatGPT; these are domain assumptions rather than derived results.

axioms (2)
  • domain assumption OpenAlex provides sufficiently complete coverage of battery-related publications
    The corpus of 189,581 works is taken as the basis for the graph without reported completeness checks.
  • domain assumption KeyBERT combined with ChatGPT produces keyphrases that accurately capture fine-grained research descriptors
    Selected after evaluating alternatives, but selection criteria and accuracy metrics are not detailed in the abstract.

pith-pipeline@v0.9.0 · 5524 in / 1353 out tokens · 36128 ms · 2026-05-10T00:42:11.453424+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 26 canonical work pages · 3 internal anchors

  1. [1]

    : The 2021 battery technology roadmap

    Ma, J., Li, Y., Grundish, N.S., Goodenough, J.B., Chen, Y., Guo, L., Peng, Z., Qi, X., Yang, F., Qie, L., et al. : The 2021 battery technology roadmap. Journal of Physics D: Applied Physics 54(18), 183001 (2021)

  2. [2]

    BKGA00001

    Clark, S., Battaglia, C., Castelli, I.E., Flores, E., Gold, L., Punckt, C., Stier, S., Veit, P.: Semantic resources for managing knowledge in battery research. ChemSusChem 18(16), 202500458 13 A u t h o r N o d e – K n o w l e d g e G r a p h S c h e m a a f f i l i a t i o n O R C I D n o _ p u b l i c a t i o n s _ f i . . . t o t a l _ n o _ p u b l i ...

  3. [3]

    : A materials terminology knowledge graph automatically 14 constructed from text corpus

    Zhang, Y., Chen, F., Liu, Z., Ju, Y., Cui, D., Zhu, J., Jiang, X., Guo, X., He, J., Zhang, L., et al. : A materials terminology knowledge graph automatically 14 constructed from text corpus. Scientific Data 11(1), 600 (2024)

  4. [4]

    OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts

    Priem, J., Piwowar, H., Orr, R.: Openalex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833 (2022)

  5. [5]

    Zen- odo (2020)

    Grootendorst, M.: KeyBERT: Minimal keyword extraction with BERT. Zen- odo (2020). https://doi.org/10.5281/zenodo.4461265 . https://doi.org/10.5281/ zenodo.4461265

  6. [6]

    : Rdf primer

    Manola, F., Miller, E., McBride, B., et al. : Rdf primer. W3C recommendation 10(1-107), 6 (2004)

  7. [7]

    Hayashi and K

    Foppiano, L., Castro, P., Suarez, P., Terashima, K., Takano, Y., Ishii, M.: Automatic extraction of materials and properties from superconductors scien- tific literature. Science and Technology of Advanced Materials Methods 3 (2023) https://doi.org/10.1080/27660400.2022.2153633

  8. [8]

    npj Com- putational Materials 6(1), 18 (2020) https://doi.org/10.1038/s41524-020-0287-8

    Court, C.J., Cole, J.M.: Magnetic and superconducting phase diagrams and tran- sition temperatures predicted using text mining and machine learning. npj Com- putational Materials 6(1), 18 (2020) https://doi.org/10.1038/s41524-020-0287-8

  9. [9]

    Scientific Data 6(1), 203 (2019) https://doi.org/10.1038/s41597-019-0224-1

    Kononova, O., Huo, H., He, T., Rong, Z., Botari, T., Sun, W., Tshitoyan, V., Ceder, G.: Text-mined dataset of inorganic materials synthesis recipes. Scientific Data 6(1), 203 (2019) https://doi.org/10.1038/s41597-019-0224-1

  10. [10]

    Scientific Data 7(1), 260 (2020)

    Huang, S., Cole, J.M.: A database of battery materials auto-generated using chemdataextractor. Scientific Data 7(1), 260 (2020)

  11. [11]

    Dieb, T.M., Yoshioka, M., Hara, S.: Automatic Information Extraction of Experi- ments from Nanodevices Development Papers, pp. 42–47. IEEE, Fukuoka, Japan (2012). https://doi.org/10.1109/iiai-aai.2012.18

  12. [12]

    Dieb, T.M., Yoshioka, M.: Extraction of chemical and drug named entities by ensemble learning using chemical ner tools based on different extraction guidelines. Trans. Mach. Learn. Data Min (2015)

  13. [13]

    Journal of Information Processing 24(3), 554–564 (2016) https://doi.org/10.2197/ipsjjip.24

    Dieb, T.M., Yoshioka, M., Hara, S.: Nadev: An annotated corpus to support information extraction from research papers on nanocrystal devices. Journal of Information Processing 24(3), 554–564 (2016) https://doi.org/10.2197/ipsjjip.24. 554

  14. [14]

    Science and Technology of Advanced Materials: Methods 1(1), 123–133 (2021) https://doi.org/10.1080/27660400.2021.1943172

    Foppiano, L., Dieb, S., Suzuki, A., Baptista De Castro, P., Iwasaki, S., Uzuki, A., Echevarria, M.G.E., Meng, Y., Terashima, K., Romary, L., Takano, Y., Ishii, M.: Supermat: construction of a linked annotated dataset from superconductors- related publications. Science and Technology of Advanced Materials: Methods 1(1), 34–44 (2021) https://doi.org/10.1080...

  15. [15]

    Charnine, M., Tishchenko, L. A. A. amd Kochiev: Visualization of research trend- ing topic prediction: Intelligent method for data analysis. Proceedings of the 31th International Conference on Computer Graphics and Vision. Volume 2 (2021) https://doi.org/10.20948/graphicon-2021-3027-1028-1037

  16. [16]

    Scientometrics 88, 653–661 (2011) https://doi.org/10

    Jamali, H.R., Nikzad, M.: Article title type and its relation with the number of downloads and citations. Scientometrics 88, 653–661 (2011) https://doi.org/10. 1007/s11192-011-0412-z

  17. [17]

    Scientometrics 121, 1583– 1598 (2019) https://doi.org/10.1007/s11192-019-03241-6

    Katsurai, M., Ono, S.: Trendnets: Mapping emerging research trends from dynamic co-word networks via sparse representation. Scientometrics 121, 1583– 1598 (2019) https://doi.org/10.1007/s11192-019-03241-6

  18. [18]

    Materials Today: Proceedings 45, 5591–5596 (2021) https:// doi.org/10.1016/j.matpr.2021.02.313

    Rani, S., Kumar, M.: Topic modeling and its applications in materials science and engineering. Materials Today: Proceedings 45, 5591–5596 (2021) https:// doi.org/10.1016/j.matpr.2021.02.313

  19. [19]

    Pattern Recognition and Computer Vision, 375–387 (2018) https://doi.org/10.1007/978-3-030-03338-5_ 32

    Law, J., Zhuo, H.H., He, J., Rong, E.: Ltsg: Latent topical skip-gram for mutu- ally improving topic model and vector representations. Pattern Recognition and Computer Vision, 375–387 (2018) https://doi.org/10.1007/978-3-030-03338-5_ 32

  20. [20]

    IEEE access 11, 144778–144798 (2023)

    Nadim, M., Akopian, D., Matamoros, A.: A comparative assessment of unsuper- vised keyword extraction tools. IEEE access 11, 144778–144798 (2023)

  21. [21]

    arXiv preprint arXiv:2404.02330 (2024)

    Chataut, S., Do, T., Gurung, B.D.S., Aryal, S., Khanal, A., Lushbough, C., Gnimpieba, E.: Comparative study of domain driven terms extraction using large language models. arXiv preprint arXiv:2404.02330 (2024)

  22. [22]

    In: 2025 IEEE International Confer- ence on Information Reuse and Integration and Data Science (IRI), pp

    Jia, X., Roller, C., Wang, C.: Llm-rank: An unsupervised keyword extraction method using local large language models. In: 2025 IEEE International Confer- ence on Information Reuse and Integration and Data Science (IRI), pp. 73–78 (2025). IEEE

  23. [23]

    : Llm-take: Theme-aware keyword extraction using large language models

    Maragheh, R.Y., Fang, C., Irugu, C.C., Parikh, P., Cho, J., Xu, J., Sukumar, S., Patel, M., Korpeoglu, E., Kumar, S., et al. : Llm-take: Theme-aware keyword extraction using large language models. In: 2023 IEEE International Conference on Big Data (BigData), pp. 4318–4324 (2023). IEEE

  24. [24]

    Proceedings of the 27th ACM International Conference on Infor- mation and Knowledge Management (2018) https://doi.org/10.1145/3269206

    Xu, J., Shen, S., Li, D., Fu, Y.: A network-embedding based method for author disambiguation. Proceedings of the 27th ACM International Conference on Infor- mation and Knowledge Management (2018) https://doi.org/10.1145/3269206. 3269272

  25. [25]

    Advanced Energy Materi- als 11(16), 2003580 (2021) https://doi.org/10.1002/aenm.202003580 https://advanced.onlinelibrary.wiley.com/doi/pdf/10.1002/aenm.202003580

    Nie, Z., Liu, Y., Yang, L., Li, S., Pan, F.: Construction and appli- cation of materials knowledge graph based on author disambiguation: 16 Revisiting the evolution of lifepo4. Advanced Energy Materi- als 11(16), 2003580 (2021) https://doi.org/10.1002/aenm.202003580 https://advanced.onlinelibrary.wiley.com/doi/pdf/10.1002/aenm.202003580

  26. [26]

    Research and Advanced Technology for Digital Libraries, 300–311 (2017) https: //doi.org/10.1007/978-3-319-67008-9_24

    Müller, M.: Semantic author name disambiguation with word embeddings. Research and Advanced Technology for Digital Libraries, 300–311 (2017) https: //doi.org/10.1007/978-3-319-67008-9_24

  27. [27]

    Scientometrics 128(9), 5051–5078 (2023)

    Schäfermeier, B., Hirth, J., Hanika, T.: Research topic flows in co-authorship networks. Scientometrics 128(9), 5051–5078 (2023)

  28. [28]

    Quantitative Science Studies 2, 1511–1528 (2021) https://doi.org/10.1162/qss_a_00170

    Ghosal, T., Tiwary, P., Patton, R.M., Stahl, C.C.: Towards establishing a research lineage via identification of significant citations. Quantitative Science Studies 2, 1511–1528 (2021) https://doi.org/10.1162/qss_a_00170

  29. [29]

    Science and Technology of Advanced Materials: Methods 1(1), 123–133 (2021) https://doi.org/10.1080/27660400.2021.1943172

    Dieb, S., Amano, K., Tanabe, K., Sato, D., Ishii, M., Tanifuji, M.: Cre- ating research topic map for nims samurai database using natural lan- guage processing approach. Science and Technology of Advanced Materi- als: Methods 1(1), 2–11 (2021) https://doi.org/10.1080/27660400.2021.1899426 https://doi.org/10.1080/27660400.2021.1899426

  30. [30]

    https://battery2030.eu

    BATTERY 2030+ Consortium: BATTERY 2030+: Large-Scale Research Initia- tive. https://battery2030.eu. Accessed: 2026-04-06 (2025)

  31. [31]

    In: Payne, T.R., Presutti, V., Qi, G., Poveda-Villalón, M., Stoilos, G., Hollink, L., Kaoudi, Z., Cheng, G., Li, J

    Färber, M., Lamprecht, D., Krause, J., Aung, L., Haase, P.: Semopenalex: The scientific landscape in 26 billion rdf triples. In: Payne, T.R., Presutti, V., Qi, G., Poveda-Villalón, M., Stoilos, G., Hollink, L., Kaoudi, Z., Cheng, G., Li, J. (eds.) The Semantic Web – ISWC 2023, pp. 94–112. Springer, Cham (2023)

  32. [32]

    In: Proceedings of the 24th International Conference on World Wide Web, pp

    Sinha, A., Shen, Z., Song, Y., Ma, H., Eide, D., Hsu, B.-J.p., Wang, K.: An overview of microsoft academic service (mas) and applications. In: Proceedings of the 24th International Conference on World Wide Web, pp. 243–246. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2740908.2742839

  33. [33]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2019). https://arxiv. org/abs/1810.04805

  34. [34]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, ??? (2019). https://arxiv.org/abs/1908.10084

  35. [35]

    Journal of chemical information and modeling 62(24), 6365–6377 (2022) 17

    Huang, S., Cole, J.M.: Batterybert: A pretrained language model for battery database enhancement. Journal of chemical information and modeling 62(24), 6365–6377 (2022) 17

  36. [36]

    Grobid. GitHub. https://github.com/grobidOrg/grobid

  37. [37]

    https://github.com/amueller/wordcloud

    Mueller, A.C.: Wordcloud (2023). https://github.com/amueller/wordcloud

  38. [38]

    The Journal of Physical Chemistry C 127(12), 5689–5701 (2023) https://doi.org/10.1021/acs.jpcc.2c09040

    Liu, J., Kaneko, T., Ock, J.-Y., Kondou, S., Ueno, K., Dokko, K., Sodeyama, K., Watanabe, M.: Distinct differences in li-deposition/dissolution reversibility in sulfolane-based electrolytes depending on li-salt species and their solvation structures. The Journal of Physical Chemistry C 127(12), 5689–5701 (2023) https://doi.org/10.1021/acs.jpcc.2c09040

  39. [39]

    Electrochemistry 92(2) (2024) https://doi.org/23-0008710

    Tomoaki, K., Yui, F., Toshihiko, M., Hiroaki, K., Keitaro, S.: Ether molecule decomposition on mgm2o4 (m = mn, fe, co) spinel surface: A first- principles study. Electrochemistry 92(2) (2024) https://doi.org/23-0008710. 5796/electrochemistry.23-00087

  40. [40]

    The Journal of Physical Chemistry C 118(26), 14091–14097 (2014) https://doi.org/10.1021/jp501178n

    Sodeyama, K., Yamada, Y., Aikawa, K., Yamada, A., Tateyama, Y.: Sacrificial anion reduction mechanism for electrochemical stability improvement in highly concentrated li-salt electrolyte. The Journal of Physical Chemistry C 118(26), 14091–14097 (2014) https://doi.org/10.1021/jp501178n

  41. [41]

    Physi- cal Chemistry Chemical Physics 20(35), 22585–22591 (2018) https://doi.org/10

    Sodeyama, K., Igarashi, Y., Nakayama, T., Tateyama, Y., Okada, M.: Liquid electrolyte informatics using an exhaustive search with linear regression. Physi- cal Chemistry Chemical Physics 20(35), 22585–22591 (2018) https://doi.org/10. 1039/c7cp08280k

  42. [42]

    https://www.w3.org/RDF/

    W3C: Resource Description Framework (RDF). https://www.w3.org/RDF/. Accessed: 2026-01-22

  43. [43]

    Semantic Web 9(5), 677–705 (2018) https://doi.org/10.3233/ SW-180294 https://journals.sagepub.com/doi/pdf/10.3233/SW-180294

    Ellefi, M.B., Bellahsene, Z., Breslin, J.G., Demidova, E., Dietze, S., Szymański, J., Todorov, K.: Rdf dataset profiling – a survey of features, methods, vocabularies and applications. Semantic Web 9(5), 677–705 (2018) https://doi.org/10.3233/ SW-180294 https://journals.sagepub.com/doi/pdf/10.3233/SW-180294

  44. [44]

    https://librarycarpentry.github.io/lc-wikidata/01-introduction.html

    Library Carpentry: What is Wikidata? Introduction to Wikidata for Librarians. https://librarycarpentry.github.io/lc-wikidata/01-introduction.html. Accessed: 2026-01-22

  45. [45]

    an” identifier to “the

    Van Veen, T.: Wikidata: from “an” identifier to “the” identifier. Information technology and libraries 38(2), 72–81 (2019)

  46. [46]

    Burgstaller-Muehlbacher, S., Waagmeester, A., Mitraka, E., Turner, J., Putman, T., Leong, J., Naik, C., Pavlidis, P., Schriml, L., Good, B.M., Su, A.I.: Wikidata as a semantic framework for the gene wiki initiative. Database 2016, 015 (2016) https://doi.org/ 10.1093/database/baw015 https://academic.oup.com/database/article- pdf/doi/10.1093/database/baw015...