Recognition: unknown
ProDock: From multi-target consensus docking into database-backed storage
Pith reviewed 2026-05-08 12:52 UTC · model grok-4.3
The pith
ProDock turns fragmented docking runs into explicit many-to-many campaigns stored in SQLite for direct comparison and reuse.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By representing studies as explicit many-to-many campaigns linking multiple receptors, ligands, and docking backends, ProDock converts fragmented engine-specific outputs into structured analytical results that are easier to compare, reuse, and audit. The toolkit integrates receptor and ligand preparation, reference-ligand-based box generation, batch docking, pose crawling, score extraction, interaction profiling, and database insertion within a single project-local workflow that accepts inputs from PDB identifiers to SMILES strings.
What carries the argument
Many-to-many campaigns that link receptors, ligands, and docking backends while routing all steps through preprocessing, execution, postprocessing, and SQLite storage.
If this is right
- Docking results across different receptors and ligands become directly queryable without manual parsing of separate output files.
- Provenance tracking is preserved through the full workflow so that any result can be traced back to its exact receptor preparation and docking settings.
- Comparative analysis of multiple docking engines on the same campaign is supported by uniform postprocessing and storage.
- Studies can be resumed or extended by adding new ligands or receptors to an existing campaign without rebuilding the entire workflow.
Where Pith is reading between the lines
- The campaign model could scale to large virtual screening collections by treating entire ligand libraries as single campaign inputs.
- Database queries could be extended to feed directly into downstream machine-learning rescoring or binding-affinity prediction pipelines.
- Audit logs stored in the same database might support regulatory or publication requirements for documenting docking protocols.
Load-bearing premise
That the integrated preprocessing, batch execution, and database insertion layers will function without additional user scripting for the supported docking backends and input formats.
What would settle it
A test in which a user supplies a supported receptor file and ligand set to ProDock and finds that custom scripts are still required to complete preprocessing or database insertion for any of the integrated backends.
Figures
read the original abstract
Protein--ligand docking is widely used in structure-based discovery, but routine studies often fail at the workflow level rather than at the scoring level. Receptor cleaning, ligand preparation, file conversion, box definition, run organization, and downstream parsing are frequently handled by fragmented scripts, which reduces reproducibility, obscures provenance, and complicates comparative analysis across targets, ligands, and docking settings. We present ProDock, an open-source Python toolkit for reproducible protein--ligand docking and postprocessing. ProDock organizes application-oriented docking into four connected layers: receptor and ligand preprocessing, provenance-aware docking execution, postprocessing of poses and interaction fingerprints, and SQLite-backed storage for later querying. The package supports inputs ranging from PDB identifiers and local receptor files to \texttt{SMILES} strings and prepared ligand directories, and integrates receptor preparation, ligand preparation, reference-ligand-based box generation, campaign serialization, batch docking, pose crawling, score extraction, interaction profiling, and database insertion within a consistent project-local workflow. By representing studies as explicit many-to-many campaigns linking multiple receptors, ligands, and docking backends, ProDock converts fragmented engine-specific outputs into structured analytical results that are easier to compare, reuse, and audit. ProDock is implemented in Python and released under an open-source license at https://github.com/Medicine-Artificial-Intelligence/ProDock. Documentation is available at https://prodock.readthedocs.io/en/latest.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ProDock, an open-source Python toolkit for protein-ligand docking that organizes studies as explicit many-to-many campaigns linking multiple receptors, ligands, and docking backends. It structures the workflow into four layers—receptor/ligand preprocessing, provenance-aware execution, postprocessing of poses and interaction fingerprints, and SQLite-backed storage—while supporting inputs such as PDB identifiers, local files, and SMILES strings. The central claim is that this design converts fragmented, engine-specific outputs into structured, queryable, and auditable results to improve reproducibility, comparison, and reuse across targets and settings.
Significance. If the implementation matches the described architecture and functions without additional user scripting, ProDock could meaningfully address workflow fragmentation in structure-based discovery by enabling consistent multi-target campaigns and database-driven analysis. The explicit campaign model and provenance tracking represent a practical advance over ad-hoc scripts, with potential to support larger-scale comparative studies; the open-source release and linked documentation are positive factors for adoption.
major comments (2)
- [Abstract and workflow layers description] The manuscript asserts that the four layers integrate preprocessing, batch execution, postprocessing, and database insertion within a consistent project-local workflow without additional user scripting for supported backends and formats, but provides no usage examples, code snippets, test cases, or validation data to demonstrate this integration (see the abstract and the description of the four connected layers). This leaves the weakest assumption untested and undermines assessment of the claimed reproducibility and provenance benefits.
- [Description of campaign model and storage] The central claim that ProDock converts fragmented outputs into structured analytical results easier to compare, reuse, and audit rests on the campaign serialization and SQLite schema, yet the manuscript contains no empirical illustration (e.g., a multi-receptor query example or audit trail) showing these advantages in practice.
minor comments (1)
- A workflow diagram or pseudocode example illustrating how a campaign object is serialized and passed through the four layers would improve clarity of the API and data flow.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and positive evaluation of ProDock's potential impact. We have carefully considered the major comments and will revise the manuscript to include the requested demonstrations of workflow integration and empirical examples.
read point-by-point responses
-
Referee: [Abstract and workflow layers description] The manuscript asserts that the four layers integrate preprocessing, batch execution, postprocessing, and database insertion within a consistent project-local workflow without additional user scripting for supported backends and formats, but provides no usage examples, code snippets, test cases, or validation data to demonstrate this integration (see the abstract and the description of the four connected layers). This leaves the weakest assumption untested and undermines assessment of the claimed reproducibility and provenance benefits.
Authors: We agree with this observation. The current version of the manuscript describes the architecture but does not include concrete usage examples or code to illustrate the end-to-end workflow. In the revised manuscript, we will add a dedicated section with code snippets demonstrating a full campaign setup, execution, and database storage for supported backends, including a basic validation example that shows consistent results across runs. This addition will allow readers to assess the integration and reproducibility claims directly. revision: yes
-
Referee: [Description of campaign model and storage] The central claim that ProDock converts fragmented outputs into structured analytical results easier to compare, reuse, and audit rests on the campaign serialization and SQLite schema, yet the manuscript contains no empirical illustration (e.g., a multi-receptor query example or audit trail) showing these advantages in practice.
Authors: We acknowledge that an empirical illustration is necessary to substantiate the advantages of the campaign model and storage layer. We will incorporate into the revised manuscript an example of querying the SQLite database for results across multiple receptors, including extraction of an audit trail via provenance information. This will demonstrate in practice how the structured storage facilitates comparison and reuse. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is a software description of the ProDock Python toolkit for organizing protein-ligand docking studies as many-to-many campaigns across receptors, ligands, and backends, with layers for preprocessing, provenance-aware execution, postprocessing, and SQLite storage. No equations, derivations, predictions, fitted parameters, or mathematical claims appear anywhere in the text. The central claim that this structure converts fragmented outputs into queryable results follows directly from the presented API, campaign serialization, and schema without any reduction to self-definitional inputs, self-citations, or imported ansatzes. All described functionality is presented as an independent implementation detail rather than a derived result.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bender and et al
Benjamin J. Bender and et al. A practical guide to large-scale docking.Nature Protocols, 2021
2021
-
[2]
Rdkit: Open-source cheminformatics
RDKit: Open-source cheminformatics. Rdkit: Open-source cheminformatics. https: //www.rdkit.org, 2025
2025
-
[3]
O’Boyle, Michael Banck, Craig A
Noel M. O’Boyle, Michael Banck, Craig A. James, Chris Morley, Tim Vandermeersch, and Geoffrey R. Hutchison. Open babel: An open chemical toolbox.Journal of Cheminformatics, 3:33, 2011
2011
-
[4]
Koes, Matthew P
David R. Koes, Matthew P. Baumgartner, and Carlos J. Camacho. Lessons learned in empirical scoring withsmina from the csar 2011 benchmarking exercise.Journal of Chemical Information and Modeling, 53(8):1893–1904, 2013
2011
-
[5]
Fast, accurate, and reliable molecular docking with quickvina 2.Bioinformatics, 31(13):2214–2216, 2015
Amr Alhossary, Syaifie Handoko, Yuguang Mu, and Chee Keong Kwoh. Fast, accurate, and reliable molecular docking with quickvina 2.Bioinformatics, 31(13):2214–2216, 2015
2015
-
[6]
Gnina 1.0: molecular docking with deep learning.Journal of Cheminformatics, 13(1):43, 2021
Andrew T McNutt, Paul Francoeur, Rishal Aggarwal, Tomohide Masuda, Rocco Meli, Matthew Ragoza, Jocelyn Sunseri, and David Ryan Koes. Gnina 1.0: molecular docking with deep learning.Journal of Cheminformatics, 13(1):43, 2021
2021
-
[7]
Bauer, Eva Nittinger, Kathryn A
Jeff Guo, Jon Paul Janet, Matthias R. Bauer, Eva Nittinger, Kathryn A. Giblin, Kostas Papadopoulos, Alexey Voronov, Atanas Patronov, Ola Engkvist, and Christian Margre- itter. Dockstream: a docking wrapper to enhance de novo molecular design.Journal of Cheminformatics, 13(1):89, 2021
2021
-
[8]
Easydock: customizable and scalable docking tool.Journal of Cheminformatics, 15(1):102, 2023
Guzel Minibaeva, Aleksandra Ivanova, and Pavel Polishchuk. Easydock: customizable and scalable docking tool.Journal of Cheminformatics, 15(1):102, 2023
2023
-
[9]
Graff and Connor W
David E. Graff and Connor W. Coley. pyscreener: A python wrapper for computational docking software.Journal of Open Source Software, 7(71):3950, 2022
2022
-
[10]
Chodera, Robert T
Peter Eastman, Jason Swails, John D. Chodera, Robert T. McGibbon, Yutong Zhao, Kyle A. Beauchamp, Lee-Ping Wang, Andrew C. Simmonett, Matthew P. Harrigan, Chaya D. Stern, Rafal P. Wiewiora, Brooks R. Brooks, and Vijay S. Pande. Openmm 7: Rapid development of high performance algorithms for molecular dynamics.PLOS Computational Biology, 13(7):e1005659, 2017
2017
-
[11]
Openmm 8: Molecular dynamics simulation with machine learning potentials.Journal of Physical Chemistry B, 128(1):109–116, 2023
Peter Eastman and et al. Openmm 8: Molecular dynamics simulation with machine learning potentials.Journal of Physical Chemistry B, 128(1):109–116, 2023
2023
-
[12]
Meeko: Molecule parameterization and software interoperability for docking and beyond.Journal of Chemical Information and Modeling, 65(24), 2025
Diogo Santos-Martins, Yiran He, Jérôme Eberhardt, Parnika Sharma, et al. Meeko: Molecule parameterization and software interoperability for docking and beyond.Journal of Chemical Information and Modeling, 65(24), 2025. S8
2025
-
[13]
Prolif: a library to encode molecular interactions as fingerprints.Journal of Cheminformatics, 13(1):72, 2021
C’edric Bouysset and S’ebastien Fiorucci. Prolif: a library to encode molecular interactions as fingerprints.Journal of Cheminformatics, 13(1):72, 2021
2021
-
[14]
Tillack, and Stefano Forli
Jerome Eberhardt, Diogo Santos-Martins, Andreas F. Tillack, and Stefano Forli. Autodock vina 1.2.0: New docking methods, expanded force field, and python bindings.Journal of Chemical Information and Modeling, 61(8):3891–3898, 2021. S9
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.