arxiv: 2604.21828 · v1 · submitted 2026-04-23 · 🧬 q-bio.QM

Recognition: unknown

ProDock: From multi-target consensus docking into database-backed storage

Tieu-Long Phan , Lai Hoang Son Le , Thanh-An Pham , Nhu-Ngoc Nguyen Song , Tuyet-Minh Phan , Tuyen Ngoc Truong

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:52 UTC · model grok-4.3

classification 🧬 q-bio.QM

keywords protein-ligand dockingworkflow reproducibilitymulti-target campaignsSQLite storagedocking postprocessingopen-source toolkitstructure-based discovery

0 comments

The pith

ProDock turns fragmented docking runs into explicit many-to-many campaigns stored in SQLite for direct comparison and reuse.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ProDock as a Python toolkit that organizes protein-ligand docking into four connected layers: preprocessing of receptors and ligands, provenance-aware execution across backends, postprocessing of poses and fingerprints, and insertion into a local SQLite database. It treats each study as a campaign that explicitly links multiple receptors, ligands, and docking engines rather than leaving these connections to scattered scripts. A reader would care because routine docking work often breaks at the level of file handling and result aggregation, which reduces reproducibility and makes it hard to compare results across targets or settings. The central mechanism is the campaign representation that converts engine-specific outputs into queryable structured data.

Core claim

By representing studies as explicit many-to-many campaigns linking multiple receptors, ligands, and docking backends, ProDock converts fragmented engine-specific outputs into structured analytical results that are easier to compare, reuse, and audit. The toolkit integrates receptor and ligand preparation, reference-ligand-based box generation, batch docking, pose crawling, score extraction, interaction profiling, and database insertion within a single project-local workflow that accepts inputs from PDB identifiers to SMILES strings.

What carries the argument

Many-to-many campaigns that link receptors, ligands, and docking backends while routing all steps through preprocessing, execution, postprocessing, and SQLite storage.

If this is right

Docking results across different receptors and ligands become directly queryable without manual parsing of separate output files.
Provenance tracking is preserved through the full workflow so that any result can be traced back to its exact receptor preparation and docking settings.
Comparative analysis of multiple docking engines on the same campaign is supported by uniform postprocessing and storage.
Studies can be resumed or extended by adding new ligands or receptors to an existing campaign without rebuilding the entire workflow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The campaign model could scale to large virtual screening collections by treating entire ligand libraries as single campaign inputs.
Database queries could be extended to feed directly into downstream machine-learning rescoring or binding-affinity prediction pipelines.
Audit logs stored in the same database might support regulatory or publication requirements for documenting docking protocols.

Load-bearing premise

That the integrated preprocessing, batch execution, and database insertion layers will function without additional user scripting for the supported docking backends and input formats.

What would settle it

A test in which a user supplies a supported receptor file and ligand set to ProDock and finds that custom scripts are still required to complete preprocessing or database insertion for any of the integrated backends.

Figures

Figures reproduced from arXiv: 2604.21828 by Lai Hoang Son Le, Nhu-Ngoc Nguyen Song, Thanh-An Pham, Tieu-Long Phan, Tuyen Ngoc Truong, Tuyet-Minh Phan.

**Figure 1.** Figure 1: Overview of the ProDock workflow. The package organizes docking into four connected stages: preprocessing of receptor and ligand inputs, docking execution, postprocessing of poses and interaction fingerprints, and SQLite result repository for downstream querying and comparison. OpenMM [11] or Open Babel [3]. The prepared receptor is then converted into docking-compatible PDBQT using Open Babel or Meeko [12… view at source ↗

read the original abstract

Protein--ligand docking is widely used in structure-based discovery, but routine studies often fail at the workflow level rather than at the scoring level. Receptor cleaning, ligand preparation, file conversion, box definition, run organization, and downstream parsing are frequently handled by fragmented scripts, which reduces reproducibility, obscures provenance, and complicates comparative analysis across targets, ligands, and docking settings. We present ProDock, an open-source Python toolkit for reproducible protein--ligand docking and postprocessing. ProDock organizes application-oriented docking into four connected layers: receptor and ligand preprocessing, provenance-aware docking execution, postprocessing of poses and interaction fingerprints, and SQLite-backed storage for later querying. The package supports inputs ranging from PDB identifiers and local receptor files to \texttt{SMILES} strings and prepared ligand directories, and integrates receptor preparation, ligand preparation, reference-ligand-based box generation, campaign serialization, batch docking, pose crawling, score extraction, interaction profiling, and database insertion within a consistent project-local workflow. By representing studies as explicit many-to-many campaigns linking multiple receptors, ligands, and docking backends, ProDock converts fragmented engine-specific outputs into structured analytical results that are easier to compare, reuse, and audit. ProDock is implemented in Python and released under an open-source license at https://github.com/Medicine-Artificial-Intelligence/ProDock. Documentation is available at https://prodock.readthedocs.io/en/latest.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ProDock wraps docking steps into campaigns with SQLite storage for better organization, but offers no comparisons or demos to prove it beats existing scripts or tools.

read the letter

ProDock is a Python toolkit that models docking studies as explicit many-to-many campaigns linking receptors, ligands, and backends, then routes everything through preprocessing, provenance-aware runs, pose and fingerprint postprocessing, and SQLite storage. The central idea is to replace scattered scripts with a single project-local workflow that keeps results queryable and auditable later. That structure is the main practical contribution here. It handles common inputs like PDB files, SMILES strings, and prepared directories without forcing users to write their own converters for each step. The four-layer breakdown lines up with the usual pain points in routine docking campaigns, and the open-source release with documentation makes it straightforward to try. The campaign serialization and batch execution features could genuinely cut down on the reproducibility headaches that come from mixing different docking engines by hand. The description stays consistent with no internal contradictions in how the layers connect or how provenance is tracked. On the soft side, the paper stays at the level of architecture and API description. There are no usage examples, timing benchmarks, or side-by-side comparisons against KNIME, Galaxy, or even simple custom pipelines, so the claimed gains in ease of comparison and auditability remain untested in the text. It is also unclear how much extra setup the database layer requires for the supported backends or whether real users would still need custom code for edge cases. This is aimed at computational groups that run repeated multi-target docking and want less ad-hoc file management. A reader already building similar tooling would get value from the campaign schema and layer choices as a concrete reference point. It deserves peer review because the workflow problem is real and the proposed model is coherent, even if the full paper needs added examples and validation data to strengthen the case.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces ProDock, an open-source Python toolkit for protein-ligand docking that organizes studies as explicit many-to-many campaigns linking multiple receptors, ligands, and docking backends. It structures the workflow into four layers—receptor/ligand preprocessing, provenance-aware execution, postprocessing of poses and interaction fingerprints, and SQLite-backed storage—while supporting inputs such as PDB identifiers, local files, and SMILES strings. The central claim is that this design converts fragmented, engine-specific outputs into structured, queryable, and auditable results to improve reproducibility, comparison, and reuse across targets and settings.

Significance. If the implementation matches the described architecture and functions without additional user scripting, ProDock could meaningfully address workflow fragmentation in structure-based discovery by enabling consistent multi-target campaigns and database-driven analysis. The explicit campaign model and provenance tracking represent a practical advance over ad-hoc scripts, with potential to support larger-scale comparative studies; the open-source release and linked documentation are positive factors for adoption.

major comments (2)

[Abstract and workflow layers description] The manuscript asserts that the four layers integrate preprocessing, batch execution, postprocessing, and database insertion within a consistent project-local workflow without additional user scripting for supported backends and formats, but provides no usage examples, code snippets, test cases, or validation data to demonstrate this integration (see the abstract and the description of the four connected layers). This leaves the weakest assumption untested and undermines assessment of the claimed reproducibility and provenance benefits.
[Description of campaign model and storage] The central claim that ProDock converts fragmented outputs into structured analytical results easier to compare, reuse, and audit rests on the campaign serialization and SQLite schema, yet the manuscript contains no empirical illustration (e.g., a multi-receptor query example or audit trail) showing these advantages in practice.

minor comments (1)

A workflow diagram or pseudocode example illustrating how a campaign object is serialized and passed through the four layers would improve clarity of the API and data flow.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and positive evaluation of ProDock's potential impact. We have carefully considered the major comments and will revise the manuscript to include the requested demonstrations of workflow integration and empirical examples.

read point-by-point responses

Referee: [Abstract and workflow layers description] The manuscript asserts that the four layers integrate preprocessing, batch execution, postprocessing, and database insertion within a consistent project-local workflow without additional user scripting for supported backends and formats, but provides no usage examples, code snippets, test cases, or validation data to demonstrate this integration (see the abstract and the description of the four connected layers). This leaves the weakest assumption untested and undermines assessment of the claimed reproducibility and provenance benefits.

Authors: We agree with this observation. The current version of the manuscript describes the architecture but does not include concrete usage examples or code to illustrate the end-to-end workflow. In the revised manuscript, we will add a dedicated section with code snippets demonstrating a full campaign setup, execution, and database storage for supported backends, including a basic validation example that shows consistent results across runs. This addition will allow readers to assess the integration and reproducibility claims directly. revision: yes
Referee: [Description of campaign model and storage] The central claim that ProDock converts fragmented outputs into structured analytical results easier to compare, reuse, and audit rests on the campaign serialization and SQLite schema, yet the manuscript contains no empirical illustration (e.g., a multi-receptor query example or audit trail) showing these advantages in practice.

Authors: We acknowledge that an empirical illustration is necessary to substantiate the advantages of the campaign model and storage layer. We will incorporate into the revised manuscript an example of querying the SQLite database for results across multiple receptors, including extraction of an audit trail via provenance information. This will demonstrate in practice how the structured storage facilitates comparison and reuse. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a software description of the ProDock Python toolkit for organizing protein-ligand docking studies as many-to-many campaigns across receptors, ligands, and backends, with layers for preprocessing, provenance-aware execution, postprocessing, and SQLite storage. No equations, derivations, predictions, fitted parameters, or mathematical claims appear anywhere in the text. The central claim that this structure converts fragmented outputs into queryable results follows directly from the presented API, campaign serialization, and schema without any reduction to self-definitional inputs, self-citations, or imported ansatzes. All described functionality is presented as an independent implementation detail rather than a derived result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented scientific entities; the work is a software engineering contribution rather than a theoretical or empirical claim.

pith-pipeline@v0.9.0 · 5580 in / 1057 out tokens · 29895 ms · 2026-05-08T12:52:47.192682+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references

[1]

Bender and et al

Benjamin J. Bender and et al. A practical guide to large-scale docking.Nature Protocols, 2021

2021
[2]

Rdkit: Open-source cheminformatics

RDKit: Open-source cheminformatics. Rdkit: Open-source cheminformatics. https: //www.rdkit.org, 2025

2025
[3]

O’Boyle, Michael Banck, Craig A

Noel M. O’Boyle, Michael Banck, Craig A. James, Chris Morley, Tim Vandermeersch, and Geoffrey R. Hutchison. Open babel: An open chemical toolbox.Journal of Cheminformatics, 3:33, 2011

2011
[4]

Koes, Matthew P

David R. Koes, Matthew P. Baumgartner, and Carlos J. Camacho. Lessons learned in empirical scoring withsmina from the csar 2011 benchmarking exercise.Journal of Chemical Information and Modeling, 53(8):1893–1904, 2013

2011
[5]

Fast, accurate, and reliable molecular docking with quickvina 2.Bioinformatics, 31(13):2214–2216, 2015

Amr Alhossary, Syaifie Handoko, Yuguang Mu, and Chee Keong Kwoh. Fast, accurate, and reliable molecular docking with quickvina 2.Bioinformatics, 31(13):2214–2216, 2015

2015
[6]

Gnina 1.0: molecular docking with deep learning.Journal of Cheminformatics, 13(1):43, 2021

Andrew T McNutt, Paul Francoeur, Rishal Aggarwal, Tomohide Masuda, Rocco Meli, Matthew Ragoza, Jocelyn Sunseri, and David Ryan Koes. Gnina 1.0: molecular docking with deep learning.Journal of Cheminformatics, 13(1):43, 2021

2021
[7]

Bauer, Eva Nittinger, Kathryn A

Jeff Guo, Jon Paul Janet, Matthias R. Bauer, Eva Nittinger, Kathryn A. Giblin, Kostas Papadopoulos, Alexey Voronov, Atanas Patronov, Ola Engkvist, and Christian Margre- itter. Dockstream: a docking wrapper to enhance de novo molecular design.Journal of Cheminformatics, 13(1):89, 2021

2021
[8]

Easydock: customizable and scalable docking tool.Journal of Cheminformatics, 15(1):102, 2023

Guzel Minibaeva, Aleksandra Ivanova, and Pavel Polishchuk. Easydock: customizable and scalable docking tool.Journal of Cheminformatics, 15(1):102, 2023

2023
[9]

Graff and Connor W

David E. Graff and Connor W. Coley. pyscreener: A python wrapper for computational docking software.Journal of Open Source Software, 7(71):3950, 2022

2022
[10]

Chodera, Robert T

Peter Eastman, Jason Swails, John D. Chodera, Robert T. McGibbon, Yutong Zhao, Kyle A. Beauchamp, Lee-Ping Wang, Andrew C. Simmonett, Matthew P. Harrigan, Chaya D. Stern, Rafal P. Wiewiora, Brooks R. Brooks, and Vijay S. Pande. Openmm 7: Rapid development of high performance algorithms for molecular dynamics.PLOS Computational Biology, 13(7):e1005659, 2017

2017
[11]

Openmm 8: Molecular dynamics simulation with machine learning potentials.Journal of Physical Chemistry B, 128(1):109–116, 2023

Peter Eastman and et al. Openmm 8: Molecular dynamics simulation with machine learning potentials.Journal of Physical Chemistry B, 128(1):109–116, 2023

2023
[12]

Meeko: Molecule parameterization and software interoperability for docking and beyond.Journal of Chemical Information and Modeling, 65(24), 2025

Diogo Santos-Martins, Yiran He, Jérôme Eberhardt, Parnika Sharma, et al. Meeko: Molecule parameterization and software interoperability for docking and beyond.Journal of Chemical Information and Modeling, 65(24), 2025. S8

2025
[13]

Prolif: a library to encode molecular interactions as fingerprints.Journal of Cheminformatics, 13(1):72, 2021

C’edric Bouysset and S’ebastien Fiorucci. Prolif: a library to encode molecular interactions as fingerprints.Journal of Cheminformatics, 13(1):72, 2021

2021
[14]

Tillack, and Stefano Forli

Jerome Eberhardt, Diogo Santos-Martins, Andreas F. Tillack, and Stefano Forli. Autodock vina 1.2.0: New docking methods, expanded force field, and python bindings.Journal of Chemical Information and Modeling, 61(8):3891–3898, 2021. S9

2021