MOD-Finder: Identify multi-omics data sets related to defined chemical exposure

J. Hackerm\"uller; J. Schor; S. Canzler

arxiv: 1907.06346 · v1 · pith:5RT2AGWXnew · submitted 2019-07-15 · 🧬 q-bio.QM

MOD-Finder: Identify multi-omics data sets related to defined chemical exposure

S. Canzler , J. Hackerm\"uller , J. Schor This is my paper

Pith reviewed 2026-05-24 21:24 UTC · model grok-4.3

classification 🧬 q-bio.QM

keywords multi-omicschemical exposuredata integrationomics databasesweb applicationchemical identifierstoxicology

0 comments

The pith

MOD-Finder automates searches across public databases for multi-omics data sets related to a given chemical exposure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Finding multi-omics data on cellular responses to a specific chemical is tedious because the data live in separate repositories and chemicals carry multiple names and identifier systems that do not map uniquely. MOD-Finder lets a user supply a chemical name or identifier and then queries several public databases automatically to retrieve matching transcriptome, proteome, and other omics data sets. Returned results are shown in a simple list that also includes biological effects presumed to be triggered by the chemical. The service is delivered as a free web application written in R and Shiny, with the source code released under an open license. A sympathetic reader would care because the automation removes the need for repeated manual cross-checks that currently slow down efforts to combine different omics layers for exposure studies.

Core claim

The authors present MOD-Finder, a web application that searches multiple public repositories for omics data sets associated with exposure to a user-specified chemical, returning the data sets together with assumed effect information in an accessible format.

What carries the argument

MOD-Finder, an R Shiny web service that maps chemical identifiers and queries omics databases automatically.

If this is right

Researchers can locate related multi-omics data sets without performing repeated manual cross-database searches.
Integration of transcriptome, proteome, and other layers for a given chemical exposure becomes more feasible.
Users receive effect information alongside the data-set listings to guide further analysis.
Both web access and local reuse are supported through the open-source release.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The mapping step may reduce naming inconsistencies that currently hinder large-scale comparisons in toxicology.
The service could be extended to additional databases or omics types without changing its core workflow.
Automated retrieval might enable systematic meta-analyses of chemical-response patterns across studies.

Load-bearing premise

Chemical names and identifiers can be mapped reliably enough across databases to retrieve the relevant data sets without missing important matches or including many false ones.

What would settle it

Running the tool on a well-studied chemical such as bisphenol A and checking whether all known published omics data sets appear in the results or whether many irrelevant ones are returned due to identifier mismatches.

read the original abstract

Summary: Integration of multi-omics data on chemical exposure of cells or organisms promises a more complete representation of the responding pathways than single omics data. Data of different omics layers, like transcriptome or proteome is deposited in different repositories. Additionally, precisely specifying a chemical of interest that was used in the exposure experiments suffers from different nomenclatures and non-uniquely mapping of chemical identifiers. The manual search for corresponding omics data sets of different layers for exposure with a chemical of interest is thus a tedious task. We have developed MOD-Finder (Multi-Omics Data set Finder) to efficiently search for chemical-related omics data sets in several publicly available databases in an automated manner. A plain and simple presentation of the returned omics data sets is augmented with effect information that are assumed to be triggered by the chemical of interest. Availability and Implementation: MOD-Finder is implemented in R using the Shiny package. The web service is available at https://webapp.ufz.de/mod_finder and the source code under the GNU GPL v3 license at https://github.com/yigbt/MOD-Finder. Supplementary information: Supplementary data are available at https://www.ufz.de/index.php?en=44919

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MOD-Finder is a simple Shiny wrapper for querying omics databases on chemical exposures, released with code but without any accuracy checks.

read the letter

The paper introduces MOD-Finder, an R/Shiny web tool that takes a chemical name or ID and pulls related multi-omics datasets from several public repositories, then displays them with some added effect annotations. The code is on GitHub under GPL and a live instance is hosted at UFZ. That is the actual contribution: a domain-specific convenience layer on top of existing databases rather than new data or new algorithms. The open release is the part that gives it immediate value; anyone can run or modify it without starting from scratch. The motivation section correctly identifies the real friction—different chemical nomenclatures and non-unique mappings make manual cross-database searches slow. Automating the query step addresses that friction at the level of effort, even if it does not change the underlying data quality. The soft spot is the complete lack of evaluation. The abstract itself calls out non-unique identifier mappings as the central manual-search problem, yet the description gives no account of which synonym tables are used, how conflicts are handled, or any precision/recall numbers on ambiguous cases. If the tool simply forwards the user string to each database, the same mapping headaches remain and the efficiency gain is only in keystrokes. No benchmark against manual search or against other existing query tools is reported, so soundness rests entirely on the public code rather than demonstrated behavior. This is a methods/tool paper aimed at toxicologists and systems biologists who routinely need to locate existing exposure datasets. A reader who wants a ready-made interface might download it and test it themselves. A reader who needs evidence that the results are reliable or complete will not find it here. I would send it for peer review. The implementation is concrete, the code is available for inspection, and referees can ask for the missing mapping details and validation in revision. It is not a high-impact result, but it is a usable piece of infrastructure that belongs in the methods literature once the gaps are addressed.

Referee Report

2 major / 1 minor

Summary. The manuscript describes MOD-Finder, an R/Shiny web application that automates searches across public omics repositories (e.g., GEO, ArrayExpress, PRIDE) for datasets linked to a user-specified chemical exposure. It returns matching multi-omics records augmented with effect annotations assumed to be triggered by the chemical, addressing the tedium of manual searches caused by inconsistent chemical nomenclatures and non-unique identifier mappings. The tool and source code are made publicly available.

Significance. If the mapping and search logic function as intended, the tool could reduce the manual effort required to locate multi-omics data for chemical-exposure studies and thereby support pathway-level integration across omics layers. The public release of both the web service and GPL-licensed source code is a clear strength that enables reproducibility and community extension.

major comments (2)

[Abstract] Abstract: the manuscript explicitly identifies non-unique chemical-identifier mappings as the central obstacle to manual search, yet supplies no description of the synonym-resolution procedure (which synonym databases are queried, how conflicts are ranked or flagged to the user, or whether the tool simply forwards the raw input string). Without this, the automation claim reduces to parallel query submission rather than resolution of the stated problem.
No section provides quantitative validation (precision, recall, or coverage) of search results on a benchmark set of chemicals with known ambiguous names; soundness therefore rests solely on the code release rather than demonstrated performance on the core use case.

minor comments (1)

The supplementary information link is given only as a UFZ institutional page; a direct DOI or stable archive of the example queries and output would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and note planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: the manuscript explicitly identifies non-unique chemical-identifier mappings as the central obstacle to manual search, yet supplies no description of the synonym-resolution procedure (which synonym databases are queried, how conflicts are ranked or flagged to the user, or whether the tool simply forwards the raw input string). Without this, the automation claim reduces to parallel query submission rather than resolution of the stated problem.

Authors: We agree the abstract omits details on synonym resolution. The manuscript body references public chemical databases to handle nomenclature issues, but we will revise the abstract to briefly describe the mapping approach and expand the methods to specify queried resources (e.g., PubChem, ChEBI) and conflict-handling logic. revision: yes
Referee: [—] No section provides quantitative validation (precision, recall, or coverage) of search results on a benchmark set of chemicals with known ambiguous names; soundness therefore rests solely on the code release rather than demonstrated performance on the core use case.

Authors: We acknowledge that a benchmark evaluation would strengthen claims. However, constructing a gold-standard set of chemical-omics associations requires substantial manual curation not available in existing resources and lies outside the scope of this tool-description paper. The GPL-licensed code enables independent assessment. We will add a discussion paragraph on this limitation and avenues for future validation. revision: partial

Circularity Check

0 steps flagged

No circularity; software tool description without derivations or self-referential claims

full rationale

The paper describes the implementation of MOD-Finder, an R/Shiny web tool for automated querying of public omics repositories using chemical identifiers. No equations, fitted parameters, predictions, uniqueness theorems, or ansatzes appear. The abstract and implementation section state the problem of non-unique chemical mappings but present the tool as a convenience layer for query submission and result display; no claim is made that the tool resolves ambiguities via any derived procedure. No self-citations are load-bearing. The contribution is self-contained as a software artifact and does not reduce any result to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper contributes only a software implementation that integrates existing public databases; no free parameters, mathematical axioms, or new postulated entities are introduced.

pith-pipeline@v0.9.0 · 5760 in / 915 out tokens · 23140 ms · 2026-05-24T21:24:30.126841+00:00 · methodology

MOD-Finder: Identify multi-omics data sets related to defined chemical exposure

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)