PyCC.id: A package for hypothesis-driven equation discovery with structural identifiability

Federico J. Gonzalez

arxiv: 2606.05191 · v1 · pith:PTQXBSXSnew · submitted 2026-05-07 · 💻 cs.LG · eess.SP

PyCC.id: A package for hypothesis-driven equation discovery with structural identifiability

Federico J. Gonzalez This is my paper

Pith reviewed 2026-06-30 23:05 UTC · model grok-4.3

classification 💻 cs.LG eess.SP

keywords equation discoveryODEstructural identifiabilityPython libraryhypothesis-driventime-seriesskeletondifferential equations

0 comments

The pith

PyCC is a Python library for hypothesis-driven ODE discovery using structural skeletons with identifiability properties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PyCC as a tool to address the ill-conditioned inverse problem of discovering ODEs from time-series data by allowing definition of structural skeletons. These skeletons represent families of equations and can incorporate user hypotheses and priors iteratively. Some skeletons offer structural identifiability, enabling checks on whether the skeleton is suitable. The library's modularity supports different discovery methods like neural networks and symbolic regression.

Core claim

In this work, we present the Python library PyCC, which condenses these efforts into a flexible tool that allows researchers and engineers to seamlessly define their skeletons and hypotheses to discover ODEs from time-dependent data. The methodology uses structural skeletons inspired by characteristic curves to define a hypothesis-driven approach where identifiability properties help validate the skeleton.

What carries the argument

The structural skeleton associated with a family of ODEs, enabling hypothesis incorporation and providing structural identifiability properties for validation.

If this is right

Users can check if a skeleton is correct or should be discarded using identifiability properties.
The search space is reduced by incorporating domain knowledge as priors.
Multiple equation discovery paradigms can be used due to modularity.
Iterative refinement of models is supported based on hypotheses.
The ill-conditioned nature of the inverse problem is mitigated by pre-defined constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might extend to discovering other types of equations beyond ODEs.
It could connect to identifiability analysis in parameter estimation problems.
Testing on standard benchmark datasets for equation discovery could validate the identifiability checks.
Automation of hypothesis suggestion could be a natural next step.

Load-bearing premise

Some skeletons have demonstrable structural identifiability properties that allow checking whether the skeleton is correct or should be discarded.

What would settle it

An experiment showing that identifiability analysis on a skeleton fails to distinguish between correct and incorrect models when applied to time-series data from a known system.

Figures

Figures reproduced from arXiv: 2606.05191 by Federico J. Gonzalez.

read the original abstract

Data-driven equation discovery is fundamentally an inverse problem that seeks to infer the governing differential equations of a system directly from time-series measurements. A known issue is the ill-conditioned nature of the inverse problem, which frequently produces multiple mathematical models that fit the data similarly well. One path to address this issue is by incorporating known hypotheses and constraints into the training phase beforehand. While this approach effectively reduces the search space, it still results in multiple candidate models, forcing practitioners to rely on post-hoc manual filtering based on their own domain expertise. A recent approach incorporates structural `skeletons' inspired by characteristic curves (CCs), defining a hypothesis-driven methodology. In this methodology, practitioners define a skeleton, which is associated with a family of ordinary differential equations (ODEs), and then add their hypotheses and priors based on their domain knowledge to refine the obtained model iteratively. An important advantage of this approach is that some skeletons have demonstrable structural identifiability properties, which are useful for checking whether the skeleton is correct or should be discarded. Furthermore, this formalism enables the use of multiple equation discovery paradigms due to its modularity (such as neural networks, symbolic regression, and sparse regression). In this work, we present the Python library PyCC, which condenses these efforts into a flexible tool that allows researchers and engineers to seamlessly define their skeletons and hypotheses to discover ODEs from time-dependent data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PyCC is a software package that wraps a recent skeleton-based method for ODE discovery, mainly adding convenience and modularity rather than new theory or results.

read the letter

The core of this paper is the release of PyCC, a Python library that lets users define structural skeletons for families of ODEs, fold in domain hypotheses, and apply identifiability checks to narrow down candidates from time-series data. It also supports swapping in different discovery engines.

The main thing it does is take an approach described in prior work and turn it into a single, modular tool. That modularity is a practical plus: the same skeleton setup can feed into neural networks, symbolic regression, or sparse methods without rewriting the hypothesis layer each time. The identifiability angle is borrowed from the skeletons themselves, which can sometimes be checked for structural properties before fitting.

The write-up stays high-level. It explains the workflow but does not include any concrete examples, fitted models, error metrics, or side-by-side comparisons with existing libraries. Without those, it is difficult to judge how much the identifiability step actually reduces the number of plausible models in practice or how robust the implementation is across different datasets.

This is aimed at researchers already working on data-driven ODE discovery who need a ready framework for incorporating priors. Someone looking for new mathematical results or large-scale benchmarks will not find them here.

The paper is coherent on its own terms and shows clear thinking about the workflow, even if the contribution is incremental. I would send it for peer review rather than desk-reject it, mainly so referees can check the code quality, documentation, and whether the examples (if added) demonstrate real utility. A software paper in this niche can still be worth archiving if the tool is usable and the claims about modularity hold up under inspection.

Referee Report

0 major / 3 minor

Summary. The manuscript presents PyCC (also styled PyCC.id), a Python library for hypothesis-driven discovery of ODEs from time-series data. Users define structural skeletons that represent families of ODEs, incorporate domain hypotheses and priors, and exploit structural identifiability properties of certain skeletons to prune or validate candidate models. The formalism is modular, supporting multiple discovery back-ends including neural networks, symbolic regression, and sparse regression.

Significance. If the implementation correctly delivers the advertised modularity and identifiability checks, the package would supply a practical, extensible framework that integrates prior knowledge directly into the discovery pipeline, potentially mitigating the ill-posedness of inverse problems in equation discovery. The explicit support for structural identifiability as a model-selection criterion is a distinctive feature that could improve reliability over purely data-driven approaches.

minor comments (3)

The title uses 'PyCC.id' while the abstract and body consistently refer to 'PyCC'; the naming convention should be unified and explained.
The abstract asserts that 'some skeletons have demonstrable structural identifiability properties' and that the formalism 'enables the use of multiple equation discovery paradigms,' yet provides no concrete code examples, skeleton definitions, or usage snippets to illustrate these features.
No installation instructions, dependency list, or link to the source repository appear in the provided text; these are standard requirements for a software-package manuscript.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the package's potential significance, and recommendation for minor revision. The report contains no specific major comments requiring point-by-point responses.

Circularity Check

0 steps flagged

No circularity: software library description with no derivations

full rationale

The manuscript presents PyCC as a modular Python package for users to define ODE skeletons and hypotheses. No equations, fitted parameters, predictions, or derivation steps appear in the abstract or description. The statement that 'some skeletons have demonstrable structural identifiability properties' is presented as a pre-existing advantage of the approach rather than a result derived or fitted within this paper. No self-citation chains, ansatzes, or renamings are invoked as load-bearing steps. The work is self-contained as a tool release and contains no internal reduction of outputs to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms or invented entities; the contribution is a software library implementing an existing approach.

pith-pipeline@v0.9.1-grok · 5777 in / 999 out tokens · 25773 ms · 2026-06-30T23:05:06.867263+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 11 canonical work pages · 2 internal anchors

[1]

Scientific machine learning

Felix Dietrich and Wil Schilders. “Scientific machine learning”. In:Mathematische Semester- berichte72.2 (2025), pp. 89–115.issn: 1432-1815.doi:10.1007/s00591-025-00399-4

work page doi:10.1007/s00591-025-00399-4 2025
[2]

Carola-Bibiane Sch¨ onlieb and Zakhar Shumaylov.Data-driven approaches to inverse problems
[3]

arXiv:2506.11732 [math.NA].url:https://arxiv.org/abs/2506.11732

work page arXiv
[4]

Determination of the characteristic curves of a nonlinear first order system from Fourier analysis

Federico J. Gonzalez. “Determination of the characteristic curves of a nonlinear first order system from Fourier analysis”. en. In:Sci. Rep.13.1 (Feb. 2023), p. 1955.doi:10.1038/s41598-023- 29151-5

work page doi:10.1038/s41598-023- 2023
[5]

System identification based on characteristic curves: a mathematical connection between power series and Fourier analysis for first-order nonlinear systems

F. J. Gonzalez. “System identification based on characteristic curves: a mathematical connection between power series and Fourier analysis for first-order nonlinear systems”. In:Nonlinear Dyn. 112.18 (July 2024), pp. 16167–16197.issn: 1573-269X.doi:10.1007/s11071-024-09890-4

work page doi:10.1007/s11071-024-09890-4 2024
[6]

Interpretable neural network system identification method for two families of second-order systems based on characteristic curves

Federico J. Gonzalez and Luis P. Lara. “Interpretable neural network system identification method for two families of second-order systems based on characteristic curves”. In:Nonlin- ear Dyn.113.24 (Sept. 2025), pp. 33063–33086.issn: 1573-269X.doi:10.1007/s11071- 025- 11744-6

work page doi:10.1007/s11071- 2025
[7]

Integrating prior knowledge in equation discovery: Interpretable symmetry- informed neural networks and symbolic regression via characteristic curves

Federico J. Gonzalez. “Integrating prior knowledge in equation discovery: Interpretable symmetry- informed neural networks and symbolic regression via characteristic curves”. In: (2026). arXiv: 2601.21720 [nlin.CD].url:https://arxiv.org/abs/2601.21720

work page arXiv 2026
[8]

Nayfeh and P

Ali H. Nayfeh and P. Frank Pai.Linear and Nonlinear Structural Mechanics. New York, NY: John Wiley & Sons, Aug. 2004.doi:10.1002/9783527617562

work page doi:10.1002/9783527617562 2004
[9]

Discovering governing equations from data by sparse identification of nonlinear dynamical systems

Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. “Discovering governing equations from data by sparse identification of nonlinear dynamical systems”. In:Proc. Natl. Acad. Sci. 113.15 (2016), pp. 3932–3937.doi:10.1073/pnas.1517384113

work page doi:10.1073/pnas.1517384113 2016
[10]

Miles Cranmer.Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
[11]

arXiv:2305.01582 [astro-ph.IM]

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Ricky T. Q. Chen et al.Neural Ordinary Differential Equations. 2018. arXiv:1806 . 07366 [cs.LG].url:https://arxiv.org/abs/1806.07366

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Raissi, P

M. Raissi, P. Perdikaris, and G.E. Karniadakis. “Physics-informed neural networks: A deep learn- ing framework for solving forward and inverse problems involving nonlinear partial differential equations”. In:Journal of Computational Physics378 (2019), pp. 686–707.issn: 0021-9991.doi: 10.1016/j.jcp.2018.10.045

work page doi:10.1016/j.jcp.2018.10.045 2019
[14]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In:NeurIPS 32. Ed. by H. Wallach et al. Curran Associates, Inc., 2019, pp. 8024–8035. 11

2019

[1] [1]

Scientific machine learning

Felix Dietrich and Wil Schilders. “Scientific machine learning”. In:Mathematische Semester- berichte72.2 (2025), pp. 89–115.issn: 1432-1815.doi:10.1007/s00591-025-00399-4

work page doi:10.1007/s00591-025-00399-4 2025

[2] [2]

Carola-Bibiane Sch¨ onlieb and Zakhar Shumaylov.Data-driven approaches to inverse problems

[3] [3]

arXiv:2506.11732 [math.NA].url:https://arxiv.org/abs/2506.11732

work page arXiv

[4] [4]

Determination of the characteristic curves of a nonlinear first order system from Fourier analysis

Federico J. Gonzalez. “Determination of the characteristic curves of a nonlinear first order system from Fourier analysis”. en. In:Sci. Rep.13.1 (Feb. 2023), p. 1955.doi:10.1038/s41598-023- 29151-5

work page doi:10.1038/s41598-023- 2023

[5] [5]

System identification based on characteristic curves: a mathematical connection between power series and Fourier analysis for first-order nonlinear systems

F. J. Gonzalez. “System identification based on characteristic curves: a mathematical connection between power series and Fourier analysis for first-order nonlinear systems”. In:Nonlinear Dyn. 112.18 (July 2024), pp. 16167–16197.issn: 1573-269X.doi:10.1007/s11071-024-09890-4

work page doi:10.1007/s11071-024-09890-4 2024

[6] [6]

Interpretable neural network system identification method for two families of second-order systems based on characteristic curves

Federico J. Gonzalez and Luis P. Lara. “Interpretable neural network system identification method for two families of second-order systems based on characteristic curves”. In:Nonlin- ear Dyn.113.24 (Sept. 2025), pp. 33063–33086.issn: 1573-269X.doi:10.1007/s11071- 025- 11744-6

work page doi:10.1007/s11071- 2025

[7] [7]

Integrating prior knowledge in equation discovery: Interpretable symmetry- informed neural networks and symbolic regression via characteristic curves

Federico J. Gonzalez. “Integrating prior knowledge in equation discovery: Interpretable symmetry- informed neural networks and symbolic regression via characteristic curves”. In: (2026). arXiv: 2601.21720 [nlin.CD].url:https://arxiv.org/abs/2601.21720

work page arXiv 2026

[8] [8]

Nayfeh and P

Ali H. Nayfeh and P. Frank Pai.Linear and Nonlinear Structural Mechanics. New York, NY: John Wiley & Sons, Aug. 2004.doi:10.1002/9783527617562

work page doi:10.1002/9783527617562 2004

[9] [9]

Discovering governing equations from data by sparse identification of nonlinear dynamical systems

Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. “Discovering governing equations from data by sparse identification of nonlinear dynamical systems”. In:Proc. Natl. Acad. Sci. 113.15 (2016), pp. 3932–3937.doi:10.1073/pnas.1517384113

work page doi:10.1073/pnas.1517384113 2016

[10] [10]

Miles Cranmer.Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

[11] [11]

arXiv:2305.01582 [astro-ph.IM]

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Ricky T. Q. Chen et al.Neural Ordinary Differential Equations. 2018. arXiv:1806 . 07366 [cs.LG].url:https://arxiv.org/abs/1806.07366

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Raissi, P

M. Raissi, P. Perdikaris, and G.E. Karniadakis. “Physics-informed neural networks: A deep learn- ing framework for solving forward and inverse problems involving nonlinear partial differential equations”. In:Journal of Computational Physics378 (2019), pp. 686–707.issn: 0021-9991.doi: 10.1016/j.jcp.2018.10.045

work page doi:10.1016/j.jcp.2018.10.045 2019

[14] [14]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In:NeurIPS 32. Ed. by H. Wallach et al. Curran Associates, Inc., 2019, pp. 8024–8035. 11

2019