PyCC.id: A package for hypothesis-driven equation discovery with structural identifiability
Pith reviewed 2026-06-30 23:05 UTC · model grok-4.3
The pith
PyCC is a Python library for hypothesis-driven ODE discovery using structural skeletons with identifiability properties.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In this work, we present the Python library PyCC, which condenses these efforts into a flexible tool that allows researchers and engineers to seamlessly define their skeletons and hypotheses to discover ODEs from time-dependent data. The methodology uses structural skeletons inspired by characteristic curves to define a hypothesis-driven approach where identifiability properties help validate the skeleton.
What carries the argument
The structural skeleton associated with a family of ODEs, enabling hypothesis incorporation and providing structural identifiability properties for validation.
If this is right
- Users can check if a skeleton is correct or should be discarded using identifiability properties.
- The search space is reduced by incorporating domain knowledge as priors.
- Multiple equation discovery paradigms can be used due to modularity.
- Iterative refinement of models is supported based on hypotheses.
- The ill-conditioned nature of the inverse problem is mitigated by pre-defined constraints.
Where Pith is reading between the lines
- This approach might extend to discovering other types of equations beyond ODEs.
- It could connect to identifiability analysis in parameter estimation problems.
- Testing on standard benchmark datasets for equation discovery could validate the identifiability checks.
- Automation of hypothesis suggestion could be a natural next step.
Load-bearing premise
Some skeletons have demonstrable structural identifiability properties that allow checking whether the skeleton is correct or should be discarded.
What would settle it
An experiment showing that identifiability analysis on a skeleton fails to distinguish between correct and incorrect models when applied to time-series data from a known system.
Figures
read the original abstract
Data-driven equation discovery is fundamentally an inverse problem that seeks to infer the governing differential equations of a system directly from time-series measurements. A known issue is the ill-conditioned nature of the inverse problem, which frequently produces multiple mathematical models that fit the data similarly well. One path to address this issue is by incorporating known hypotheses and constraints into the training phase beforehand. While this approach effectively reduces the search space, it still results in multiple candidate models, forcing practitioners to rely on post-hoc manual filtering based on their own domain expertise. A recent approach incorporates structural `skeletons' inspired by characteristic curves (CCs), defining a hypothesis-driven methodology. In this methodology, practitioners define a skeleton, which is associated with a family of ordinary differential equations (ODEs), and then add their hypotheses and priors based on their domain knowledge to refine the obtained model iteratively. An important advantage of this approach is that some skeletons have demonstrable structural identifiability properties, which are useful for checking whether the skeleton is correct or should be discarded. Furthermore, this formalism enables the use of multiple equation discovery paradigms due to its modularity (such as neural networks, symbolic regression, and sparse regression). In this work, we present the Python library PyCC, which condenses these efforts into a flexible tool that allows researchers and engineers to seamlessly define their skeletons and hypotheses to discover ODEs from time-dependent data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents PyCC (also styled PyCC.id), a Python library for hypothesis-driven discovery of ODEs from time-series data. Users define structural skeletons that represent families of ODEs, incorporate domain hypotheses and priors, and exploit structural identifiability properties of certain skeletons to prune or validate candidate models. The formalism is modular, supporting multiple discovery back-ends including neural networks, symbolic regression, and sparse regression.
Significance. If the implementation correctly delivers the advertised modularity and identifiability checks, the package would supply a practical, extensible framework that integrates prior knowledge directly into the discovery pipeline, potentially mitigating the ill-posedness of inverse problems in equation discovery. The explicit support for structural identifiability as a model-selection criterion is a distinctive feature that could improve reliability over purely data-driven approaches.
minor comments (3)
- The title uses 'PyCC.id' while the abstract and body consistently refer to 'PyCC'; the naming convention should be unified and explained.
- The abstract asserts that 'some skeletons have demonstrable structural identifiability properties' and that the formalism 'enables the use of multiple equation discovery paradigms,' yet provides no concrete code examples, skeleton definitions, or usage snippets to illustrate these features.
- No installation instructions, dependency list, or link to the source repository appear in the provided text; these are standard requirements for a software-package manuscript.
Simulated Author's Rebuttal
We thank the referee for their positive summary, recognition of the package's potential significance, and recommendation for minor revision. The report contains no specific major comments requiring point-by-point responses.
Circularity Check
No circularity: software library description with no derivations
full rationale
The manuscript presents PyCC as a modular Python package for users to define ODE skeletons and hypotheses. No equations, fitted parameters, predictions, or derivation steps appear in the abstract or description. The statement that 'some skeletons have demonstrable structural identifiability properties' is presented as a pre-existing advantage of the approach rather than a result derived or fitted within this paper. No self-citation chains, ansatzes, or renamings are invoked as load-bearing steps. The work is self-contained as a tool release and contains no internal reduction of outputs to inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Felix Dietrich and Wil Schilders. “Scientific machine learning”. In:Mathematische Semester- berichte72.2 (2025), pp. 89–115.issn: 1432-1815.doi:10.1007/s00591-025-00399-4
-
[2]
Carola-Bibiane Sch¨ onlieb and Zakhar Shumaylov.Data-driven approaches to inverse problems
- [3]
-
[4]
Determination of the characteristic curves of a nonlinear first order system from Fourier analysis
Federico J. Gonzalez. “Determination of the characteristic curves of a nonlinear first order system from Fourier analysis”. en. In:Sci. Rep.13.1 (Feb. 2023), p. 1955.doi:10.1038/s41598-023- 29151-5
-
[5]
F. J. Gonzalez. “System identification based on characteristic curves: a mathematical connection between power series and Fourier analysis for first-order nonlinear systems”. In:Nonlinear Dyn. 112.18 (July 2024), pp. 16167–16197.issn: 1573-269X.doi:10.1007/s11071-024-09890-4
-
[6]
Federico J. Gonzalez and Luis P. Lara. “Interpretable neural network system identification method for two families of second-order systems based on characteristic curves”. In:Nonlin- ear Dyn.113.24 (Sept. 2025), pp. 33063–33086.issn: 1573-269X.doi:10.1007/s11071- 025- 11744-6
-
[7]
Federico J. Gonzalez. “Integrating prior knowledge in equation discovery: Interpretable symmetry- informed neural networks and symbolic regression via characteristic curves”. In: (2026). arXiv: 2601.21720 [nlin.CD].url:https://arxiv.org/abs/2601.21720
-
[8]
Ali H. Nayfeh and P. Frank Pai.Linear and Nonlinear Structural Mechanics. New York, NY: John Wiley & Sons, Aug. 2004.doi:10.1002/9783527617562
-
[9]
Discovering governing equations from data by sparse identification of nonlinear dynamical systems
Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. “Discovering governing equations from data by sparse identification of nonlinear dynamical systems”. In:Proc. Natl. Acad. Sci. 113.15 (2016), pp. 3932–3937.doi:10.1073/pnas.1517384113
-
[10]
Miles Cranmer.Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
-
[11]
arXiv:2305.01582 [astro-ph.IM]
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Ricky T. Q. Chen et al.Neural Ordinary Differential Equations. 2018. arXiv:1806 . 07366 [cs.LG].url:https://arxiv.org/abs/1806.07366
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[13]
M. Raissi, P. Perdikaris, and G.E. Karniadakis. “Physics-informed neural networks: A deep learn- ing framework for solving forward and inverse problems involving nonlinear partial differential equations”. In:Journal of Computational Physics378 (2019), pp. 686–707.issn: 0021-9991.doi: 10.1016/j.jcp.2018.10.045
-
[14]
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In:NeurIPS 32. Ed. by H. Wallach et al. Curran Associates, Inc., 2019, pp. 8024–8035. 11
2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.