arxiv: 2605.04081 · v2 · submitted 2026-04-15 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Time series causal discovery with variable lags

Bruno Petrungaro , Anthony C. Constantinou

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:20 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords time series causal discoveryvariable lagsBayesian networksstructure learningTabu searchBIC scoringcausal inference

0 comments

The pith

A Tabu search algorithm learns causal structures from time series by optimizing edge-specific lags up to a maximum bound.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a structure learning method for causal Bayesian networks from time series data that permits each causal edge to have its own delay length. This contrasts with prior approaches that fix a single lag window for all dependencies. The algorithm employs a Tabu search combined with a decomposable score that penalizes longer lags while using effective sample sizes per node. In tests on simulated data it matches or exceeds existing methods in recovering the graph and lags, and on UK COVID-19 policy records it identifies mostly immediate effects alongside some delayed ones. This matters because many real processes, from disease spread to policy responses, involve effects that unfold at uneven rates over time.

Core claim

The proposed Tabu-based algorithm searches for time-ordered directed acyclic graphs where each edge is assigned an individual lag from 0 to a user-specified maximum, scored via a BIC function with lag penalty and node-specific effective sample sizes, achieving competitive recovery of structure and lags in simulations and revealing a dominance of short delays with some longer ones in real data.

What carries the argument

A Tabu search procedure over time-respecting causal graphs that assigns and optimizes per-edge lags while maintaining decomposable scoring for local updates.

If this is right

The method captures heterogeneous delay structures in dynamic systems without forcing a uniform lag window.
It supplies theoretical guarantees of validity and local optimality for the returned graphs.
A parallel implementation improves scalability to larger numbers of variables.
When applied to policy data the approach surfaces both immediate and delayed causal influences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same variable-lag idea could be ported to other causal discovery algorithms beyond the Tabu-BIC combination.
Re-running the search with different maximum-lag bounds on the same data would show how sensitive the recovered delays are.
Pairing the learned structure with actual intervention records could test whether the estimated lags match observed response times.
The discrete-lag formulation offers a starting point for handling irregularly sampled or continuous-time series.

Load-bearing premise

The data is generated by a causal Bayesian network whose maximum relevant lag is within the user-specified limit and whose structure is best identified by the BIC score with the described penalties.

What would settle it

Generate synthetic time series from a known causal graph with specific lags within the bound, apply the algorithm, and observe whether it fails to recover the exact adjacencies or correct lag assignments.

read the original abstract

Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs require a cause-and-effect map of the variables under consideration, known as the network's structure. Learning the graphical structure of a causal model from data remains challenging; learning it from time-series data is even harder because dependencies may arise at different time lags. Existing time-series causal discovery methods often assume a fixed lag window and do not explicitly optimise edge-specific lags. We propose a Tabu-based structure learning algorithm that searches for a time-ordered directed structure (i.e., where every edge respects time) while allowing edge-specific lags up to a specified maximum lag. The approach uses a decomposable BIC-based score with node-specific effective sample sizes and an explicit lag-length penalty encouraging parsimonious delay assignments while preserving efficient local score updates. We provide theoretical guarantees of validity and local optimality, and we also describe a parallel implementation for improved scalability. In simulations, the method recovered graph structure competitively and estimated lags accurately when true adjacencies were recovered. On a real-world UK COVID-19 policy dataset, the learnt structure was dominated by short delays while retaining a substantial minority of longer-lag dependencies, consistent with delayed behavioural and epidemiological effects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a Tabu-search algorithm for time-ordered causal graphs that picks lags per edge instead of fixing a window, backed by a decomposable BIC variant, but the finite-sample ranking of that score for variable lags is not obviously solid from the details given.

read the letter

The main thing here is the shift from fixed-lag windows to explicit per-edge lag selection inside a Tabu search over time-respecting graphs. They keep the search efficient with a decomposable score that adds a lag-length penalty and uses node-specific effective sample sizes. That matches real settings where some relationships act fast and others take longer, like policy impacts or disease spread. The parallel version and the local-optimality claim are useful engineering touches. Simulations are said to recover structure competitively and get lags right when the edges are correct, and the UK COVID example produces mostly short delays with a mix of longer ones, which lines up with what one would expect from behavioural and epidemiological data.

Referee Report

3 major / 2 minor

Summary. The paper proposes a Tabu-based structure learning algorithm for causal Bayesian networks from time-series data. It searches over time-ordered directed graphs allowing edge-specific lags up to a user-specified maximum, using a decomposable BIC score with node-specific effective sample sizes and an explicit lag-length penalty. Theoretical guarantees of validity and local optimality are claimed, along with a parallel implementation. Simulations show competitive graph structure recovery and accurate lag estimation conditional on correct adjacencies; application to UK COVID-19 policy data yields mostly short delays with a minority of longer lags.

Significance. If the BIC score with the stated adjustments reliably ranks structures, the method would advance time-series causal discovery by relaxing fixed-lag assumptions while preserving efficient local search. The decomposable score, theoretical guarantees, and dual synthetic/real-world evaluation are strengths that could make the approach useful in domains with delayed effects such as epidemiology.

major comments (3)

[Score function (Section 3)] Score function section: the BIC score is described as using node-specific effective sample sizes together with a lag-length penalty, yet each candidate parent set with lag k uses only T-k observations. The manuscript does not derive or validate a per-lag ESS adjustment (as opposed to per-node), nor calibrate the penalty coefficient to the resulting information loss; this directly affects whether the score ranks true lag assignments highest in finite samples and therefore underpins both the validity guarantees and the reported simulation recovery.
[Theoretical guarantees] Theoretical guarantees section: local optimality of the Tabu search is asserted on the basis of score decomposability, but the guarantees are only as strong as the underlying score's consistency under the finite-sample regime of the experiments. No explicit derivation is supplied showing how the node-specific ESS and lag penalty preserve consistency when lags vary across edges.
[Experiments / Simulations] Simulation results: the claim of competitive structure recovery and accurate lag estimation (when adjacencies are recovered) is presented without error bars, number of independent runs, or explicit data-exclusion rules for the reduced sample sizes at longer lags. This makes it impossible to assess whether the reported performance is robust or sensitive to post-hoc choices.

minor comments (2)

[Abstract] Abstract: the phrase 'theoretical guarantees of validity' should be clarified to specify whether it refers to score consistency, asymptotic correctness, or another property.
[Real-world application] The real-world UK COVID-19 application would benefit from a brief statement of how the maximum lag bound was chosen and whether sensitivity checks were performed.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below, acknowledging where the manuscript is incomplete and outlining specific revisions to strengthen the derivations, guarantees, and experimental reporting.

read point-by-point responses

Referee: [Score function (Section 3)] Score function section: the BIC score is described as using node-specific effective sample sizes together with a lag-length penalty, yet each candidate parent set with lag k uses only T-k observations. The manuscript does not derive or validate a per-lag ESS adjustment (as opposed to per-node), nor calibrate the penalty coefficient to the resulting information loss; this directly affects whether the score ranks true lag assignments highest in finite samples and therefore underpins both the validity guarantees and the reported simulation recovery.

Authors: We agree that the current description of the score lacks an explicit derivation of the per-lag ESS adjustment and calibration of the penalty. In the revision we will expand Section 3 with a step-by-step derivation: for any parent set whose maximum lag is k the node's ESS is defined as T-k (reflecting the shifted observations actually available for that edge), the standard BIC penalty is applied to this reduced count, and the explicit lag-length penalty coefficient is calibrated via a small grid search on held-out synthetic data so that longer lags are penalised only when they do not improve the likelihood enough to offset the information loss. These additions will directly support the finite-sample ranking behaviour and the validity claims. revision: yes
Referee: [Theoretical guarantees] Theoretical guarantees section: local optimality of the Tabu search is asserted on the basis of score decomposability, but the guarantees are only as strong as the underlying score's consistency under the finite-sample regime of the experiments. No explicit derivation is supplied showing how the node-specific ESS and lag penalty preserve consistency when lags vary across edges.

Authors: Local optimality follows immediately from score decomposability, which is unchanged by the lag adjustments. We acknowledge, however, that an explicit consistency argument for variable lags is missing. The revised theoretical section will contain a short proof sketch: under the usual faithfulness and positivity assumptions, the likelihood term of the adjusted BIC grows linearly with the per-edge ESS (T-k), while both the standard BIC penalty and the additional lag penalty grow only logarithmically; consequently the score difference between the true structure and any incorrect lag assignment diverges to infinity with T, preserving consistency even when lags differ across edges. This derivation will be tied to the finite-sample regimes used in the experiments. revision: yes
Referee: [Experiments / Simulations] Simulation results: the claim of competitive structure recovery and accurate lag estimation (when adjacencies are recovered) is presented without error bars, number of independent runs, or explicit data-exclusion rules for the reduced sample sizes at longer lags. This makes it impossible to assess whether the reported performance is robust or sensitive to post-hoc choices.

Authors: We apologise for the omission of these statistical details. The revised Experiments section will state that all synthetic results are averages over 50 independent runs, will display means accompanied by standard-error bars, and will specify the exact data-handling rule: for any candidate lag k the score uses precisely the last T-k observations, and any replicate with T < 2*max_lag is discarded before aggregation to avoid unreliable estimates at the longest lags. These changes will allow readers to evaluate robustness directly. revision: yes

Circularity Check

0 steps flagged

No significant circularity: standard BIC score and Tabu search applied to variable-lag time series

full rationale

The paper presents an algorithmic procedure that searches over time-ordered DAGs with edge-specific lags using a decomposable BIC score that incorporates node-specific effective sample sizes and an explicit lag penalty. This score is an externally defined information criterion applied to the data, not constructed from the output structure itself. Theoretical guarantees of validity and local optimality are stated to follow from the score's decomposability (enabling efficient local updates) and standard properties of Tabu search; these are independent of the target result and do not reduce the discovered lags or adjacencies to a fitted parameter by construction. No load-bearing premise relies on self-citation chains, uniqueness theorems imported from the authors' prior work, or ansatzes smuggled via citation. Simulations and the COVID-19 application serve as external empirical checks rather than derivations that loop back to inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; full paper likely contains additional parameter settings and background assumptions not visible here.

free parameters (2)

maximum lag bound
User-specified upper limit on searchable delays; directly controls search space size and must be chosen before running the algorithm.
lag-length penalty coefficient
Scalar multiplier on the lag penalty term inside the BIC score; encourages shorter delays and is part of the objective being optimized.

axioms (2)

domain assumption The data are generated by a causal Bayesian network whose edges respect a strict temporal order.
Invoked by the requirement that every edge respects time and by the use of time-series causal discovery framing.
domain assumption The decomposable BIC score with node-specific effective sample sizes is a valid ranking criterion for structures with variable lags.
Underpins the local score updates and the claim of efficient search.

pith-pipeline@v0.9.0 · 5534 in / 1493 out tokens · 51573 ms · 2026-05-11T02:20:19.329559+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The approach uses a decomposable BIC-based score with node-specific effective sample sizes and an explicit lag-length penalty
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat induction and orbit embedding unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

searches for a time-ordered directed structure ... allowing edge-specific lags up to a specified maximum lag

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 1 internal anchor

[1]

In: Proceedings of the 2nd AAAI Conference on Artificial Intelligence, pp

Pearl, J.: Reverend bayes on inference engines: A distributed hierarchical approach. In: Proceedings of the 2nd AAAI Conference on Artificial Intelligence, pp. 133–136. AAAI Press, Menlo Park, CA (1982)

work page 1982
[2]

In: Proceedings of the 7th Conference of the Cognitive Science Society, pp

Pearl, J.: Bayesian networks: A model of self-activated memory for evidential reasoning. In: Proceedings of the 7th Conference of the Cognitive Science Society, pp. 329–334 (1985)

work page 1985
[3]

Basic books, New York, NY (2018)

Pearl, J., Mackenzie, D.: The Book of Why: the New Science of Cause and Effect. Basic books, New York, NY (2018)

work page 2018
[4]

Verma, T., Pearl, J.: Equivalence and Synthesis of Causal Models, 1st edn., pp. 221–236. Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3501714.3501732

work page doi:10.1145/3501714.3501732 2022
[5]

Journal of Machine Learning Research5, 1287–1330 (2004)

Chickering, M., Heckerman, D., Meek, C.: Large-sample learning of bayesian networks is np-hard. Journal of Machine Learning Research5, 1287–1330 (2004)

work page 2004
[6]

The annals of statistics, 461– 464 (1978)

Schwarz, G.: Estimating the dimension of a model. The annals of statistics, 461– 464 (1978)

work page 1978
[7]

Journal of machine learning research3(Nov), 507–554 (2002)

Chickering, D.M.: Optimal structure identification with greedy search. Journal of machine learning research3(Nov), 507–554 (2002)

work page 2002
[8]

2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 392–395 (2019) https://doi.org/10.1109/ICMLA

Kocacoban, D., Cussens, J.: Online causal structure learning in the presence of latent variables. 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), 392–395 (2019) https://doi.org/10.1109/ICMLA. 2019.00073

work page doi:10.1109/icmla 2019
[9]

In: UAI Workshop on Causal Structure Learning (2012)

Kummerfeld, E., Danks, D.: Online learning of time-varying causal structures. In: UAI Workshop on Causal Structure Learning (2012)

work page 2012
[10]

Advances in neural information processing systems26(2013)

Kummerfeld, E., Danks, D.: Tracking time-varying graphical structure. Advances in neural information processing systems26(2013)

work page 2013
[11]

Science advances5(11), 4996 (2019)

Runge, J., Nowack, P., Kretschmer, M., Flaxman, S., Sejdinovic, D.: Detecting and quantifying causal associations in large nonlinear time series datasets. Science advances5(11), 4996 (2019)

work page 2019
[12]

Advances in neural information processing systems 33, 12615–12625 (2020)

Gerhardus, A., Runge, J.: High-recall causal discovery for autocorrelated time series with latent confounders. Advances in neural information processing systems 33, 12615–12625 (2020)

work page 2020
[13]

In: Artificial Intelligence and Statistics, pp

Siracusa, M., Fisher III, J.: Tractable bayesian inference of time-series dependence structure. In: Artificial Intelligence and Statistics, pp. 528–535 (2009). PMLR 28

work page 2009
[14]

In: Proceedings of 2018 ACM SIGKDD Workshop on Causal Discovery, pp

Malinsky, D., Spirtes, P.: Causal structure learning from multivariate time series in settings with unmeasured confounding. In: Proceedings of 2018 ACM SIGKDD Workshop on Causal Discovery, pp. 23–47 (2018). PMLR

work page 2018
[15]

PhD thesis, Utrecht University (1995)

Bouckaert, R.R.: Bayesian belief networks: from construction to inference. PhD thesis, Utrecht University (1995)

work page 1995
[16]

Computational intelligence10(3), 269–293 (1994)

Lam, W., Bacchus, F.: Learning bayesian belief networks: An approach based on the mdl principle. Computational intelligence10(3), 269–293 (1994)

work page 1994
[17]

Machine learning20(3), 197–243 (1995)

Heckerman, D., Geiger, D., Chickering, D.M.: Learning bayesian networks: The combination of knowledge and statistical data. Machine learning20(3), 197–243 (1995)

work page 1995
[18]

PhD thesis, University of California, Berkeley (2002)

Murphy, K.P.: Dynamic bayesian networks: representation, inference and learn- ing. PhD thesis, University of California, Berkeley (2002)

work page 2002
[19]

Journal of Business & Economic Statistics4(1), 25–38 (1986)

Litterman, R.B.: Forecasting with bayesian vector autoregressions—five years of experience. Journal of Business & Economic Statistics4(1), 25–38 (1986)

work page 1986
[20]

Econometrica: Journal of the Econometric Society, 178–196 (1965)

Almon, S.: The distributed lag between capital appropriations and expenditures. Econometrica: Journal of the Econometric Society, 178–196 (1965)

work page 1965
[21]

Econometric reviews26(1), 53–90 (2007)

Ghysels, E., Sinko, A., Valkanov, R.: Midas regressions: Further results and new directions. Econometric reviews26(1), 53–90 (2007)

work page 2007
[22]

Springer, Berlin, Heidelberg (2005)

L¨ utkepohl, H.: New Introduction to Multiple Time Series Analysis. Springer, Berlin, Heidelberg (2005). https://doi.org/10.1007/978-3-540-27752-1

work page doi:10.1007/978-3-540-27752-1 2005
[23]

nature585(7825), 357–362 (2020)

Harris, C.R., Millman, K.J., Van Der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J.,et al.: Array programming with numpy. nature585(7825), 357–362 (2020)

work page 2020
[24]

Journal of Artificial Intelligence Research73, 767–819 (2022)

Assaad, C.K., Devijver, E., Gaussier, E.: Survey and evaluation of causal discovery methods for time series. Journal of Artificial Intelligence Research73, 767–819 (2022)

work page 2022
[25]

Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation,

Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informed- ness, markedness and correlation. arXiv preprint arXiv:2010.16061 (2020)

work page arXiv 2010
[26]

Machine learning65(1), 31–78 (2006)

Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing bayesian network structure learning algorithm. Machine learning65(1), 31–78 (2006)

work page 2006
[27]

arXiv preprint arXiv:1905.12666 (2019)

Constantinou, A.C.: Evaluating structure learning algorithms with a balanced scoring function. arXiv preprint arXiv:1905.12666 (2019)

work page arXiv 1905
[28]

arXiv preprint 29 arXiv:2508.15928 (2025)

Huang, J., Yao, Y., Divakaran, A.: Transforming causality: Transformer-based temporal causal discovery with prior knowledge integration. arXiv preprint 29 arXiv:2508.15928 (2025)

work page arXiv 2025
[29]

Econometric vs. Causal Structure-Learning for Time-Series Policy Decisions: Evidence from the UK COVID-19 Policies

Petrungaro, B., Constantinou, A.C.: Econometric vs. causal structure-learning for time-series policy decisions: Evidence from the uk covid-19 policies. arXiv preprint arXiv:2603.00041 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[30]

Biometrika82, 669–710 (1995)

Pearl, J.: Causal diagrams for empirical research. Biometrika82, 669–710 (1995)

work page 1995
[31]

International Journal of Approximate Reasoning131, 151–188 (2021) 30

Constantinou, A.C., Liu, Y., Chobtham, K., Guo, Z., Kitson, N.K.: Large-scale empirical validation of bayesian network structure learning algorithms with noisy data. International Journal of Approximate Reasoning131, 151–188 (2021) 30

work page 2021