Time-Varying Interaction Estimation Using Ensemble Methods

Alfred Hero; Amir Sadeghian; Brandon Oselio; Silvio Savarese

arxiv: 1906.10746 · v1 · pith:NYYNSBMDnew · submitted 2019-06-25 · 📡 eess.SP · cs.IT· cs.LG· eess.IV· math.IT

Time-Varying Interaction Estimation Using Ensemble Methods

Brandon Oselio , Amir Sadeghian , Silvio Savarese , Alfred Hero This is my paper

Pith reviewed 2026-05-25 16:11 UTC · model grok-4.3

classification 📡 eess.SP cs.ITcs.LGeess.IVmath.IT

keywords ensemble methodsadaptive directed informationtime-varying interactionsdirected informationnon-stationary dataexploratory data analysisinteraction estimation

0 comments

The pith

Ensemble methods applied to adaptive directed information produce a robust estimator for time-varying interactions by reducing parameter sensitivity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that ensemble learning can combine multiple adaptive directed information estimators to handle non-stationary time-directed dependencies in multivariate data. Directed information quantifies causal influence over time, and its adaptive version already accommodates changing interactions, but still demands many design choices that affect results. Ensembling averages across variants to stabilize the output for exploratory analysis. The approach is illustrated on pedestrian trajectories from a drone dataset, where interactions shift as people move through crowds. A reader would care because it offers a practical route to reliable interaction discovery without repeated manual tuning on each new dataset.

Core claim

Adaptive directed information estimators can be combined via ensemble methods to yield a more robust estimator of time-directed interactions that alleviates the impact of design decisions and parameters while preserving the ability to discover complex dependencies in non-stationary multivariate data.

What carries the argument

Ensemble aggregation of multiple adaptive directed information estimators, each with different parameter settings, whose outputs are combined to produce a single interaction estimate.

Load-bearing premise

The ensemble of adaptive directed information estimators will deliver meaningfully more robust results than any single well-tuned estimator without introducing new biases.

What would settle it

A controlled simulation containing known time-varying directed interactions where the ensemble estimator either misses the true interactions or reports interactions absent from the ground truth at rates comparable to or worse than a single tuned estimator.

read the original abstract

Directed information (DI) is a useful tool to explore time-directed interactions in multivariate data. However, as originally formulated DI is not well suited to interactions that change over time. In previous work, adaptive directed information was introduced to accommodate non-stationarity, while still preserving the utility of DI to discover complex dependencies between entities. There are many design decisions and parameters that are crucial to the effectiveness of ADI. Here, we apply ideas from ensemble learning in order to alleviate this issue, allowing for a more robust estimator for exploratory data analysis. We apply these techniques to interaction estimation in a crowded scene, utilizing the Stanford drone dataset as an example.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies ensembles to adaptive directed information to reduce tuning parameters for time-varying interaction estimation, but the abstract provides no results to check if the net parameter burden actually drops.

read the letter

The main point is that this work takes adaptive directed information, which already handles non-stationary interactions, and layers ensemble methods on top to make it less sensitive to design choices like window sizes or adaptation rates. They demonstrate the idea on the Stanford drone dataset for crowded scene analysis. That extension is the concrete new piece beyond the prior ADI papers they cite. The paper does a clear job naming the practical pain point with ADI and framing ensembles as a route to more stable exploratory estimates without claiming a new theoretical framework. The stress-test concern lands: any ensemble still needs explicit decisions on which base ADI variants to include, the aggregation rule, and how to set combination weights, so it is not obvious that the total number of choices shrinks. The abstract contains no equations, no error bars, no comparison metrics, and no validation details, which leaves the robustness claim untested. This limits how far the soundness can be assessed from what is shown. The work is aimed at researchers already using directed information on video or sensor data who want a more turnkey estimator for time-varying cases. A reader in that niche could pick up the idea as a practical suggestion, but it would need the full experiments and comparisons to be worth building on. I would send it to peer review because the underlying problem is real and the proposed direction is straightforward enough to evaluate with proper results.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that ensemble methods applied to adaptive directed information (ADI) estimators can alleviate the numerous design decisions and parameters required by ADI, producing a more robust estimator for time-directed interactions in non-stationary multivariate data; the approach is illustrated via interaction estimation on the Stanford drone dataset.

Significance. If the central claim holds with evidence that the ensemble net-reduces tuning burden while improving robustness without new biases, the work could make directed-information tools more practical for exploratory analysis of time-varying dependencies in domains such as crowd dynamics or neural recordings.

major comments (2)

[Abstract / Method description] The abstract asserts that ensemble learning alleviates ADI design decisions, yet the skeptic concern is not addressed: any ensemble still requires explicit selection of base ADI variants (window sizes, adaptation rates, history lengths), an aggregation rule, and meta-validation; without a demonstration that the net number of free choices is smaller than a single well-tuned ADI, the robustness benefit is not established.
[Experiments / Results] No quantitative results, error bars, or validation metrics appear in the provided text to test whether the ensemble combination yields meaningfully more robust estimates than a single ADI instance or avoids introducing aggregation bias; this leaves the weakest assumption unexamined.

minor comments (1)

[Abstract] The abstract refers to 'many design decisions and parameters' without enumerating them, making it hard to judge the scale of the problem the ensemble is meant to solve.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We respond to each major comment below and indicate planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract / Method description] The abstract asserts that ensemble learning alleviates ADI design decisions, yet the skeptic concern is not addressed: any ensemble still requires explicit selection of base ADI variants (window sizes, adaptation rates, history lengths), an aggregation rule, and meta-validation; without a demonstration that the net number of free choices is smaller than a single well-tuned ADI, the robustness benefit is not established.

Authors: We agree that constructing the ensemble still requires selecting a collection of base ADI configurations and an aggregation rule. Our position is that the practical tuning burden is reduced because the user no longer needs to identify a single optimal parameter set in advance; instead, a modest fixed collection of plausible variants (e.g., several window lengths and adaptation rates) is combined, and the ensemble output is less sensitive to any individual choice. This is the sense in which we claim alleviation for exploratory analysis. We will revise the abstract and method section to state this distinction more explicitly and to include a brief enumeration of the decisions required for the ensemble versus a single ADI. revision: yes
Referee: [Experiments / Results] No quantitative results, error bars, or validation metrics appear in the provided text to test whether the ensemble combination yields meaningfully more robust estimates than a single ADI instance or avoids introducing aggregation bias; this leaves the weakest assumption unexamined.

Authors: The Stanford drone example is presented as an illustrative case study on real data where ground-truth time-varying interactions are unavailable, so the manuscript emphasizes qualitative visualization of the resulting interaction graphs. We acknowledge that this leaves the robustness claim without quantitative support. We will add a new subsection containing controlled experiments on synthetic non-stationary data with known ground truth, reporting error metrics and comparisons against single ADI instances, including error bars across multiple realizations. revision: yes

Circularity Check

0 steps flagged

No circularity: methodological proposal with no self-referential reductions shown

full rationale

The provided abstract and context contain no equations, derivations, or load-bearing steps that reduce by construction to fitted inputs or self-citations. The paper describes applying ensemble methods to prior ADI work for robustness in exploratory analysis, but presents no self-definitional mappings, fitted parameters renamed as predictions, or uniqueness theorems imported from overlapping authors. The central claim remains an empirical methodological suggestion whose validity is independent of any circular reduction in the given text; external validation on datasets like Stanford drone would be required to assess performance but does not indicate circularity here.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities. The approach implicitly assumes that ensemble averaging over multiple adaptive directed information realizations improves robustness, but no details on how the ensemble is constructed or what parameters are varied are given.

pith-pipeline@v0.9.0 · 5644 in / 1063 out tokens · 45745 ms · 2026-05-25T16:11:02.086006+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 3 internal anchors

[1]

On directed informa- tion theory and granger causality graphs,

P.-O. Amblard and O. J. Michel, “On directed informa- tion theory and granger causality graphs,” Journal of computational neuroscience, vol. 30, no. 1, pp. 7–16, 2011

work page 2011
[2]

A survey of meth- ods for time series change point detection,

S. Aminikhanghahi and D. J. Cook, “A survey of meth- ods for time series change point detection,” Knowledge and information systems, vol. 51, no. 2, pp. 339–367, 2017

work page 2017
[3]

Quickest detection for changes in maximal knn coherence of random matrices,

T. Banerjee, H. Firouzi, and A. O. Hero, “Quickest detection for changes in maximal knn coherence of random matrices,” IEEE Transactions on Signal Pro- cessing, vol. 66, no. 17, pp. 4490–4503, 2018

work page 2018
[4]

Cesa-Bianchi and G

N. Cesa-Bianchi and G. Lugosi, Prediction, learning, and games. Cambridge university press, 2006

work page 2006
[5]

Shrinkage Optimized Directed Information using Pictorial Structures for Action Recognition

X. Chen, A. Hero, and S. Savarese, “Shrinkage opti- mized directed information using pictorial structures for action recognition,” ArXiv preprint arXiv:1404.3312, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[6]

EEG Spatial Decoding and Classification with Logit Shrinkage Regularized Directed Information Assessment (L-SODA)

X. Chen, Z. Syed, and A. Hero, “Eeg spatial decoding and classiﬁcation with logit shrinkage regularized di- rected information assessment (l-soda),” ArXiv preprint arXiv:1404.0404, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[7]

Dynamic covariance models,

Z. Chen and C. Leng, “Dynamic covariance models,” Journal of the American Statistical Association , vol. 111, no. 515, pp. 1196–1207, 2016

work page 2016
[8]

Sparse in- verse covariance estimation with the graphical lasso,

J. Friedman, T. Hastie, and R. Tibshirani, “Sparse in- verse covariance estimation with the graphical lasso,” Biostatistics, vol. 9, no. 3, pp. 432–441, 2008

work page 2008
[9]

Tracking the best expert,

M. Herbster and M. K. Warmuth, “Tracking the best expert,” Machine learning, vol. 32, no. 2, pp. 151–178, 1998

work page 1998
[10]

Hub discovery in partial correlation graphs,

A. Hero and B. Rajaratnam, “Hub discovery in partial correlation graphs,” IEEE Transactions on Information Theory, vol. 58, no. 9, pp. 6064–6078, 2012

work page 2012
[11]

Universal estimation of directed informa- tion,

J. Jiao, H. H. Permuter, L. Zhao, Y .-H. Kim, and T. Weissman, “Universal estimation of directed informa- tion,” IEEE Transactions on Information Theory, vol. 59, no. 10, pp. 6220–6242, 2013

work page 2013
[12]

The nonpara- normal: Semiparametric estimation of high dimensional undirected graphs,

H. Liu, J. Lafferty, and L. Wasserman, “The nonpara- normal: Semiparametric estimation of high dimensional undirected graphs,” Journal of Machine Learning Re- search, vol. 10, no. Oct, pp. 2295–2328, 2009

work page 2009
[13]

Directed information measure for quantifying the information ﬂow in the brain,

Y . Liu and S. Aviyente, “Directed information measure for quantifying the information ﬂow in the brain,” in Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE, IEEE, 2009, pp. 2188–2191

work page 2009
[14]

Visualizing data using t-sne,

L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008

work page 2008
[15]

Causality, feedback and directed informa- tion,

J. Massey, “Causality, feedback and directed informa- tion,” in Proc. Int. Symp. Inf. Theory Applic.(ISITA-90), Citeseer, 1990, pp. 303–305

work page 1990
[16]

Conservation of mutual and directed information,

J. L. Massey and P. C. Massey, “Conservation of mutual and directed information,” in Information Theory, 2005. ISIT 2005. Proceedings. International Symposium on, IEEE, 2005, pp. 157–158

work page 2005
[17]

Dynamic directed inﬂuence networks: A study of campaigns on twitter,

B. Oselio and A. O. H. III, “Dynamic directed inﬂuence networks: A study of campaigns on twitter,” in Social, Cultural, and Behavioral Modeling, 9th International Conference, SBP-BRiMS 2016, Washington, DC, USA, June 28 - July 1, 2016, Proceedings, K. S. Xu, D. Reitter, D. Lee, and N. Osgood, Eds., ser. Lecture Notes in Computer Science, vol. 9708, Spring...

work page doi:10.1007/978-3-319-39931-7 2016
[18]

Dynamic reconstruction of inﬂuence graphs with adaptive directed information,

——, “Dynamic reconstruction of inﬂuence graphs with adaptive directed information,” in2017 IEEE Interna- tional Conference on Acoustics, Speech and Signal Pro- cessing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, IEEE, 2017, pp. 5935–5939, ISBN : 978-1- 5090-4117-6. DOI: 10.1109/ICASSP.2017.7953295 . [On- line]. Available: https://doi.org/10.1109/...

work page doi:10.1109/icassp.2017.7953295 2017
[19]

Multi-layer relevance networks,

B. Oselio, S. Liu, and A. Hero, “Multi-layer relevance networks,” in 19th IEEE International Workshop on Signal Processing Advances in Wireless Communica- tions, SPAWC 2018, Kalamata, Greece, June 25-28, 2018, IEEE, 2018, pp. 1–5, ISBN : 978-1-5386-3512-4. DOI: 10.1109/SPAWC.2018.8446016 . [Online]. Available: https://doi.org/10.1109/SPAWC.2018.8446016

work page doi:10.1109/spawc.2018.8446016 2018
[20]

Estimating the directed information to infer causal relationships in ensemble neural spike train recordings,

C. J. Quinn, T. P. Coleman, N. Kiyavash, and N. G. Hatsopoulos, “Estimating the directed information to infer causal relationships in ensemble neural spike train recordings,” Journal of computational neuroscience , vol. 30, no. 1, pp. 17–44, 2011

work page 2011
[21]

Directed information graphs,

C. J. Quinn, N. Kiyavash, and T. P. Coleman, “Directed information graphs,” IEEE Transactions on information theory, vol. 61, no. 12, pp. 6887–6909, 2015

work page 2015
[22]

Learning social etiquette: Human trajectory understand- ing in crowded scenes,

A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese, “Learning social etiquette: Human trajectory understand- ing in crowded scenes,” in European conference on computer vision, Springer, 2016, pp. 549–565

work page 2016
[23]

Adapting to Non-stationarity with Growing Expert Ensembles

C. R. Shalizi, A. Z. Jacobs, K. L. Klinkner, and A. Clauset, “Adapting to non-stationarity with growing ex- pert ensembles,” ArXiv preprint arXiv:1103.0949, 2011

work page internal anchor Pith review Pith/arXiv arXiv 2011

[1] [1]

On directed informa- tion theory and granger causality graphs,

P.-O. Amblard and O. J. Michel, “On directed informa- tion theory and granger causality graphs,” Journal of computational neuroscience, vol. 30, no. 1, pp. 7–16, 2011

work page 2011

[2] [2]

A survey of meth- ods for time series change point detection,

S. Aminikhanghahi and D. J. Cook, “A survey of meth- ods for time series change point detection,” Knowledge and information systems, vol. 51, no. 2, pp. 339–367, 2017

work page 2017

[3] [3]

Quickest detection for changes in maximal knn coherence of random matrices,

T. Banerjee, H. Firouzi, and A. O. Hero, “Quickest detection for changes in maximal knn coherence of random matrices,” IEEE Transactions on Signal Pro- cessing, vol. 66, no. 17, pp. 4490–4503, 2018

work page 2018

[4] [4]

Cesa-Bianchi and G

N. Cesa-Bianchi and G. Lugosi, Prediction, learning, and games. Cambridge university press, 2006

work page 2006

[5] [5]

Shrinkage Optimized Directed Information using Pictorial Structures for Action Recognition

X. Chen, A. Hero, and S. Savarese, “Shrinkage opti- mized directed information using pictorial structures for action recognition,” ArXiv preprint arXiv:1404.3312, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[6] [6]

EEG Spatial Decoding and Classification with Logit Shrinkage Regularized Directed Information Assessment (L-SODA)

X. Chen, Z. Syed, and A. Hero, “Eeg spatial decoding and classiﬁcation with logit shrinkage regularized di- rected information assessment (l-soda),” ArXiv preprint arXiv:1404.0404, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[7] [7]

Dynamic covariance models,

Z. Chen and C. Leng, “Dynamic covariance models,” Journal of the American Statistical Association , vol. 111, no. 515, pp. 1196–1207, 2016

work page 2016

[8] [8]

Sparse in- verse covariance estimation with the graphical lasso,

J. Friedman, T. Hastie, and R. Tibshirani, “Sparse in- verse covariance estimation with the graphical lasso,” Biostatistics, vol. 9, no. 3, pp. 432–441, 2008

work page 2008

[9] [9]

Tracking the best expert,

M. Herbster and M. K. Warmuth, “Tracking the best expert,” Machine learning, vol. 32, no. 2, pp. 151–178, 1998

work page 1998

[10] [10]

Hub discovery in partial correlation graphs,

A. Hero and B. Rajaratnam, “Hub discovery in partial correlation graphs,” IEEE Transactions on Information Theory, vol. 58, no. 9, pp. 6064–6078, 2012

work page 2012

[11] [11]

Universal estimation of directed informa- tion,

J. Jiao, H. H. Permuter, L. Zhao, Y .-H. Kim, and T. Weissman, “Universal estimation of directed informa- tion,” IEEE Transactions on Information Theory, vol. 59, no. 10, pp. 6220–6242, 2013

work page 2013

[12] [12]

The nonpara- normal: Semiparametric estimation of high dimensional undirected graphs,

H. Liu, J. Lafferty, and L. Wasserman, “The nonpara- normal: Semiparametric estimation of high dimensional undirected graphs,” Journal of Machine Learning Re- search, vol. 10, no. Oct, pp. 2295–2328, 2009

work page 2009

[13] [13]

Directed information measure for quantifying the information ﬂow in the brain,

Y . Liu and S. Aviyente, “Directed information measure for quantifying the information ﬂow in the brain,” in Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE, IEEE, 2009, pp. 2188–2191

work page 2009

[14] [14]

Visualizing data using t-sne,

L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008

work page 2008

[15] [15]

Causality, feedback and directed informa- tion,

J. Massey, “Causality, feedback and directed informa- tion,” in Proc. Int. Symp. Inf. Theory Applic.(ISITA-90), Citeseer, 1990, pp. 303–305

work page 1990

[16] [16]

Conservation of mutual and directed information,

J. L. Massey and P. C. Massey, “Conservation of mutual and directed information,” in Information Theory, 2005. ISIT 2005. Proceedings. International Symposium on, IEEE, 2005, pp. 157–158

work page 2005

[17] [17]

Dynamic directed inﬂuence networks: A study of campaigns on twitter,

B. Oselio and A. O. H. III, “Dynamic directed inﬂuence networks: A study of campaigns on twitter,” in Social, Cultural, and Behavioral Modeling, 9th International Conference, SBP-BRiMS 2016, Washington, DC, USA, June 28 - July 1, 2016, Proceedings, K. S. Xu, D. Reitter, D. Lee, and N. Osgood, Eds., ser. Lecture Notes in Computer Science, vol. 9708, Spring...

work page doi:10.1007/978-3-319-39931-7 2016

[18] [18]

Dynamic reconstruction of inﬂuence graphs with adaptive directed information,

——, “Dynamic reconstruction of inﬂuence graphs with adaptive directed information,” in2017 IEEE Interna- tional Conference on Acoustics, Speech and Signal Pro- cessing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017, IEEE, 2017, pp. 5935–5939, ISBN : 978-1- 5090-4117-6. DOI: 10.1109/ICASSP.2017.7953295 . [On- line]. Available: https://doi.org/10.1109/...

work page doi:10.1109/icassp.2017.7953295 2017

[19] [19]

Multi-layer relevance networks,

B. Oselio, S. Liu, and A. Hero, “Multi-layer relevance networks,” in 19th IEEE International Workshop on Signal Processing Advances in Wireless Communica- tions, SPAWC 2018, Kalamata, Greece, June 25-28, 2018, IEEE, 2018, pp. 1–5, ISBN : 978-1-5386-3512-4. DOI: 10.1109/SPAWC.2018.8446016 . [Online]. Available: https://doi.org/10.1109/SPAWC.2018.8446016

work page doi:10.1109/spawc.2018.8446016 2018

[20] [20]

Estimating the directed information to infer causal relationships in ensemble neural spike train recordings,

C. J. Quinn, T. P. Coleman, N. Kiyavash, and N. G. Hatsopoulos, “Estimating the directed information to infer causal relationships in ensemble neural spike train recordings,” Journal of computational neuroscience , vol. 30, no. 1, pp. 17–44, 2011

work page 2011

[21] [21]

Directed information graphs,

C. J. Quinn, N. Kiyavash, and T. P. Coleman, “Directed information graphs,” IEEE Transactions on information theory, vol. 61, no. 12, pp. 6887–6909, 2015

work page 2015

[22] [22]

Learning social etiquette: Human trajectory understand- ing in crowded scenes,

A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese, “Learning social etiquette: Human trajectory understand- ing in crowded scenes,” in European conference on computer vision, Springer, 2016, pp. 549–565

work page 2016

[23] [23]

Adapting to Non-stationarity with Growing Expert Ensembles

C. R. Shalizi, A. Z. Jacobs, K. L. Klinkner, and A. Clauset, “Adapting to non-stationarity with growing ex- pert ensembles,” ArXiv preprint arXiv:1103.0949, 2011

work page internal anchor Pith review Pith/arXiv arXiv 2011