Tracking Temporal Evolution of Graphs using Non-Timestamped Data

Arquimedes Canedo; Palash Goyal; Sujit Rokka Chhetri

arxiv: 1907.02222 · v1 · pith:6IY5MDRXnew · submitted 2019-07-04 · 💻 cs.SI

Tracking Temporal Evolution of Graphs using Non-Timestamped Data

Sujit Rokka Chhetri , Palash Goyal , Arquimedes Canedo This is my paper

Pith reviewed 2026-05-25 08:50 UTC · model grok-4.3

classification 💻 cs.SI

keywords dynamic graphstemporal evolutionnon-timestamped dataYouTube datasetgraph clusteringcommunity migrationtime series forecastingevolving networks

0 comments

The pith

YoutubeGraph-Dyn dataset constructs time-evolving graphs from non-timestamped YouTube interactions with 416 snapshots every six hours.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces YoutubeGraph-Dyn as a dataset of evolving graphs derived from real YouTube interactions to address the scarcity of temporal graph data. It supplies 416 snapshots taken every six hours across 104 days along with multi-modal relationships and attributes such as word embeddings and integers. The core method builds these time-evolving graphs even though the source data lacks timestamps. The authors compute graph statistics and apply clustering algorithms to track community migration while testing time series and recurrent neural network models to forecast non-timestamped values. A reader would care because existing datasets rarely provide fine intra-day resolution for studying how networks change.

Core claim

YoutubeGraph-Dyn provides intra-day time granularity with 416 snapshots taken every 6 hours for a period of 104 days, multi-modal relationships that capture different aspects of the data, multiple attributes including timestamped, non-timestamped, word embeddings, and integers. The data collection methodology emphasizes the creation of time evolving graphs from non-timestamped data. Graph statistics are supplied and state-of-the-art clustering, time series, and recurrent neural network algorithms are tested on community migration and forecasting tasks.

What carries the argument

The data collection methodology that generates multiple timed snapshots to produce time-evolving graphs from originally non-timestamped interaction records.

If this is right

Graph clustering algorithms can be evaluated for their ability to detect community migration across the sequence of snapshots.
Time series analysis and recurrent neural network models can be tested on forecasting non-timestamped attributes using the fine-grained temporal structure.
The multi-modal relationships allow separate examination of different interaction types within the same evolving network.
The 416 snapshots supply a large number of time points for statistical analysis of graph properties over 104 days.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The snapshot construction approach could be adapted to other platforms whose interaction logs lack explicit timestamps.
Multi-modal edges might expose distinct evolution rates across relationship types that single-mode graphs would miss.
Forecasting performance on non-timestamped fields could serve as a proxy for how well dynamic models capture underlying user behavior.

Load-bearing premise

The data collection methodology can successfully create meaningful time-evolving graphs from non-timestamped YouTube interaction data.

What would settle it

If clustering algorithms applied to the 416 snapshots detect no consistent community migrations that correspond to external YouTube events or if forecasting accuracy on held-out non-timestamped attributes remains at random baseline levels, the generated graphs would lose practical value.

Figures

Figures reproduced from arXiv: 1907.02222 by Arquimedes Canedo, Palash Goyal, Sujit Rokka Chhetri.

**Figure 2.** Figure 2: Statistics of the temporal graphs over 104 days (Time interval = 6 hours). [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Inducing graph from the dataset. graph slightly decreases over time. It changes from 0.8176 from day 1 to 0.8114 from day 104. Highest degree. The average degree connectivity of a graph is calculated as follows [10]: k w nn,i = 1 si Õ j ∈N (i) wijkj (3) where si represents the weighted degree of the node i,Wij represents the weight of the edge that links i and j, and N(i) represents the neighbors of the no… view at source ↗

**Figure 5.** Figure 5: Channel’s subscriber count predictions [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 4.** Figure 4: Community clustering for the temporal graphs (for [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Exploration of hyperparameters for ARIMA, LSTM, and GRU models. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Datasets to study the temporal evolution of graphs are scarce. To encourage the research of novel dynamic graph learning algorithms we introduce YoutubeGraph-Dyn (available at https://github.com/palash1992/YoutubeGraph-Dyn), an evolving graph dataset generated from YouTube real-world interactions. YoutubeGraph-Dyn provides intra-day time granularity (with 416 snapshots taken every 6 hours for a period of 104 days), multi-modal relationships that capture different aspects of the data, multiple attributes including timestamped, non-timestamped, word embeddings, and integers. Our data collection methodology emphasizes the creation of time evolving graphs from non-timestamped data. In this paper, we provide various graph statistics of YoutubeGraph-Dyn and test state-of-the-art graph clustering algorithms to detect community migration, and time series analysis and recurrent neural network algorithms to forecast non-timestamped data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's real contribution is releasing YoutubeGraph-Dyn as a public temporal graph dataset, but the method for turning non-timestamped interactions into 6-hour snapshots is not described clearly enough to assess whether the claimed dynamics are reliable.

read the letter

The main thing here is a new dataset called YoutubeGraph-Dyn pulled from YouTube interactions. It gives 416 snapshots spaced every six hours across 104 days, plus multi-modal edges and attributes that mix timestamped fields, non-timestamped ones, embeddings, and integers. They make the data available on GitHub and run a couple of off-the-shelf tests: community detection to track migration and some time-series plus RNN forecasting on the non-timestamped parts. That is the actual new piece—another public evolving graph collection with intra-day resolution, which is still uncommon. The authors are straightforward about the scarcity of such data and position the release as a resource for algorithm work. That part is useful on its face. The weak point is the construction step. The abstract stresses that the methodology creates time-evolving graphs from non-timestamped sources, yet it gives no concrete account of how raw interactions get assigned to the six-hour windows or how the boundaries are set. Without that mapping or any cross-check against known timestamps, the temporal structure could be an artifact of the slicing rather than real evolution. The reported experiments do not include validation numbers or error analysis that would let a reader judge the quality of those slices. The paper is therefore mainly a data release with some illustrative runs rather than a fully supported claim about recovered dynamics. Readers who need sample temporal graphs for testing clustering or forecasting code will find it worth downloading and inspecting. Anyone who needs to trust the time labels for serious modeling will want the missing construction details first. I would send it to peer review so referees can ask for the exact assignment procedure and any available validation steps; the dataset itself is the kind of thing that can be refined and used even if the initial write-up is light on evidence.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces YoutubeGraph-Dyn, a dataset of time-evolving graphs derived from YouTube interactions. It claims to supply 416 snapshots at 6-hour intervals over 104 days, constructed from non-timestamped data, with multi-modal relationships and attributes including word embeddings. The authors report graph statistics and evaluate state-of-the-art clustering algorithms for community migration as well as time-series and RNN methods for forecasting non-timestamped attributes.

Significance. If the snapshot construction accurately reflects genuine temporal dynamics rather than collection artifacts, the dataset would offer a useful resource for dynamic graph research due to its intra-day granularity and multi-modal structure, which are uncommon in existing benchmarks. The public release on GitHub supports reproducibility.

major comments (2)

[Data Collection Methodology] The data collection methodology (described in the abstract and presumably §3 or equivalent) provides no explicit mapping from raw non-timestamped interactions to the 6-hour snapshot boundaries. Without this, it is impossible to verify whether the 416 snapshots capture real evolution or arbitrary partitions, which is load-bearing for the central claim of meaningful intra-day temporal structure.
[Experiments / Results] The abstract states that graph statistics are provided and that clustering/forecasting algorithms are tested, yet no quantitative results, validation metrics, or error analysis appear in the summary description. This prevents assessment of whether the derived temporal structure is accurate (soundness concern noted in review).

minor comments (1)

[Dataset Description] Clarify the exact number of nodes, edges, and modalities per snapshot to allow direct comparison with other dynamic graph benchmarks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript introducing YoutubeGraph-Dyn. We address each major comment below and describe the planned revisions.

read point-by-point responses

Referee: [Data Collection Methodology] The data collection methodology (described in the abstract and presumably §3 or equivalent) provides no explicit mapping from raw non-timestamped interactions to the 6-hour snapshot boundaries. Without this, it is impossible to verify whether the 416 snapshots capture real evolution or arbitrary partitions, which is load-bearing for the central claim of meaningful intra-day temporal structure.

Authors: We agree that greater explicitness is needed. Section 3 describes the aggregation of non-timestamped YouTube interactions into snapshots, but the boundary assignment logic can be clarified. In the revision we will add a dedicated subsection with pseudocode, a diagram of the collection-to-snapshot pipeline, and concrete examples showing how interaction timestamps determine the 6-hour intervals. This will confirm that the partitioning follows the actual data collection cadence rather than arbitrary cuts. revision: yes
Referee: [Experiments / Results] The abstract states that graph statistics are provided and that clustering/forecasting algorithms are tested, yet no quantitative results, validation metrics, or error analysis appear in the summary description. This prevents assessment of whether the derived temporal structure is accurate (soundness concern noted in review).

Authors: The referee's summary is drawn from the abstract, which is intentionally concise. The full manuscript (Sections 4–5) reports concrete graph statistics (node/edge counts, density, degree distributions per snapshot), clustering results (modularity and migration metrics across algorithms), and forecasting performance (MAE/RMSE for time-series and RNN models on non-timestamped attributes) together with validation details. To address the concern we will insert a short table of key quantitative highlights into the abstract and ensure all metrics are cross-referenced to the experimental sections. revision: partial

Circularity Check

0 steps flagged

No circularity; dataset release paper contains no derivations or self-referential reductions.

full rationale

The manuscript introduces YoutubeGraph-Dyn as a new evolving-graph dataset and reports standard statistics plus off-the-shelf clustering and forecasting experiments. No equations, fitted parameters, predictions, or uniqueness theorems appear. The data-collection description is presented as a methodological contribution rather than a derivation that reduces to its own inputs. Consequently no step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset introduction paper; no free parameters, mathematical axioms, or invented entities are involved in the contribution.

pith-pipeline@v0.9.0 · 5685 in / 1030 out tokens · 61984 ms · 2026-05-25T08:50:31.293546+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 4 internal anchors

[1]

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. CoRR abs/1609.08675 (2016). http://arxiv.org/abs/1609.08675

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Xu Cheng, Cameron Dale, and Jiangchuan Liu. 2008. Dataset for Statistics and Social Network of YouTube Videos. (2008). http://netsg.cs.sfu.ca/youtubedata/

work page 2008
[3]

Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated feedback recurrent neural networks. In International Conference on Machine Learning. 2067–2075

work page 2015
[4]

Javier Contreras, Rosario Espinola, Francisco J Nogales, and Antonio J Conejo. 2003. ARIMA models to predict next-day electricity prices. IEEE transactions on power systems 18, 3 (2003), 1014–1020

work page 2003
[5]

Johannes Gehrke, Paul Ginsparg, and Jon Kleinberg. 2003. Overview of the 2003 KDD Cup. SIGKDD Explor. Newsl. 5, 2 (Dec. 2003), 149–151

work page 2003
[6]

Google. 2016. Youtube-8M Dataset. (2016). https://research.google.com/youtube8m/

work page 2016
[7]

Palash Goyal, Sujit Rokka Chhetri, and Arquimedes Canedo. 2018. dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning. CoRR abs/1809.02657 (2018). http://arxiv.org/abs/1809.02657

work page arXiv 2018
[8]

Palash Goyal, Nitin Kamra, Xinran He, and Yan Liu. 2018. DynGEM: Deep Embedding Method for Dynamic Graphs. CoRR abs/1805.11273 (2018). http://arxiv.org/abs/1805.11273

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28, 10 (2017), 2222–2232

work page 2017
[10]

Aric Hagberg, Pieter Swart, and Daniel S Chult. 2008. Exploring network structure, dynamics, and function using NetworkX. Technical Report. Los Alamos National Lab.(LANL), Los Alamos, NM (United States)

work page 2008
[11]

Representation Learning on Graphs: Methods and Applications

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. CoRR abs/1709.05584 (2017). http://arxiv.org/abs/1709.05584

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Kaggle Mitchel J. 2017. Trending YouTube Video Statistics and Comments. (2017). https://www.kaggle.com/ datasnaek/youtube

work page 2017
[13]

Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD ’05). ACM, 177–187

work page 2005
[14]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605

work page 2008
[15]

Gummadi, Peter Druschel, and Bobby Bhattacharjee

Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Mea- surement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07). San Diego, CA

work page 2007
[16]

Alan E. Mislove. 2009. Online Social Networks: Measurement, Analysis, and Applications to Distributed Informa- tion Systems. Ph.D. Dissertation. Rice University

work page 2009
[17]

Jari Saramäki, Mikko Kivelä, Jukka-Pekka Onnela, Kimmo Kaski, and Janos Kertesz. 2007. Generalizations of the clustering coefficient to weighted complex networks. Physical Review E 75, 2 (2007), 027105

work page 2007
[18]

Statsmodels. 2019. Statistics in Python. (2019). https://www.statsmodels.org/

work page 2019
[19]

Stanford University. 2012. SNAP - Youtube social network and ground-truth communities. (2012). https://snap. stanford.edu/data/com-Youtube.html

work page 2012
[20]

Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1225–1234

work page 2016
[21]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A Comprehen- sive Survey on Graph Neural Networks. CoRR abs/1901.00596 (2019). http://arxiv.org/abs/1901.00596

work page arXiv 2019
[22]

Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. CoRR abs/1205.6233 (2012). http://arxiv.org/abs/1205.6233

work page internal anchor Pith review Pith/arXiv arXiv 2012
[23]

YouTube. 2019. Data API. (2019). https://developers.google.com/youtube/v3/

work page 2019
[24]

Ziwei Zhang, Peng Cui, Jian Pei, Xiao Wang, and Wenwu Zhu. 2018. Timers: Error-bounded svd restart on dynamic networks. In Thirty-Second AAAI Conference on Artificial Intelligence

work page 2018
[25]

Linhong Zhu, Dong Guo, Junming Yin, Greg Ver Steeg, and Aram Galstyan. 2016. Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2765–2777. 8

work page 2016

[1] [1]

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. CoRR abs/1609.08675 (2016). http://arxiv.org/abs/1609.08675

work page internal anchor Pith review Pith/arXiv arXiv 2016

[2] [2]

Xu Cheng, Cameron Dale, and Jiangchuan Liu. 2008. Dataset for Statistics and Social Network of YouTube Videos. (2008). http://netsg.cs.sfu.ca/youtubedata/

work page 2008

[3] [3]

Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated feedback recurrent neural networks. In International Conference on Machine Learning. 2067–2075

work page 2015

[4] [4]

Javier Contreras, Rosario Espinola, Francisco J Nogales, and Antonio J Conejo. 2003. ARIMA models to predict next-day electricity prices. IEEE transactions on power systems 18, 3 (2003), 1014–1020

work page 2003

[5] [5]

Johannes Gehrke, Paul Ginsparg, and Jon Kleinberg. 2003. Overview of the 2003 KDD Cup. SIGKDD Explor. Newsl. 5, 2 (Dec. 2003), 149–151

work page 2003

[6] [6]

Google. 2016. Youtube-8M Dataset. (2016). https://research.google.com/youtube8m/

work page 2016

[7] [7]

Palash Goyal, Sujit Rokka Chhetri, and Arquimedes Canedo. 2018. dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning. CoRR abs/1809.02657 (2018). http://arxiv.org/abs/1809.02657

work page arXiv 2018

[8] [8]

Palash Goyal, Nitin Kamra, Xinran He, and Yan Liu. 2018. DynGEM: Deep Embedding Method for Dynamic Graphs. CoRR abs/1805.11273 (2018). http://arxiv.org/abs/1805.11273

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28, 10 (2017), 2222–2232

work page 2017

[10] [10]

Aric Hagberg, Pieter Swart, and Daniel S Chult. 2008. Exploring network structure, dynamics, and function using NetworkX. Technical Report. Los Alamos National Lab.(LANL), Los Alamos, NM (United States)

work page 2008

[11] [11]

Representation Learning on Graphs: Methods and Applications

William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. CoRR abs/1709.05584 (2017). http://arxiv.org/abs/1709.05584

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Kaggle Mitchel J. 2017. Trending YouTube Video Statistics and Comments. (2017). https://www.kaggle.com/ datasnaek/youtube

work page 2017

[13] [13]

Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD ’05). ACM, 177–187

work page 2005

[14] [14]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605

work page 2008

[15] [15]

Gummadi, Peter Druschel, and Bobby Bhattacharjee

Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Mea- surement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07). San Diego, CA

work page 2007

[16] [16]

Alan E. Mislove. 2009. Online Social Networks: Measurement, Analysis, and Applications to Distributed Informa- tion Systems. Ph.D. Dissertation. Rice University

work page 2009

[17] [17]

Jari Saramäki, Mikko Kivelä, Jukka-Pekka Onnela, Kimmo Kaski, and Janos Kertesz. 2007. Generalizations of the clustering coefficient to weighted complex networks. Physical Review E 75, 2 (2007), 027105

work page 2007

[18] [18]

Statsmodels. 2019. Statistics in Python. (2019). https://www.statsmodels.org/

work page 2019

[19] [19]

Stanford University. 2012. SNAP - Youtube social network and ground-truth communities. (2012). https://snap. stanford.edu/data/com-Youtube.html

work page 2012

[20] [20]

Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1225–1234

work page 2016

[21] [21]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A Comprehen- sive Survey on Graph Neural Networks. CoRR abs/1901.00596 (2019). http://arxiv.org/abs/1901.00596

work page arXiv 2019

[22] [22]

Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. CoRR abs/1205.6233 (2012). http://arxiv.org/abs/1205.6233

work page internal anchor Pith review Pith/arXiv arXiv 2012

[23] [23]

YouTube. 2019. Data API. (2019). https://developers.google.com/youtube/v3/

work page 2019

[24] [24]

Ziwei Zhang, Peng Cui, Jian Pei, Xiao Wang, and Wenwu Zhu. 2018. Timers: Error-bounded svd restart on dynamic networks. In Thirty-Second AAAI Conference on Artificial Intelligence

work page 2018

[25] [25]

Linhong Zhu, Dong Guo, Junming Yin, Greg Ver Steeg, and Aram Galstyan. 2016. Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2765–2777. 8

work page 2016