pith. sign in

arxiv: 1907.02222 · v1 · pith:6IY5MDRXnew · submitted 2019-07-04 · 💻 cs.SI

Tracking Temporal Evolution of Graphs using Non-Timestamped Data

Pith reviewed 2026-05-25 08:50 UTC · model grok-4.3

classification 💻 cs.SI
keywords dynamic graphstemporal evolutionnon-timestamped dataYouTube datasetgraph clusteringcommunity migrationtime series forecastingevolving networks
0
0 comments X

The pith

YoutubeGraph-Dyn dataset constructs time-evolving graphs from non-timestamped YouTube interactions with 416 snapshots every six hours.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces YoutubeGraph-Dyn as a dataset of evolving graphs derived from real YouTube interactions to address the scarcity of temporal graph data. It supplies 416 snapshots taken every six hours across 104 days along with multi-modal relationships and attributes such as word embeddings and integers. The core method builds these time-evolving graphs even though the source data lacks timestamps. The authors compute graph statistics and apply clustering algorithms to track community migration while testing time series and recurrent neural network models to forecast non-timestamped values. A reader would care because existing datasets rarely provide fine intra-day resolution for studying how networks change.

Core claim

YoutubeGraph-Dyn provides intra-day time granularity with 416 snapshots taken every 6 hours for a period of 104 days, multi-modal relationships that capture different aspects of the data, multiple attributes including timestamped, non-timestamped, word embeddings, and integers. The data collection methodology emphasizes the creation of time evolving graphs from non-timestamped data. Graph statistics are supplied and state-of-the-art clustering, time series, and recurrent neural network algorithms are tested on community migration and forecasting tasks.

What carries the argument

The data collection methodology that generates multiple timed snapshots to produce time-evolving graphs from originally non-timestamped interaction records.

If this is right

  • Graph clustering algorithms can be evaluated for their ability to detect community migration across the sequence of snapshots.
  • Time series analysis and recurrent neural network models can be tested on forecasting non-timestamped attributes using the fine-grained temporal structure.
  • The multi-modal relationships allow separate examination of different interaction types within the same evolving network.
  • The 416 snapshots supply a large number of time points for statistical analysis of graph properties over 104 days.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The snapshot construction approach could be adapted to other platforms whose interaction logs lack explicit timestamps.
  • Multi-modal edges might expose distinct evolution rates across relationship types that single-mode graphs would miss.
  • Forecasting performance on non-timestamped fields could serve as a proxy for how well dynamic models capture underlying user behavior.

Load-bearing premise

The data collection methodology can successfully create meaningful time-evolving graphs from non-timestamped YouTube interaction data.

What would settle it

If clustering algorithms applied to the 416 snapshots detect no consistent community migrations that correspond to external YouTube events or if forecasting accuracy on held-out non-timestamped attributes remains at random baseline levels, the generated graphs would lose practical value.

Figures

Figures reproduced from arXiv: 1907.02222 by Arquimedes Canedo, Palash Goyal, Sujit Rokka Chhetri.

Figure 1
Figure 1. Figure 1: Sample graphs generated from the dataset. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Statistics of the temporal graphs over 104 days (Time interval = 6 hours). [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Inducing graph from the dataset. graph slightly decreases over time. It changes from 0.8176 from day 1 to 0.8114 from day 104. Highest degree. The average degree connectivity of a graph is calculated as follows [10]: k w nn,i = 1 si Õ j ∈N (i) wijkj (3) where si represents the weighted degree of the node i,Wij represents the weight of the edge that links i and j, and N(i) represents the neighbors of the no… view at source ↗
Figure 5
Figure 5. Figure 5: Channel’s subscriber count predictions [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Community clustering for the temporal graphs (for [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Exploration of hyperparameters for ARIMA, LSTM, and GRU models. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Datasets to study the temporal evolution of graphs are scarce. To encourage the research of novel dynamic graph learning algorithms we introduce YoutubeGraph-Dyn (available at https://github.com/palash1992/YoutubeGraph-Dyn), an evolving graph dataset generated from YouTube real-world interactions. YoutubeGraph-Dyn provides intra-day time granularity (with 416 snapshots taken every 6 hours for a period of 104 days), multi-modal relationships that capture different aspects of the data, multiple attributes including timestamped, non-timestamped, word embeddings, and integers. Our data collection methodology emphasizes the creation of time evolving graphs from non-timestamped data. In this paper, we provide various graph statistics of YoutubeGraph-Dyn and test state-of-the-art graph clustering algorithms to detect community migration, and time series analysis and recurrent neural network algorithms to forecast non-timestamped data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces YoutubeGraph-Dyn, a dataset of time-evolving graphs derived from YouTube interactions. It claims to supply 416 snapshots at 6-hour intervals over 104 days, constructed from non-timestamped data, with multi-modal relationships and attributes including word embeddings. The authors report graph statistics and evaluate state-of-the-art clustering algorithms for community migration as well as time-series and RNN methods for forecasting non-timestamped attributes.

Significance. If the snapshot construction accurately reflects genuine temporal dynamics rather than collection artifacts, the dataset would offer a useful resource for dynamic graph research due to its intra-day granularity and multi-modal structure, which are uncommon in existing benchmarks. The public release on GitHub supports reproducibility.

major comments (2)
  1. [Data Collection Methodology] The data collection methodology (described in the abstract and presumably §3 or equivalent) provides no explicit mapping from raw non-timestamped interactions to the 6-hour snapshot boundaries. Without this, it is impossible to verify whether the 416 snapshots capture real evolution or arbitrary partitions, which is load-bearing for the central claim of meaningful intra-day temporal structure.
  2. [Experiments / Results] The abstract states that graph statistics are provided and that clustering/forecasting algorithms are tested, yet no quantitative results, validation metrics, or error analysis appear in the summary description. This prevents assessment of whether the derived temporal structure is accurate (soundness concern noted in review).
minor comments (1)
  1. [Dataset Description] Clarify the exact number of nodes, edges, and modalities per snapshot to allow direct comparison with other dynamic graph benchmarks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript introducing YoutubeGraph-Dyn. We address each major comment below and describe the planned revisions.

read point-by-point responses
  1. Referee: [Data Collection Methodology] The data collection methodology (described in the abstract and presumably §3 or equivalent) provides no explicit mapping from raw non-timestamped interactions to the 6-hour snapshot boundaries. Without this, it is impossible to verify whether the 416 snapshots capture real evolution or arbitrary partitions, which is load-bearing for the central claim of meaningful intra-day temporal structure.

    Authors: We agree that greater explicitness is needed. Section 3 describes the aggregation of non-timestamped YouTube interactions into snapshots, but the boundary assignment logic can be clarified. In the revision we will add a dedicated subsection with pseudocode, a diagram of the collection-to-snapshot pipeline, and concrete examples showing how interaction timestamps determine the 6-hour intervals. This will confirm that the partitioning follows the actual data collection cadence rather than arbitrary cuts. revision: yes

  2. Referee: [Experiments / Results] The abstract states that graph statistics are provided and that clustering/forecasting algorithms are tested, yet no quantitative results, validation metrics, or error analysis appear in the summary description. This prevents assessment of whether the derived temporal structure is accurate (soundness concern noted in review).

    Authors: The referee's summary is drawn from the abstract, which is intentionally concise. The full manuscript (Sections 4–5) reports concrete graph statistics (node/edge counts, density, degree distributions per snapshot), clustering results (modularity and migration metrics across algorithms), and forecasting performance (MAE/RMSE for time-series and RNN models on non-timestamped attributes) together with validation details. To address the concern we will insert a short table of key quantitative highlights into the abstract and ensure all metrics are cross-referenced to the experimental sections. revision: partial

Circularity Check

0 steps flagged

No circularity; dataset release paper contains no derivations or self-referential reductions.

full rationale

The manuscript introduces YoutubeGraph-Dyn as a new evolving-graph dataset and reports standard statistics plus off-the-shelf clustering and forecasting experiments. No equations, fitted parameters, predictions, or uniqueness theorems appear. The data-collection description is presented as a methodological contribution rather than a derivation that reduces to its own inputs. Consequently no step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a dataset introduction paper; no free parameters, mathematical axioms, or invented entities are involved in the contribution.

pith-pipeline@v0.9.0 · 5685 in / 1030 out tokens · 61984 ms · 2026-05-25T08:50:31.293546+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · 4 internal anchors

  1. [1]

    Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. YouTube-8M: A Large-Scale Video Classification Benchmark. CoRR abs/1609.08675 (2016). http://arxiv.org/abs/1609.08675

  2. [2]

    Xu Cheng, Cameron Dale, and Jiangchuan Liu. 2008. Dataset for Statistics and Social Network of YouTube Videos. (2008). http://netsg.cs.sfu.ca/youtubedata/

  3. [3]

    Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated feedback recurrent neural networks. In International Conference on Machine Learning. 2067–2075

  4. [4]

    Javier Contreras, Rosario Espinola, Francisco J Nogales, and Antonio J Conejo. 2003. ARIMA models to predict next-day electricity prices. IEEE transactions on power systems 18, 3 (2003), 1014–1020

  5. [5]

    Johannes Gehrke, Paul Ginsparg, and Jon Kleinberg. 2003. Overview of the 2003 KDD Cup. SIGKDD Explor. Newsl. 5, 2 (Dec. 2003), 149–151

  6. [6]

    Google. 2016. Youtube-8M Dataset. (2016). https://research.google.com/youtube8m/

  7. [7]

    Palash Goyal, Sujit Rokka Chhetri, and Arquimedes Canedo. 2018. dyngraph2vec: Capturing Network Dynamics using Dynamic Graph Representation Learning. CoRR abs/1809.02657 (2018). http://arxiv.org/abs/1809.02657

  8. [8]

    Palash Goyal, Nitin Kamra, Xinran He, and Yan Liu. 2018. DynGEM: Deep Embedding Method for Dynamic Graphs. CoRR abs/1805.11273 (2018). http://arxiv.org/abs/1805.11273

  9. [9]

    Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28, 10 (2017), 2222–2232

  10. [10]

    Aric Hagberg, Pieter Swart, and Daniel S Chult. 2008. Exploring network structure, dynamics, and function using NetworkX. Technical Report. Los Alamos National Lab.(LANL), Los Alamos, NM (United States)

  11. [11]

    Representation Learning on Graphs: Methods and Applications

    William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. CoRR abs/1709.05584 (2017). http://arxiv.org/abs/1709.05584

  12. [12]

    Kaggle Mitchel J. 2017. Trending YouTube Video Statistics and Comments. (2017). https://www.kaggle.com/ datasnaek/youtube

  13. [13]

    Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD ’05). ACM, 177–187

  14. [14]

    Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605

  15. [15]

    Gummadi, Peter Druschel, and Bobby Bhattacharjee

    Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Mea- surement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/Usenix Internet Measurement Conference (IMC’07). San Diego, CA

  16. [16]

    Alan E. Mislove. 2009. Online Social Networks: Measurement, Analysis, and Applications to Distributed Informa- tion Systems. Ph.D. Dissertation. Rice University

  17. [17]

    Jari Saramäki, Mikko Kivelä, Jukka-Pekka Onnela, Kimmo Kaski, and Janos Kertesz. 2007. Generalizations of the clustering coefficient to weighted complex networks. Physical Review E 75, 2 (2007), 027105

  18. [18]

    Statsmodels. 2019. Statistics in Python. (2019). https://www.statsmodels.org/

  19. [19]

    Stanford University. 2012. SNAP - Youtube social network and ground-truth communities. (2012). https://snap. stanford.edu/data/com-Youtube.html

  20. [20]

    Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1225–1234

  21. [21]

    Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A Comprehen- sive Survey on Graph Neural Networks. CoRR abs/1901.00596 (2019). http://arxiv.org/abs/1901.00596

  22. [22]

    Jaewon Yang and Jure Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. CoRR abs/1205.6233 (2012). http://arxiv.org/abs/1205.6233

  23. [23]

    YouTube. 2019. Data API. (2019). https://developers.google.com/youtube/v3/

  24. [24]

    Ziwei Zhang, Peng Cui, Jian Pei, Xiao Wang, and Wenwu Zhu. 2018. Timers: Error-bounded svd restart on dynamic networks. In Thirty-Second AAAI Conference on Artificial Intelligence

  25. [25]

    Linhong Zhu, Dong Guo, Junming Yin, Greg Ver Steeg, and Aram Galstyan. 2016. Scalable temporal latent space inference for link prediction in dynamic social networks. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2765–2777. 8