Exploring temporal dynamics in digital trace data: mining user-sequences for communication research

Jakob Ohme; Lion Wedel; Yangliu Fan

arxiv: 2505.18790 · v1 · submitted 2025-05-24 · 💻 cs.SI

Exploring temporal dynamics in digital trace data: mining user-sequences for communication research

Yangliu Fan , Jakob Ohme , Lion Wedel This is my paper

Pith reviewed 2026-05-19 12:41 UTC · model grok-4.3

classification 💻 cs.SI

keywords digital trace datauser-sequencestemporal dynamicscommunication researchsequence analysisprocess miningdata donationslongitudinal data

0 comments

The pith

Digital trace data can be analyzed as time-evolving user-sequences to capture the temporal dynamics of communication.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Communication theory treats communication as a process that unfolds dynamically over time, yet many empirical methods aggregate data or ignore timestamps and therefore cannot test that view directly. This paper proposes keeping the fine-grained timestamps intact in donated digital traces and constructing time-evolving user-sequences that record user activity at high resolution. Six computational approaches, including sequence analysis, process mining, and language-based models, are applied to a real dataset of 1,262,775 traces from 309 users to illustrate how such sequences can be mined. A reader would care because the framework offers a practical way to align methods with the theoretical premise that timing and sequence matter in how people communicate.

Core claim

The paper claims that preserving hyper-longitudinal timestamp information in digital trace data and analyzing the resulting time-evolving user-sequences supplies rich, high-resolution information about user activity that non-dynamical methods miss, as demonstrated by applying sequence analysis, process mining, and related techniques to over a million timestamped traces collected via data donations from 309 users.

What carries the argument

Time-evolving user-sequences built from timestamped digital traces, which serve as structured input for sequence-analysis and process-mining tools to extract temporal patterns in communication behavior.

If this is right

Researchers can model typical sequences of user actions to test theories about the ordered flow of communication events.
Process-mining techniques can reveal common pathways through platforms or media over short time windows.
Language models applied to sequences can connect content with precise timing of when messages are sent or received.
The same dataset can support both minute-scale and month-scale analyses without losing temporal structure.
Data-donation studies become more valuable when collection emphasizes continuous timestamp recording rather than one-time snapshots.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the approach across multiple platforms could show how users shift between channels in real time.
Sequence patterns might be compared between demographic groups to identify differences in temporal habits that static measures overlook.
The framework could be tested by checking whether sequence-derived features improve predictions of future user engagement compared with non-sequential baselines.

Load-bearing premise

That applying existing sequence-analysis and process-mining tools to donated digital traces will produce demonstrable advances in communication theory beyond what static or aggregated methods already achieve.

What would settle it

A side-by-side analysis of the same donated trace dataset in which one team uses only static summaries or cross-sectional aggregates while another team mines the full user-sequences, then checks whether the sequence approach yields new, replicable findings about temporal ordering or change that the static approach does not.

read the original abstract

Communication is commonly considered a process that is dynamically situated in a temporal context. However, there remains a disconnection between such theoretical dynamicality and the non-dynamical character of communication scholars' preferred methodologies. In this paper, we argue for a new research framework that uses computational approaches to leverage the fine-grained timestamps recorded in digital trace data. In particular, we propose to maintain the hyper-longitudinal information in the trace data and analyze time-evolving 'user-sequences,' which provide rich information about user activity with high temporal resolution. To illustrate our proposed framework, we present a case study that applied six approaches (e.g., sequence analysis, process mining, and language-based models) to real-world user-sequences containing 1,262,775 timestamped traces from 309 unique users, gathered via data donations. Overall, our study suggests a conceptual reorientation towards a better understanding of the temporal dimension in communication processes, resting on the exploding supply of digital trace data and the technical advances in analytical approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper makes a fair case for keeping temporal order in digital trace data via user-sequences but the case study applies tools without showing gains over static summaries.

read the letter

The main takeaway is that this paper pushes communication researchers to keep the full temporal order in digital trace data instead of summarizing it away, and it shows how sequence analysis and process mining could help with that. They do a decent job laying out why this matters. Theory often talks about communication as unfolding over time, but data practices tend to use cross-sections or totals. By focusing on user-sequences built from timestamped events, the approach could capture things like the rhythm of app use or how one interaction leads to another. The case study draws on actual donated data with over a million traces, which gives it some grounding rather than staying purely abstract. Where it falls short is in showing that these dynamical tools deliver insights you couldn't get from simpler non-temporal methods. The manuscript describes running six different approaches on the dataset but stops short of any quantitative comparison or ablation that would demonstrate added value. For instance, we don't see whether sequence patterns predict user behavior better than just looking at total time spent or average session length. That leaves the central claim more as a suggestion than a tested improvement. The work is aimed at people in communication studies who are comfortable with computational methods and want to incorporate more process-oriented thinking. It won't appeal much to those hunting for big new empirical results or formal proofs. Overall the thinking is clear and it engages honestly with the literature on temporal dynamics. I would send this to peer review, mainly to get feedback on how to strengthen the empirical demonstration in the case study.

Referee Report

2 major / 1 minor

Summary. The manuscript argues that communication theory emphasizes dynamic, temporally situated processes, yet prevailing methodologies remain non-dynamical. It proposes a framework that preserves hyper-longitudinal timestamps in digital trace data and analyzes time-evolving user-sequences via computational tools including sequence analysis, process mining, and language-based models. The framework is illustrated by a case study applying six such approaches to 1,262,775 timestamped events donated by 309 users, with the overall suggestion that this constitutes a conceptual reorientation toward the temporal dimension in communication research.

Significance. If the framework can be shown to generate falsifiable predictions or refined hypotheses unavailable from static summaries of the same traces, the work would hold moderate significance for communication research by leveraging abundant digital trace data. The use of real donated data in the case study is a concrete strength that supports reproducibility and grounds the proposal in empirical material.

major comments (2)

Abstract and case-study description: the manuscript states that the six approaches were applied to the 1,262,775 events yet reports no quantitative metrics, error bars, ablation results, or side-by-side comparisons against non-dynamical baselines (frequency counts, duration aggregates, or cross-sectional correlations). This leaves the central claim—that dynamical user-sequence analysis yields communication insights beyond static methods—unsupported by evidence rather than demonstrated.
Case-study section: without explicit benchmarking or metrics showing that the outputs of sequence analysis or process mining produce distinct theoretical contributions or falsifiable predictions unavailable from the identical dataset summarized statically, the weakest assumption identified in the stress-test note remains unaddressed and load-bearing for the proposed reorientation.

minor comments (1)

The abstract and main text would benefit from a brief enumeration of the exact six approaches and the precise sequence-mining or process-mining algorithms employed, to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting areas where the empirical illustration could be strengthened. We address each major comment below, clarifying the scope of the case study while indicating revisions to improve the manuscript.

read point-by-point responses

Referee: Abstract and case-study description: the manuscript states that the six approaches were applied to the 1,262,775 events yet reports no quantitative metrics, error bars, ablation results, or side-by-side comparisons against non-dynamical baselines (frequency counts, duration aggregates, or cross-sectional correlations). This leaves the central claim—that dynamical user-sequence analysis yields communication insights beyond static methods—unsupported by evidence rather than demonstrated.

Authors: The manuscript's core contribution is a conceptual framework for preserving hyper-longitudinal timestamps and analyzing user-sequences with dynamical methods. The case study illustrates the application of six approaches to a real donated dataset of 1,262,775 events rather than serving as a comparative empirical test. We agree that the current presentation could more explicitly distinguish illustration from validation. In revision we will update the abstract and case-study description to emphasize the illustrative intent, add a limitations paragraph noting the absence of quantitative benchmarks against static baselines, and outline directions for future work that could include such metrics and ablation studies. revision: partial
Referee: Case-study section: without explicit benchmarking or metrics showing that the outputs of sequence analysis or process mining produce distinct theoretical contributions or falsifiable predictions unavailable from the identical dataset summarized statically, the weakest assumption identified in the stress-test note remains unaddressed and load-bearing for the proposed reorientation.

Authors: We acknowledge that explicit side-by-side benchmarking would more directly address whether dynamical outputs generate unique theoretical value. The case study demonstrates concrete outputs (e.g., sequence patterns and process models) from the donated data that static frequency or duration aggregates would not surface in the same form. However, the paper does not claim to have produced falsifiable predictions in this illustration. We will revise the case-study section to include qualitative examples of temporal patterns revealed by the dynamical methods that are not visible in static summaries, and we will add a forward-looking subsection on how communication researchers could design benchmarking studies to test for distinct contributions. revision: partial

Circularity Check

0 steps flagged

No circularity: proposal applies existing tools to trace data without self-referential reductions.

full rationale

The paper proposes a conceptual framework for analyzing time-evolving user-sequences from digital trace data and illustrates it via a case study applying six established approaches (sequence analysis, process mining, language models) to 1,262,775 timestamped events. No equations, fitted parameters, predictions, or first-principles derivations appear that reduce to the inputs by construction. The argument for maintaining hyper-longitudinal information rests on the properties of donated trace data and advances in computational methods rather than tautological definitions or self-citation chains. The central claim is a methodological reorientation whose validity can be assessed externally against static baselines, with no load-bearing step that is equivalent to its own premise.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that fine-grained timestamps in donated digital traces contain usable hyper-longitudinal information about communication processes; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Digital trace data records fine-grained timestamps that can be preserved as hyper-longitudinal user-sequences.
Invoked in the proposal to maintain temporal resolution instead of aggregating data.

pith-pipeline@v0.9.0 · 5705 in / 988 out tokens · 31060 ms · 2026-05-19T12:41:07.268707+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/ArrowOfTime.lean arrow_from_z unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose to maintain the hyper-longitudinal information in the trace data and analyze time-evolving 'user-sequences'
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sequence analysis, process mining, and language-based models

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.