ScholaWrite: A Dataset of End-to-End Scholarly Writing Process
read the original abstract
Writing is a cognitively demanding activity that requires constant decision-making, heavy reliance on working memory, and frequent shifts between tasks of different goals. To build writing assistants that truly align with writers' cognition, we must capture and decode the complete thought process behind how writers transform ideas into final texts. We present ScholaWrite, the first dataset of end-to-end scholarly writing, tracing the multi-month journey from initial drafts to final manuscripts. We contribute three key advances: (1) a Chrome extension that unobtrusively records keystrokes on Overleaf, enabling the collection of realistic, in-situ writing data; (2) a novel corpus of full scholarly manuscripts, enriched with fine-grained annotations of cognitive writing intentions. The dataset includes \LaTeX-based edits from five computer science preprints, capturing nearly 62K text changes over four months; and (3) analyses and insights into the micro-dynamics of scholarly writing, highlighting gaps between human writing processes and the current capabilities of large language models (LLMs) in providing meaningful assistance. ScholaWrite underscores the value of capturing end-to-end writing data to develop future writing assistants that support, not replace, the cognitive work of scientists.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Privacy-Preserving Proof of Human Authorship via Zero-Knowledge Process Attestation
ZK-PoP uses Groth16 proofs, Pedersen commitments, and Bulletproof range proofs to attest that behavioral feature vectors and content evolution match human patterns without exposing the raw data.
-
Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification
Cognitive Load Correlation from keystroke timings distinguishes genuine human composition from mechanical transcription with estimated 85-95% accuracy in a non-intrusive framework.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.