Methods, Data, and Conceptual Change: Reflections from Two Quantitative Diachronic Case Studies
Pith reviewed 2026-05-08 19:21 UTC · model grok-4.3
The pith
Comparative reflection on two quantitative studies shows dataset structure limits what kinds of semantic change frequency-based methods can detect.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through parallel examination of quad-based concept modelling on EEBO-TCP data (c. 1470s-1690s) and SynFlow analysis on the Royal Society Corpus (1750-1799), the paper establishes that dataset structure shapes the kinds of semantic change quantitative methods can reliably detect and thereby clarifies the inherent limits of approaches that rely solely on lexical frequency.
What carries the argument
Comparative methodological reflection that contrasts how each of the two chosen techniques operationalizes concepts, the data assumptions each carries, and the diachronic interpretations each supports.
Load-bearing premise
The assumption that the operational choices and interpretive limits observed in these two specific corpora and methods will hold for quantitative diachronic work in general.
What would settle it
A quantitative study using different corpora and methods that successfully detects all major types of semantic change without any detectable dependence on dataset structure would challenge the central claim.
read the original abstract
This discussion paper reflects on how quantitative approaches to historical linguistics interact with dataset properties. Drawing on two worked examples, we examine English data using quad-based concept modelling of Early Modern English discourse in EEBO-TCP (c. 1470s-1690s; 765M words) alongside SynFlow analysis of scientific writing in Royal Society Corpus 6.0.4 (1750-1799; drawn from a 78.6M-token open corpus). Through parallel comparison, the paper explores how each approach operationalises concepts, the data assumptions they entail, and the diachronic interpretations they support. We argue that comparative methodological reflection clarifies the limits of purely lexical, frequency-based approaches and highlights how dataset structure shapes the kinds of semantic change that quantitative methods can reliably detect.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This discussion paper reflects on how quantitative approaches to historical linguistics interact with dataset properties. Drawing on two worked examples, it examines English data using quad-based concept modelling of Early Modern English discourse in EEBO-TCP (c. 1470s-1690s; 765M words) alongside SynFlow analysis of scientific writing in Royal Society Corpus 6.0.4 (1750-1799; 78.6M tokens). Through parallel comparison, the paper explores how each approach operationalises concepts, the data assumptions they entail, and the diachronic interpretations they support. The central argument is that comparative methodological reflection clarifies the limits of purely lexical, frequency-based approaches and highlights how dataset structure shapes the kinds of semantic change that quantitative methods can reliably detect.
Significance. If the interpretive claims hold, the paper makes a useful contribution to computational historical linguistics by supplying concrete, parallel case studies that illustrate often-overlooked interactions between method and corpus structure. The explicit use of two large, publicly referenced corpora (EEBO-TCP and Royal Society Corpus) and the side-by-side comparison of distinct operationalizations provide practical grounding that strengthens the reflective argument. Such discussion pieces help the field move beyond purely lexical frequency counts toward more data-aware quantitative work.
minor comments (2)
- [Abstract] The abstract introduces 'quad-based concept modelling' and 'SynFlow analysis' without a one-sentence gloss or pointer to the relevant literature; adding brief definitional phrases would improve accessibility for readers outside the immediate subfield.
- [Case-study comparison section] The manuscript would benefit from a short table or paragraph that explicitly contrasts the two methods' handling of frequency information versus other features (e.g., co-occurrence patterns or syntactic context), making the claimed 'limits of purely lexical, frequency-based approaches' easier to evaluate directly from the examples.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation of our discussion paper and the recommendation for minor revision. The assessment correctly identifies the value of our side-by-side comparison of quad-based concept modelling on EEBO-TCP and SynFlow analysis on the Royal Society Corpus in clarifying interactions between method, data structure, and detectable semantic change.
Circularity Check
No circularity: reflective discussion without derivations or self-referential predictions
full rationale
The paper is explicitly a methodological reflection that draws on two independent worked examples (quad-based modelling on EEBO-TCP and SynFlow on the Royal Society Corpus) to illustrate interactions between quantitative techniques and corpus structure. No equations, fitted parameters, predictions, or uniqueness theorems are claimed; the central argument is interpretive and rests on direct comparison of the case studies rather than any reduction to inputs by construction or self-citation chains. All load-bearing steps are external to the paper's own data processing and remain falsifiable through the cited corpora and methods.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Quantitative methods can operationalize abstract concepts from historical text collections in ways that support diachronic interpretation
Reference graph
Works this paper leans on
-
[1]
Fischer, S., Knappen, J., Menzel, K., & Teich, E. (2020). The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Twelfth Lan...
work page 2020
-
[2]
Fitzmaurice, S., Robinson, J. A., Alexander, M., Hine, I. C., Mehl, S., & Dallachy, F. (2017). Linguistic DNA: investigating conceptual change in Early Modern English discourse. Studia Neophilologica, 89(sup1), 21-38
work page 2017
-
[3]
Fitzmaurice, S., & Mehl, S. (2022). Volatile concepts: Analysing discursive change through underspecification in co -occurrence quads. International Journal of Corpus Linguistics, 27(4), 428-450
work page 2022
-
[4]
Kermes, H., Degaetano-Ortlieb, S., Khamis, A., Knappen, J., & Teich, E. (2016). The Royal Society Corpus: From Uncharted Data to Corpus. In N. Calzolari, K. Choukri, T
work page 2016
-
[5]
Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) (pp. 1928–1931). European Language Resources Association (ELRA). https://aclanthology.org/L16-1305/
work page 1928
-
[6]
Knappen, J., Fischer, S., Kermes, H., Teich, E., & Fankhauser, P. (2017). The Making of the Royal Society Corpus. In G. Bouma & Y. Adesam (Eds.), Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language (pp. 7–11). Linköping University Electronic Press. https://aclanthology.org/W17-0503/
work page 2017
-
[7]
Menzel, K., Knappen, J., & Teich, E. (2021). Generating linguistically relevant metadata for the Royal Society Corpus. Research in Corpus Linguistics, 9(1), 1–18. https://doi.org/10.32714/ricl.09.01.02 Phan-Tất, B. (2025). SynFlow [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.17414457
-
[8]
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A Python Natural Language Processing Toolkit for Many Human Languages (arXiv:2003.07082). arXiv. https://doi.org/10.48550/arXiv.2003.07082 Text Creation Partnership (TCP). (2020). Early English Books Online Text Creation Partnership (EEBO-TCP): Phase I & II Transcriptions. https://...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.