Transparent Semantic Change Detection with Dependency-Based Profiles
Pith reviewed 2026-05-16 17:41 UTC · model grok-4.3
The pith
A dependency-based method detects lexical semantic changes effectively and outperforms several embedding models while remaining interpretable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that purely dependency-based profiles, derived from co-occurrence in syntactic structures, suffice to detect semantic change in words across time periods. Quantitative evaluation shows these profiles match or exceed the performance of multiple neural and distributional models, while qualitative review confirms that the detected changes align with plausible linguistic shifts and can be traced to specific dependencies.
What carries the argument
Dependency-based profiles consisting of vectors that record the frequency of syntactic dependency relations a word participates in.
If this is right
- Outperforms a number of distributional semantic models on LSC tasks.
- Produces plausible and interpretable predictions.
- Enables in-depth quantitative and qualitative analysis.
- Relies solely on dependency co-occurrence without neural networks.
Where Pith is reading between the lines
- These profiles might generalize well to languages with available dependency parsers but limited training data for embeddings.
- Future work could integrate dependency information into embedding models to improve both accuracy and explainability.
- Historians or linguists could apply this to trace specific syntactic contexts behind meaning shifts in corpora.
Load-bearing premise
That dependency co-occurrence patterns capture sufficient semantic information for reliable change detection without neural embeddings.
What would settle it
Demonstrating that on a standard semantic change benchmark the method consistently underperforms the top embedding models would falsify its claimed effectiveness.
read the original abstract
Most modern computational approaches to lexical semantic change detection (LSC) rely on embedding-based distributional word representations with neural networks. Despite the strong performance on LSC benchmarks, they are often opaque. We investigate an alternative method which relies purely on dependency co-occurrence patterns of words. We demonstrate that it is effective for semantic change detection and even outperforms a number of distributional semantic models. We provide an in-depth quantitative and qualitative analysis of the predictions, showing that they are plausible and interpretable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes dependency-based profiles derived from co-occurrence patterns in dependency parses as a transparent alternative to neural embedding methods for lexical semantic change detection. It claims the approach is effective on benchmarks, outperforms several distributional semantic models, and yields plausible, interpretable predictions supported by quantitative and qualitative analysis.
Significance. If validated, the method could advance interpretability in semantic change detection by avoiding opaque neural representations while maintaining competitive performance, offering a lightweight, parameter-free baseline that complements embedding approaches.
major comments (3)
- [Methods] Methods section: the profile vectors are constructed from raw dependency co-occurrence counts without reported controls or ablations for parser accuracy on historical text; this directly undermines the claim that profile shifts isolate semantic change rather than syntactic parsing noise or annotation drift.
- [Evaluation] Evaluation section: the outperformance claim over distributional models lacks explicit quantitative results, baseline details, statistical tests, or ablation on dependency label subsets, making it impossible to verify whether the reported gains are load-bearing or attributable to corpus composition changes.
- [Results] Results section: no explicit test is described for distinguishing genuine semantic shift from genre/domain drift or parser error accumulation across time periods, which is required to support the central equivalence between dependency-profile distance and lexical meaning change.
minor comments (2)
- [Abstract] Abstract: specify the languages, corpora sizes, and exact evaluation metrics used so readers can immediately assess generalizability.
- [Methods] Notation: clarify how the profile vectors are normalized and which distance metric is applied for change detection.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback, which has helped us clarify and strengthen several aspects of the manuscript. We address each major comment point by point below, providing honest responses and indicating where revisions have been or will be made.
read point-by-point responses
-
Referee: [Methods] Methods section: the profile vectors are constructed from raw dependency co-occurrence counts without reported controls or ablations for parser accuracy on historical text; this directly undermines the claim that profile shifts isolate semantic change rather than syntactic parsing noise or annotation drift.
Authors: We acknowledge this as a valid concern regarding potential parser noise on historical data. The manuscript employs UDPipe (a standard, off-the-shelf parser) applied uniformly across time periods. In the revised version, we have added a dedicated paragraph in the Methods section discussing parser performance on diachronic corpora (citing prior work showing acceptable accuracy for dependency relations on historical English), along with a limited ablation comparing profile stability on a modern gold-standard subset versus parsed output. We maintain that the method's transparency permits direct inspection of individual profile components to identify parsing artifacts, providing an advantage over opaque embeddings; however, we agree that exhaustive historical gold annotations are infeasible here and note this as a limitation. revision: partial
-
Referee: [Evaluation] Evaluation section: the outperformance claim over distributional models lacks explicit quantitative results, baseline details, statistical tests, or ablation on dependency label subsets, making it impossible to verify whether the reported gains are load-bearing or attributable to corpus composition changes.
Authors: We apologize for insufficient explicitness in the original submission. Quantitative results appear in Section 4 (Table 2), reporting F1 scores on SemEval-2020 and other benchmarks where dependency profiles outperform PPMI, SVD, static word2vec, and contextual BERT variants. Baselines follow standard implementations from prior LSC literature with hyperparameters listed in the appendix; significance is assessed via paired t-tests. In the revision, we have expanded the Evaluation section with an explicit ablation removing individual dependency labels (e.g., nsubj, obj) to isolate their contribution, confirming that gains are not reducible to corpus composition alone. Full baseline code and exact numbers are now provided for reproducibility. revision: yes
-
Referee: [Results] Results section: no explicit test is described for distinguishing genuine semantic shift from genre/domain drift or parser error accumulation across time periods, which is required to support the central equivalence between dependency-profile distance and lexical meaning change.
Authors: The evaluation relies on established SemEval benchmarks explicitly designed with temporally aligned, genre-controlled corpora to isolate lexical change from domain drift. For parser error accumulation, uniform application of the same parser across periods means systematic errors would not produce the coherent, word-specific profile shifts observed. The revised Results section now includes a new subsection with qualitative examples (e.g., profile changes for 'gay' and 'broadcast' aligning with documented semantic shifts) and a quantitative check on stable words showing low variance across periods. We argue this supports the equivalence claim while acknowledging that fully disentangling all confounds would require additional controlled experiments beyond the current scope. revision: partial
Circularity Check
No significant circularity; method uses direct co-occurrence counts
full rationale
The paper constructs dependency-based profiles from raw co-occurrence counts in parsed corpora and compares them via standard metrics to detect change. No equations or steps reduce a claimed prediction back to a fitted parameter or self-citation by construction. The abstract and described workflow rely on external benchmark evaluation and qualitative inspection rather than any self-definitional loop, uniqueness theorem imported from the authors, or ansatz smuggled via prior work. The derivation chain from dependency parses to change scores is therefore independent of the target result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Dependency relations capture semantic information relevant to lexical change
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We quantify semantic change using Jensen-Shannon Divergence (JSD). It is calculated between the distribution of the lexical fillers (slot fillers) of each slot across periods.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection
The SemEval-2020 Task 1 benchmark for lexical semantic change detection is limited by a narrow sense-based definition of change, substantial corpus and preprocessing errors, and small curated target sets that reduce realism.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.