Evaluating the Utility of Document Embedding Vector Difference for Relation Learning
Pith reviewed 2026-05-24 19:37 UTC · model grok-4.3
The pith
Document embedding vector differences help detect similar documents but perform poorly for classifying multiple relation types.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors demonstrate that document-level difference vectors obtained by subtracting pretrained document embeddings have utility in assessing document-level similarity on duplicate detection tasks, but perform less well when used for multi-relational classification on dialogue act tagging.
What carries the argument
Document embedding vector differences, formed by subtracting one pretrained document embedding from another and passed to a linear classifier for relation prediction.
If this is right
- Simple linear models on embedding differences can serve as a baseline for document similarity tasks.
- The same approach is unlikely to replace richer models when the task requires distinguishing many distinct relation types.
- Pretrained document embeddings contain linearly extractable similarity signals at the pair level.
Where Pith is reading between the lines
- The weaker multi-relational results hint that document embedding spaces may be less relationally structured than word embedding spaces.
- Non-linear classifiers or task-specific fine-tuning of embeddings could be tested to close the gap on multi-class cases.
Load-bearing premise
That the geometry of pretrained document embeddings already encodes relational information in a form that simple vector subtraction can extract linearly, the same way it does for words.
What would settle it
A controlled experiment on a new similarity dataset where difference vectors yield no improvement over random guessing or bag-of-words baselines would show the utility does not hold.
read the original abstract
Recent work has demonstrated that vector offsets obtained by subtracting pretrained word embedding vectors can be used to predict lexical relations with surprising accuracy. Inspired by this finding, in this paper, we extend the idea to the document level, in generating document-level embeddings, calculating the distance between them, and using a linear classifier to classify the relation between the documents. In the context of duplicate detection and dialogue act tagging tasks, we show that document-level difference vectors have utility in assessing document-level similarity, but perform less well in multi-relational classification.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends the vector-offset technique from word embeddings to document embeddings, computing difference vectors between pretrained document embeddings and training a linear classifier to predict relations between document pairs. It evaluates this approach on two tasks—duplicate detection (assessing similarity) and dialogue act tagging (multi-relational classification)—and reports that the difference vectors show utility for similarity but perform less well for multi-relational classification.
Significance. If the empirical results hold under proper controls, the work provides targeted evidence that document embedding geometry can encode certain relational signals in a linearly extractable form, at least for similarity detection. The scoped evaluation on two concrete tasks avoids overgeneralization and directly tests the linear-extractability assumption within those settings.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): the directional findings are reported without specifying the document embedding models, training corpora, baseline methods, or any statistical significance tests; these omissions make it impossible to evaluate whether the observed utility for similarity is robust or task-specific.
- [§4.2] §4.2 (Dialogue act tagging results): the claim that difference vectors 'perform less well' in multi-relational classification requires quantitative comparison to at least one non-difference baseline (e.g., concatenated embeddings or a non-linear classifier) and an error analysis; without these, the contrast with the duplicate-detection results cannot be assessed.
minor comments (3)
- [§3] Clarify whether the linear classifier is trained on the raw difference vectors or on additional features, and report any hyperparameter search.
- [§4.1] Add dataset statistics (size, class balance) and embedding dimensionality for both tasks.
- [§2] The related-work section should cite the original word-embedding offset papers (e.g., Mikolov et al.) and any prior document-level relation work.
Simulated Author's Rebuttal
We thank the referee for the positive assessment and constructive feedback. We address each major comment below and will revise the manuscript to address the points raised.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the directional findings are reported without specifying the document embedding models, training corpora, baseline methods, or any statistical significance tests; these omissions make it impossible to evaluate whether the observed utility for similarity is robust or task-specific.
Authors: We agree that the abstract and §4 would benefit from greater explicitness. The revised version will expand the abstract to name the document embedding models and training corpora, describe the baseline methods, and report statistical significance tests. Section 4 will be updated with the same details to permit evaluation of robustness and task-specificity. revision: yes
-
Referee: [§4.2] §4.2 (Dialogue act tagging results): the claim that difference vectors 'perform less well' in multi-relational classification requires quantitative comparison to at least one non-difference baseline (e.g., concatenated embeddings or a non-linear classifier) and an error analysis; without these, the contrast with the duplicate-detection results cannot be assessed.
Authors: We accept that the current presentation of the multi-relational results is insufficient without direct baselines. The revision will add quantitative comparisons to at least one non-difference baseline (concatenated embeddings) and a non-linear classifier, together with an error analysis on the dialogue act tagging task, to make the contrast with duplicate detection results clearer. revision: yes
Circularity Check
Empirical evaluation with no derivation chain
full rationale
The paper is an empirical study that extends word-embedding offset ideas to document embeddings, computes differences, and evaluates a linear classifier on duplicate detection and dialogue act tagging tasks. No mathematical derivation, uniqueness theorem, or fitted-parameter prediction is claimed; results are reported directly from the experiments without any step that reduces to inputs by construction or self-citation load-bearing. This is a standard non-circular empirical evaluation.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.