Translator2Vec: Understanding and Representing Human Post-Editors

Andr\'e F. T. Martins; Ant\'onio G\'ois

arxiv: 1907.10362 · v1 · pith:GZTFAUUWnew · submitted 2019-07-24 · 💻 cs.CL

Translator2Vec: Understanding and Representing Human Post-Editors

Ant\'onio G\'ois , Andr\'e F. T. Martins This is my paper

Pith reviewed 2026-05-24 16:55 UTC · model grok-4.3

classification 💻 cs.CL

keywords post-editingmachine translationaction sequencesuser modelingrepresentation learningpost-editor identificationediting time prediction

0 comments

The pith

Action sequences from post-editing sessions identify individual post-editors more accurately than the initial or final text alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper releases a dataset of 66,268 document-level post-editing sessions by 332 humans that records keystrokes, mouse actions, and waiting times. It shows these action sequences contain enough information to identify which post-editor performed the work, outperforming baselines that rely only on source and target text. The authors then learn continuous vector representations of post-editors from the sequences and demonstrate that the representations improve prediction of how long post-editing will take.

Core claim

Action sequences are informative enough to identify post-editors accurately compared to baselines that only look at the initial and final text. Continuous representations learned from these sequences improve the downstream task of predicting post-editing time.

What carries the argument

Translator2Vec, continuous vector representations of post-editors learned from their sequences of keystrokes, mouse actions, and pauses during post-editing.

If this is right

Post-editors exhibit consistent individual patterns in how they interact with machine translation output.
These patterns provide a stronger signal for modeling editing behavior than the linguistic content alone.
Representations derived from actions can be used as features to forecast post-editing duration more accurately.
Visualization of the learned vectors can expose groupings among different post-editing styles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same action-based modeling approach could be tested on other human-AI editing tasks such as code review or document revision.
If the representations prove stable over time, they might support adaptive interfaces that adjust suggestions to an individual post-editor's observed habits.
Aggregating representations across many editors could reveal population-level differences in editing efficiency linked to experience or language pair.

Load-bearing premise

The recorded action sequences contain stable, person-specific patterns that generalize beyond the training documents and are not dominated by document content or interface artifacts.

What would settle it

A post-editor identifier trained on action sequences from one collection of documents would fail to identify the same editors above text-only baseline accuracy when tested on a new collection of different documents.

read the original abstract

The combination of machines and humans for translation is effective, with many studies showing productivity gains when humans post-edit machine-translated output instead of translating from scratch. To take full advantage of this combination, we need a fine-grained understanding of how human translators work, and which post-editing styles are more effective than others. In this paper, we release and analyze a new dataset with document-level post-editing action sequences, including edit operations from keystrokes, mouse actions, and waiting times. Our dataset comprises 66,268 full document sessions post-edited by 332 humans, the largest of the kind released to date. We show that action sequences are informative enough to identify post-editors accurately, compared to baselines that only look at the initial and final text. We build on this to learn and visualize continuous representations of post-editors, and we show that these representations improve the downstream task of predicting post-editing time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper releases a large dataset of post-editing action sequences and shows they identify editors better than text baselines while aiding time prediction, but the splits need checking to rule out document leakage.

read the letter

The main takeaway is a new dataset of 66k full document post-editing sessions from 332 people, with keystroke, mouse, and timing logs. They use this to show action sequences identify the editor more accurately than baselines using only source and target text, then learn continuous embeddings of editors that improve prediction of editing time on held-out data. That combination of scale and downstream use is the concrete advance over prior translator modeling work. The dataset release itself is the part most likely to see reuse. The modeling applies standard embedding techniques to a new domain, which is fine but not a methodological leap. The central empirical claims rest on the identification and time-prediction results. Those look plausible from the abstract, but the evaluation design is the soft spot. If train and test sessions share documents or if editors are assigned disjoint document sets, the sequence model can succeed by recognizing the translation task rather than stable person-specific patterns. The abstract does not state whether splits are document-disjoint or session-disjoint, so the reported gains could partly reflect content or interface artifacts. That needs explicit verification in the methods section. The work is aimed at researchers building interactive MT systems or user models for post-editing. Anyone who needs real action logs at this scale will find the data useful even if they ignore the embeddings. The paper is coherent on its own terms and engages the right prior literature on post-editing productivity. It deserves a serious referee because the dataset is a verifiable contribution and the claims are falsifiable once the splits are examined. I would send it to review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The paper releases a dataset of 66,268 document-level post-editing sessions by 332 translators, consisting of keystroke, mouse, and waiting-time action sequences. It claims these sequences enable more accurate post-editor identification than baselines using only initial and final text, and that continuous editor representations (Translator2Vec) learned from the sequences improve downstream prediction of post-editing time.

Significance. If the identification and representation results hold under document-disjoint evaluation, the work supplies the largest public resource for modeling individual post-editing behavior and demonstrates that action sequences carry stable, person-specific signals usable for both identification and time prediction. The dataset release itself is a clear contribution to translation process research.

major comments (2)

[Dataset and experimental protocol section] Dataset and experimental protocol section: the description of the 66k sessions and 332 editors does not state whether editor-identification experiments employ document-disjoint or editor-session-disjoint splits. Without this control, sequence models can succeed by recognizing document identity or source-text properties rather than editor-specific patterns, directly undermining the central claim that action sequences are informative beyond text-only baselines.
[Time-prediction experiments (likely §5)] Time-prediction experiments (likely §5): no ablation or error analysis is reported to show that the reported gains from Translator2Vec embeddings are not reducible to session-level statistics (e.g., average pause time or total keystrokes) already available to a non-embedding baseline. This is load-bearing for the claim that the learned representations improve the downstream task.

minor comments (2)

[Abstract] The abstract states clear empirical results but the methods, exact baseline definitions, and evaluation metrics are not summarized; a one-sentence methods overview would improve readability.
[Notation for action sequences] Notation for action sequences (keystrokes, mouse, waits) should be defined once with a table of symbols before the first results table.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for identifying two points that require clarification to strengthen the experimental claims. We respond to each major comment below.

read point-by-point responses

Referee: [Dataset and experimental protocol section] Dataset and experimental protocol section: the description of the 66k sessions and 332 editors does not state whether editor-identification experiments employ document-disjoint or editor-session-disjoint splits. Without this control, sequence models can succeed by recognizing document identity or source-text properties rather than editor-specific patterns, directly undermining the central claim that action sequences are informative beyond text-only baselines.

Authors: We agree that the type of split is critical for validating that the models capture editor-specific signals rather than document or source-text properties. The revised manuscript will explicitly state that the editor-identification experiments employ document-disjoint splits (no document overlap between training and test for any editor). This detail was omitted from the original description but is consistent with the experimental intent; we will add the necessary protocol description in the Dataset and experimental protocol section. revision: yes
Referee: [Time-prediction experiments (likely §5)] Time-prediction experiments (likely §5): no ablation or error analysis is reported to show that the reported gains from Translator2Vec embeddings are not reducible to session-level statistics (e.g., average pause time or total keystrokes) already available to a non-embedding baseline. This is load-bearing for the claim that the learned representations improve the downstream task.

Authors: We concur that demonstrating the added value of the learned embeddings over simple session-level aggregates is important. The revised version will include an ablation study in the time-prediction section that compares Translator2Vec-augmented models against baselines augmented with explicit session statistics (average pause time, total keystrokes, number of mouse actions, etc.). This will clarify whether the continuous representations provide gains beyond those aggregates. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are empirically derived from held-out data.

full rationale

The paper trains sequence models on post-editing action data to perform editor identification (vs. text baselines) and to learn continuous representations, then evaluates those representations on a separate downstream task of post-editing time prediction. No equations or steps reduce the reported gains to fitted parameters by construction, no self-citation chains bear the central claims, and the evaluation uses held-out sessions, making the results falsifiable against external benchmarks rather than self-definitional.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 1 invented entities

Standard embedding learning assumptions plus the new dataset; no invented physical entities.

free parameters (1)

embedding dimension
Hyperparameter chosen for the representation model, typical in embedding work.

invented entities (1)

Translator2Vec no independent evidence
purpose: Name for the learned continuous post-editor representations
Descriptive label for the embedding model output, not a new postulated object with independent evidence.

pith-pipeline@v0.9.0 · 5690 in / 992 out tokens · 21421 ms · 2026-05-24T16:55:57.963252+00:00 · methodology

Translator2Vec: Understanding and Representing Human Post-Editors

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)