Enhancing Scientific Discourse: Machine Translation for the Scientific Domain

Dimitris Roussis; Sokratis Sofianopoulos; Stelios Piperidis

arxiv: 2605.20912 · v1 · pith:2W5VBRUSnew · submitted 2026-05-20 · 💻 cs.CL

Enhancing Scientific Discourse: Machine Translation for the Scientific Domain

Dimitris Roussis , Sokratis Sofianopoulos , Stelios Piperidis This is my paper

Pith reviewed 2026-05-21 04:46 UTC · model grok-4.3

classification 💻 cs.CL

keywords scientific machine translationparallel corporaneural machine translationfine-tuningdomain adaptationSpanish-English translationFrench-English translationPortuguese-English translation

0 comments

The pith

Specialized scientific corpora can be used to fine-tune neural machine translation systems for improved performance in technical domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops collections of parallel and monolingual corpora for scientific texts in Spanish-English, French-English, and Portuguese-English language pairs. These include a large general scientific corpus and smaller ones focused on cancer research, energy research, neuroscience, and transportation for each pair. The authors fine-tune general-purpose neural machine translation systems using these corpora and evaluate the results to show their quality. A sympathetic reader would care because better machine translation could make scientific publications more accessible across language barriers, aiding international research collaboration.

Core claim

The authors present the development of parallel and monolingual corpora for the scientific domain targeting Spanish-English, French-English, and Portuguese-English. For each language pair, a large general scientific corpus is created along with four smaller corpora in Cancer Research, Energy Research, Neuroscience, and Transportation research. These corpora are utilized for fine-tuning general-purpose neural machine translation systems, with details on the creation process, fine-tuning strategies, and evaluation results demonstrating their quality.

What carries the argument

The creation of domain-specific parallel corpora for fine-tuning neural machine translation models.

If this is right

Fine-tuned NMT systems exhibit improved translation quality for scientific content.
Domain-specific corpora lead to better performance than general-purpose models in technical fields.
The approach supports access to international scientific publications across languages.
Monolingual corpora can complement the parallel data for additional training benefits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending these corpora to more language pairs could further globalize scientific communication.
Researchers might test if these improvements translate to real-world applications like reading foreign papers.
Similar corpus-building methods could apply to other specialized domains such as law or engineering.

Load-bearing premise

The created parallel corpora are of sufficient quality, size, and domain representativeness to produce measurable improvements in NMT performance when used for fine-tuning.

What would settle it

An experiment showing no improvement in translation metrics like BLEU scores on scientific test sets after fine-tuning with these corpora compared to the baseline model would falsify the claim.

read the original abstract

The increasing volume of scientific research necessitates effective communication across language barriers. Machine translation (MT) offers a promising solution for accessing international publications. However, the scientific domain presents unique challenges due to its specialized vocabulary and complex sentence structures. In this paper, we present the development of a collection of parallel and monolingual corpora for the scientific domain. The corpora target the language pairs Spanish-English, French-English, and Portuguese-English. For each language pair, we create a large general scientific corpus as well as four smaller corpora focused on the domains of: Cancer Research, Energy Research, Neuroscience, and Transportation research. To evaluate the quality of these corpora, we utilize them for fine-tuning general-purpose neural machine translation (NMT) systems. We provide details regarding the corpus creation process, the fine-tuning strategies employed, and we conclude with the evaluation results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New scientific MT corpora for three language pairs with subdomain focus, but evaluation lacks baselines and numbers so gains are unproven.

read the letter

Hey, the main thing to know is that this paper builds parallel and monolingual corpora for scientific text in Spanish-English, French-English, and Portuguese-English, with extra collections focused on cancer research, energy, neuroscience, and transportation. They then fine-tune general NMT systems on these resources and point to evaluation results as evidence of quality. That is the concrete output: new data targeted at those pairs and subfields rather than a new modeling trick.

Referee Report

2 major / 1 minor

Summary. The manuscript describes the development of parallel and monolingual corpora for the scientific domain targeting Spanish-English, French-English, and Portuguese-English. For each language pair, it creates a large general scientific corpus along with four smaller subdomain-specific corpora (Cancer Research, Energy Research, Neuroscience, and Transportation research). These resources are used to fine-tune general-purpose neural machine translation (NMT) systems, with the paper providing details on the corpus creation process, fine-tuning strategies, and concluding with evaluation results to assess corpus quality.

Significance. If the evaluations demonstrate clear improvements, the work would provide valuable domain-specific resources that address gaps in scientific MT, where specialized vocabulary and structures challenge general models. The multi-language and multi-subdomain design is a positive aspect for broader applicability in facilitating cross-lingual scientific access. The effort in new data creation for an important application area merits credit, though the absence of quantitative details limits assessment of its practical impact.

major comments (2)

[Evaluation section] Evaluation section: The manuscript references evaluation results demonstrating corpus quality but reports neither baseline scores for the untuned general-purpose NMT systems, absolute metrics (e.g., BLEU or other scores on held-out scientific test data), nor performance deltas attributable to fine-tuning with the new corpora. Without these, it is impossible to verify measurable improvements over base models or existing resources, undermining the central claim that the corpora are usable and of demonstrated quality.
[Corpus creation] Corpus creation and description: No information is given on corpus sizes, data sources, alignment techniques, or quality controls such as alignment error rates or filtering steps. These details are load-bearing for claims of domain representativeness, size sufficiency, and overall utility for fine-tuning.

minor comments (1)

[Abstract] The abstract could more explicitly summarize the evaluation outcomes or key metrics to better convey the paper's results at a glance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to incorporate the requested details, thereby strengthening the presentation of our contributions.

read point-by-point responses

Referee: [Evaluation section] Evaluation section: The manuscript references evaluation results demonstrating corpus quality but reports neither baseline scores for the untuned general-purpose NMT systems, absolute metrics (e.g., BLEU or other scores on held-out scientific test data), nor performance deltas attributable to fine-tuning with the new corpora. Without these, it is impossible to verify measurable improvements over base models or existing resources, undermining the central claim that the corpora are usable and of demonstrated quality.

Authors: We agree that the evaluation section requires additional quantitative details to substantiate our claims. In the revised manuscript, we will report baseline scores from the untuned general-purpose NMT systems, absolute metrics such as BLEU scores on held-out scientific test sets for each language pair and subdomain, and the performance improvements (deltas) achieved through fine-tuning with the new corpora. These additions will enable direct verification of the corpora’s utility and quality. revision: yes
Referee: [Corpus creation] Corpus creation and description: No information is given on corpus sizes, data sources, alignment techniques, or quality controls such as alignment error rates or filtering steps. These details are load-bearing for claims of domain representativeness, size sufficiency, and overall utility for fine-tuning.

Authors: The referee is correct that the current manuscript provides only a high-level overview of corpus creation. We will expand this section in the revision to include specific corpus sizes (number of sentence pairs for the general scientific corpora and each of the four subdomains), data sources (e.g., scientific publications from repositories such as PubMed, arXiv, and domain-specific journals), alignment techniques (including the tools and methods used for parallel sentence alignment), and quality controls such as filtering steps, estimated alignment error rates, and any post-alignment validation procedures. This will provide the necessary transparency to support claims of representativeness and utility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical corpus creation and evaluation are self-contained

full rationale

The paper's core contribution is the construction of new parallel and monolingual scientific corpora for specific language pairs and sub-domains, followed by their application to fine-tune general NMT models and the reporting of resulting evaluation metrics. No equations, fitted parameters, or predictions are defined in terms of themselves. No self-citations are invoked to justify uniqueness or load-bearing premises. The evaluation is presented as direct empirical measurement of corpus utility rather than a derived result that reduces to the input data by construction. This is a standard data-creation-plus-experiment workflow with no reduction of outputs to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions from machine translation research rather than introducing new free parameters, axioms, or entities.

axioms (1)

domain assumption Parallel corpora improve neural machine translation performance when used for fine-tuning
Invoked implicitly when using the corpora to fine-tune general-purpose NMT systems and evaluate quality.

pith-pipeline@v0.9.0 · 5672 in / 1058 out tokens · 32907 ms · 2026-05-21T04:46:11.762616+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We utilize them for fine-tuning general-purpose neural machine translation (NMT) systems... evaluation results demonstrating their quality.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages · 7 internal anchors

[1]

Transactions of the association for computational linguistics , volume=

Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond , author=. Transactions of the association for computational linguistics , volume=. 2019 , publisher=

work page 2019
[2]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

work page
[3]

Proceedings of the International Conference HiT-IT , pages=

Translations and Open Science , author=. Proceedings of the International Conference HiT-IT , pages=

work page
[4]

Computational Processing of the Portuguese Language: 13th International Conference, PROPOR 2018, Canela, Brazil, September 24--26, 2018, Proceedings 13 , pages=

A parallel corpus of theses and dissertations abstracts , author=. Computational Processing of the Portuguese Language: 13th International Conference, PROPOR 2018, Canela, Brazil, September 24--26, 2018, Proceedings 13 , pages=. 2018 , organization=

work page 2018
[5]

A Large Parallel Corpus of Full-Text Scientific Articles

A large parallel corpus of full-text scientific articles , author=. arXiv preprint arXiv:1905.01852 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1905
[6]

Proceedings of the Thirteenth Language Resources and Evaluation Conference , pages=

SciPar: A collection of parallel corpora from scientific abstracts , author=. Proceedings of the Thirteenth Language Resources and Evaluation Conference , pages=

work page
[7]

Proceedings of the Third Conference on Machine Translation: Shared Task Papers , pages=

The ilsp/arc submission to the wmt 2018 parallel corpus filtering shared task , author=. Proceedings of the Third Conference on Machine Translation: Shared Task Papers , pages=

work page 2018
[8]

Marian: Fast Neural Machine Translation in C++

Marian: Fast neural machine translation in C++ , author=. arXiv preprint arXiv:1804.00344 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Parallel Data, Tools and Interfaces in OPUS

Tiedemann, J. Parallel Data, Tools and Interfaces in OPUS. Proceedings of the Eighth International Conference on Language Resources and Evaluation ( LREC '12). 2012

work page 2012
[10]

OPUS - MT -- Building open translation services for the World

Tiedemann, J. OPUS - MT -- Building open translation services for the World. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. 2020

work page 2020
[11]

The Tatoeba Translation Challenge -- Realistic Data Sets for Low Resource and Multilingual MT

Tiedemann, J. The Tatoeba Translation Challenge -- Realistic Data Sets for Low Resource and Multilingual MT. Proceedings of the Fifth Conference on Machine Translation. 2020

work page 2020
[12]

Neural Machine Translation by Jointly Learning to Align and Translate

Neural machine translation by jointly learning to align and translate , author=. arXiv preprint arXiv:1409.0473 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Improving Neural Machine Translation Models with Monolingual Data

Improving neural machine translation models with monolingual data , author=. arXiv preprint arXiv:1511.06709 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Understanding Back-Translation at Scale

Understanding back-translation at scale , author=. arXiv preprint arXiv:1808.09381 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

and Kaiser, Lukasz and Polosukhin, Illia , title =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , title =. Advances in Neural Information Processing Systems , volume =. 2017 , url =

work page 2017
[16]

Scaling Neural Machine Translation

Scaling neural machine translation , author=. arXiv preprint arXiv:1806.00187 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Altbach, P. G. , title =. Economic and Political Weekly , pages =

work page
[18]

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =

work page
[19]

Proceedings of the tenth workshop on statistical machine translation , pages=

chrF: character n-gram F-score for automatic MT evaluation , author=. Proceedings of the tenth workshop on statistical machine translation , pages=

work page
[20]

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers , pages=

chrF deconstructed: beta parameters and n-gram weights , author=. Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers , pages=

work page
[21]

Proceedings of the second conference on machine translation , pages=

chrF++: words helping character n-grams , author=. Proceedings of the second conference on machine translation , pages=

work page
[22]

arXiv preprint arXiv:2009.09025 , year=

COMET: A neural framework for MT evaluation , author=. arXiv preprint arXiv:2009.09025 , year=

work page arXiv 2009
[23]

A Call for Clarity in Reporting BLEU Scores

A call for clarity in reporting BLEU scores , author=. arXiv preprint arXiv:1804.08771 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[24]

A Study of Translation Edit Rate with Targeted Human Annotation

Snover, Matthew and Dorr, Bonnie and Schwartz, Rich and Micciulla, Linnea and Makhoul, John. A Study of Translation Edit Rate with Targeted Human Annotation. Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 2006

work page 2006
[25]

Journal of Artificial Intelligence Research , volume=

Domain adaptation and multi-domain adaptation for neural machine translation: A survey , author=. Journal of Artificial Intelligence Research , volume=

work page
[26]

and Wigginton, M.J

Stockemer, D. and Wigginton, M.J. , title=. Scientometrics , year=2019, volume=

work page 2019

[1] [1]

Transactions of the association for computational linguistics , volume=

Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond , author=. Transactions of the association for computational linguistics , volume=. 2019 , publisher=

work page 2019

[2] [2]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

work page

[3] [3]

Proceedings of the International Conference HiT-IT , pages=

Translations and Open Science , author=. Proceedings of the International Conference HiT-IT , pages=

work page

[4] [4]

Computational Processing of the Portuguese Language: 13th International Conference, PROPOR 2018, Canela, Brazil, September 24--26, 2018, Proceedings 13 , pages=

A parallel corpus of theses and dissertations abstracts , author=. Computational Processing of the Portuguese Language: 13th International Conference, PROPOR 2018, Canela, Brazil, September 24--26, 2018, Proceedings 13 , pages=. 2018 , organization=

work page 2018

[5] [5]

A Large Parallel Corpus of Full-Text Scientific Articles

A large parallel corpus of full-text scientific articles , author=. arXiv preprint arXiv:1905.01852 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1905

[6] [6]

Proceedings of the Thirteenth Language Resources and Evaluation Conference , pages=

SciPar: A collection of parallel corpora from scientific abstracts , author=. Proceedings of the Thirteenth Language Resources and Evaluation Conference , pages=

work page

[7] [7]

Proceedings of the Third Conference on Machine Translation: Shared Task Papers , pages=

The ilsp/arc submission to the wmt 2018 parallel corpus filtering shared task , author=. Proceedings of the Third Conference on Machine Translation: Shared Task Papers , pages=

work page 2018

[8] [8]

Marian: Fast Neural Machine Translation in C++

Marian: Fast neural machine translation in C++ , author=. arXiv preprint arXiv:1804.00344 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Parallel Data, Tools and Interfaces in OPUS

Tiedemann, J. Parallel Data, Tools and Interfaces in OPUS. Proceedings of the Eighth International Conference on Language Resources and Evaluation ( LREC '12). 2012

work page 2012

[10] [10]

OPUS - MT -- Building open translation services for the World

Tiedemann, J. OPUS - MT -- Building open translation services for the World. Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. 2020

work page 2020

[11] [11]

The Tatoeba Translation Challenge -- Realistic Data Sets for Low Resource and Multilingual MT

Tiedemann, J. The Tatoeba Translation Challenge -- Realistic Data Sets for Low Resource and Multilingual MT. Proceedings of the Fifth Conference on Machine Translation. 2020

work page 2020

[12] [12]

Neural Machine Translation by Jointly Learning to Align and Translate

Neural machine translation by jointly learning to align and translate , author=. arXiv preprint arXiv:1409.0473 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Improving Neural Machine Translation Models with Monolingual Data

Improving neural machine translation models with monolingual data , author=. arXiv preprint arXiv:1511.06709 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Understanding Back-Translation at Scale

Understanding back-translation at scale , author=. arXiv preprint arXiv:1808.09381 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

and Kaiser, Lukasz and Polosukhin, Illia , title =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia , title =. Advances in Neural Information Processing Systems , volume =. 2017 , url =

work page 2017

[16] [16]

Scaling Neural Machine Translation

Scaling neural machine translation , author=. arXiv preprint arXiv:1806.00187 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Altbach, P. G. , title =. Economic and Political Weekly , pages =

work page

[18] [18]

Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing , booktitle =

work page

[19] [19]

Proceedings of the tenth workshop on statistical machine translation , pages=

chrF: character n-gram F-score for automatic MT evaluation , author=. Proceedings of the tenth workshop on statistical machine translation , pages=

work page

[20] [20]

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers , pages=

chrF deconstructed: beta parameters and n-gram weights , author=. Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers , pages=

work page

[21] [21]

Proceedings of the second conference on machine translation , pages=

chrF++: words helping character n-grams , author=. Proceedings of the second conference on machine translation , pages=

work page

[22] [22]

arXiv preprint arXiv:2009.09025 , year=

COMET: A neural framework for MT evaluation , author=. arXiv preprint arXiv:2009.09025 , year=

work page arXiv 2009

[23] [23]

A Call for Clarity in Reporting BLEU Scores

A call for clarity in reporting BLEU scores , author=. arXiv preprint arXiv:1804.08771 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

A Study of Translation Edit Rate with Targeted Human Annotation

Snover, Matthew and Dorr, Bonnie and Schwartz, Rich and Micciulla, Linnea and Makhoul, John. A Study of Translation Edit Rate with Targeted Human Annotation. Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. 2006

work page 2006

[25] [25]

Journal of Artificial Intelligence Research , volume=

Domain adaptation and multi-domain adaptation for neural machine translation: A survey , author=. Journal of Artificial Intelligence Research , volume=

work page

[26] [26]

and Wigginton, M.J

Stockemer, D. and Wigginton, M.J. , title=. Scientometrics , year=2019, volume=

work page 2019