arxiv: 2604.08977 · v1 · submitted 2026-04-10 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

Testing the Assumptions of Active Learning for Translation Tasks with Few Samples

Lorenzo Jaime Yu Flores , Cesare Spinoso di-Piano , Ori Ernst , David Ifeoluwa Adelani , Jackie Chi Kit Cheung

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:19 UTC · model grok-4.3

classification 💻 cs.CL

keywords active learningmachine translationfew-shot learningdata selectionlow-resource NLPpre-trained models

0 comments

The pith

Active learning that selects informative or diverse samples does not improve translation performance with few annotations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates why active learning fails to outperform random sampling in machine translation when only 100 to 500 samples can be labeled. It directly tests the key assumptions that selecting more informative or more diverse data will lead to better test performance. The experiments find no correlation between standard informativeness and diversity measures and actual model accuracy. Instead, the sequence in which samples are presented and their compatibility with the model's initial pre-training exert stronger effects. This implies that current active learning approaches need revision to succeed in very low-data translation scenarios.

Core claim

Neither the informativeness nor diversity of the training data, which AL strategies optimize for, are correlated with test set performance. Instead, factors like the ordering of the training samples and interactions with pre-training data have a larger impact on performance.

What carries the argument

Correlation analysis between active learning selection criteria (informativeness and diversity) and downstream model performance on translation tasks.

If this is right

Active learning methods must incorporate considerations of sample ordering to be effective with few samples.
Interactions between selected data and pre-trained model weights play a key role in final performance.
Random sampling performs comparably to active learning strategies in this low-data regime.
New active learning designs should target ordering and pre-training compatibility rather than only informativeness or diversity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This finding may extend to other sequence generation tasks beyond translation where pre-trained models are used.
Researchers could test whether reordering the same selected samples changes outcomes in controlled experiments.
Future work might develop selection methods that explicitly model compatibility with pre-training data.
Similar assumptions in active learning for other low-resource NLP tasks could be tested using the same correlation approach.

Load-bearing premise

That the chosen metrics for informativeness and diversity, together with the specific translation datasets and model sizes tested, are sufficient to detect the correlations that would exist if the active learning assumptions held.

What would settle it

A new experiment using alternative metrics for informativeness or diversity that finds a clear positive correlation with translation test-set performance would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.08977 by Cesare Spinoso di-Piano, David Ifeoluwa Adelani, Jackie Chi Kit Cheung, Lorenzo Jaime Yu Flores, Ori Ernst.

**Figure 2.** Figure 2: Pre-SFT model performance on the samples chosen by AL strategies for Gemma-2 on Eng-Afr (left) and [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Test set ChrF+ (FLORES Plus) when fine-tuning models on unlabeled data (NLLB) with varying degrees [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Plot of % Filipino vocabulary per test example [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Fine-tuning on different subsets of the data [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Performance of model pre-SFT on candidates chosen by various strategies across three models and three [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Test performance when fine-tuning models on unlabeled data with varying degrees of difficulty (measured [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

read the original abstract

Active learning (AL) is a training paradigm for selecting unlabeled samples for annotation to improve model performance on a test set, which is useful when only a limited number of samples can be annotated. These algorithms often work by optimizing for the informativeness and diversity of the training data to be annotated. Recent work found that AL strategies fail to outperform random sampling on various language generation tasks when using 100-500 samples. To understand AL's poor performance when only using few samples, we investigate whether the core assumptions underlying AL strategies hold. We find that neither the informativeness nor diversity of the training data, which AL strategies optimize for, are correlated with test set performance. Instead, factors like the ordering of the training samples and interactions with pre-training data have a larger impact on performance. This suggests that future AL methods must take these factors into account in order to work with very few samples.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Active learning's informativeness and diversity assumptions fail to predict performance in low-sample translation, while ordering and pre-training effects matter more.

read the letter

The key point is that standard active learning heuristics break down for translation when you only have a few hundred samples. The authors test the usual assumptions directly and find no correlation between informativeness or diversity metrics and test performance. Instead, sample ordering and interactions with pre-training data show stronger links to results. This lines up with earlier reports that AL often loses to random sampling in this regime, but they make the test specific to translation data.

Referee Report

2 major / 2 minor

Summary. The paper investigates why active learning (AL) strategies fail to outperform random sampling for neural machine translation when only 100-500 samples are available. It empirically tests the core AL assumptions by computing informativeness and diversity metrics on selected data and measuring their correlation with test-set performance, finding no such correlations. Instead, it reports that sample ordering and interactions with pre-training data exert larger effects on downstream performance, concluding that future AL methods must incorporate these factors to succeed in the few-sample regime.

Significance. If the no-correlation result holds under rigorous controls, the work provides a direct empirical challenge to the informativeness/diversity assumptions that underpin most AL algorithms in low-data MT settings. This could explain recent observations of AL underperforming random baselines and would motivate a shift toward ordering-aware or pre-training-aware selection criteria, with potential impact on efficient annotation pipelines for generation tasks.

major comments (2)

[Results] The central no-correlation claim between informativeness/diversity and test performance is load-bearing for the paper's argument against standard AL assumptions, yet the abstract and described setup provide no correlation coefficients, p-values, or power analysis; with only 100-500 samples this risks underpowered tests that could mask weak but real relationships.
[Discussion] The alternative claim that ordering and pre-training interactions have larger impact requires quantitative support (e.g., effect-size comparisons or ablation tables) to be proportionate to the null finding on informativeness/diversity; without these, the relative importance remains qualitative.

minor comments (2)

[Methods] Clarify the exact definitions and implementations of the informativeness and diversity metrics used for translation (e.g., uncertainty estimation in seq2seq models) so readers can judge whether they faithfully test the AL assumptions.
[Abstract] Report the number of random seeds, exact datasets, and model scales in the abstract or early methods paragraph to address reproducibility concerns for the correlation analyses.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and have revised the paper to incorporate additional statistical details and quantitative comparisons as requested.

read point-by-point responses

Referee: [Results] The central no-correlation claim between informativeness/diversity and test performance is load-bearing for the paper's argument against standard AL assumptions, yet the abstract and described setup provide no correlation coefficients, p-values, or power analysis; with only 100-500 samples this risks underpowered tests that could mask weak but real relationships.

Authors: We agree that explicit statistical measures strengthen the central claim. In the revised manuscript we now report Pearson and Spearman correlation coefficients with associated p-values for all informativeness and diversity metrics versus test performance. All correlations remain small and non-significant (|r| < 0.12, p > 0.15). We also added a power analysis (using the observed variances and n = 100–500) showing that the experiments have >80 % power to detect correlations of |r| ≥ 0.25 at α = 0.05. These additions confirm that the lack of correlation is not an artifact of underpowering. revision: yes
Referee: [Discussion] The alternative claim that ordering and pre-training interactions have larger impact requires quantitative support (e.g., effect-size comparisons or ablation tables) to be proportionate to the null finding on informativeness/diversity; without these, the relative importance remains qualitative.

Authors: We accept that the original discussion presented the relative importance of ordering and pre-training somewhat qualitatively. The revised version includes a new table (Table 4) that directly compares effect sizes (Cohen’s d and performance deltas in BLEU) across factors. Ordering and pre-training interactions produce deltas of 3–6 BLEU points, while informativeness/diversity variations produce <1 BLEU point under the same experimental controls. We also report the proportion of variance explained by each factor in a supplementary regression analysis. These quantitative results now allow a direct, proportionate comparison to the null findings on informativeness and diversity. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper performs an empirical investigation by selecting training samples via active learning strategies, computing informativeness and diversity metrics on those samples, and directly measuring correlations against observed test-set performance on translation tasks with 100-500 samples. No mathematical derivation, parameter fitting, or self-referential definition is present; the central claims rest on reported experimental correlations and comparisons to random sampling, which are externally falsifiable against the datasets and models used. The analysis is self-contained and does not reduce any result to the authors' own prior definitions or fitted quantities by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical study that relies on standard machine learning assumptions about train-test splits and correlation as a proxy for causation; no new free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5463 in / 1006 out tokens · 49364 ms · 2026-05-10T17:19:24.349509+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

neither the informativeness nor diversity of the training data, which AL strategies optimize for, are correlated with test set performance
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ordering of the training samples and interactions with pre-training data have a larger impact

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 30 canonical work pages · 4 internal anchors

[1]

Satanjeev Banerjee and Alon Lavie. 2005. https://www.aclweb.org/anthology/W05-0909 METEOR : An automatic metric for MT evaluation with improved correlation with human judgments . In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , pages 65--72, Ann Arbor, Michigan. Association fo...

2005
[2]

Everlyn Asiko Chimoto and Bruce A. Bassett. 2022. https://doi.org/10.18653/v1/2022.findings-emnlp.348 COMET - QE and active learning for low-resource machine translation . In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4735--4740, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics

work page doi:10.18653/v1/2022.findings-emnlp.348 2022
[3]

Scaling Instruction-Finetuned Language Models

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean,...

work page internal anchor Pith review doi:10.48550/arxiv.2210.11416 2022
[4]

D. A. Cohn, Z. Ghahramani, and M. I. Jordan. 1996. http://arxiv.org/abs/cs/9603104 Active learning with statistical models

work page arXiv 1996
[5]

Liat Ein-Dor, Alon Halfon, Ariel Gera, Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Ranit Aharonov, Yoav Katz, and Noam Slonim. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.638 A ctive L earning for BERT : A n E mpirical S tudy . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages ...

work page doi:10.18653/v1/2020.emnlp-main.638 2020
[6]

Lorenzo Jaime Yu Flores, Ori Ernst, and Jackie CK Cheung. 2025. https://doi.org/10.18653/v1/2025.acl-short.15 Improving the calibration of confidence scores in text generation using the output distribution ' s characteristics . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 172--1...

work page doi:10.18653/v1/2025.acl-short.15 2025
[7]

Yarin Gal and Zoubin Ghahramani. 2016. http://arxiv.org/abs/1506.02142 Dropout as a bayesian approximation: Representing model uncertainty in deep learning

work page Pith review arXiv 2016
[8]

Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. https://api.semanticscholar.org/CorpusID:6318455 Deep bayesian active learning with image data . ArXiv, abs/1703.02910

work page arXiv 2017
[9]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava S...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, and Máté Lengyel. 2011. http://arxiv.org/abs/1112.5745 Bayesian active learning for classification and preference learning

work page arXiv 2011
[11]

Andreas Kirsch, Joost van Amersfoort, and Yarin Gal. 2019. http://arxiv.org/abs/1906.08158 Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning

work page arXiv 2019
[12]

Tyler LaBonte, Vidya Muthukumar, and Abhishek Kumar. 2022. https://openreview.net/forum?id=3OxII8ZB3A Dropout disagreement: A recipe for group robustness with fewer annotations . In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications

2022
[13]

Elite Data Labs. 2025. A I D ata A nnotation C osts in 2025: P ricing, I nsights & V alue --- aidatalabelers.com. https://aidatalabelers.com/how-much-do-ai-data-annotation-services-cost-in-2025-the-complete-guide. [Accessed 17-05-2025]

2025
[14]

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. http://arxiv.org/abs/1612.01474 Simple and scalable predictive uncertainty estimation using deep ensembles

work page Pith review arXiv 2017
[15]

Chuanming Liu and Jingqi Yu. 2023. https://doi.org/https://doi.org/10.1016/j.csl.2022.101444 Uncertainty-aware non-autoregressive neural machine translation . Computer Speech & Language, 78:101444

work page doi:10.1016/j.csl.2022.101444 2023
[16]

Andrey Malinin and Mark Gales. 2021. http://arxiv.org/abs/2002.07650 Uncertainty estimation in autoregressive structured prediction

work page arXiv 2021
[17]

Tasnim Mohiuddin, Philipp Koehn, Vishrav Chaudhary, James Cross, Shruti Bhosale, and Shafiq Joty. 2022. https://doi.org/10.18653/v1/2022.findings-emnlp.113 Data selection curriculum for neural machine translation . In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1569--1582, Abu Dhabi, United Arab Emirates. Association for C...

work page doi:10.18653/v1/2022.findings-emnlp.113 2022
[18]

Jianmo Ni, Gustavo Hernandez Abrego, Noah Constant, Ji Ma, Keith Hall, Daniel Cer, and Yinfei Yang. 2022. https://doi.org/10.18653/v1/2022.findings-acl.146 Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models . In Findings of the Association for Computational Linguistics: ACL 2022, pages 1864--1874, Dublin, Ireland. Association for...

work page doi:10.18653/v1/2022.findings-acl.146 2022
[19]

NLLB Team , Marta R. Costa-juss \`a , James Cross, Onur C elebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk...

work page doi:10.1038/s41586-024-07335-x 2024
[20]

Yotam Perlitz, Ariel Gera, Michal Shmueli-Scheuer, Dafna Sheinwald, Noam Slonim, and Liat Ein-Dor. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.611 Active learning for natural language generation . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9862--9877, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.emnlp-main.611 2023
[21]

arXiv preprint arXiv:1903.09848 , year=

Emmanouil Antonios Platanios, Otilia Stretcu, Graham Neubig, Barnabas Poczos, and Tom M. Mitchell. 2019. http://arxiv.org/abs/1903.09848 Competence-based curriculum learning for neural machine translation

work page arXiv 2019
[22]

Maja Popovi \'c . 2017. https://doi.org/10.18653/v1/W17-4770 chr F ++: words helping character n-grams . In Proceedings of the Second Conference on Machine Translation, pages 612--618, Copenhagen, Denmark. Association for Computational Linguistics

work page doi:10.18653/v1/w17-4770 2017
[23]

Ameya Prabhu, Charles Dognin, and Maneesh Singh. 2019. https://doi.org/10.18653/v1/D19-1417 Sampling bias in deep active classification: An empirical study . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4058--4068, H...

work page doi:10.18653/v1/d19-1417 2019
[24]

Bartezzaghi, Jasmina Bogojeska, Adelmo Cristiano Innocenza Malossi, and Thang Vu

Maximilian Schmidt, A. Bartezzaghi, Jasmina Bogojeska, Adelmo Cristiano Innocenza Malossi, and Thang Vu. 2022. https://api.semanticscholar.org/CorpusID:254044648 Combining data generation and active learning for low-resource question answering . In International Conference on Artificial Neural Networks

2022
[25]

Ozan Sener and Silvio Savarese. 2018. https://openreview.net/forum?id=H1aIuk-RW Active learning for convolutional neural networks: A core-set approach . In International Conference on Learning Representations

2018
[26]

Aditya Siddhant and Zachary C. Lipton. 2018. https://doi.org/10.18653/v1/D18-1318 Deep B ayesian active learning for natural language processing: Results of a large-scale empirical study . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2904--2909, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/d18-1318 2018
[27]

Smith, and Yejin Choi

Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Yejin Choi. 2020. http://arxiv.org/abs/2009.10795 Dataset cartography: Mapping and diagnosing datasets with training dynamics

work page arXiv 2020
[28]

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, ...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe,...

work page internal anchor Pith review arXiv 2022
[30]

Wong, Yikai Zhou, Lidia S

Yu Wan, Baosong Yang, Derek F. Wong, Yikai Zhou, Lidia S. Chao, Haibo Zhang, and Boxing Chen. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.80 Self-paced learning for neural machine translation . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1074--1080, Online. Association for Computational Li...

work page doi:10.18653/v1/2020.emnlp-main.80 2020
[31]

Polina Zablotskaia, Du Phan, Joshua Maynez, Shashi Narayan, Jie Ren, and Jeremiah Liu. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.197 On uncertainty calibration and selective generation in probabilistic neural summarization: A benchmark study . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2980--2992, Singapore...

work page doi:10.18653/v1/2023.findings-emnlp.197 2023
[32]

Xiangkai Zeng, Sarthak Garg, Rajen Chatterjee, Udhyakumar Nallasamy, and Matthias Paulik. 2019. https://doi.org/10.18653/v1/D19-6110 Empirical evaluation of active learning techniques for neural MT . In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 84--93, Hong Kong, China. Association for Computatio...

work page doi:10.18653/v1/d19-6110 2019
[33]

Ye Zhang, Matthew Lease, and Byron Wallace. 2017. https://doi.org/10.1609/aaai.v31i1.10962 Active discriminative text representation learning . Proceedings of the AAAI Conference on Artificial Intelligence, 31(1)

work page doi:10.1609/aaai.v31i1.10962 2017
[34]

Zhisong Zhang, Emma Strubell, and Eduard Hovy. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.414 A survey of active learning for natural language processing . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6166--6190, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics

work page doi:10.18653/v1/2022.emnlp-main.414 2022
[35]

Yuekai Zhao, Haoran Zhang, Shuchang Zhou, and Zhihua Zhang. 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.162 Active learning approaches to enhancing neural machine translation . In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1796--1806, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2020.findings-emnlp.162 2020
[36]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...
[37]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...