Can Reasoning Models Detect Changes to their Chains of Thought?

Chengyuan Xue; Miriam Wanner; Sathvik Napa; Utkarsh Singh; William Walden

arxiv: 2606.22085 · v1 · pith:ZIZW65ZRnew · submitted 2026-06-20 · 💻 cs.AI · cs.CL· cs.LG

Can Reasoning Models Detect Changes to their Chains of Thought?

Sathvik Napa , Utkarsh Singh , Chengyuan Xue , Miriam Wanner , William Walden This is my paper

Pith reviewed 2026-06-26 11:39 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords chain of thoughtreasoning modelsmodel interventiondetectionAI safetyprefilling

0 comments

The pith

Reasoning models detect edits to their chains of thought only modestly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether current reasoning models can spot when their chain of thought has been changed by an external intervention. This matters because many proposed uses of AI involve editing the model's reasoning steps to improve safety or performance, and such edits would fail if the model notices and adjusts its output accordingly. The study finds that detection accuracy is low, models cannot usually say what changed, and performance is similar whether the original CoT was their own or someone else's. These results suggest that CoT editing may often go undetected in practice.

Core claim

Recent reasoning models show only modest accuracy at detecting interventions on their chains of thought, whether those interventions occur during or after reasoning and whether the chains come from the model itself or from other models. Models also have difficulty specifying exactly how the chain was modified.

What carries the argument

The CoT detection task, in which models receive a reasoning trace that may have been prefilling or altered and are prompted to report whether and how it differs from what they would have generated.

If this is right

CoT prefilling with stronger models' reasoning can likely proceed without the target model noticing the substitution.
Interventions to remove unsafe reasoning steps may succeed because the model does not flag the alteration.
Detection performance does not improve when the CoT belongs to the model being tested versus another model.
Modest detection holds both while the model is still reasoning and after the chain is complete.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers could use CoT editing as a reliable way to steer model behavior without triggering internal consistency checks.
Future experiments might vary the subtlety of the edits to find the threshold at which detection becomes reliable.
Similar tests on non-reasoning models could reveal whether chain-of-thought use itself affects self-monitoring ability.

Load-bearing premise

The prompts and edit methods used in the experiments measure genuine detection ability rather than being influenced by how the questions are worded or how the changes are introduced.

What would settle it

A follow-up experiment that achieves high accuracy in identifying both the presence and the specific nature of CoT modifications across multiple models and edit types would falsify the claim of only modest detection.

Figures

Figures reproduced from arXiv: 2606.22085 by Chengyuan Xue, Miriam Wanner, Sathvik Napa, Utkarsh Singh, William Walden.

**Figure 2.** Figure 2: Detection accuracy for the (alerted) completed (dark bars; §4.1) and partial (light bars; §4.2) conditions. The Ione and Erep are shared across conditions; Dhalf and Epar are unique to completed and partial, respectively. Error bars indicate 95% CIs; ∗ denotes significantly different from 50% at α = 0.05 on a two-sided z-test. 4 Detecting Changes Here, we investigate whether models can detect modifications… view at source ↗

**Figure 4.** Figure 4: Model accuracy in identifying where a CoT modification was made on MMLU. 5 Localizing Changes We have shown that models have only moderate ability to detect CoT modifications. We now investigate whether they are able to identify where modifications occur. To facilitate clear localization of changes, we make modifications involving only a single step—either inserting (Ione), deleting (Done), or paraphrasi… view at source ↗

**Figure 5.** Figure 5: unalerted results in the completed condition [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Complete results for the experiment described in § [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Full change localization accuracy results (§ [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Effect of replacing the fixed irrelevant statement used for Ione in the main text (left column) with a dynamically LLM-generated statement that is relevant to the problem but that is a non-sequitur (Ione - LLM; right column). Detection accuracy plummets across the board. These results are from the partial condition (§4.2) [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Main task accuracy for all models on all datasets. We consistently obtain best results with [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

read the original abstract

There are many reasons one may want to edit a model's chain of thought (CoT) -- e.g., to prefill it with reasoning from a stronger model or to remove steps that may yield unsafe outputs. The success of these interventions plausibly depends on a model's inability to notice them, as the model may alter its behavior if it suspects tampering. In this work, we study whether recent reasoning models are able to detect such interventions on their CoTs under a variety of conditions: both during reasoning and after it, and when prefilled both with their own CoTs and with those of other models. Broadly, we find that (i) models exhibit only very modest detection accuracy; (ii) models struggle to identify *how* their CoT was modified; and (iii) models are about as good at detecting changes to their own CoTs as to those of other models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds modest CoT tampering detection in reasoning models with no own-vs-other difference, but methods details are missing so the numbers are hard to interpret.

read the letter

The core result is that current reasoning models catch CoT edits only modestly well, cannot say what changed, and perform about the same on their own traces as on others'. This is a straightforward empirical question with direct relevance to safety edits that rewrite reasoning steps.

The work is new in its focus on recent reasoning models and the specific conditions tested (own vs other CoTs, during vs after reasoning). It frames the safety motivation cleanly and reports consistent patterns across those conditions.

The main limitation is that the abstract supplies no accuracy numbers, sample sizes, statistical tests, or control conditions with zero interventions. Without those, it is impossible to tell whether the modest detection rates reflect real limited awareness or just how the prompts were worded. The stress-test point about base-rate reporting of changes is live here because no such controls are described.

This is useful for alignment researchers who need to know whether CoT edits will stay hidden. It is not yet strong enough for broad citation until the methods and raw results are visible.

A serious editor should send it to review so the experimental design and any controls can be checked properly.

Referee Report

2 major / 1 minor

Summary. The paper investigates whether recent reasoning models can detect edits to their chains of thought (CoTs), both during and after generation and when the CoT is their own or from another model. It reports three main empirical findings: models achieve only modest accuracy in detecting such interventions, struggle to identify the specific type of modification, and show comparable detection performance on their own CoTs versus those of other models.

Significance. If the empirical results are robust, the work provides evidence that CoT interventions (e.g., for safety or capability enhancement) are unlikely to be noticed by the model itself, which bears on the feasibility of such techniques without triggering behavioral changes. The study contributes behavioral data on model introspection and self-monitoring in the context of reasoning traces.

major comments (2)

[Methods] Methods (experimental design): The study does not report control conditions using unmodified CoTs to measure the base rate at which models report 'changes' in the absence of any intervention. Without these controls, the modest detection accuracy cannot be distinguished from generic prompt-induced affirmative responses, undermining interpretation of findings (i) and (iii).
[Results] Results (detection accuracy): The abstract and reported findings provide no sample sizes, statistical tests, confidence intervals, or error bars, making it impossible to assess whether the 'modest accuracy' and 'no difference between own/other' claims are statistically supported or merely descriptive.

minor comments (1)

[Abstract] The abstract refers to 'a variety of conditions' but does not enumerate them; a brief enumeration in the abstract or introduction would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important aspects of experimental design and reporting. We address each major point below and commit to revisions that strengthen the empirical claims.

read point-by-point responses

Referee: [Methods] The study does not report control conditions using unmodified CoTs to measure the base rate at which models report 'changes' in the absence of any intervention. Without these controls, the modest detection accuracy cannot be distinguished from generic prompt-induced affirmative responses, undermining interpretation of findings (i) and (iii).

Authors: We agree this is a valid concern. The current experiments focus on modified CoTs but do not explicitly include unmodified controls to quantify false-positive rates. In the revised manuscript we will add these control conditions (prompting models to detect changes on their original, unmodified CoTs) and report the resulting base rates alongside the main results. This will allow clearer interpretation of the reported detection accuracies. revision: yes
Referee: [Results] Results (detection accuracy): The abstract and reported findings provide no sample sizes, statistical tests, confidence intervals, or error bars, making it impossible to assess whether the 'modest accuracy' and 'no difference between own/other' claims are statistically supported or merely descriptive.

Authors: We acknowledge the omission. The full manuscript contains the underlying trial counts but does not present sample sizes, statistical tests, confidence intervals, or error bars in the abstract or main result summaries. We will revise to include these details (e.g., N per condition, appropriate tests for accuracy differences, and error bars on figures) so that the strength of the 'modest accuracy' and 'own vs. other' comparisons can be evaluated rigorously. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical behavioral study with no derivations or self-referential structure

full rationale

The paper is an empirical investigation of model behavior under CoT interventions. It reports experimental results on detection accuracy, identification of modification type, and own-vs-other CoT performance. No equations, fitted parameters, uniqueness theorems, ansatzes, or derivation chains appear in the provided text. All claims rest on direct measurement of prompted responses rather than any reduction of outputs to inputs by construction. This is a standard non-circular empirical design.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, axioms, or invented entities are present; the work is an empirical measurement study.

pith-pipeline@v0.9.1-grok · 5690 in / 1070 out tokens · 33768 ms · 2026-06-26T11:39:22.814368+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 170 canonical work pages

[1]

Proceedings of the 21st Workshop on Biomedical Language Processing. 2022

2022
[2]

Explainable Assessment of Healthcare Articles with QA

Boissonnet, Alodie and Saeidi, Marzieh and Plachouras, Vassilis and Vlachos, Andreas. Explainable Assessment of Healthcare Articles with QA. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.1

work page doi:10.18653/v1/2022.bionlp-1.1 2022
[3]

A sequence-to-sequence approach for document-level relation extraction

Giorgi, John and Bader, Gary and Wang, Bo. A sequence-to-sequence approach for document-level relation extraction. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.2

work page doi:10.18653/v1/2022.bionlp-1.2 2022
[4]

Position-based Prompting for Health Outcome Generation

Abaho, Micheal and Bollegala, Danushka and Williamson, Paula and Dodd, Susanna. Position-based Prompting for Health Outcome Generation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.3

work page doi:10.18653/v1/2022.bionlp-1.3 2022
[5]

How You Say It Matters: Measuring the Impact of Verbal Disfluency Tags on Automated Dementia Detection

Farzana, Shahla and Deshpande, Ashwin and Parde, Natalie. How You Say It Matters: Measuring the Impact of Verbal Disfluency Tags on Automated Dementia Detection. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.4

work page doi:10.18653/v1/2022.bionlp-1.4 2022
[6]

Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training

Soleimani, Amir and Nikoulina, Vassilina and Favre, Benoit and Ait Mokhtar, Salah. Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.5

work page doi:10.18653/v1/2022.bionlp-1.5 2022
[7]

Data Augmentation for Biomedical Factoid Question Answering

Pappas, Dimitris and Malakasiotis, Prodromos and Androutsopoulos, Ion. Data Augmentation for Biomedical Factoid Question Answering. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.6

work page doi:10.18653/v1/2022.bionlp-1.6 2022
[8]

Slot Filling for Biomedical Information Extraction

Papanikolaou, Yannis and Staib, Marlene and Grace, Justin Joshua and Bennett, Francine. Slot Filling for Biomedical Information Extraction. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.7

work page doi:10.18653/v1/2022.bionlp-1.7 2022
[9]

Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations

Zeng, Sihang and Yuan, Zheng and Yu, Sheng. Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.8

work page doi:10.18653/v1/2022.bionlp-1.8 2022
[10]

B io BART : Pretraining and Evaluation of A Biomedical Generative Language Model

Yuan, Hongyi and Yuan, Zheng and Gan, Ruyi and Zhang, Jiaxing and Xie, Yutao and Yu, Sheng. B io BART : Pretraining and Evaluation of A Biomedical Generative Language Model. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.9

work page doi:10.18653/v1/2022.bionlp-1.9 2022
[11]

Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation

Naseem, Usman and Bandi, Ajay and Raza, Shaina and Rashid, Junaid and Chakravarthi, Bharathi Raja. Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.10

work page doi:10.18653/v1/2022.bionlp-1.10 2022
[12]

Memory-aligned Knowledge Graph for Clinically Accurate Radiology Image Report Generation

Yan, Sixing. Memory-aligned Knowledge Graph for Clinically Accurate Radiology Image Report Generation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.11

work page doi:10.18653/v1/2022.bionlp-1.11 2022
[13]

Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts

Phan, Uyen and Nguyen, Nhung. Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.12

work page doi:10.18653/v1/2022.bionlp-1.12 2022
[14]

Auxiliary Learning for Named Entity Recognition with Multiple Auxiliary Biomedical Training Data

Watanabe, Taiki and Ichikawa, Tomoya and Tamura, Akihiro and Iwakura, Tomoya and Ma, Chunpeng and Kato, Tsuneo. Auxiliary Learning for Named Entity Recognition with Multiple Auxiliary Biomedical Training Data. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.13

work page doi:10.18653/v1/2022.bionlp-1.13 2022
[15]

SNP 2 V ec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Cahyawijaya, Samuel and Yu, Tiezheng and Liu, Zihan and Zhou, Xiaopu and Mak, Tze Wing Tiffany and Ip, Yuk Yu Nancy and Fung, Pascale. SNP 2 V ec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.14

work page doi:10.18653/v1/2022.bionlp-1.14 2022
[16]

Biomedical NER using Novel Schema and Distant Supervision

Khandelwal, Anshita and Kar, Alok and Chikka, Veera Raghavendra and Karlapalem, Kamalakar. Biomedical NER using Novel Schema and Distant Supervision. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.15

work page doi:10.18653/v1/2022.bionlp-1.15 2022
[17]

Improving Supervised Drug-Protein Relation Extraction with Distantly Supervised Models

Iinuma, Naoki and Miwa, Makoto and Sasaki, Yutaka. Improving Supervised Drug-Protein Relation Extraction with Distantly Supervised Models. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.16

work page doi:10.18653/v1/2022.bionlp-1.16 2022
[18]

Named Entity Recognition for Cancer Immunology Research Using Distant Supervision

Trieu, Hai-Long and Miwa, Makoto and Ananiadou, Sophia. Named Entity Recognition for Cancer Immunology Research Using Distant Supervision. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.17

work page doi:10.18653/v1/2022.bionlp-1.17 2022
[19]

Intra-Template Entity Compatibility based Slot-Filling for Clinical Trial Information Extraction

Witte, Christian and Cimiano, Philipp. Intra-Template Entity Compatibility based Slot-Filling for Clinical Trial Information Extraction. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.18

work page doi:10.18653/v1/2022.bionlp-1.18 2022
[20]

Pretrained Biomedical Language Models for Clinical NLP in S panish

Carrino, Casimiro Pio and Llop, Joan and P \`a mies, Marc and Guti \'e rrez-Fandi \ n o, Asier and Armengol-Estap \'e , Jordi and Silveira-Ocampo, Joaqu \'i n and Valencia, Alfonso and Gonzalez-Agirre, Aitor and Villegas, Marta. Pretrained Biomedical Language Models for Clinical NLP in S panish. Proceedings of the 21st Workshop on Biomedical Language Proc...

work page doi:10.18653/v1/2022.bionlp-1.19 2022
[21]

Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts

Amin, Saadullah and Pokaratsiri Goldstein, Noon and Wixted, Morgan and Garcia-Rudolph, Alejandro and Mart \'i nez-Costa, Catalina and Neumann, Guenter. Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.20

work page doi:10.18653/v1/2022.bionlp-1.20 2022
[22]

VPAI \_ L ab at M ed V id QA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification

Li, Bin and Weng, Yixuan and Xia, Fei and Sun, Bin and Li, Shutao. VPAI \_ L ab at M ed V id QA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.21

work page doi:10.18653/v1/2022.bionlp-1.21 2022
[23]

G en C ompare S um: a hybrid unsupervised summarization method using salience

Bishop, Jennifer and Xie, Qianqian and Ananiadou, Sophia. G en C ompare S um: a hybrid unsupervised summarization method using salience. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.22

work page doi:10.18653/v1/2022.bionlp-1.22 2022
[24]

B io C ite: A Deep Learning-based Citation Linkage Framework for Biomedical Research Articles

Singha Roy, Sudipta and Mercer, Robert E. B io C ite: A Deep Learning-based Citation Linkage Framework for Biomedical Research Articles. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.23

work page doi:10.18653/v1/2022.bionlp-1.23 2022
[25]

Low Resource Causal Event Detection from Biomedical Literature

Liang, Zhengzhong and Noriega-Atala, Enrique and Morrison, Clayton and Surdeanu, Mihai. Low Resource Causal Event Detection from Biomedical Literature. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.24

work page doi:10.18653/v1/2022.bionlp-1.24 2022
[26]

Overview of the M ed V id QA 2022 Shared Task on Medical Video Question-Answering

Gupta, Deepak and Demner-Fushman, Dina. Overview of the M ed V id QA 2022 Shared Task on Medical Video Question-Answering. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.25

work page doi:10.18653/v1/2022.bionlp-1.25 2022
[27]

Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations

Richie, Russell and Grover, Sachin and Tsui, Fuchiang (Rich). Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.26

work page doi:10.18653/v1/2022.bionlp-1.26 2022
[28]

and Zuo, Xu and Hu, Yan and Kuttichi Keloth, Vipina and Li, Jianfu and Zheng, W

Das, Avisha and Selek, Salih and Warner, Alia R. and Zuo, Xu and Hu, Yan and Kuttichi Keloth, Vipina and Li, Jianfu and Zheng, W. Jim and Xu, Hua. Conversational Bots for Psychotherapy: A Study of Generative Transformer Models Using Domain-specific Dialogues. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bi...

work page doi:10.18653/v1/2022.bionlp-1.27 2022
[29]

BEEDS : Large-Scale Biomedical Event Extraction using Distant Supervision and Question Answering

Wang, Xing David and Leser, Ulf and Weber, Leon. BEEDS : Large-Scale Biomedical Event Extraction using Distant Supervision and Question Answering. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.28

work page doi:10.18653/v1/2022.bionlp-1.28 2022
[30]

Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection

Kim, Bosung and Nakashole, Ndapa. Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.29

work page doi:10.18653/v1/2022.bionlp-1.29 2022
[31]

Improving R omanian B io NER Using a Biologically Inspired System

Mitrofan, Maria and Pais, Vasile. Improving R omanian B io NER Using a Biologically Inspired System. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.30

work page doi:10.18653/v1/2022.bionlp-1.30 2022
[32]

B angla B io M ed: A Biomedical Named-Entity Annotated Corpus for B angla ( B engali)

Sazzed, Salim. B angla B io M ed: A Biomedical Named-Entity Annotated Corpus for B angla ( B engali). Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.31

work page doi:10.18653/v1/2022.bionlp-1.31 2022
[33]

ICDB ig B ird: A Contextual Embedding Model for ICD Code Classification

Michalopoulos, George and Malyska, Michal and Sahar, Nicola and Wong, Alexander and Chen, Helen. ICDB ig B ird: A Contextual Embedding Model for ICD Code Classification. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.32

work page doi:10.18653/v1/2022.bionlp-1.32 2022
[34]

Doctor XA v I er: Explainable Diagnosis on Physician-Patient Dialogues and XAI Evaluation

Ngai, Hillary and Rudzicz, Frank. Doctor XA v I er: Explainable Diagnosis on Physician-Patient Dialogues and XAI Evaluation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.33

work page doi:10.18653/v1/2022.bionlp-1.33 2022
[35]

DISTANT - CTO : A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature

Dhrangadhariya, Anjani and M. DISTANT - CTO : A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.34

work page doi:10.18653/v1/2022.bionlp-1.34 2022
[36]

and Peng, Yifan

Tang, Liyan and Kooragayalu, Shravan and Wang, Yanshan and Ding, Ying and Durrett, Greg and Rousseau, Justin F. and Peng, Yifan. E cho G en: Generating Conclusions from Echocardiogram Notes. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.35

work page doi:10.18653/v1/2022.bionlp-1.35 2022
[37]

Quantifying Clinical Outcome Measures in Patients with Epilepsy Using the Electronic Health Record

Xie, Kevin and Litt, Brian and Roth, Dan and Ellis, Colin A. Quantifying Clinical Outcome Measures in Patients with Epilepsy Using the Electronic Health Record. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.36

work page doi:10.18653/v1/2022.bionlp-1.36 2022
[38]

Comparing Encoder-Only and Encoder-Decoder Transformers for Relation Extraction from Biomedical Texts: An Empirical Study on Ten Benchmark Datasets

Sarrouti, Mourad and Tao, Carson and Mamy Randriamihaja, Yoann. Comparing Encoder-Only and Encoder-Decoder Transformers for Relation Extraction from Biomedical Texts: An Empirical Study on Ten Benchmark Datasets. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.37

work page doi:10.18653/v1/2022.bionlp-1.37 2022
[39]

Utility Preservation of Clinical Text After De-Identification

Vakili, Thomas and Dalianis, Hercules. Utility Preservation of Clinical Text After De-Identification. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.38

work page doi:10.18653/v1/2022.bionlp-1.38 2022
[40]

Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD -9 Coding

Falis, Mat \'u s and Dong, Hang and Birch, Alexandra and Alex, Beatrice. Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD -9 Coding. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.39

work page doi:10.18653/v1/2022.bionlp-1.39 2022
[41]

Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models

Chandak, Sidhant and Zhang, Liqing and Brown, Connor and Huang, Lifu. Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.40

work page doi:10.18653/v1/2022.bionlp-1.40 2022
[42]

Model Distillation for Faithful Explanations of Medical Code Predictions

Wood-Doughty, Zach and Cachola, Isabel and Dredze, Mark. Model Distillation for Faithful Explanations of Medical Code Predictions. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.41

work page doi:10.18653/v1/2022.bionlp-1.41 2022
[43]

and Szolovits, Peter

Liang, Jennifer J and Lehman, Eric and Iyengar, Ananya and Mahajan, Diwakar and Raghavan, Preethi and Chang, Cindy Y. and Szolovits, Peter. Towards Generalizable Methods for Automating Risk Score Calculation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.42

work page doi:10.18653/v1/2022.bionlp-1.42 2022
[44]

D o SSIER at M ed V id QA 2022: Text-based Approaches to Medical Video Answer Localization Problem

Kusa, Wojciech and Peikos, Georgios and Espitia, \'O scar and Hanbury, Allan and Pasi, Gabriella. D o SSIER at M ed V id QA 2022: Text-based Approaches to Medical Video Answer Localization Problem. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.43

work page doi:10.18653/v1/2022.bionlp-1.43 2022
[45]

Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022

2022
[46]

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Jin, Xisen and Zhang, Dejiao and Zhu, Henghui and Xiao, Wei and Li, Shang-Wen and Wei, Xiaokai and Arnold, Andrew and Ren, Xiang. Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigsc...

work page doi:10.18653/v1/2022.bigscience-1.1 2022
[47]

Using ASR -Generated Text for Spoken Language Modeling

Herv \'e , Nicolas and Pelloin, Valentin and Favre, Benoit and Dary, Franck and Laurent, Antoine and Meignier, Sylvain and Besacier, Laurent. Using ASR -Generated Text for Spoken Language Modeling. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.2

work page doi:10.18653/v1/2022.bigscience-1.2 2022
[48]

You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings

Talat, Zeerak and N \'e v \'e ol, Aur \'e lie and Biderman, Stella and Clinciu, Miruna and Dey, Manan and Longpre, Shayne and Luccioni, Sasha and Masoud, Maraim and Mitchell, Margaret and Radev, Dragomir and Sharma, Shanya and Subramonian, Arjun and Tae, Jaesung and Tan, Samson and Tunuguntla, Deepak and Van Der Wal, Oskar. You reap what you sow: On the C...

work page doi:10.18653/v1/2022.bigscience-1.3 2022
[49]

Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model

Kobayashi, Sosuke and Kiyono, Shun and Suzuki, Jun and Inui, Kentaro. Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.4

work page doi:10.18653/v1/2022.bigscience-1.4 2022
[50]

UNIREX : A Unified Learning Framework for Language Model Rationale Extraction

Chan, Aaron and Sanjabi, Maziar and Mathias, Lambert and Tan, Liang and Nie, Shaoliang and Peng, Xiaochang and Ren, Xiang and Firooz, Hamed. UNIREX : A Unified Learning Framework for Language Model Rationale Extraction. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/...

work page doi:10.18653/v1/2022.bigscience-1.5 2022
[51]

Pipelines for Social Bias Testing of Large Language Models

Nozza, Debora and Bianchi, Federico and Hovy, Dirk. Pipelines for Social Bias Testing of Large Language Models. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.6

work page doi:10.18653/v1/2022.bigscience-1.6 2022
[52]

Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0

De Toni, Francesco and Akiki, Christopher and De La Rosa, Javier and Fourrier, Cl \'e mentine and Manjavacas, Enrique and Schweter, Stefan and Van Strien, Daniel. Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. do...

work page doi:10.18653/v1/2022.bigscience-1.7 2022
[53]

A Holistic Assessment of the Carbon Footprint of Noor, a Very Large A rabic Language Model

Lakim, Imad and Almazrouei, Ebtesam and Abualhaol, Ibrahim and Debbah, Merouane and Launay, Julien. A Holistic Assessment of the Carbon Footprint of Noor, a Very Large A rabic Language Model. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.8

work page doi:10.18653/v1/2022.bigscience-1.8 2022
[54]

GPT - N eo X -20 B : An Open-Source Autoregressive Language Model

Black, Sidney and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and Pieler, Michael and Prashanth, Usvsn Sai and Purohit, Shivanshu and Reynolds, Laria and Tow, Jonathan and Wang, Ben and Weinbach, Samuel. GPT - N eo X -20 B : An Open-Source ...

work page doi:10.18653/v1/2022.bigscience-1.9 2022
[55]

Dataset Debt in Biomedical Language Modeling

Fries, Jason and Seelam, Natasha and Altay, Gabriel and Weber, Leon and Kang, Myungsun and Datta, Debajyoti and Su, Ruisi and Garda, Samuele and Wang, Bo and Ott, Simon and Samwald, Matthias and Kusa, Wojciech. Dataset Debt in Biomedical Language Modeling. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large La...

work page doi:10.18653/v1/2022.bigscience-1.10 2022
[56]

Emergent Structures and Training Dynamics in Large Language Models

Teehan, Ryan and Clinciu, Miruna and Serikov, Oleg and Szczechla, Eliza and Seelam, Natasha and Mirkin, Shachar and Gokaslan, Aaron. Emergent Structures and Training Dynamics in Large Language Models. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.11

work page doi:10.18653/v1/2022.bigscience-1.11 2022
[57]

Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned

Horawalavithana, Sameera and Ayton, Ellyn and Sharma, Shivam and Howland, Scott and Subramanian, Megha and Vasquez, Scott and Cosbey, Robin and Glenski, Maria and Volkova, Svitlana. Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Pers...

work page doi:10.18653/v1/2022.bigscience-1.12 2022
[58]

Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022

2022
[59]

Using Item Response Theory to Measure Gender and Racial Bias of a BERT -based Automated E nglish Speech Assessment System

Kwako, Alexander and Wan, Yixin and Zhao, Jieyu and Chang, Kai-Wei and Cai, Li and Hansen, Mark. Using Item Response Theory to Measure Gender and Racial Bias of a BERT -based Automated E nglish Speech Assessment System. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.1

work page doi:10.18653/v1/2022.bea-1.1 2022
[60]

Automatic scoring of short answers using justification cues estimated by BERT

Takano, Shunya and Ichikawa, Osamu. Automatic scoring of short answers using justification cues estimated by BERT. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.2

work page doi:10.18653/v1/2022.bea-1.2 2022
[61]

Mitigating Learnerese Effects for CEFR Classification

Jalota, Rricha and Bourgonje, Peter and Van Sas, Jan and Huang, Huiyan. Mitigating Learnerese Effects for CEFR Classification. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.3

work page doi:10.18653/v1/2022.bea-1.3 2022
[62]

Automatically Detecting Reduced-formed E nglish Pronunciations by Using Deep Learning

Chen, Lei and Jiang, Chenglin and Gu, Yiwei and Liu, Yang and Yuan, Jiahong. Automatically Detecting Reduced-formed E nglish Pronunciations by Using Deep Learning. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.4

work page doi:10.18653/v1/2022.bea-1.4 2022
[63]

A Baseline Readability Model for C ebuano

Imperial, Joseph Marvin and Reyes, Lloyd Lois Antonie and Ibanez, Michael Antonio and Sapinit, Ranz and Hussien, Mohammed. A Baseline Readability Model for C ebuano. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.5

work page doi:10.18653/v1/2022.bea-1.5 2022
[64]

Generation of Synthetic Error Data of Verb Order Errors for S wedish

Casademont Moner, Judit and Volodina, Elena. Generation of Synthetic Error Data of Verb Order Errors for S wedish. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.6

work page doi:10.18653/v1/2022.bea-1.6 2022
[65]

A Dependency Treebank of Spoken Second Language E nglish

Kyle, Kristopher and Eguchi, Masaki and Miller, Aaron and Sither, Theodore. A Dependency Treebank of Spoken Second Language E nglish. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.7

work page doi:10.18653/v1/2022.bea-1.7 2022
[66]

Starting from ``Zero'': An Incremental Zero-shot Learning Approach for Assessing Peer Feedback Comments

Jia, Qinjin and Cao, Yupeng and Gehringer, Edward. Starting from ``Zero'': An Incremental Zero-shot Learning Approach for Assessing Peer Feedback Comments. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.8

work page doi:10.18653/v1/2022.bea-1.8 2022
[67]

On Assessing and Developing Spoken `Grammatical Error Correction' Systems

Lu, Yiting and Bann \`o , Stefano and Gales, Mark. On Assessing and Developing Spoken `Grammatical Error Correction' Systems. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.9

work page doi:10.18653/v1/2022.bea-1.9 2022
[68]

Automatic True/False Question Generation for Educational Purpose

Zou, Bowei and Li, Pengfei and Pan, Liangming and Aw, Ai Ti. Automatic True/False Question Generation for Educational Purpose. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.10

work page doi:10.18653/v1/2022.bea-1.10 2022
[69]

and Sumner, Tamara

Suresh, Abhijit and Jacobs, Jennifer and Perkoff, Margaret and Martin, James H. and Sumner, Tamara. Fine-tuning Transformers with Additional Context to Classify Discursive Moves in Mathematics Classrooms. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.11

work page doi:10.18653/v1/2022.bea-1.11 2022
[70]

Cross-corpora experiments of automatic proficiency assessment and error detection for spoken E nglish

Bann \`o , Stefano and Matassoni, Marco. Cross-corpora experiments of automatic proficiency assessment and error detection for spoken E nglish. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.12

work page doi:10.18653/v1/2022.bea-1.12 2022
[71]

Activity focused Speech Recognition of Preschool Children in Early Childhood Classrooms

Dutta, Satwik and Irvin, Dwight and Buzhardt, Jay and Hansen, John H.L. Activity focused Speech Recognition of Preschool Children in Early Childhood Classrooms. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.13

work page doi:10.18653/v1/2022.bea-1.13 2022
[72]

Structural information in mathematical formulas for exercise difficulty prediction: a comparison of NLP representations

Loginova, Ekaterina and Benoit, Dries. Structural information in mathematical formulas for exercise difficulty prediction: a comparison of NLP representations. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.14

work page doi:10.18653/v1/2022.bea-1.14 2022
[73]

The Specificity and Helpfulness of Peer-to-Peer Feedback in Higher Education

Rietsche, Roman and Caines, Andrew and Schramm, Cornelius and Pf. The Specificity and Helpfulness of Peer-to-Peer Feedback in Higher Education. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.15

work page doi:10.18653/v1/2022.bea-1.15 2022
[74]

Similarity-Based Content Scoring - How to Make S - BERT Keep Up With BERT

Bexte, Marie and Horbach, Andrea and Zesch, Torsten. Similarity-Based Content Scoring - How to Make S - BERT Keep Up With BERT. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.16

work page doi:10.18653/v1/2022.bea-1.16 2022
[75]

Don ' t Drop the Topic - The Role of the Prompt in Argument Identification in Student Writing

Ding, Yuning and Bexte, Marie and Horbach, Andrea. Don ' t Drop the Topic - The Role of the Prompt in Argument Identification in Student Writing. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.17

work page doi:10.18653/v1/2022.bea-1.17 2022
[76]

ALEN App: Argumentative Writing Support To Foster E nglish Language Learning

Wambsganss, Thiemo and Caines, Andrew and Buttery, Paula. ALEN App: Argumentative Writing Support To Foster E nglish Language Learning. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.18

work page doi:10.18653/v1/2022.bea-1.18 2022
[77]

Assessing sentence readability for G erman language learners with broad linguistic modeling or readability formulas: When do linguistic insights make a difference?

Weiss, Zarah and Meurers, Detmar. Assessing sentence readability for G erman language learners with broad linguistic modeling or readability formulas: When do linguistic insights make a difference?. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.19

work page doi:10.18653/v1/2022.bea-1.19 2022
[78]

Parametrizable exercise generation from authentic texts: Effectively targeting the language means on the curriculum

Heck, Tanja and Meurers, Detmar. Parametrizable exercise generation from authentic texts: Effectively targeting the language means on the curriculum. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.20

work page doi:10.18653/v1/2022.bea-1.20 2022
[79]

Selecting Context Clozes for Lightweight Reading Compliance

Keim, Greg and Littman, Michael. Selecting Context Clozes for Lightweight Reading Compliance. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.21

work page doi:10.18653/v1/2022.bea-1.21 2022
[80]

`Meet me at the ribary' -- Acceptability of spelling variants in free-text answers to listening comprehension prompts

Laarmann-Quante, Ronja and Schwarz, Leska and Horbach, Andrea and Zesch, Torsten. `Meet me at the ribary' -- Acceptability of spelling variants in free-text answers to listening comprehension prompts. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.22

work page doi:10.18653/v1/2022.bea-1.22 2022

Showing first 80 references.

[1] [1]

Proceedings of the 21st Workshop on Biomedical Language Processing. 2022

2022

[2] [2]

Explainable Assessment of Healthcare Articles with QA

Boissonnet, Alodie and Saeidi, Marzieh and Plachouras, Vassilis and Vlachos, Andreas. Explainable Assessment of Healthcare Articles with QA. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.1

work page doi:10.18653/v1/2022.bionlp-1.1 2022

[3] [3]

A sequence-to-sequence approach for document-level relation extraction

Giorgi, John and Bader, Gary and Wang, Bo. A sequence-to-sequence approach for document-level relation extraction. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.2

work page doi:10.18653/v1/2022.bionlp-1.2 2022

[4] [4]

Position-based Prompting for Health Outcome Generation

Abaho, Micheal and Bollegala, Danushka and Williamson, Paula and Dodd, Susanna. Position-based Prompting for Health Outcome Generation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.3

work page doi:10.18653/v1/2022.bionlp-1.3 2022

[5] [5]

How You Say It Matters: Measuring the Impact of Verbal Disfluency Tags on Automated Dementia Detection

Farzana, Shahla and Deshpande, Ashwin and Parde, Natalie. How You Say It Matters: Measuring the Impact of Verbal Disfluency Tags on Automated Dementia Detection. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.4

work page doi:10.18653/v1/2022.bionlp-1.4 2022

[6] [6]

Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training

Soleimani, Amir and Nikoulina, Vassilina and Favre, Benoit and Ait Mokhtar, Salah. Zero-Shot Aspect-Based Scientific Document Summarization using Self-Supervised Pre-training. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.5

work page doi:10.18653/v1/2022.bionlp-1.5 2022

[7] [7]

Data Augmentation for Biomedical Factoid Question Answering

Pappas, Dimitris and Malakasiotis, Prodromos and Androutsopoulos, Ion. Data Augmentation for Biomedical Factoid Question Answering. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.6

work page doi:10.18653/v1/2022.bionlp-1.6 2022

[8] [8]

Slot Filling for Biomedical Information Extraction

Papanikolaou, Yannis and Staib, Marlene and Grace, Justin Joshua and Bennett, Francine. Slot Filling for Biomedical Information Extraction. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.7

work page doi:10.18653/v1/2022.bionlp-1.7 2022

[9] [9]

Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations

Zeng, Sihang and Yuan, Zheng and Yu, Sheng. Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.8

work page doi:10.18653/v1/2022.bionlp-1.8 2022

[10] [10]

B io BART : Pretraining and Evaluation of A Biomedical Generative Language Model

Yuan, Hongyi and Yuan, Zheng and Gan, Ruyi and Zhang, Jiaxing and Xie, Yutao and Yu, Sheng. B io BART : Pretraining and Evaluation of A Biomedical Generative Language Model. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.9

work page doi:10.18653/v1/2022.bionlp-1.9 2022

[11] [11]

Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation

Naseem, Usman and Bandi, Ajay and Raza, Shaina and Rashid, Junaid and Chakravarthi, Bharathi Raja. Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.10

work page doi:10.18653/v1/2022.bionlp-1.10 2022

[12] [12]

Memory-aligned Knowledge Graph for Clinically Accurate Radiology Image Report Generation

Yan, Sixing. Memory-aligned Knowledge Graph for Clinically Accurate Radiology Image Report Generation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.11

work page doi:10.18653/v1/2022.bionlp-1.11 2022

[13] [13]

Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts

Phan, Uyen and Nguyen, Nhung. Simple Semantic-based Data Augmentation for Named Entity Recognition in Biomedical Texts. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.12

work page doi:10.18653/v1/2022.bionlp-1.12 2022

[14] [14]

Auxiliary Learning for Named Entity Recognition with Multiple Auxiliary Biomedical Training Data

Watanabe, Taiki and Ichikawa, Tomoya and Tamura, Akihiro and Iwakura, Tomoya and Ma, Chunpeng and Kato, Tsuneo. Auxiliary Learning for Named Entity Recognition with Multiple Auxiliary Biomedical Training Data. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.13

work page doi:10.18653/v1/2022.bionlp-1.13 2022

[15] [15]

SNP 2 V ec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Cahyawijaya, Samuel and Yu, Tiezheng and Liu, Zihan and Zhou, Xiaopu and Mak, Tze Wing Tiffany and Ip, Yuk Yu Nancy and Fung, Pascale. SNP 2 V ec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.14

work page doi:10.18653/v1/2022.bionlp-1.14 2022

[16] [16]

Biomedical NER using Novel Schema and Distant Supervision

Khandelwal, Anshita and Kar, Alok and Chikka, Veera Raghavendra and Karlapalem, Kamalakar. Biomedical NER using Novel Schema and Distant Supervision. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.15

work page doi:10.18653/v1/2022.bionlp-1.15 2022

[17] [17]

Improving Supervised Drug-Protein Relation Extraction with Distantly Supervised Models

Iinuma, Naoki and Miwa, Makoto and Sasaki, Yutaka. Improving Supervised Drug-Protein Relation Extraction with Distantly Supervised Models. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.16

work page doi:10.18653/v1/2022.bionlp-1.16 2022

[18] [18]

Named Entity Recognition for Cancer Immunology Research Using Distant Supervision

Trieu, Hai-Long and Miwa, Makoto and Ananiadou, Sophia. Named Entity Recognition for Cancer Immunology Research Using Distant Supervision. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.17

work page doi:10.18653/v1/2022.bionlp-1.17 2022

[19] [19]

Intra-Template Entity Compatibility based Slot-Filling for Clinical Trial Information Extraction

Witte, Christian and Cimiano, Philipp. Intra-Template Entity Compatibility based Slot-Filling for Clinical Trial Information Extraction. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.18

work page doi:10.18653/v1/2022.bionlp-1.18 2022

[20] [20]

Pretrained Biomedical Language Models for Clinical NLP in S panish

Carrino, Casimiro Pio and Llop, Joan and P \`a mies, Marc and Guti \'e rrez-Fandi \ n o, Asier and Armengol-Estap \'e , Jordi and Silveira-Ocampo, Joaqu \'i n and Valencia, Alfonso and Gonzalez-Agirre, Aitor and Villegas, Marta. Pretrained Biomedical Language Models for Clinical NLP in S panish. Proceedings of the 21st Workshop on Biomedical Language Proc...

work page doi:10.18653/v1/2022.bionlp-1.19 2022

[21] [21]

Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts

Amin, Saadullah and Pokaratsiri Goldstein, Noon and Wixted, Morgan and Garcia-Rudolph, Alejandro and Mart \'i nez-Costa, Catalina and Neumann, Guenter. Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.20

work page doi:10.18653/v1/2022.bionlp-1.20 2022

[22] [22]

VPAI \_ L ab at M ed V id QA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification

Li, Bin and Weng, Yixuan and Xia, Fei and Sun, Bin and Li, Shutao. VPAI \_ L ab at M ed V id QA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.21

work page doi:10.18653/v1/2022.bionlp-1.21 2022

[23] [23]

G en C ompare S um: a hybrid unsupervised summarization method using salience

Bishop, Jennifer and Xie, Qianqian and Ananiadou, Sophia. G en C ompare S um: a hybrid unsupervised summarization method using salience. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.22

work page doi:10.18653/v1/2022.bionlp-1.22 2022

[24] [24]

B io C ite: A Deep Learning-based Citation Linkage Framework for Biomedical Research Articles

Singha Roy, Sudipta and Mercer, Robert E. B io C ite: A Deep Learning-based Citation Linkage Framework for Biomedical Research Articles. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.23

work page doi:10.18653/v1/2022.bionlp-1.23 2022

[25] [25]

Low Resource Causal Event Detection from Biomedical Literature

Liang, Zhengzhong and Noriega-Atala, Enrique and Morrison, Clayton and Surdeanu, Mihai. Low Resource Causal Event Detection from Biomedical Literature. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.24

work page doi:10.18653/v1/2022.bionlp-1.24 2022

[26] [26]

Overview of the M ed V id QA 2022 Shared Task on Medical Video Question-Answering

Gupta, Deepak and Demner-Fushman, Dina. Overview of the M ed V id QA 2022 Shared Task on Medical Video Question-Answering. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.25

work page doi:10.18653/v1/2022.bionlp-1.25 2022

[27] [27]

Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations

Richie, Russell and Grover, Sachin and Tsui, Fuchiang (Rich). Inter-annotator agreement is not the ceiling of machine learning performance: Evidence from a comprehensive set of simulations. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.26

work page doi:10.18653/v1/2022.bionlp-1.26 2022

[28] [28]

and Zuo, Xu and Hu, Yan and Kuttichi Keloth, Vipina and Li, Jianfu and Zheng, W

Das, Avisha and Selek, Salih and Warner, Alia R. and Zuo, Xu and Hu, Yan and Kuttichi Keloth, Vipina and Li, Jianfu and Zheng, W. Jim and Xu, Hua. Conversational Bots for Psychotherapy: A Study of Generative Transformer Models Using Domain-specific Dialogues. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bi...

work page doi:10.18653/v1/2022.bionlp-1.27 2022

[29] [29]

BEEDS : Large-Scale Biomedical Event Extraction using Distant Supervision and Question Answering

Wang, Xing David and Leser, Ulf and Weber, Leon. BEEDS : Large-Scale Biomedical Event Extraction using Distant Supervision and Question Answering. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.28

work page doi:10.18653/v1/2022.bionlp-1.28 2022

[30] [30]

Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection

Kim, Bosung and Nakashole, Ndapa. Data Augmentation for Rare Symptoms in Vaccine Side-Effect Detection. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.29

work page doi:10.18653/v1/2022.bionlp-1.29 2022

[31] [31]

Improving R omanian B io NER Using a Biologically Inspired System

Mitrofan, Maria and Pais, Vasile. Improving R omanian B io NER Using a Biologically Inspired System. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.30

work page doi:10.18653/v1/2022.bionlp-1.30 2022

[32] [32]

B angla B io M ed: A Biomedical Named-Entity Annotated Corpus for B angla ( B engali)

Sazzed, Salim. B angla B io M ed: A Biomedical Named-Entity Annotated Corpus for B angla ( B engali). Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.31

work page doi:10.18653/v1/2022.bionlp-1.31 2022

[33] [33]

ICDB ig B ird: A Contextual Embedding Model for ICD Code Classification

Michalopoulos, George and Malyska, Michal and Sahar, Nicola and Wong, Alexander and Chen, Helen. ICDB ig B ird: A Contextual Embedding Model for ICD Code Classification. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.32

work page doi:10.18653/v1/2022.bionlp-1.32 2022

[34] [34]

Doctor XA v I er: Explainable Diagnosis on Physician-Patient Dialogues and XAI Evaluation

Ngai, Hillary and Rudzicz, Frank. Doctor XA v I er: Explainable Diagnosis on Physician-Patient Dialogues and XAI Evaluation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.33

work page doi:10.18653/v1/2022.bionlp-1.33 2022

[35] [35]

DISTANT - CTO : A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature

Dhrangadhariya, Anjani and M. DISTANT - CTO : A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.34

work page doi:10.18653/v1/2022.bionlp-1.34 2022

[36] [36]

and Peng, Yifan

Tang, Liyan and Kooragayalu, Shravan and Wang, Yanshan and Ding, Ying and Durrett, Greg and Rousseau, Justin F. and Peng, Yifan. E cho G en: Generating Conclusions from Echocardiogram Notes. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.35

work page doi:10.18653/v1/2022.bionlp-1.35 2022

[37] [37]

Quantifying Clinical Outcome Measures in Patients with Epilepsy Using the Electronic Health Record

Xie, Kevin and Litt, Brian and Roth, Dan and Ellis, Colin A. Quantifying Clinical Outcome Measures in Patients with Epilepsy Using the Electronic Health Record. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.36

work page doi:10.18653/v1/2022.bionlp-1.36 2022

[38] [38]

Comparing Encoder-Only and Encoder-Decoder Transformers for Relation Extraction from Biomedical Texts: An Empirical Study on Ten Benchmark Datasets

Sarrouti, Mourad and Tao, Carson and Mamy Randriamihaja, Yoann. Comparing Encoder-Only and Encoder-Decoder Transformers for Relation Extraction from Biomedical Texts: An Empirical Study on Ten Benchmark Datasets. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.37

work page doi:10.18653/v1/2022.bionlp-1.37 2022

[39] [39]

Utility Preservation of Clinical Text After De-Identification

Vakili, Thomas and Dalianis, Hercules. Utility Preservation of Clinical Text After De-Identification. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.38

work page doi:10.18653/v1/2022.bionlp-1.38 2022

[40] [40]

Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD -9 Coding

Falis, Mat \'u s and Dong, Hang and Birch, Alexandra and Alex, Beatrice. Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD -9 Coding. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.39

work page doi:10.18653/v1/2022.bionlp-1.39 2022

[41] [41]

Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models

Chandak, Sidhant and Zhang, Liqing and Brown, Connor and Huang, Lifu. Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.40

work page doi:10.18653/v1/2022.bionlp-1.40 2022

[42] [42]

Model Distillation for Faithful Explanations of Medical Code Predictions

Wood-Doughty, Zach and Cachola, Isabel and Dredze, Mark. Model Distillation for Faithful Explanations of Medical Code Predictions. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.41

work page doi:10.18653/v1/2022.bionlp-1.41 2022

[43] [43]

and Szolovits, Peter

Liang, Jennifer J and Lehman, Eric and Iyengar, Ananya and Mahajan, Diwakar and Raghavan, Preethi and Chang, Cindy Y. and Szolovits, Peter. Towards Generalizable Methods for Automating Risk Score Calculation. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.42

work page doi:10.18653/v1/2022.bionlp-1.42 2022

[44] [44]

D o SSIER at M ed V id QA 2022: Text-based Approaches to Medical Video Answer Localization Problem

Kusa, Wojciech and Peikos, Georgios and Espitia, \'O scar and Hanbury, Allan and Pasi, Gabriella. D o SSIER at M ed V id QA 2022: Text-based Approaches to Medical Video Answer Localization Problem. Proceedings of the 21st Workshop on Biomedical Language Processing. 2022. doi:10.18653/v1/2022.bionlp-1.43

work page doi:10.18653/v1/2022.bionlp-1.43 2022

[45] [45]

Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022

2022

[46] [46]

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Jin, Xisen and Zhang, Dejiao and Zhu, Henghui and Xiao, Wei and Li, Shang-Wen and Wei, Xiaokai and Arnold, Andrew and Ren, Xiang. Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigsc...

work page doi:10.18653/v1/2022.bigscience-1.1 2022

[47] [47]

Using ASR -Generated Text for Spoken Language Modeling

Herv \'e , Nicolas and Pelloin, Valentin and Favre, Benoit and Dary, Franck and Laurent, Antoine and Meignier, Sylvain and Besacier, Laurent. Using ASR -Generated Text for Spoken Language Modeling. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.2

work page doi:10.18653/v1/2022.bigscience-1.2 2022

[48] [48]

You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings

Talat, Zeerak and N \'e v \'e ol, Aur \'e lie and Biderman, Stella and Clinciu, Miruna and Dey, Manan and Longpre, Shayne and Luccioni, Sasha and Masoud, Maraim and Mitchell, Margaret and Radev, Dragomir and Sharma, Shanya and Subramonian, Arjun and Tae, Jaesung and Tan, Samson and Tunuguntla, Deepak and Van Der Wal, Oskar. You reap what you sow: On the C...

work page doi:10.18653/v1/2022.bigscience-1.3 2022

[49] [49]

Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model

Kobayashi, Sosuke and Kiyono, Shun and Suzuki, Jun and Inui, Kentaro. Diverse Lottery Tickets Boost Ensemble from a Single Pretrained Model. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.4

work page doi:10.18653/v1/2022.bigscience-1.4 2022

[50] [50]

UNIREX : A Unified Learning Framework for Language Model Rationale Extraction

Chan, Aaron and Sanjabi, Maziar and Mathias, Lambert and Tan, Liang and Nie, Shaoliang and Peng, Xiaochang and Ren, Xiang and Firooz, Hamed. UNIREX : A Unified Learning Framework for Language Model Rationale Extraction. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/...

work page doi:10.18653/v1/2022.bigscience-1.5 2022

[51] [51]

Pipelines for Social Bias Testing of Large Language Models

Nozza, Debora and Bianchi, Federico and Hovy, Dirk. Pipelines for Social Bias Testing of Large Language Models. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.6

work page doi:10.18653/v1/2022.bigscience-1.6 2022

[52] [52]

Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0

De Toni, Francesco and Akiki, Christopher and De La Rosa, Javier and Fourrier, Cl \'e mentine and Manjavacas, Enrique and Schweter, Stefan and Van Strien, Daniel. Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. do...

work page doi:10.18653/v1/2022.bigscience-1.7 2022

[53] [53]

A Holistic Assessment of the Carbon Footprint of Noor, a Very Large A rabic Language Model

Lakim, Imad and Almazrouei, Ebtesam and Abualhaol, Ibrahim and Debbah, Merouane and Launay, Julien. A Holistic Assessment of the Carbon Footprint of Noor, a Very Large A rabic Language Model. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.8

work page doi:10.18653/v1/2022.bigscience-1.8 2022

[54] [54]

GPT - N eo X -20 B : An Open-Source Autoregressive Language Model

Black, Sidney and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and Pieler, Michael and Prashanth, Usvsn Sai and Purohit, Shivanshu and Reynolds, Laria and Tow, Jonathan and Wang, Ben and Weinbach, Samuel. GPT - N eo X -20 B : An Open-Source ...

work page doi:10.18653/v1/2022.bigscience-1.9 2022

[55] [55]

Dataset Debt in Biomedical Language Modeling

Fries, Jason and Seelam, Natasha and Altay, Gabriel and Weber, Leon and Kang, Myungsun and Datta, Debajyoti and Su, Ruisi and Garda, Samuele and Wang, Bo and Ott, Simon and Samwald, Matthias and Kusa, Wojciech. Dataset Debt in Biomedical Language Modeling. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large La...

work page doi:10.18653/v1/2022.bigscience-1.10 2022

[56] [56]

Emergent Structures and Training Dynamics in Large Language Models

Teehan, Ryan and Clinciu, Miruna and Serikov, Oleg and Szczechla, Eliza and Seelam, Natasha and Mirkin, Shachar and Gokaslan, Aaron. Emergent Structures and Training Dynamics in Large Language Models. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022. doi:10.18653/v1/2022.bigscience-1.11

work page doi:10.18653/v1/2022.bigscience-1.11 2022

[57] [57]

Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned

Horawalavithana, Sameera and Ayton, Ellyn and Sharma, Shivam and Howland, Scott and Subramanian, Megha and Vasquez, Scott and Cosbey, Robin and Glenski, Maria and Volkova, Svitlana. Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned. Proceedings of BigScience Episode \# 5 -- Workshop on Challenges & Pers...

work page doi:10.18653/v1/2022.bigscience-1.12 2022

[58] [58]

Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022

2022

[59] [59]

Using Item Response Theory to Measure Gender and Racial Bias of a BERT -based Automated E nglish Speech Assessment System

Kwako, Alexander and Wan, Yixin and Zhao, Jieyu and Chang, Kai-Wei and Cai, Li and Hansen, Mark. Using Item Response Theory to Measure Gender and Racial Bias of a BERT -based Automated E nglish Speech Assessment System. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.1

work page doi:10.18653/v1/2022.bea-1.1 2022

[60] [60]

Automatic scoring of short answers using justification cues estimated by BERT

Takano, Shunya and Ichikawa, Osamu. Automatic scoring of short answers using justification cues estimated by BERT. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.2

work page doi:10.18653/v1/2022.bea-1.2 2022

[61] [61]

Mitigating Learnerese Effects for CEFR Classification

Jalota, Rricha and Bourgonje, Peter and Van Sas, Jan and Huang, Huiyan. Mitigating Learnerese Effects for CEFR Classification. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.3

work page doi:10.18653/v1/2022.bea-1.3 2022

[62] [62]

Automatically Detecting Reduced-formed E nglish Pronunciations by Using Deep Learning

Chen, Lei and Jiang, Chenglin and Gu, Yiwei and Liu, Yang and Yuan, Jiahong. Automatically Detecting Reduced-formed E nglish Pronunciations by Using Deep Learning. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.4

work page doi:10.18653/v1/2022.bea-1.4 2022

[63] [63]

A Baseline Readability Model for C ebuano

Imperial, Joseph Marvin and Reyes, Lloyd Lois Antonie and Ibanez, Michael Antonio and Sapinit, Ranz and Hussien, Mohammed. A Baseline Readability Model for C ebuano. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.5

work page doi:10.18653/v1/2022.bea-1.5 2022

[64] [64]

Generation of Synthetic Error Data of Verb Order Errors for S wedish

Casademont Moner, Judit and Volodina, Elena. Generation of Synthetic Error Data of Verb Order Errors for S wedish. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.6

work page doi:10.18653/v1/2022.bea-1.6 2022

[65] [65]

A Dependency Treebank of Spoken Second Language E nglish

Kyle, Kristopher and Eguchi, Masaki and Miller, Aaron and Sither, Theodore. A Dependency Treebank of Spoken Second Language E nglish. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.7

work page doi:10.18653/v1/2022.bea-1.7 2022

[66] [66]

Starting from ``Zero'': An Incremental Zero-shot Learning Approach for Assessing Peer Feedback Comments

Jia, Qinjin and Cao, Yupeng and Gehringer, Edward. Starting from ``Zero'': An Incremental Zero-shot Learning Approach for Assessing Peer Feedback Comments. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.8

work page doi:10.18653/v1/2022.bea-1.8 2022

[67] [67]

On Assessing and Developing Spoken `Grammatical Error Correction' Systems

Lu, Yiting and Bann \`o , Stefano and Gales, Mark. On Assessing and Developing Spoken `Grammatical Error Correction' Systems. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.9

work page doi:10.18653/v1/2022.bea-1.9 2022

[68] [68]

Automatic True/False Question Generation for Educational Purpose

Zou, Bowei and Li, Pengfei and Pan, Liangming and Aw, Ai Ti. Automatic True/False Question Generation for Educational Purpose. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.10

work page doi:10.18653/v1/2022.bea-1.10 2022

[69] [69]

and Sumner, Tamara

Suresh, Abhijit and Jacobs, Jennifer and Perkoff, Margaret and Martin, James H. and Sumner, Tamara. Fine-tuning Transformers with Additional Context to Classify Discursive Moves in Mathematics Classrooms. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.11

work page doi:10.18653/v1/2022.bea-1.11 2022

[70] [70]

Cross-corpora experiments of automatic proficiency assessment and error detection for spoken E nglish

Bann \`o , Stefano and Matassoni, Marco. Cross-corpora experiments of automatic proficiency assessment and error detection for spoken E nglish. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.12

work page doi:10.18653/v1/2022.bea-1.12 2022

[71] [71]

Activity focused Speech Recognition of Preschool Children in Early Childhood Classrooms

Dutta, Satwik and Irvin, Dwight and Buzhardt, Jay and Hansen, John H.L. Activity focused Speech Recognition of Preschool Children in Early Childhood Classrooms. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.13

work page doi:10.18653/v1/2022.bea-1.13 2022

[72] [72]

Structural information in mathematical formulas for exercise difficulty prediction: a comparison of NLP representations

Loginova, Ekaterina and Benoit, Dries. Structural information in mathematical formulas for exercise difficulty prediction: a comparison of NLP representations. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.14

work page doi:10.18653/v1/2022.bea-1.14 2022

[73] [73]

The Specificity and Helpfulness of Peer-to-Peer Feedback in Higher Education

Rietsche, Roman and Caines, Andrew and Schramm, Cornelius and Pf. The Specificity and Helpfulness of Peer-to-Peer Feedback in Higher Education. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.15

work page doi:10.18653/v1/2022.bea-1.15 2022

[74] [74]

Similarity-Based Content Scoring - How to Make S - BERT Keep Up With BERT

Bexte, Marie and Horbach, Andrea and Zesch, Torsten. Similarity-Based Content Scoring - How to Make S - BERT Keep Up With BERT. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.16

work page doi:10.18653/v1/2022.bea-1.16 2022

[75] [75]

Don ' t Drop the Topic - The Role of the Prompt in Argument Identification in Student Writing

Ding, Yuning and Bexte, Marie and Horbach, Andrea. Don ' t Drop the Topic - The Role of the Prompt in Argument Identification in Student Writing. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.17

work page doi:10.18653/v1/2022.bea-1.17 2022

[76] [76]

ALEN App: Argumentative Writing Support To Foster E nglish Language Learning

Wambsganss, Thiemo and Caines, Andrew and Buttery, Paula. ALEN App: Argumentative Writing Support To Foster E nglish Language Learning. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.18

work page doi:10.18653/v1/2022.bea-1.18 2022

[77] [77]

Assessing sentence readability for G erman language learners with broad linguistic modeling or readability formulas: When do linguistic insights make a difference?

Weiss, Zarah and Meurers, Detmar. Assessing sentence readability for G erman language learners with broad linguistic modeling or readability formulas: When do linguistic insights make a difference?. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.19

work page doi:10.18653/v1/2022.bea-1.19 2022

[78] [78]

Parametrizable exercise generation from authentic texts: Effectively targeting the language means on the curriculum

Heck, Tanja and Meurers, Detmar. Parametrizable exercise generation from authentic texts: Effectively targeting the language means on the curriculum. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.20

work page doi:10.18653/v1/2022.bea-1.20 2022

[79] [79]

Selecting Context Clozes for Lightweight Reading Compliance

Keim, Greg and Littman, Michael. Selecting Context Clozes for Lightweight Reading Compliance. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.21

work page doi:10.18653/v1/2022.bea-1.21 2022

[80] [80]

`Meet me at the ribary' -- Acceptability of spelling variants in free-text answers to listening comprehension prompts

Laarmann-Quante, Ronja and Schwarz, Leska and Horbach, Andrea and Zesch, Torsten. `Meet me at the ribary' -- Acceptability of spelling variants in free-text answers to listening comprehension prompts. Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). 2022. doi:10.18653/v1/2022.bea-1.22

work page doi:10.18653/v1/2022.bea-1.22 2022