From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

Guangnan Ye; Yixin Cao; Yuhang Zhou

arxiv: 2606.07190 · v1 · pith:A3IPCBGAnew · submitted 2026-06-05 · 💻 cs.CL

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

Yuhang Zhou , Yixin Cao , Guangnan Ye This is my paper

Pith reviewed 2026-06-27 21:44 UTC · model grok-4.3

classification 💻 cs.CL

keywords prefix utilityprocess reward modelLLM reasoningmathematical reasoningpairwise rankingbest-of-n selectionbeam searchreinforcement learning

0 comments

The pith

Prefix gain measured via student solve-rate improvements trains a utility model that outperforms correctness-based rewards for guiding LLM reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that local step correctness is only an indirect signal for whether a reasoning prefix helps an LLM reach a correct final answer. Instead it measures prefix gain directly as the lift in solve rate when a group of lightweight student models is conditioned on that prefix. A Prefix Utility Model is then trained with a simple pairwise ranking loss to predict this gain, allowing it to score both full trajectories and partial prefixes. Experiments on mathematical reasoning show that PUM supplies stronger supervision than rule-based or correctness-based signals, particularly when candidate sets grow large, search budgets rise, or outcome rewards become sparse.

Core claim

Prefix gain is defined as the solve-rate improvement obtained by conditioning a lightweight student model group on a given prefix; a Prefix Utility Model trained with pairwise ranking on these gains learns to score both complete and partial reasoning steps and supplies a stronger prefix-level signal than local correctness in Best-of-N selection, beam search, and reinforcement learning on mathematical reasoning tasks.

What carries the argument

Prefix gain, the solve-rate improvement induced by conditioning a lightweight student model group on a prefix, used as the training target for a Prefix Utility Model via pairwise ranking.

If this is right

PUM can be used to rank or prune partial trajectories during search without waiting for final answers.
Performance gains increase as the number of candidate prefixes grows or when outcome-based rewards are unavailable.
The same model works for both complete trajectories and incomplete prefixes without retraining.
Reinforcement learning with PUM rewards improves sample efficiency when rule-based rewards are sparse.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may reduce the need for expensive human or rule-based process annotations by substituting cheap student-model rollouts.
If student models are chosen from the same family as the target LLM, the proxy may become tighter but could also introduce distribution shift issues at scale.
The method suggests a general recipe for turning any cheap proxy solver into a utility labeler for more expensive target models.

Load-bearing premise

That the solve-rate improvement observed when conditioning student models on a prefix is a faithful proxy for how useful that prefix is to the target LLM.

What would settle it

Measure the correlation between PUM scores and actual solve rates on a large held-out set of prefixes; if the correlation is near zero or negative while correctness-based scores remain positive, the utility claim is falsified.

Figures

Figures reproduced from arXiv: 2606.07190 by Guangnan Ye, Yixin Cao, Yuhang Zhou.

**Figure 2.** Figure 2: Prefix Utility Model: prefix sampling, gain-based preference construction, and pairwise utility learning. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Best-of-N selection and ranking robustness. Left: Best-of-N selection accuracy. Right: ranking robustness, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: RL training curves. reaches 57.21% average accuracy. We find that PRM and PQM exhibit stronger length preferences: the policy learns to generate longer responses that receive high process scores, while validation accuracy drops. This indicates reward hacking, where local process rewards are exploited without improving final-answer correctness. The training dynamics also explain why PUM+GRPO is effective.… view at source ↗

**Figure 5.** Figure 5: Further analyses on weak-to-strong, policy dependence, scaling, and hard-data RL. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Gain trends across prefix positions for correct and wrong trajectories. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Relationship between student capability gaps and pairwise gain agreement. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

Reasoning prefixes shape the future trajectory of LLM problem solving, yet existing process reward models usually evaluate them through local step correctness. We argue that correctness is a useful but indirect proxy for the effect we ultimately care about: whether a prefix increases the probability of successful completion. We define this effect as prefix gain, the solve-rate improvement induced by conditioning lightweight student model group on a prefix, and use it to train a Prefix Utility Model (PUM) with a simple pairwise ranking objective. PUM learns outcome-grounded prefix utility and can score both complete trajectories and partial reasoning prefixes. Across Best-of-$N$ selection, beam search, and reinforcement learning on mathematical reasoning, PUM provides a strong prefix-level supervision signal, especially when candidate pools are large, search budgets increase, or rule-based rewards are sparse. We release all data, models, and code at https://zhiqix.github.io/pum-project-page.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Prefix gain measured on student models is the new supervision signal, but the paper does not show it predicts utility for the target LLM.

read the letter

The main point is that they define prefix gain as the solve-rate delta you get by feeding a prefix to a group of lightweight student models, then train a Prefix Utility Model on pairwise rankings of those deltas. This is meant to give outcome-grounded scores for both full trajectories and partial prefixes instead of local correctness labels.

What they actually do is apply the resulting PUM to Best-of-N, beam search, and RL on math reasoning tasks. The claim is that it helps most when pools are large, search budgets grow, or rule-based rewards are thin. Releasing the data, models, and code is straightforward and useful for anyone who wants to test the signal themselves.

The soft spot is the proxy step. The paper measures gain only on the student models and then uses the PUM on the target LLM without reporting a direct comparison, such as rank correlation or calibration between student-induced deltas and the actual solve-rate change the target model sees on the same prefixes. Downstream task gains are consistent with the story but do not isolate whether the student proxy tracks the target’s behavior. If the student group and target reason differently, the learned scores could be misaligned.

This is for people building search or RL pipelines for LLM reasoning on math problems. Readers who want a new prefix-level signal and are willing to run their own validation will get something concrete from the definition and the released artifacts. It deserves a serious referee because the supervision idea is distinct from standard process-reward work and the use cases are practical, though the missing proxy check is the item that needs attention in review.

Referee Report

2 major / 1 minor

Summary. The paper argues that process reward models relying on local step correctness are indirect proxies for the desired outcome of increased solve probability. It defines prefix gain as the solve-rate delta induced by conditioning a group of lightweight student models on a given prefix, trains a Prefix Utility Model (PUM) via pairwise ranking on these gains, and reports that the resulting PUM supplies effective prefix-level supervision for Best-of-N, beam search, and RL on mathematical reasoning tasks, with larger gains under big candidate pools, higher search budgets, or sparse rule-based rewards. All data, models, and code are released.

Significance. If the student-model proxy is shown to correlate with target-LLM utility, the work supplies an outcome-grounded alternative to correctness-based process rewards and could improve search and RL in sparse-reward reasoning settings. The public release of artifacts is a clear strength that supports reproducibility.

major comments (2)

[Abstract / Method description] The central claim that PUM supplies useful supervision for the target LLM rests on the untested assumption that prefix gain measured on the student-model group correlates with the change in solve probability the same prefix would induce in the target model. No section reports a direct validation (rank correlation, calibration plot, or matched-prefix experiment) of student gain versus target gain.
[Experiments] Downstream improvements in Best-of-N, beam search, and RL are presented as evidence for PUM utility, yet these results are consistent with but do not isolate the proxy assumption; without an ablation that replaces the student-derived labels with target-derived labels or measures proxy fidelity, it remains unclear whether the reported gains stem from the gain-based formulation or from other factors.

minor comments (1)

[Method] Notation for the student-model group and the exact definition of solve-rate delta should be formalized with an equation early in the method section to avoid ambiguity when the same prefix is scored by PUM versus the original student ensemble.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the proxy assumption and experimental isolation. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract / Method description] The central claim that PUM supplies useful supervision for the target LLM rests on the untested assumption that prefix gain measured on the student-model group correlates with the change in solve probability the same prefix would induce in the target model. No section reports a direct validation (rank correlation, calibration plot, or matched-prefix experiment) of student gain versus target gain.

Authors: We acknowledge that the manuscript does not include a direct validation (such as rank correlation or matched-prefix experiments) of how well student-model prefix gains predict target-model gains. The student-model group was selected specifically to make large-scale prefix-gain computation tractable; repeating the same measurements on the target model would be substantially more expensive. Downstream gains on the target LLM offer indirect support for transfer, yet we agree these do not constitute a direct test of the proxy. In revision we will add an explicit limitations paragraph discussing the untested correlation and will report a small-scale correlation study on a held-out subset of prefixes if compute permits. revision: partial
Referee: [Experiments] Downstream improvements in Best-of-N, beam search, and RL are presented as evidence for PUM utility, yet these results are consistent with but do not isolate the proxy assumption; without an ablation that replaces the student-derived labels with target-derived labels or measures proxy fidelity, it remains unclear whether the reported gains stem from the gain-based formulation or from other factors.

Authors: The referee is correct that the reported improvements do not isolate the student-proxy contribution from other design choices. An ablation that substitutes target-derived labels would directly address this but is currently infeasible at the scale of our experiments due to the computational cost of evaluating thousands of prefixes on the target model. The existing results show consistent gains across search and RL settings, especially under large candidate pools and sparse rewards, but we will revise the text to state the claims more narrowly and to flag the missing proxy-fidelity ablation as an important direction for follow-up work. revision: partial

Circularity Check

0 steps flagged

No circularity: prefix gain defined via independent student-model proxy

full rationale

The paper defines prefix gain explicitly as the solve-rate improvement measured on a separate lightweight student model group, then trains PUM via pairwise ranking on those externally computed deltas. This construction uses an independent measurement process rather than fitting or deriving the quantity from the target LLM's own outputs or trajectories. No equations, self-citations, or uniqueness claims in the provided text reduce the central claim to a definitional loop or fitted input renamed as prediction. Downstream evaluations on Best-of-N, beam search, and RL for the target model constitute external validation of the proxy assumption, not a self-referential derivation. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the domain assumption that student-model solve-rate improvement is a valid proxy for prefix utility; no free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption Solve-rate improvement induced by conditioning lightweight student models on a prefix is a reliable measure of that prefix’s utility for successful completion.
This definition is the load-bearing premise used to train PUM via pairwise ranking.

invented entities (1)

Prefix Utility Model (PUM) no independent evidence
purpose: Model trained to predict prefix gain from reasoning prefixes.
New model introduced to operationalize the gain-based evaluation.

pith-pipeline@v0.9.1-grok · 5687 in / 1162 out tokens · 18738 ms · 2026-06-27T21:44:03.586081+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 135 canonical work pages

[1]

Advances in Neural Information Processing Systems , volume=

Bread: Branched rollouts from expert anchors bridge sft & rl for reasoning , author=. Advances in Neural Information Processing Systems , volume=
[2]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[3]

Advances in Neural Information Processing Systems , volume=

Self-evaluation guided beam search for reasoning , author=. Advances in Neural Information Processing Systems , volume=
[4]

arXiv preprint arXiv:2601.20829 , year=

Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning , author=. arXiv preprint arXiv:2601.20829 , year=

Pith/arXiv arXiv
[5]

5-math technical report: Toward mathematical expert model via self-improvement , author=

Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement , author=. arXiv preprint arXiv:2409.12122 , year=

Pith/arXiv arXiv
[6]

International Conference on Learning Representations , volume=

Let's verify step by step , author=. International Conference on Learning Representations , volume=
[7]

Proceedings of the First Workshop on Neural Machine Translation , pages=

Beam search strategies for neural machine translation , author=. Proceedings of the First Workshop on Neural Machine Translation , pages=
[8]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Math-shepherd: Verify and reinforce llms step-by-step without human annotations , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[9]

arXiv preprint arXiv:2509.26578 , year=

Linking process to outcome: Conditional reward modeling for llm reasoning , author=. arXiv preprint arXiv:2509.26578 , year=

arXiv
[10]

arXiv preprint arXiv:2406.06592 , year=

Improve mathematical reasoning in language models by automated process supervision , author=. arXiv preprint arXiv:2406.06592 , year=

Pith/arXiv arXiv
[11]

International Conference on Learning Representations , volume=

Process reward model with q-value rankings , author=. International Conference on Learning Representations , volume=
[12]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

s1: Simple test-time scaling , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025
[13]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv
[14]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[15]

arXiv preprint arXiv:2203.11171 , year=

Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=

Pith/arXiv arXiv
[16]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=
[17]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

The lessons of developing process reward models in mathematical reasoning , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025
[18]

Show your work: Scratchpads for intermediate computation with language models , author=
[19]

arXiv preprint arXiv:2205.10625 , year=

Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=

Pith/arXiv arXiv
[20]

arXiv preprint arXiv:2506.22058 , year=

Lost at the beginning of reasoning , author=. arXiv preprint arXiv:2506.22058 , year=

arXiv
[21]

arXiv preprint arXiv:2509.21284 , year=

Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond , author=. arXiv preprint arXiv:2509.21284 , year=

arXiv
[22]

arXiv preprint arXiv:2402.03300 , year=

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

Pith/arXiv arXiv
[23]

The Fourteenth International Conference on Learning Representations , year=

Teach2Eval: An Interaction-Driven LLMs Evaluation Method via Teaching Effectiveness , author=. The Fourteenth International Conference on Learning Representations , year=
[24]

arXiv preprint arXiv:2307.09288 , year=

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

Pith/arXiv arXiv
[25]

arXiv preprint arXiv:2309.16609 , year=

Qwen technical report , author=. arXiv preprint arXiv:2309.16609 , year=

Pith/arXiv arXiv
[26]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Mario: Math reasoning with code interpreter output-a reproducible pipeline , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024
[27]

arXiv preprint arXiv:2103.03874 , year=

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

Pith/arXiv arXiv
[28]

International Conference on Learning Representations , volume=

Rewarding progress: Scaling automated process verifiers for llm reasoning , author=. International Conference on Learning Representations , volume=
[29]

the method of paired comparisons , author=

Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=

1952
[30]

Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022

2022
[31]

A Systematic Survey of Text Worlds as Embodied Natural Language Environments

Jansen, Peter. A Systematic Survey of Text Worlds as Embodied Natural Language Environments. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.1

work page doi:10.18653/v1/2022.wordplay-1.1 2022
[32]

A Minimal Computational Improviser Based on Oral Thought

Montfort, Nick and Bartlett Fernandez, Sebastian. A Minimal Computational Improviser Based on Oral Thought. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.2

work page doi:10.18653/v1/2022.wordplay-1.2 2022
[33]

Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code

Volum, Ryan and Rao, Sudha and Xu, Michael and DesGarennes, Gabriel and Brockett, Chris and Van Durme, Benjamin and Deng, Olivia and Malhotra, Akanksha and Dolan, Bill. Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. Proceedings of the 3rd Wordplay: When Language Meets Games Worksho...

work page doi:10.18653/v1/2022.wordplay-1.3 2022
[34]

A Sequence Modelling Approach to Question Answering in Text-Based Games

Furman, Gregory and Toledo, Edan and Shock, Jonathan and Buys, Jan. A Sequence Modelling Approach to Question Answering in Text-Based Games. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.4

work page doi:10.18653/v1/2022.wordplay-1.4 2022
[35]

Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents

Teodorescu, Laetitia and Yuan, Xingdi and C \^o t \'e , Marc-Alexandre and Oudeyer, Pierre-Yves. Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.5

work page doi:10.18653/v1/2022.wordplay-1.5 2022
[36]

Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022

2022
[37]

Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing

Yuan, Shuzhou and Maronikolakis, Antonis and Sch. Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.1

work page doi:10.18653/v1/2022.woah-1.1 2022
[38]

Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions

Ashida, Mana and Komachi, Mamoru. Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.2

work page doi:10.18653/v1/2022.woah-1.2 2022
[39]

G rease V ision: Rewriting the Rules of the Interface

Datta, Siddhartha and Kollnig, Konrad and Shadbolt, Nigel. G rease V ision: Rewriting the Rules of the Interface. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.3

work page doi:10.18653/v1/2022.woah-1.3 2022
[40]

Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation

Ludwig, Florian and Dolos, Klara and Zesch, Torsten and Hobley, Eleanor. Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.4

work page doi:10.18653/v1/2022.woah-1.4 2022
[41]

`` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch

Ruitenbeek, Ward and Zwart, Victor and Van Der Noord, Robin and Gnezdilov, Zhenja and Caselli, Tommaso. `` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.5

work page doi:10.18653/v1/2022.woah-1.5 2022
[42]

Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts

Goffredo, Pierpaolo and Basile, Valerio and Cepollaro, Bianca and Patti, Viviana. Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.6

work page doi:10.18653/v1/2022.woah-1.6 2022
[43]

S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes

Deshpande, Awantee and Ruiter, Dana and Mosbach, Marius and Klakow, Dietrich. S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.7

work page doi:10.18653/v1/2022.woah-1.7 2022
[44]

The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists

Lu, Christina and Jurgens, David. The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.8

work page doi:10.18653/v1/2022.woah-1.8 2022
[45]

Lost in Distillation: A Case Study in Toxicity Modeling

Chvasta, Alyssa and Lees, Alyssa and Sorensen, Jeffrey and Vasserman, Lucy and Goyal, Nitesh. Lost in Distillation: A Case Study in Toxicity Modeling. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.9

work page doi:10.18653/v1/2022.woah-1.9 2022
[46]

Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words

Stamou, Vivian and Alexiou, Iakovi and Klimi, Antigone and Molou, Eleftheria and Saivanidou, Alexandra and Markantonatou, Stella. Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.10

work page doi:10.18653/v1/2022.woah-1.10 2022
[47]

Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler

Israeli, Abraham and Tsur, Oren. Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.11

work page doi:10.18653/v1/2022.woah-1.11 2022
[48]

Resources for Multilingual Hate Speech Detection

Arango Monnar, Ayme and Perez, Jorge and Poblete, Barbara and Salda \ n a, Magdalena and Proust, Valentina. Resources for Multilingual Hate Speech Detection. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.12

work page doi:10.18653/v1/2022.woah-1.12 2022
[49]

Enriching Abusive Language Detection with Community Context

Saleem, Haji Mohammad and Kurrek, Jana and Ruths, Derek. Enriching Abusive Language Detection with Community Context. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.13

work page doi:10.18653/v1/2022.woah-1.13 2022
[50]

DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis

Demus, Christoph and Pitz, Jonas and Sch. DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.14

work page doi:10.18653/v1/2022.woah-1.14 2022
[51]

Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models

R. Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.15

work page doi:10.18653/v1/2022.woah-1.15 2022
[52]

Distributional properties of political dogwhistle representations in S wedish BERT

Hertzberg, Niclas and Cooper, Robin and Lindgren, Elina and R. Distributional properties of political dogwhistle representations in S wedish BERT. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.16

work page doi:10.18653/v1/2022.woah-1.16 2022
[53]

Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

Khurana, Urja and Vermeulen, Ivar and Nalisnick, Eric and Van Noorloos, Marloes and Fokkens, Antske. Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.17

work page doi:10.18653/v1/2022.woah-1.17 2022
[54]

Accounting for Offensive Speech as a Practice of Resistance

Diaz, Mark and Amironesei, Razvan and Weidinger, Laura and Gabriel, Iason. Accounting for Offensive Speech as a Practice of Resistance. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.18

work page doi:10.18653/v1/2022.woah-1.18 2022
[55]

Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging

Zheng, Joan and Friedman, Scott and Schmer-galunder, Sonja and Magnusson, Ian and Wheelock, Ruta and Gottlieb, Jeremy and Gomez, Diana and Miller, Christopher. Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:1...

work page doi:10.18653/v1/2022.woah-1.19 2022
[56]

Flexible text generation for counterfactual fairness probing

Fryer, Zee and Axelrod, Vera and Packer, Ben and Beutel, Alex and Chen, Jilin and Webster, Kellie. Flexible text generation for counterfactual fairness probing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.20

work page doi:10.18653/v1/2022.woah-1.20 2022
[57]

Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News

Moldovan, Andreea and Cs. Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.21

work page doi:10.18653/v1/2022.woah-1.21 2022
[58]

Targeted Identity Group Prediction in Hate Speech Corpora

Sachdeva, Pratik and Barreto, Renata and Von Vacano, Claudia and Kennedy, Chris. Targeted Identity Group Prediction in Hate Speech Corpora. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.22

work page doi:10.18653/v1/2022.woah-1.22 2022
[59]

Revisiting Queer Minorities in Lexicons

Ramesh, Krithika and Kumar, Sumeet and Khudabukhsh, Ashiqur. Revisiting Queer Minorities in Lexicons. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.23

work page doi:10.18653/v1/2022.woah-1.23 2022
[60]

HATE - ITA : Hate Speech Detection in I talian Social Media Text

Nozza, Debora and Bianchi, Federico and Attanasio, Giuseppe. HATE - ITA : Hate Speech Detection in I talian Social Media Text. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.24

work page doi:10.18653/v1/2022.woah-1.24 2022
[61]

Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[62]

Changes in Tweet Geolocation over Time: A Study with Carmen 2.0

Zhang, Jingyu and DeLucia, Alexandra and Dredze, Mark. Changes in Tweet Geolocation over Time: A Study with Carmen 2.0. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[63]

Extracting Mathematical Concepts from Text

Collard, Jacob and de Paiva, Valeria and Fong, Brendan and Subrahmanian, Eswaran. Extracting Mathematical Concepts from Text. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[64]

Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data

Ehghaghi, Malikeh and Rudzicz, Frank and Novikova, Jekaterina. Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[65]

Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis

Eggleston, Chloe and O ' Connor, Brendan. Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[66]

Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?

Novikova, Jekaterina. Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[67]

Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets

Pranesh, Raj. Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[68]

NTULM : Enriching Social Media Text Representations with Non-Textual Units

Li, Jinning and Mishra, Shubhanshu and El-Kishky, Ahmed and Mehta, Sneha and Kulkarni, Vivek. NTULM : Enriching Social Media Text Representations with Non-Textual Units. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[69]

Robust Candidate Generation for Entity Linking on Short Social Media Texts

Hebert, Liam and Makki, Raheleh and Mishra, Shubhanshu and Saghir, Hamidreza and Kamath, Anusha and Merhav, Yuval. Robust Candidate Generation for Entity Linking on Short Social Media Texts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[70]

T rans POS : Transformers for Consolidating Different POS Tagset Datasets

Li, Alex and Bankole-Hameed, Ilyas and Singh, Ranadeep and Ng, Gabriel and Gupta, Akshat. T rans POS : Transformers for Consolidating Different POS Tagset Datasets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[71]

An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Fu, Xue-Yong and Chen, Cheng and Laskar, Md Tahmid Rahman and Tn, Shashi Bhushan and Corston-Oliver, Simon. An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[72]

Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification

Khan, Jawad and Ahmad, Niaz and Alam, Aftab and Lee, Youngmoon. Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[73]

An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues

Labat, Sofie and Hadifar, Amir and Demeester, Thomas and Hoste, Veronique. An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[74]

Supervised and Unsupervised Evaluation of Synthetic Code-Switching

Orlov, Evgeny and Artemova, Ekaterina. Supervised and Unsupervised Evaluation of Synthetic Code-Switching. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[75]

A rab G end: Gender Analysis and Inference on A rabic T witter

Mubarak, Hamdy and Chowdhury, Shammur Absar and Alam, Firoj. A rab G end: Gender Analysis and Inference on A rabic T witter. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[76]

Automatic Identification of 5 C Vaccine Behaviour on Social Media

Sampath Kumar, Ajay Hemanth and Shausan, Aminath and Demartini, Gianluca and Rahimi, Afshin. Automatic Identification of 5 C Vaccine Behaviour on Social Media. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[77]

Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports

Dimeski, Adam and Rahimi, Afshin. Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[78]

`` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data

S, Sumukh and Shrivastava, Manish. `` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[79]

Span Extraction Aided Improved Code-mixed Sentiment Classification

S, Ramaneswaran and Benhur, Sean and Ghosh, Sreyan. Span Extraction Aided Improved Code-mixed Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022
[80]

A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements

Das, Debarati and Chenchu, Roopana and Abdollahi, Maral and Huh, Jisu and Srivastava, Jaideep. A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

Showing first 80 references.

[1] [1]

Advances in Neural Information Processing Systems , volume=

Bread: Branched rollouts from expert anchors bridge sft & rl for reasoning , author=. Advances in Neural Information Processing Systems , volume=

[2] [2]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[3] [3]

Advances in Neural Information Processing Systems , volume=

Self-evaluation guided beam search for reasoning , author=. Advances in Neural Information Processing Systems , volume=

[4] [4]

arXiv preprint arXiv:2601.20829 , year=

Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning , author=. arXiv preprint arXiv:2601.20829 , year=

Pith/arXiv arXiv

[5] [5]

5-math technical report: Toward mathematical expert model via self-improvement , author=

Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement , author=. arXiv preprint arXiv:2409.12122 , year=

Pith/arXiv arXiv

[6] [6]

International Conference on Learning Representations , volume=

Let's verify step by step , author=. International Conference on Learning Representations , volume=

[7] [7]

Proceedings of the First Workshop on Neural Machine Translation , pages=

Beam search strategies for neural machine translation , author=. Proceedings of the First Workshop on Neural Machine Translation , pages=

[8] [8]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Math-shepherd: Verify and reinforce llms step-by-step without human annotations , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[9] [9]

arXiv preprint arXiv:2509.26578 , year=

Linking process to outcome: Conditional reward modeling for llm reasoning , author=. arXiv preprint arXiv:2509.26578 , year=

arXiv

[10] [10]

arXiv preprint arXiv:2406.06592 , year=

Improve mathematical reasoning in language models by automated process supervision , author=. arXiv preprint arXiv:2406.06592 , year=

Pith/arXiv arXiv

[11] [11]

International Conference on Learning Representations , volume=

Process reward model with q-value rankings , author=. International Conference on Learning Representations , volume=

[12] [12]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

s1: Simple test-time scaling , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

2025

[13] [13]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv

[14] [14]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[15] [15]

arXiv preprint arXiv:2203.11171 , year=

Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=

Pith/arXiv arXiv

[16] [16]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=

[17] [17]

Findings of the Association for Computational Linguistics: ACL 2025 , pages=

The lessons of developing process reward models in mathematical reasoning , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=

2025

[18] [18]

Show your work: Scratchpads for intermediate computation with language models , author=

[19] [19]

arXiv preprint arXiv:2205.10625 , year=

Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=

Pith/arXiv arXiv

[20] [20]

arXiv preprint arXiv:2506.22058 , year=

Lost at the beginning of reasoning , author=. arXiv preprint arXiv:2506.22058 , year=

arXiv

[21] [21]

arXiv preprint arXiv:2509.21284 , year=

Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond , author=. arXiv preprint arXiv:2509.21284 , year=

arXiv

[22] [22]

arXiv preprint arXiv:2402.03300 , year=

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

Pith/arXiv arXiv

[23] [23]

The Fourteenth International Conference on Learning Representations , year=

Teach2Eval: An Interaction-Driven LLMs Evaluation Method via Teaching Effectiveness , author=. The Fourteenth International Conference on Learning Representations , year=

[24] [24]

arXiv preprint arXiv:2307.09288 , year=

Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

Pith/arXiv arXiv

[25] [25]

arXiv preprint arXiv:2309.16609 , year=

Qwen technical report , author=. arXiv preprint arXiv:2309.16609 , year=

Pith/arXiv arXiv

[26] [26]

Findings of the Association for Computational Linguistics: ACL 2024 , pages=

Mario: Math reasoning with code interpreter output-a reproducible pipeline , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=

2024

[27] [27]

arXiv preprint arXiv:2103.03874 , year=

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

Pith/arXiv arXiv

[28] [28]

International Conference on Learning Representations , volume=

Rewarding progress: Scaling automated process verifiers for llm reasoning , author=. International Conference on Learning Representations , volume=

[29] [29]

the method of paired comparisons , author=

Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=

1952

[30] [30]

Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022

2022

[31] [31]

A Systematic Survey of Text Worlds as Embodied Natural Language Environments

Jansen, Peter. A Systematic Survey of Text Worlds as Embodied Natural Language Environments. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.1

work page doi:10.18653/v1/2022.wordplay-1.1 2022

[32] [32]

A Minimal Computational Improviser Based on Oral Thought

Montfort, Nick and Bartlett Fernandez, Sebastian. A Minimal Computational Improviser Based on Oral Thought. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.2

work page doi:10.18653/v1/2022.wordplay-1.2 2022

[33] [33]

Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code

Volum, Ryan and Rao, Sudha and Xu, Michael and DesGarennes, Gabriel and Brockett, Chris and Van Durme, Benjamin and Deng, Olivia and Malhotra, Akanksha and Dolan, Bill. Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. Proceedings of the 3rd Wordplay: When Language Meets Games Worksho...

work page doi:10.18653/v1/2022.wordplay-1.3 2022

[34] [34]

A Sequence Modelling Approach to Question Answering in Text-Based Games

Furman, Gregory and Toledo, Edan and Shock, Jonathan and Buys, Jan. A Sequence Modelling Approach to Question Answering in Text-Based Games. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.4

work page doi:10.18653/v1/2022.wordplay-1.4 2022

[35] [35]

Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents

Teodorescu, Laetitia and Yuan, Xingdi and C \^o t \'e , Marc-Alexandre and Oudeyer, Pierre-Yves. Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.5

work page doi:10.18653/v1/2022.wordplay-1.5 2022

[36] [36]

Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022

2022

[37] [37]

Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing

Yuan, Shuzhou and Maronikolakis, Antonis and Sch. Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.1

work page doi:10.18653/v1/2022.woah-1.1 2022

[38] [38]

Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions

Ashida, Mana and Komachi, Mamoru. Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.2

work page doi:10.18653/v1/2022.woah-1.2 2022

[39] [39]

G rease V ision: Rewriting the Rules of the Interface

Datta, Siddhartha and Kollnig, Konrad and Shadbolt, Nigel. G rease V ision: Rewriting the Rules of the Interface. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.3

work page doi:10.18653/v1/2022.woah-1.3 2022

[40] [40]

Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation

Ludwig, Florian and Dolos, Klara and Zesch, Torsten and Hobley, Eleanor. Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.4

work page doi:10.18653/v1/2022.woah-1.4 2022

[41] [41]

`` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch

Ruitenbeek, Ward and Zwart, Victor and Van Der Noord, Robin and Gnezdilov, Zhenja and Caselli, Tommaso. `` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.5

work page doi:10.18653/v1/2022.woah-1.5 2022

[42] [42]

Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts

Goffredo, Pierpaolo and Basile, Valerio and Cepollaro, Bianca and Patti, Viviana. Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.6

work page doi:10.18653/v1/2022.woah-1.6 2022

[43] [43]

S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes

Deshpande, Awantee and Ruiter, Dana and Mosbach, Marius and Klakow, Dietrich. S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.7

work page doi:10.18653/v1/2022.woah-1.7 2022

[44] [44]

The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists

Lu, Christina and Jurgens, David. The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.8

work page doi:10.18653/v1/2022.woah-1.8 2022

[45] [45]

Lost in Distillation: A Case Study in Toxicity Modeling

Chvasta, Alyssa and Lees, Alyssa and Sorensen, Jeffrey and Vasserman, Lucy and Goyal, Nitesh. Lost in Distillation: A Case Study in Toxicity Modeling. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.9

work page doi:10.18653/v1/2022.woah-1.9 2022

[46] [46]

Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words

Stamou, Vivian and Alexiou, Iakovi and Klimi, Antigone and Molou, Eleftheria and Saivanidou, Alexandra and Markantonatou, Stella. Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.10

work page doi:10.18653/v1/2022.woah-1.10 2022

[47] [47]

Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler

Israeli, Abraham and Tsur, Oren. Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.11

work page doi:10.18653/v1/2022.woah-1.11 2022

[48] [48]

Resources for Multilingual Hate Speech Detection

Arango Monnar, Ayme and Perez, Jorge and Poblete, Barbara and Salda \ n a, Magdalena and Proust, Valentina. Resources for Multilingual Hate Speech Detection. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.12

work page doi:10.18653/v1/2022.woah-1.12 2022

[49] [49]

Enriching Abusive Language Detection with Community Context

Saleem, Haji Mohammad and Kurrek, Jana and Ruths, Derek. Enriching Abusive Language Detection with Community Context. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.13

work page doi:10.18653/v1/2022.woah-1.13 2022

[50] [50]

DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis

Demus, Christoph and Pitz, Jonas and Sch. DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.14

work page doi:10.18653/v1/2022.woah-1.14 2022

[51] [51]

Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models

R. Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.15

work page doi:10.18653/v1/2022.woah-1.15 2022

[52] [52]

Distributional properties of political dogwhistle representations in S wedish BERT

Hertzberg, Niclas and Cooper, Robin and Lindgren, Elina and R. Distributional properties of political dogwhistle representations in S wedish BERT. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.16

work page doi:10.18653/v1/2022.woah-1.16 2022

[53] [53]

Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

Khurana, Urja and Vermeulen, Ivar and Nalisnick, Eric and Van Noorloos, Marloes and Fokkens, Antske. Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.17

work page doi:10.18653/v1/2022.woah-1.17 2022

[54] [54]

Accounting for Offensive Speech as a Practice of Resistance

Diaz, Mark and Amironesei, Razvan and Weidinger, Laura and Gabriel, Iason. Accounting for Offensive Speech as a Practice of Resistance. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.18

work page doi:10.18653/v1/2022.woah-1.18 2022

[55] [55]

Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging

Zheng, Joan and Friedman, Scott and Schmer-galunder, Sonja and Magnusson, Ian and Wheelock, Ruta and Gottlieb, Jeremy and Gomez, Diana and Miller, Christopher. Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:1...

work page doi:10.18653/v1/2022.woah-1.19 2022

[56] [56]

Flexible text generation for counterfactual fairness probing

Fryer, Zee and Axelrod, Vera and Packer, Ben and Beutel, Alex and Chen, Jilin and Webster, Kellie. Flexible text generation for counterfactual fairness probing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.20

work page doi:10.18653/v1/2022.woah-1.20 2022

[57] [57]

Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News

Moldovan, Andreea and Cs. Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.21

work page doi:10.18653/v1/2022.woah-1.21 2022

[58] [58]

Targeted Identity Group Prediction in Hate Speech Corpora

Sachdeva, Pratik and Barreto, Renata and Von Vacano, Claudia and Kennedy, Chris. Targeted Identity Group Prediction in Hate Speech Corpora. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.22

work page doi:10.18653/v1/2022.woah-1.22 2022

[59] [59]

Revisiting Queer Minorities in Lexicons

Ramesh, Krithika and Kumar, Sumeet and Khudabukhsh, Ashiqur. Revisiting Queer Minorities in Lexicons. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.23

work page doi:10.18653/v1/2022.woah-1.23 2022

[60] [60]

HATE - ITA : Hate Speech Detection in I talian Social Media Text

Nozza, Debora and Bianchi, Federico and Attanasio, Giuseppe. HATE - ITA : Hate Speech Detection in I talian Social Media Text. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.24

work page doi:10.18653/v1/2022.woah-1.24 2022

[61] [61]

Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[62] [62]

Changes in Tweet Geolocation over Time: A Study with Carmen 2.0

Zhang, Jingyu and DeLucia, Alexandra and Dredze, Mark. Changes in Tweet Geolocation over Time: A Study with Carmen 2.0. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[63] [63]

Extracting Mathematical Concepts from Text

Collard, Jacob and de Paiva, Valeria and Fong, Brendan and Subrahmanian, Eswaran. Extracting Mathematical Concepts from Text. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[64] [64]

Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data

Ehghaghi, Malikeh and Rudzicz, Frank and Novikova, Jekaterina. Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[65] [65]

Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis

Eggleston, Chloe and O ' Connor, Brendan. Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[66] [66]

Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?

Novikova, Jekaterina. Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[67] [67]

Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets

Pranesh, Raj. Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[68] [68]

NTULM : Enriching Social Media Text Representations with Non-Textual Units

Li, Jinning and Mishra, Shubhanshu and El-Kishky, Ahmed and Mehta, Sneha and Kulkarni, Vivek. NTULM : Enriching Social Media Text Representations with Non-Textual Units. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[69] [69]

Robust Candidate Generation for Entity Linking on Short Social Media Texts

Hebert, Liam and Makki, Raheleh and Mishra, Shubhanshu and Saghir, Hamidreza and Kamath, Anusha and Merhav, Yuval. Robust Candidate Generation for Entity Linking on Short Social Media Texts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[70] [70]

T rans POS : Transformers for Consolidating Different POS Tagset Datasets

Li, Alex and Bankole-Hameed, Ilyas and Singh, Ranadeep and Ng, Gabriel and Gupta, Akshat. T rans POS : Transformers for Consolidating Different POS Tagset Datasets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[71] [71]

An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts

Fu, Xue-Yong and Chen, Cheng and Laskar, Md Tahmid Rahman and Tn, Shashi Bhushan and Corston-Oliver, Simon. An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[72] [72]

Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification

Khan, Jawad and Ahmad, Niaz and Alam, Aftab and Lee, Youngmoon. Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[73] [73]

An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues

Labat, Sofie and Hadifar, Amir and Demeester, Thomas and Hoste, Veronique. An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[74] [74]

Supervised and Unsupervised Evaluation of Synthetic Code-Switching

Orlov, Evgeny and Artemova, Ekaterina. Supervised and Unsupervised Evaluation of Synthetic Code-Switching. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[75] [75]

A rab G end: Gender Analysis and Inference on A rabic T witter

Mubarak, Hamdy and Chowdhury, Shammur Absar and Alam, Firoj. A rab G end: Gender Analysis and Inference on A rabic T witter. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[76] [76]

Automatic Identification of 5 C Vaccine Behaviour on Social Media

Sampath Kumar, Ajay Hemanth and Shausan, Aminath and Demartini, Gianluca and Rahimi, Afshin. Automatic Identification of 5 C Vaccine Behaviour on Social Media. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[77] [77]

Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports

Dimeski, Adam and Rahimi, Afshin. Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[78] [78]

`` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data

S, Sumukh and Shrivastava, Manish. `` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[79] [79]

Span Extraction Aided Improved Code-mixed Sentiment Classification

S, Ramaneswaran and Benhur, Sean and Ghosh, Sreyan. Span Extraction Aided Improved Code-mixed Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022

[80] [80]

A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements

Das, Debarati and Chenchu, Roopana and Abdollahi, Maral and Huh, Jisu and Srivastava, Jaideep. A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022

2022