From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning
Pith reviewed 2026-06-27 21:44 UTC · model grok-4.3
The pith
Prefix gain measured via student solve-rate improvements trains a utility model that outperforms correctness-based rewards for guiding LLM reasoning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Prefix gain is defined as the solve-rate improvement obtained by conditioning a lightweight student model group on a given prefix; a Prefix Utility Model trained with pairwise ranking on these gains learns to score both complete and partial reasoning steps and supplies a stronger prefix-level signal than local correctness in Best-of-N selection, beam search, and reinforcement learning on mathematical reasoning tasks.
What carries the argument
Prefix gain, the solve-rate improvement induced by conditioning a lightweight student model group on a prefix, used as the training target for a Prefix Utility Model via pairwise ranking.
If this is right
- PUM can be used to rank or prune partial trajectories during search without waiting for final answers.
- Performance gains increase as the number of candidate prefixes grows or when outcome-based rewards are unavailable.
- The same model works for both complete trajectories and incomplete prefixes without retraining.
- Reinforcement learning with PUM rewards improves sample efficiency when rule-based rewards are sparse.
Where Pith is reading between the lines
- The approach may reduce the need for expensive human or rule-based process annotations by substituting cheap student-model rollouts.
- If student models are chosen from the same family as the target LLM, the proxy may become tighter but could also introduce distribution shift issues at scale.
- The method suggests a general recipe for turning any cheap proxy solver into a utility labeler for more expensive target models.
Load-bearing premise
That the solve-rate improvement observed when conditioning student models on a prefix is a faithful proxy for how useful that prefix is to the target LLM.
What would settle it
Measure the correlation between PUM scores and actual solve rates on a large held-out set of prefixes; if the correlation is near zero or negative while correctness-based scores remain positive, the utility claim is falsified.
Figures
read the original abstract
Reasoning prefixes shape the future trajectory of LLM problem solving, yet existing process reward models usually evaluate them through local step correctness. We argue that correctness is a useful but indirect proxy for the effect we ultimately care about: whether a prefix increases the probability of successful completion. We define this effect as prefix gain, the solve-rate improvement induced by conditioning lightweight student model group on a prefix, and use it to train a Prefix Utility Model (PUM) with a simple pairwise ranking objective. PUM learns outcome-grounded prefix utility and can score both complete trajectories and partial reasoning prefixes. Across Best-of-$N$ selection, beam search, and reinforcement learning on mathematical reasoning, PUM provides a strong prefix-level supervision signal, especially when candidate pools are large, search budgets increase, or rule-based rewards are sparse. We release all data, models, and code at https://zhiqix.github.io/pum-project-page.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that process reward models relying on local step correctness are indirect proxies for the desired outcome of increased solve probability. It defines prefix gain as the solve-rate delta induced by conditioning a group of lightweight student models on a given prefix, trains a Prefix Utility Model (PUM) via pairwise ranking on these gains, and reports that the resulting PUM supplies effective prefix-level supervision for Best-of-N, beam search, and RL on mathematical reasoning tasks, with larger gains under big candidate pools, higher search budgets, or sparse rule-based rewards. All data, models, and code are released.
Significance. If the student-model proxy is shown to correlate with target-LLM utility, the work supplies an outcome-grounded alternative to correctness-based process rewards and could improve search and RL in sparse-reward reasoning settings. The public release of artifacts is a clear strength that supports reproducibility.
major comments (2)
- [Abstract / Method description] The central claim that PUM supplies useful supervision for the target LLM rests on the untested assumption that prefix gain measured on the student-model group correlates with the change in solve probability the same prefix would induce in the target model. No section reports a direct validation (rank correlation, calibration plot, or matched-prefix experiment) of student gain versus target gain.
- [Experiments] Downstream improvements in Best-of-N, beam search, and RL are presented as evidence for PUM utility, yet these results are consistent with but do not isolate the proxy assumption; without an ablation that replaces the student-derived labels with target-derived labels or measures proxy fidelity, it remains unclear whether the reported gains stem from the gain-based formulation or from other factors.
minor comments (1)
- [Method] Notation for the student-model group and the exact definition of solve-rate delta should be formalized with an equation early in the method section to avoid ambiguity when the same prefix is scored by PUM versus the original student ensemble.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the proxy assumption and experimental isolation. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract / Method description] The central claim that PUM supplies useful supervision for the target LLM rests on the untested assumption that prefix gain measured on the student-model group correlates with the change in solve probability the same prefix would induce in the target model. No section reports a direct validation (rank correlation, calibration plot, or matched-prefix experiment) of student gain versus target gain.
Authors: We acknowledge that the manuscript does not include a direct validation (such as rank correlation or matched-prefix experiments) of how well student-model prefix gains predict target-model gains. The student-model group was selected specifically to make large-scale prefix-gain computation tractable; repeating the same measurements on the target model would be substantially more expensive. Downstream gains on the target LLM offer indirect support for transfer, yet we agree these do not constitute a direct test of the proxy. In revision we will add an explicit limitations paragraph discussing the untested correlation and will report a small-scale correlation study on a held-out subset of prefixes if compute permits. revision: partial
-
Referee: [Experiments] Downstream improvements in Best-of-N, beam search, and RL are presented as evidence for PUM utility, yet these results are consistent with but do not isolate the proxy assumption; without an ablation that replaces the student-derived labels with target-derived labels or measures proxy fidelity, it remains unclear whether the reported gains stem from the gain-based formulation or from other factors.
Authors: The referee is correct that the reported improvements do not isolate the student-proxy contribution from other design choices. An ablation that substitutes target-derived labels would directly address this but is currently infeasible at the scale of our experiments due to the computational cost of evaluating thousands of prefixes on the target model. The existing results show consistent gains across search and RL settings, especially under large candidate pools and sparse rewards, but we will revise the text to state the claims more narrowly and to flag the missing proxy-fidelity ablation as an important direction for follow-up work. revision: partial
Circularity Check
No circularity: prefix gain defined via independent student-model proxy
full rationale
The paper defines prefix gain explicitly as the solve-rate improvement measured on a separate lightweight student model group, then trains PUM via pairwise ranking on those externally computed deltas. This construction uses an independent measurement process rather than fitting or deriving the quantity from the target LLM's own outputs or trajectories. No equations, self-citations, or uniqueness claims in the provided text reduce the central claim to a definitional loop or fitted input renamed as prediction. Downstream evaluations on Best-of-N, beam search, and RL for the target model constitute external validation of the proxy assumption, not a self-referential derivation. The method is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Solve-rate improvement induced by conditioning lightweight student models on a prefix is a reliable measure of that prefix’s utility for successful completion.
invented entities (1)
-
Prefix Utility Model (PUM)
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Advances in Neural Information Processing Systems , volume=
Bread: Branched rollouts from expert anchors bridge sft & rl for reasoning , author=. Advances in Neural Information Processing Systems , volume=
-
[2]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
-
[3]
Advances in Neural Information Processing Systems , volume=
Self-evaluation guided beam search for reasoning , author=. Advances in Neural Information Processing Systems , volume=
-
[4]
arXiv preprint arXiv:2601.20829 , year=
Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning , author=. arXiv preprint arXiv:2601.20829 , year=
-
[5]
5-math technical report: Toward mathematical expert model via self-improvement , author=
Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement , author=. arXiv preprint arXiv:2409.12122 , year=
-
[6]
International Conference on Learning Representations , volume=
Let's verify step by step , author=. International Conference on Learning Representations , volume=
-
[7]
Proceedings of the First Workshop on Neural Machine Translation , pages=
Beam search strategies for neural machine translation , author=. Proceedings of the First Workshop on Neural Machine Translation , pages=
-
[8]
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
Math-shepherd: Verify and reinforce llms step-by-step without human annotations , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[9]
arXiv preprint arXiv:2509.26578 , year=
Linking process to outcome: Conditional reward modeling for llm reasoning , author=. arXiv preprint arXiv:2509.26578 , year=
-
[10]
arXiv preprint arXiv:2406.06592 , year=
Improve mathematical reasoning in language models by automated process supervision , author=. arXiv preprint arXiv:2406.06592 , year=
-
[11]
International Conference on Learning Representations , volume=
Process reward model with q-value rankings , author=. International Conference on Learning Representations , volume=
-
[12]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
s1: Simple test-time scaling , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
2025
-
[13]
arXiv preprint arXiv:2505.09388 , year=
Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=
-
[14]
Advances in neural information processing systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
-
[15]
arXiv preprint arXiv:2203.11171 , year=
Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=
-
[16]
Advances in neural information processing systems , volume=
Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=
-
[17]
Findings of the Association for Computational Linguistics: ACL 2025 , pages=
The lessons of developing process reward models in mathematical reasoning , author=. Findings of the Association for Computational Linguistics: ACL 2025 , pages=
2025
-
[18]
Show your work: Scratchpads for intermediate computation with language models , author=
-
[19]
arXiv preprint arXiv:2205.10625 , year=
Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=
-
[20]
arXiv preprint arXiv:2506.22058 , year=
Lost at the beginning of reasoning , author=. arXiv preprint arXiv:2506.22058 , year=
-
[21]
arXiv preprint arXiv:2509.21284 , year=
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond , author=. arXiv preprint arXiv:2509.21284 , year=
-
[22]
arXiv preprint arXiv:2402.03300 , year=
Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=
-
[23]
The Fourteenth International Conference on Learning Representations , year=
Teach2Eval: An Interaction-Driven LLMs Evaluation Method via Teaching Effectiveness , author=. The Fourteenth International Conference on Learning Representations , year=
-
[24]
arXiv preprint arXiv:2307.09288 , year=
Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=
-
[25]
arXiv preprint arXiv:2309.16609 , year=
Qwen technical report , author=. arXiv preprint arXiv:2309.16609 , year=
-
[26]
Findings of the Association for Computational Linguistics: ACL 2024 , pages=
Mario: Math reasoning with code interpreter output-a reproducible pipeline , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=
2024
-
[27]
arXiv preprint arXiv:2103.03874 , year=
Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=
-
[28]
International Conference on Learning Representations , volume=
Rewarding progress: Scaling automated process verifiers for llm reasoning , author=. International Conference on Learning Representations , volume=
-
[29]
the method of paired comparisons , author=
Rank analysis of incomplete block designs: I. the method of paired comparisons , author=. Biometrika , volume=. 1952 , publisher=
1952
-
[30]
Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022
2022
-
[31]
A Systematic Survey of Text Worlds as Embodied Natural Language Environments
Jansen, Peter. A Systematic Survey of Text Worlds as Embodied Natural Language Environments. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.1
-
[32]
A Minimal Computational Improviser Based on Oral Thought
Montfort, Nick and Bartlett Fernandez, Sebastian. A Minimal Computational Improviser Based on Oral Thought. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.2
-
[33]
Volum, Ryan and Rao, Sudha and Xu, Michael and DesGarennes, Gabriel and Brockett, Chris and Van Durme, Benjamin and Deng, Olivia and Malhotra, Akanksha and Dolan, Bill. Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. Proceedings of the 3rd Wordplay: When Language Meets Games Worksho...
-
[34]
A Sequence Modelling Approach to Question Answering in Text-Based Games
Furman, Gregory and Toledo, Edan and Shock, Jonathan and Buys, Jan. A Sequence Modelling Approach to Question Answering in Text-Based Games. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.4
-
[35]
Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents
Teodorescu, Laetitia and Yuan, Xingdi and C \^o t \'e , Marc-Alexandre and Oudeyer, Pierre-Yves. Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents. Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 2022. doi:10.18653/v1/2022.wordplay-1.5
-
[36]
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022
2022
-
[37]
Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing
Yuan, Shuzhou and Maronikolakis, Antonis and Sch. Separating Hate Speech and Offensive Language Classes via Adversarial Debiasing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.1
-
[38]
Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions
Ashida, Mana and Komachi, Mamoru. Towards Automatic Generation of Messages Countering Online Hate Speech and Microaggressions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.2
-
[39]
G rease V ision: Rewriting the Rules of the Interface
Datta, Siddhartha and Kollnig, Konrad and Shadbolt, Nigel. G rease V ision: Rewriting the Rules of the Interface. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.3
-
[40]
Ludwig, Florian and Dolos, Klara and Zesch, Torsten and Hobley, Eleanor. Improving Generalization of Hate Speech Detection Systems to Novel Target Groups via Domain Adaptation. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.4
-
[41]
`` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch
Ruitenbeek, Ward and Zwart, Victor and Van Der Noord, Robin and Gnezdilov, Zhenja and Caselli, Tommaso. `` Zo Grof ! '' : A Comprehensive Corpus for Offensive and Abusive Language in D utch. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.5
-
[42]
Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts
Goffredo, Pierpaolo and Basile, Valerio and Cepollaro, Bianca and Patti, Viviana. Counter- TWIT : An I talian Corpus for Online Counterspeech in Ecological Contexts. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.6
-
[43]
S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes
Deshpande, Awantee and Ruiter, Dana and Mosbach, Marius and Klakow, Dietrich. S tereo KG : Data-Driven Knowledge Graph Construction For Cultural Knowledge and Stereotypes. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.7
-
[44]
Lu, Christina and Jurgens, David. The subtle language of exclusion: Identifying the Toxic Speech of Trans-exclusionary Radical Feminists. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.8
-
[45]
Lost in Distillation: A Case Study in Toxicity Modeling
Chvasta, Alyssa and Lees, Alyssa and Sorensen, Jeffrey and Vasserman, Lucy and Goyal, Nitesh. Lost in Distillation: A Case Study in Toxicity Modeling. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.9
-
[46]
Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words
Stamou, Vivian and Alexiou, Iakovi and Klimi, Antigone and Molou, Eleftheria and Saivanidou, Alexandra and Markantonatou, Stella. Cleansing & expanding the HURTLEX (el) with a multidimensional categorization of offensive words. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.10
-
[47]
Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler
Israeli, Abraham and Tsur, Oren. Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.11
-
[48]
Resources for Multilingual Hate Speech Detection
Arango Monnar, Ayme and Perez, Jorge and Poblete, Barbara and Salda \ n a, Magdalena and Proust, Valentina. Resources for Multilingual Hate Speech Detection. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.12
-
[49]
Enriching Abusive Language Detection with Community Context
Saleem, Haji Mohammad and Kurrek, Jana and Ruths, Derek. Enriching Abusive Language Detection with Community Context. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.13
-
[50]
DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis
Demus, Christoph and Pitz, Jonas and Sch. DeTox: A Comprehensive Dataset for G erman Offensive Language and Conversation Analysis. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.14
-
[51]
Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models
R. Multilingual H ate C heck: Functional Tests for Multilingual Hate Speech Detection Models. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.15
-
[52]
Distributional properties of political dogwhistle representations in S wedish BERT
Hertzberg, Niclas and Cooper, Robin and Lindgren, Elina and R. Distributional properties of political dogwhistle representations in S wedish BERT. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.16
-
[53]
Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Khurana, Urja and Vermeulen, Ivar and Nalisnick, Eric and Van Noorloos, Marloes and Fokkens, Antske. Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.17
-
[54]
Accounting for Offensive Speech as a Practice of Resistance
Diaz, Mark and Amironesei, Razvan and Weidinger, Laura and Gabriel, Iason. Accounting for Offensive Speech as a Practice of Resistance. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.18
-
[55]
Zheng, Joan and Friedman, Scott and Schmer-galunder, Sonja and Magnusson, Ian and Wheelock, Ruta and Gottlieb, Jeremy and Gomez, Diana and Miller, Christopher. Towards a Multi-Entity Aspect-Based Sentiment Analysis for Characterizing Directed Social Regard in Online Messaging. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:1...
-
[56]
Flexible text generation for counterfactual fairness probing
Fryer, Zee and Axelrod, Vera and Packer, Ben and Beutel, Alex and Chen, Jilin and Webster, Kellie. Flexible text generation for counterfactual fairness probing. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.20
-
[57]
Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News
Moldovan, Andreea and Cs. Users Hate Blondes: Detecting Sexism in User Comments on Online R omanian News. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.21
-
[58]
Targeted Identity Group Prediction in Hate Speech Corpora
Sachdeva, Pratik and Barreto, Renata and Von Vacano, Claudia and Kennedy, Chris. Targeted Identity Group Prediction in Hate Speech Corpora. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.22
-
[59]
Revisiting Queer Minorities in Lexicons
Ramesh, Krithika and Kumar, Sumeet and Khudabukhsh, Ashiqur. Revisiting Queer Minorities in Lexicons. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.23
-
[60]
HATE - ITA : Hate Speech Detection in I talian Social Media Text
Nozza, Debora and Bianchi, Federico and Attanasio, Giuseppe. HATE - ITA : Hate Speech Detection in I talian Social Media Text. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022. doi:10.18653/v1/2022.woah-1.24
-
[61]
Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[62]
Changes in Tweet Geolocation over Time: A Study with Carmen 2.0
Zhang, Jingyu and DeLucia, Alexandra and Dredze, Mark. Changes in Tweet Geolocation over Time: A Study with Carmen 2.0. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[63]
Extracting Mathematical Concepts from Text
Collard, Jacob and de Paiva, Valeria and Fong, Brendan and Subrahmanian, Eswaran. Extracting Mathematical Concepts from Text. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[64]
Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data
Ehghaghi, Malikeh and Rudzicz, Frank and Novikova, Jekaterina. Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[65]
Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis
Eggleston, Chloe and O ' Connor, Brendan. Cross-Dialect Social Media Dependency Parsing for Social Scientific Entity Attribute Analysis. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[66]
Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?
Novikova, Jekaterina. Impact of Environmental Noise on A lzheimer ' s Disease Detection from Speech: Should You Let a Baby Cry?. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[67]
Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets
Pranesh, Raj. Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[68]
NTULM : Enriching Social Media Text Representations with Non-Textual Units
Li, Jinning and Mishra, Shubhanshu and El-Kishky, Ahmed and Mehta, Sneha and Kulkarni, Vivek. NTULM : Enriching Social Media Text Representations with Non-Textual Units. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[69]
Robust Candidate Generation for Entity Linking on Short Social Media Texts
Hebert, Liam and Makki, Raheleh and Mishra, Shubhanshu and Saghir, Hamidreza and Kamath, Anusha and Merhav, Yuval. Robust Candidate Generation for Entity Linking on Short Social Media Texts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[70]
T rans POS : Transformers for Consolidating Different POS Tagset Datasets
Li, Alex and Bankole-Hameed, Ilyas and Singh, Ranadeep and Ng, Gabriel and Gupta, Akshat. T rans POS : Transformers for Consolidating Different POS Tagset Datasets. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[71]
An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts
Fu, Xue-Yong and Chen, Cheng and Laskar, Md Tahmid Rahman and Tn, Shashi Bhushan and Corston-Oliver, Simon. An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[72]
Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification
Khan, Jawad and Ahmad, Niaz and Alam, Aftab and Lee, Youngmoon. Leveraging Semantic and Sentiment Knowledge for User-Generated Text Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[73]
An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues
Labat, Sofie and Hadifar, Amir and Demeester, Thomas and Hoste, Veronique. An Emotional Journey: Detecting Emotion Trajectories in D utch Customer Service Dialogues. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[74]
Supervised and Unsupervised Evaluation of Synthetic Code-Switching
Orlov, Evgeny and Artemova, Ekaterina. Supervised and Unsupervised Evaluation of Synthetic Code-Switching. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[75]
A rab G end: Gender Analysis and Inference on A rabic T witter
Mubarak, Hamdy and Chowdhury, Shammur Absar and Alam, Firoj. A rab G end: Gender Analysis and Inference on A rabic T witter. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[76]
Automatic Identification of 5 C Vaccine Behaviour on Social Media
Sampath Kumar, Ajay Hemanth and Shausan, Aminath and Demartini, Gianluca and Rahimi, Afshin. Automatic Identification of 5 C Vaccine Behaviour on Social Media. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[77]
Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports
Dimeski, Adam and Rahimi, Afshin. Automatic Extraction of Structured Mineral Drillhole Results from Unstructured Mining Company Reports. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[78]
`` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data
S, Sumukh and Shrivastava, Manish. `` Kanglish alli names! '' Named Entity Recognition for K annada- E nglish Code-Mixed Social Media Data. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[79]
Span Extraction Aided Improved Code-mixed Sentiment Classification
S, Ramaneswaran and Benhur, Sean and Ghosh, Sreyan. Span Extraction Aided Improved Code-mixed Sentiment Classification. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
-
[80]
A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements
Das, Debarati and Chenchu, Roopana and Abdollahi, Maral and Huh, Jisu and Srivastava, Jaideep. A d BERT : An Effective Few Shot Learning Framework for Aligning Tweets to Superbowl Advertisements. Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.