Recognition: unknown
Misaligned by Reward: Socially Undesirable Preferences in LLMs
Pith reviewed 2026-05-08 17:23 UTC · model grok-4.3
The pith
Reward models often prefer socially undesirable responses across bias, safety, morality, and ethics evaluations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across the tested reward models, performance varies substantially by domain and no model performs best in all cases. The models frequently assign higher rewards to socially undesirable responses and generate systematically biased distributions of selected outputs. Stronger bias avoidance in a model tends to decrease its sensitivity to contextual details, revealing an alignment trade-off between bias reduction and contextual faithfulness.
What carries the argument
The conversion framework that turns social evaluation datasets into pairwise preference data by applying gold labels where present and directional bias indicators otherwise.
If this is right
- Performance differs markedly across domains with no single reward model excelling universally.
- Models frequently score socially undesirable responses higher than desirable ones.
- The preferences encoded in reward models lead to systematically biased output distributions.
- Increased bias avoidance trades off against reduced sensitivity to context.
Where Pith is reading between the lines
- Training LLMs with these reward models risks embedding socially undesirable preferences into deployed applications.
- Alignment pipelines may require dedicated social-intelligence benchmarks rather than relying on general instruction-following tests.
- Averaging scores from multiple reward models could reduce the impact of any single model's domain-specific biases.
Load-bearing premise
Converting social evaluation datasets into pairwise preference data via gold labels and directional bias indicators produces a faithful proxy for human social preferences without introducing systematic distortion.
What would settle it
A follow-up study that collects direct human pairwise preferences on the exact same dataset pairs and finds that the reward models match human social desirability judgments in a clear majority of cases across domains would indicate the models are not systematically misaligned.
Figures
read the original abstract
Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limited insight into whether these models capture socially desirable preferences. As a result, important failures in social alignment can remain hidden. We extend reward-model benchmarking to four socially consequential domains: bias, safety, morality, and ethical reasoning. We introduce a framework that converts social evaluation datasets into pairwise preference data, leveraging gold labels where available and directional bias indicators otherwise. This enables us to test whether reward models prefer socially undesirable responses, and whether their preferences produce systematically biased distributions over selected outputs. Across five publicly available reward models and two instruction-tuned models used as reward proxies, we find substantial variation across domains, with no single model performing best overall. The models fall well short of strong social intelligence: they often prefer socially undesirable options, and their preferences produce systematically biased distributions. Moreover, stronger bias avoidance can reduce sensitivity to context, revealing a key alignment trade-off between avoiding biased outcomes and preserving contextual faithfulness. These findings show that standard reward benchmarks are insufficient for assessing social alignment and highlight the need for evaluations that directly measure the social preferences encoded in reward models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper evaluates five publicly available reward models and two instruction-tuned models as proxies on four social domains (bias, safety, morality, ethical reasoning). It introduces a conversion framework that turns existing social evaluation datasets into pairwise preference data using gold labels where available and directional bias indicators otherwise. The central claims are that reward models exhibit substantial cross-domain variation with no single best performer, frequently prefer socially undesirable options, generate systematically biased output distributions, and face a trade-off in which stronger bias avoidance reduces contextual sensitivity.
Significance. If the empirical findings hold after validation of the proxy, the work would be significant for alignment research: it demonstrates that standard reward-model benchmarks miss socially consequential failures and identifies a concrete trade-off between bias mitigation and faithfulness. The absence of machine-checked proofs or parameter-free derivations is expected for an empirical study, but the reproducible conversion framework and use of public models are strengths that would allow direct follow-up.
major comments (3)
- [§3] §3 (Conversion Framework): The claim that the generated pairwise preferences serve as a faithful proxy for human social preferences rests on the assumption that gold labels and directional bias indicators introduce no systematic distortion. No human agreement rates, inter-annotator reliability, or external validation on the converted pairs are reported, leaving open the possibility that observed model failures partly reflect artifacts of the source datasets or the conversion rules rather than intrinsic properties of the reward models.
- [§4–5] §4–5 (Empirical Results): The abstract asserts 'substantial variation,' 'often prefer socially undesirable options,' and 'systematically biased distributions,' yet the provided summary supplies no dataset sizes, number of pairs per domain, error bars, statistical significance tests, or effect-size measures. Without these, it is impossible to determine whether the reported patterns survive multiple-comparison correction or post-hoc selection across the five reward models and two proxies.
- [§5.3] §5.3 (Trade-off Claim): The statement that 'stronger bias avoidance can reduce sensitivity to context' is load-bearing for the alignment-trade-off conclusion. The manuscript must show that this reduction is not an artifact of the particular bias-avoidance metric or the chosen context-sensitivity measure; a concrete counter-example or ablation demonstrating the trade-off under alternative operationalizations would be required.
minor comments (2)
- [Table 1] Table 1 (or equivalent): clarify whether the two instruction-tuned models are used only as zero-shot proxies or also fine-tuned; the distinction affects interpretation of the 'no single best model' result.
- [§3] Notation: the term 'directional bias indicators' is introduced without a formal definition or pseudocode; a short algorithmic box would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying our approach and outlining the revisions we will make to improve the manuscript's rigor and transparency.
read point-by-point responses
-
Referee: [§3] §3 (Conversion Framework): The claim that the generated pairwise preferences serve as a faithful proxy for human social preferences rests on the assumption that gold labels and directional bias indicators introduce no systematic distortion. No human agreement rates, inter-annotator reliability, or external validation on the converted pairs are reported, leaving open the possibility that observed model failures partly reflect artifacts of the source datasets or the conversion rules rather than intrinsic properties of the reward models.
Authors: We acknowledge that the manuscript does not include new human validation or inter-annotator agreement metrics specifically for the converted pairwise preferences. The framework relies on gold labels from established, previously validated social datasets (e.g., bias and safety benchmarks) and directional indicators drawn from standard practices in the literature for the remaining cases. To address the concern directly, we will revise Section 3 to include an expanded discussion of the conversion rules, explicitly note the proxy nature of the labels, and add a limitations subsection acknowledging the possibility of dataset-specific artifacts. This will make the assumptions more transparent while preserving the reproducibility of the framework using public data. revision: partial
-
Referee: [§4–5] §4–5 (Empirical Results): The abstract asserts 'substantial variation,' 'often prefer socially undesirable options,' and 'systematically biased distributions,' yet the provided summary supplies no dataset sizes, number of pairs per domain, error bars, statistical significance tests, or effect-size measures. Without these, it is impossible to determine whether the reported patterns survive multiple-comparison correction or post-hoc selection across the five reward models and two proxies.
Authors: The full manuscript reports dataset sizes and pair counts in Section 4 (e.g., over 1,000 pairs for bias, several hundred for safety, morality, and ethical reasoning). We will revise Sections 4 and 5 to add error bars to all figures and tables, report statistical significance tests (including corrections for multiple comparisons across models and domains), and include effect-size measures. These changes will allow readers to evaluate the robustness of the patterns and address concerns about post-hoc selection. revision: yes
-
Referee: [§5.3] §5.3 (Trade-off Claim): The statement that 'stronger bias avoidance can reduce sensitivity to context' is load-bearing for the alignment-trade-off conclusion. The manuscript must show that this reduction is not an artifact of the particular bias-avoidance metric or the chosen context-sensitivity measure; a concrete counter-example or ablation demonstrating the trade-off under alternative operationalizations would be required.
Authors: We agree that additional checks would strengthen the trade-off claim. The current results show the trade-off consistently across models and domains with the primary metrics. In the revision, we will add an ablation subsection that tests alternative bias-avoidance operationalizations (e.g., varying detection thresholds) and an alternative context-sensitivity measure. We will also include a concrete counter-example from the ethical reasoning domain illustrating the trade-off under these alternatives, demonstrating that the finding is not an artifact of the original measures. revision: yes
Circularity Check
No circularity detected in empirical benchmarking
full rationale
The paper is an empirical evaluation that converts existing social datasets into pairwise preference pairs using gold labels and directional indicators, then measures reward model outputs on those pairs. No equations, derivations, fitted parameters, or self-citations are described that reduce any result to its own inputs by construction. The conversion framework is a methodological choice whose validity can be checked externally (e.g., via human agreement), and the reported findings are direct observations on public models rather than self-referential predictions. This matches the default case of a self-contained empirical study with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Social evaluation datasets can be converted into reliable pairwise preference data using gold labels where available and directional bias indicators otherwise.
Reference graph
Works this paper leans on
-
[1]
Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[2]
Research on LLM s-Empowered Conversational AI for Sustainable Behaviour Change
Chen, Ben. Research on LLM s-Empowered Conversational AI for Sustainable Behaviour Change. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[3]
Deep Reinforcement Learning of LLM s using RLHF
Levandovsky, Enoch. Deep Reinforcement Learning of LLM s using RLHF. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[4]
Conversational Collaborative Robots
Kranti, Chalamalasetti. Conversational Collaborative Robots. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[5]
Dialogue System using Large Language Model-based Dynamic Slot Generation
Hashimoto, Ekai. Dialogue System using Large Language Model-based Dynamic Slot Generation. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[6]
Towards Adaptive Human-Agent Collaboration in Real-Time Environments
Nakae, Kaito. Towards Adaptive Human-Agent Collaboration in Real-Time Environments. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[7]
Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation
Jiang, Jingjing. Towards Human-Like Dialogue Systems: Integrating Multimodal Emotion Recognition and Non-Verbal Cue Generation. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[8]
Controlling Dialogue Systems with Graph-Based Structures
Hilgendorf, Laetitia Mina. Controlling Dialogue Systems with Graph-Based Structures. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[9]
Multimodal Agentic Dialogue Systems for Situated Human-Robot Interaction
Sucal, Virgile. Multimodal Agentic Dialogue Systems for Situated Human-Robot Interaction. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[10]
Knowledge Graphs and Representational Models for Dialogue Systems
Walker, Nicholas Thomas. Knowledge Graphs and Representational Models for Dialogue Systems. Proceedings of the 21st Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems. 2025
2025
-
[11]
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.0
-
[12]
Efeoglu, Sefika and Paschke, Adrian. Fine-Tuning Large Language Models for Relation Extraction within a Retrieval-Augmented Generation Framework. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.1
-
[13]
Benchmarking Table Extraction: Multimodal LLM s vs Traditional OCR
Nunes, Guilherme and Rolla, Vitor and Pereira, Duarte and Alves, Vasco and Carreiro, Andre and Baptista, M \'a rcia. Benchmarking Table Extraction: Multimodal LLM s vs Traditional OCR. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.2
-
[14]
Injecting Structured Knowledge into LLM s via Graph Neural Networks
Li, Zichao and Ke, Zong and Zhao, Puning. Injecting Structured Knowledge into LLM s via Graph Neural Networks. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.3
-
[15]
Regular-pattern-sensitive CRF s for Distant Label Interactions
Papay, Sean and Klinger, Roman and Pad \'o , Sebastian. Regular-pattern-sensitive CRF s for Distant Label Interactions. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.4
-
[16]
Swarup, Anushka and Bhandarkar, Avanti and Wilson, Ronald and Pan, Tianyu and Woodard, Damon. From Syntax to Semantics: Evaluating the Impact of Linguistic Structures on LLM -Based Information Extraction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.5
-
[17]
Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models
Willemsen, Bram and Skantze, Gabriel. Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.6
-
[18]
Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis
Li, Daoyang and Zhao, Haiyan and Zeng, Qingcheng and Du, Mengnan. Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.7
-
[19]
Self-Contrastive Loop of Thought Method for Text-to- SQL Based on Large Language Model
Kang, Fengrui and Tan, Mingxi and Huang, Xianying and Yang, Shiju. Self-Contrastive Loop of Thought Method for Text-to- SQL Based on Large Language Model. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.8
-
[20]
Isaeva, Ulyana and Astafurov, Danil and Martynov, Nikita. Combining Automated and Manual Data for Effective Downstream Fine-Tuning of Transformers for Low-Resource Language Applications. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.9
-
[21]
Bartkowiak, Patryk and Grali \'n ski, Filip. Seamlessly Integrating Tree-Based Positional Embeddings into Transformer Models for Source Code Representation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.10
-
[22]
Enhancing AMR Parsing with Group Relative Policy Optimization
Barta, Botond and Hamerlik, Endre and Nyist, Mil \'a n and Ito, Masato and Acs, Judit. Enhancing AMR Parsing with Group Relative Policy Optimization. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.11
-
[23]
Structure Modeling Approach for UD Parsing of Historical M odern J apanese
Ozaki, Hiroaki and Omura, Mai and Komiya, Kanako and Asahara, Masayuki and Ogiso, Toshinobu. Structure Modeling Approach for UD Parsing of Historical M odern J apanese. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.12
-
[24]
BARTABSA ++: Revisiting BARTABSA with Decoder LLM s
Pfister, Jan and V. BARTABSA ++: Revisiting BARTABSA with Decoder LLM s. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.13
-
[25]
Typed- RAG : Type-Aware Decomposition of Non-Factoid Questions for Retrieval-Augmented Generation
Lee, DongGeon and Park, Ahjeong and Lee, Hyeri and Nam, Hyeonseo and Maeng, Yunho. Typed- RAG : Type-Aware Decomposition of Non-Factoid Questions for Retrieval-Augmented Generation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.14
-
[26]
Hellwig, Nils Constantin and Fehle, Jakob and Kruschwitz, Udo and Wolff, Christian. Do we still need Human Annotators? Prompting Large Language Models for Aspect Sentiment Quad Prediction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.15
-
[27]
Can LLM s Interpret and Leverage Structured Linguistic Representations? A Case Study with AMR s
Raut, Ankush and Zhu, Xiaofeng and Pacheco, Maria Leonor. Can LLM s Interpret and Leverage Structured Linguistic Representations? A Case Study with AMR s. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.16
-
[28]
LLM Dependency Parsing with In-Context Rules
Ginn, Michael and Palmer, Alexis. LLM Dependency Parsing with In-Context Rules. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.17
-
[29]
Han, Xu and Wang, Bo and Sun, Yueheng and Zhao, Dongming and Qu, Zongfeng and He, Ruifang and Hou, Yuexian and Hu, Qinghua. Cognitive Mirroring for D oc RE : A Self-Supervised Iterative Reflection Framework with Triplet-Centric Explicit and Implicit Feedback. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)...
-
[30]
Cross-Document Event-Keyed Summarization
Walden, William and Kuchmiichuk, Pavlo and Martin, Alexander and Jin, Chihsheng and Cao, Angela and Sun, Claire and Allen, Curisia and White, Aaron Steven. Cross-Document Event-Keyed Summarization. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.19
-
[31]
Transfer of Structural Knowledge from Synthetic Languages
Budnikov, Mikhail and Yamshchikov, Ivan. Transfer of Structural Knowledge from Synthetic Languages. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.20
-
[32]
Language Models are Universal Embedders
Zhang, Xin and Li, Zehan and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan and Zhang, Min. Language Models are Universal Embedders. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.21
-
[33]
Duan, Shuoqiu and Chen, Xiaoliang and Miao, Duoqian and Gu, Xu and Li, Xianyong and Du, Yajun. D ia DP @ XLLM 25: Advancing C hinese Dialogue Parsing via Unified Pretrained Language Models and Biaffine Dependency Scoring. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.22
-
[34]
Yuan, Jiahao and Sun, Xingzhe and Yu, Xing and Wang, Jingwen and Du, Dehui and Cui, Zhiqing and Di, Zixiang. LLMSR @ XLLM 25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.23
-
[35]
S peech EE @ XLLM 25: End-to-End Structured Event Extraction from Speech
Chaudhuri, Soham and Biswas, Diganta and Saha, Dipanjan and Das, Dipankar and Bandyopadhyay, Sivaji. S peech EE @ XLLM 25: End-to-End Structured Event Extraction from Speech. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.24
-
[36]
Pham Hoang Le, Nguyen and Dinh Thien, An and T. Luu, Son and Van Nguyen, Kiet. D oc IE @ XLLM 25: Z ero S emble - Robust and Efficient Zero-Shot Document Information Extraction with Heterogeneous Large Language Model Ensembles. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.25
-
[37]
Popovic, Nicholas and Kangen, Ashish and Schopf, Tim and F. D oc IE @ XLLM 25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.26
-
[38]
Tai, Le and Van, Thin. LLMSR @ XLLM 25: Integrating Reasoning Prompt Strategies with Structural Prompt Formats for Enhanced Logical Inference. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.27
-
[39]
Qiu, Chengfeng and Zhou, Lifeng and Wei, Kaifeng and Li, Yuke. D oc IE @ XLLM 25: UIEP rompter: A Unified Training-Free Framework for universal document-level information extraction via Structured Prompt. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.28
-
[40]
Chen, Danchun. LLMSR @ XLLM 25: SWRV : Empowering Self-Verification of Small Language Models through Step-wise Reasoning and Verification. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.29
-
[41]
LLMSR @ XLLM 25: An Empirical Study of LLM for Structural Reasoning
Li, Xinye and Wan, Mingqi and Sui, Dianbo. LLMSR @ XLLM 25: An Empirical Study of LLM for Structural Reasoning. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.30
-
[42]
LLMSR @ XLLM 25: A Language Model-Based Pipeline for Structured Reasoning Data Construction
Xing, Hongrui and Liu, Xinzhang and Jiang, Zhuo and Yang, Zhihao and Yao, Yitong and Wang, Zihan and Deng, Wenmin and Wang, Chao and Song, Shuangyong and Yang, Wang and He, Zhongjiang and Li, Yongxiang. LLMSR @ XLLM 25: A Language Model-Based Pipeline for Structured Reasoning Data Construction. Proceedings of the 1st Joint Workshop on Large Language Model...
-
[43]
S peech EE @ XLLM 25: Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction
Gedeon, M \'a t \'e. S peech EE @ XLLM 25: Retrieval-Enhanced Few-Shot Prompting for Speech Event Extraction. Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). 2025. doi:10.18653/v1/2025.xllm-1.32
-
[44]
Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[45]
An introduction to computational identification and classification of Upam \= a alaṇk \= a ra
Jadhav, Bhakti and Dutta, Himanshu and Kanitkar, Shruti and Kulkarni, Malhar and Bhattacharyya, Pushpak. An introduction to computational identification and classification of Upam \= a alaṇk \= a ra. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[46]
Aesthetics of S anskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on \'S ikṣ \= a ṣṭaka
Sandhan, Jivnesh and Barbadikar, Amruta and Maity, Malay and Satuluri, Pavankumar and Sandhan, Tushar and Gupta, Ravi M and Goyal, Pawan and Behera, Laxmidhar. Aesthetics of S anskrit Poetry from the Perspective of Computational Linguistics: A Case Study Analysis on \'S ikṣ \= a ṣṭaka. Computational Sanskrit and Digital Humanities - World Sanskrit Confere...
2025
-
[47]
Itaretara Dvandva: A challenge for Dependency Tree semantics
Kulkarni, Amba and Neelamana, Vasudha. Itaretara Dvandva: A challenge for Dependency Tree semantics. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[48]
A Case Study of Handwritten Text Recognition from Pre-Colonial era S anskrit Manuscripts
Chincholikar, Kartik and Dwivedi, Shagun and Gopalan, Kaushik and Awasthi, Tarinee. A Case Study of Handwritten Text Recognition from Pre-Colonial era S anskrit Manuscripts. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[49]
Towards Accent-Aware V edic S anskrit Optical Character Recognition Based on Transformer Models
Tsukagoshi, Yuzuki and Kuroiwa, Ryo and Ohmukai, Ikki. Towards Accent-Aware V edic S anskrit Optical Character Recognition Based on Transformer Models. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[50]
Vedavani: A Benchmark Corpus for ASR on V edic S anskrit Poetry
Kumar, Sujeet and Ray, Pretam and Beerukuri, Abhinay and Kamoji, Shrey and Jagadeeshan, Manoj Balaji and Goyal, Pawan. Vedavani: A Benchmark Corpus for ASR on V edic S anskrit Poetry. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[51]
Compound Type Identification in S anskrit
Krishnan, Sriram and Satuluri, Pavankumar and Barbadikar, Amruta and Prasanna Venkatesh, T S and Kulkarni, Amba. Compound Type Identification in S anskrit. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[52]
IKML : A Markup Language for Collaborative Semantic Annotation of I ndic Texts
Lakkundi, Chaitanya S and Rajaraman, Gopalakrishnan and Susarla, Sai Rama Krishna. IKML : A Markup Language for Collaborative Semantic Annotation of I ndic Texts. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[53]
Challenges in Processing V edic S anskrit: Towards creating a normalized dataset for the Ṛgveda-saṃhit \= a
Krishnan, Sriram and Gayathri, Sepuri and Kulkarni, Amba. Challenges in Processing V edic S anskrit: Towards creating a normalized dataset for the Ṛgveda-saṃhit \= a. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[54]
P \= a ṇḍitya: Visualizing S anskrit Intellectual Networks
Neill, Tyler. P \= a ṇḍitya: Visualizing S anskrit Intellectual Networks. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[55]
Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval on E nglish Queries and S anskrit Documents
Jagadeeshan, Manoj Balaji and Raj, Prince and Goyal, Pawan. Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval on E nglish Queries and S anskrit Documents. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[56]
Concordance of S anskrit Synonyms
Patel, Dhaval. Concordance of S anskrit Synonyms. Computational Sanskrit and Digital Humanities - World Sanskrit Conference 2025. 2025
2025
-
[57]
Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025
2025
-
[58]
Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts
Buhnila, Ioana and Cislaru, Georgeta and Todirascu, Amalia. Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025
2025
-
[59]
Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities
Shi, Ken and Penn, Gerald. Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025
2025
-
[60]
Reading Between the Lines: A dataset and a study on why some texts are tougher than others
Khallaf, Nouran and Eugeni, Carlo and Sharoff, Serge. Reading Between the Lines: A dataset and a study on why some texts are tougher than others. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025
2025
-
[61]
P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction
Jourdan, L \'e ane and Boudin, Florian and Dufour, Richard and Hernandez, Nicolas and Aizawa, Akiko. P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025
2025
-
[62]
Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts
Maggi, Chiara and Vitaletti, Andrea. Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025
2025
-
[63]
Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models
Sato, Anna and Kobayashi, Ichiro. Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models. Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025
2025
-
[64]
Proceedings of the 5th Wordplay: When Language meets Games Workshop (Wordplay 2025). 2025. doi:10.18653/v1/2025.wordplay-1.0
-
[65]
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[66]
A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection
Fillies, Jan and Wawerek, Marius and Paschke, Adrian. A Comprehensive Taxonomy of Bias Mitigation Methods for Hate Speech Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[67]
Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation
Antypas, Dimosthenis and Sen, Indira and Perez Almendros, Carla and Camacho-Collados, Jose and Barbieri, Francesco. Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[68]
From civility to parity: Marxist-feminist ethics for context-aware algorithmic content moderation
Oh, Dayei. From civility to parity: Marxist-feminist ethics for context-aware algorithmic content moderation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[69]
A Novel Dataset for Classifying G erman Hate Speech Comments with Criminal Relevance
Kums, Vincent and Meyer, Florian and Pivit, Luisa and Vedenina, Uliana and Wortmann, Jonas and Siegel, Melanie and Labudde, Dirk. A Novel Dataset for Classifying G erman Hate Speech Comments with Criminal Relevance. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[70]
Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection
Caselli, Tommaso and Plaza-del-Arco, Flor Miriam. Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[71]
Debiasing Static Embeddings for Hate Speech Detection
Sun, Ling and Kim, Soyoung and Dong, Xiao and K. Debiasing Static Embeddings for Hate Speech Detection. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[72]
Web(er) of Hate: A Survey on How Hate Speech Is Typed
Wang, Luna and Caines, Andrew and Hutchings, Alice. Web(er) of Hate: A Survey on How Hate Speech Is Typed. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[73]
Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLM s for Countering Hate Speech
Ngueajio, Mikel and Plaza-del-Arco, Flor Miriam and Chung, Yi-Ling and Rawat, Danda and Cercas Curry, Amanda. Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLM s for Countering Hate Speech. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[74]
HODIAT : A Dataset for Detecting Homotransphobic Hate Speech in I talian with Aggressiveness and Target Annotation
Damo, Greta and Cignarella, Alessandra Teresa and Caselli, Tommaso and Patti, Viviana and Nozza, Debora. HODIAT : A Dataset for Detecting Homotransphobic Hate Speech in I talian with Aggressiveness and Target Annotation. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[75]
Beyond the Binary: Analysing Transphobic Hate and Harassment Online
Talas, Anna and Hutchings, Alice. Beyond the Binary: Analysing Transphobic Hate and Harassment Online. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[76]
Evading Toxicity Detection with ASCII -art: A Benchmark of Spatial Attacks on Moderation Systems
Berezin, Sergey and Farahbakhsh, Reza and Crespi, Noel. Evading Toxicity Detection with ASCII -art: A Benchmark of Spatial Attacks on Moderation Systems. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[77]
Debunking with Dialogue? Exploring AI -Generated Counterspeech to Challenge Conspiracy Theories
Lisker, Mareike and Gottschalk, Christina and Mihaljevi \'c , Helena. Debunking with Dialogue? Exploring AI -Generated Counterspeech to Challenge Conspiracy Theories. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[78]
M isinfo T ele G raph: Network-driven Misinformation Detection for G erman Telegram Messages
Kalkbrenner, Lu and Solopova, Veronika and Zeiler, Steffen and Nickel, Robert and Kolossa, Dorothea. M isinfo T ele G raph: Network-driven Misinformation Detection for G erman Telegram Messages. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[79]
Catching Stray Balls: Football, fandom, and the impact on digital discourse
Hill, Mark. Catching Stray Balls: Football, fandom, and the impact on digital discourse. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
-
[80]
e , Justina and Rimkien \
Mandravickait \. e , Justina and Rimkien \. e , Egl \. e and Petkevi c ius, Mindaugas and Songailait \. e , Milita and Zaranka, Eimantas and Krilavi c ius, Tomas. Exploring Hate Speech Detection Models for L ithuanian Language. Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH). 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.