ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.
arXiv:2410.21272 (2024)
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 2polarities
background 2representative citing papers
Transformer represents but does not causally transmit staged algorithmic intermediates for base-digit extraction, diverging from probe predictions.
A prompting method that forces GPAI models to state SE best practices before deciding reduces prompt-induced cognitive biases by 51% on average across eight tested biases.
LLMs show strong spatial generalization to unseen maps in shortest-path tasks but fail length scaling due to recursive instability, with data coverage setting hard limits.
Case study applies SAE probing with enstrophy triage to a continuum-dynamics foundation model and reports intermittent feature consistency that does not align with standard physics while linking some output discrepancies to specific feature changes.
citing papers explorer
-
ToxiREX: A Dataset on Toxic REasoning in ConteXt
ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.
-
Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer
Transformer represents but does not causally transmit staged algorithmic intermediates for base-digit extraction, diverging from probe predictions.
-
Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering
A prompting method that forces GPAI models to state SE best practices before deciding reduces prompt-induced cognitive biases by 51% on average across eight tested biases.
-
Generalization in LLM Problem Solving: The Case of the Shortest Path
LLMs show strong spatial generalization to unseen maps in shortest-path tasks but fail length scaling due to recursive instability, with data coverage setting hard limits.
-
Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics
Case study applies SAE probing with enstrophy triage to a continuum-dynamics foundation model and reports intermittent feature consistency that does not align with standard physics while linking some output discrepancies to specific feature changes.
- Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning