arXiv:2410.21272 (2024)

Yaniv Nikankin, Anja Reusch, Aaron Mueller, Yonatan Belinkov · 2024 · arXiv 2410.21272

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

ToxiREX: A Dataset on Toxic REasoning in ConteXt

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.

Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transformer represents but does not causally transmit staged algorithmic intermediates for base-digit extraction, diverging from probe predictions.

Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation

cs.LG · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

MechaRule localizes sparse agonist neurons via contrastive hierarchical ablation and adaptive group testing to ground rule extraction, recalling 97% of high-effect activations at 2.14% cost while enabling near-total elimination of target behaviors.

Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering

cs.SE · 2026-04-18 · unverdicted · novelty 6.0

A prompting method that forces GPAI models to state SE best practices before deciding reduces prompt-induced cognitive biases by 51% on average across eight tested biases.

Generalization in LLM Problem Solving: The Case of the Shortest Path

cs.AI · 2026-04-16 · unverdicted · novelty 6.0

LLMs show strong spatial generalization to unseen maps in shortest-path tasks but fail length scaling due to recursive instability, with data coverage setting hard limits.

Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

cs.LG · 2026-06-10 · unverdicted · novelty 5.0

Case study applies SAE probing with enstrophy triage to a continuum-dynamics foundation model and reports intermittent feature consistency that does not align with standard physics while linking some output discrepancies to specific feature changes.

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

cs.CL · 2026-03-30

citing papers explorer

Showing 1 of 1 citing paper after filters.

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning cs.CL · 2026-03-30 · unreviewed · ref 12

arXiv:2410.21272 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer