arXiv:2410.21272 (2024)

Nikankin, Yaniv, Reusch, Anja, Mueller, Aaron, Belinkov, Yonatan , month = may, year = · 2024 · arXiv 2410.21272

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

ToxiREX: A Dataset on Toxic REasoning in ConteXt

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.

Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Transformer represents but does not causally transmit staged algorithmic intermediates for base-digit extraction, diverging from probe predictions.

Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering

cs.SE · 2026-04-18 · unverdicted · novelty 6.0

A prompting method that forces GPAI models to state SE best practices before deciding reduces prompt-induced cognitive biases by 51% on average across eight tested biases.

Generalization in LLM Problem Solving: The Case of the Shortest Path

cs.AI · 2026-04-16 · unverdicted · novelty 6.0

LLMs show strong spatial generalization to unseen maps in shortest-path tasks but fail length scaling due to recursive instability, with data coverage setting hard limits.

Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

cs.LG · 2026-06-10 · unverdicted · novelty 5.0

Case study applies SAE probing with enstrophy triage to a continuum-dynamics foundation model and reports intermittent feature consistency that does not align with standard physics while linking some output discrepancies to specific feature changes.

Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation

cs.LG · 2026-05-04

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

cs.CL · 2026-03-30

citing papers explorer

Showing 7 of 7 citing papers after filters.

ToxiREX: A Dataset on Toxic REasoning in ConteXt cs.CL · 2026-06-26 · unverdicted · none · ref 28
ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.
Represented Is Not Computed: A Causal Test of Candidate Algorithmic Intermediates in a Transformer cs.LG · 2026-05-21 · unverdicted · none · ref 18
Transformer represents but does not causally transmit staged algorithmic intermediates for base-digit extraction, diverging from probe predictions.
Mitigating Prompt-Induced Cognitive Biases in General-Purpose AI for Software Engineering cs.SE · 2026-04-18 · unverdicted · none · ref 37
A prompting method that forces GPAI models to state SE best practices before deciding reduces prompt-induced cognitive biases by 51% on average across eight tested biases.
Generalization in LLM Problem Solving: The Case of the Shortest Path cs.AI · 2026-04-16 · unverdicted · none · ref 36
LLMs show strong spatial generalization to unseen maps in shortest-path tasks but fail length scaling due to recursive instability, with data coverage setting hard limits.
Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics cs.LG · 2026-06-10 · unverdicted · none · ref 28
Case study applies SAE probing with enstrophy triage to a continuum-dynamics foundation model and reports intermittent feature consistency that does not align with standard physics while linking some output discrepancies to specific feature changes.
Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation cs.LG · 2026-05-04 · unreviewed · ref 40
The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning cs.CL · 2026-03-30 · unreviewed · ref 12

arXiv:2410.21272 (2024)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer