Presents a new expert-curated dataset of multi-turn counterspeech dialogues in five languages targeting hate against seven groups, with span annotations linking to verified external knowledge for RAG applications.
hub
A ttention is not E xplanation
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
polarities
background 3representative citing papers
GPT-2 small solves indirect object identification via a circuit of 26 attention heads organized into seven functional classes discovered through causal interventions.
An argument paper reframes LLM explainability as an embodied, situated practice based on Dourish and enactivist cognition, identifying ontological obstacles in internal explanations and advocating affordance-based designs.
A clustering-based pre-training step transfers semantic knowledge from language models into Tsetlin Machines, yielding competitive accuracy with BERT while preserving clause-level interpretability.
Evaluation of two latent reasoning models against controls shows observable latent patterns appear without the proposed mechanisms, have graded causal effects on behavior, and concentrate in structured low-rank directions, arguing that patterns are insufficient evidence for reasoning.
Behavior Forecasters trained on LRM trajectories outperform larger models in predicting repeatability and input sensitivity at low cost.
QCFuse achieves full-prefill quality in RAG with 1.7x average prefill speedup over full prefill and 1.5x over ProphetKV via compressed query-aware cache fusion.
SGC-RML creates an 8D symptom atlas from multimodal PD data and integrates conformal calibration to deliver reliable, rejectable longitudinal assessments.
MASPrism attributes failures in multi-agent systems by ranking candidates from prefill-stage NLL and attention signals of a 0.6B SLM, beating baselines by up to 33.41% Top-1 accuracy and proprietary LLMs by up to 89.5% relative improvement while processing traces in 2.66 seconds.
Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.
RETRO matches GPT-3 and Jurassic-1 performance on the Pile benchmark using 25 times fewer parameters by conditioning on retrieved chunks from a 2-trillion-token database.
G-IdiomAlign is a gloss-pivoted benchmark with multiple-choice and generation protocols for evaluating cross-lingual idiom alignment in LLMs.
LLMs generate adequate counterspeech for co-occurring hate and misinformation in 40% of cases, with a mixed knowledge strategy from fact-checkers and NGOs proving most effective after expert revision.
ORBIT uses an intervention-consistent self-supervised objective in a transformer to infer asymmetric gene program influences from observational scRNA-seq data, recovering Alzheimer's vulnerability patterns and achieving 0.984 macro F1 cell-type classification from 220 pathway scores.
Profy uses take-level expert-amateur labels on 1083 piano recordings to produce time-aligned highlight scores that correlate with expert review points (r=0.61) on held-out amateur clips.
Matched learning-rate experiments show LoRA retains substantially higher zero-shot transfer (45% vs 11% on EuroSAT, 58% vs 9% on Pets) than Full FT in CLIP adaptation.
citing papers explorer
-
MASPrism: Lightweight Failure Attribution for Multi-Agent Systems Using Prefill-Stage Signals
MASPrism attributes failures in multi-agent systems by ranking candidates from prefill-stage NLL and attention signals of a 0.6B SLM, beating baselines by up to 33.41% Top-1 accuracy and proprietary LLMs by up to 89.5% relative improvement while processing traces in 2.66 seconds.