ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
AROMA combines text, graph topology, and protein sequences with augmented reasoning and two-stage optimization to deliver more accurate and interpretable predictions of genetic perturbation effects in virtual cells, outperforming baselines even in zero-shot and long-tail settings.
ZoFia is a zero-shot fake news detection framework that uses hierarchical entity salience retrieval followed by multi-LLM adversarial debate to improve robustness over single-model approaches.
citing papers explorer
-
ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations
ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.
-
AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling
AROMA combines text, graph topology, and protein sequences with augmented reasoning and two-stage optimization to deliver more accurate and interpretable predictions of genetic perturbation effects in virtual cells, outperforming baselines even in zero-shot and long-tail settings.
-
ZoFia: Zero-Shot Fake News Detection with Entity-Guided Retrieval and Multi-LLM Interaction
ZoFia is a zero-shot fake news detection framework that uses hierarchical entity salience retrieval followed by multi-LLM adversarial debate to improve robustness over single-model approaches.