Probabilistic circuits detect LLM hallucinations as residual-stream anomalies with up to 99% AUROC and enable dynamic correction that raises truthfulness scores while cutting unnecessary output corruption.
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
Reasoning traces in large reasoning models expose safety failures missed by final-answer checks, and adaptive multi-principle steering reduces unsafe content in both traces and answers while preserving task performance.
citing papers explorer
-
Hallucination as an Anomaly: Dynamic Intervention via Probabilistic Circuits
Probabilistic circuits detect LLM hallucinations as residual-stream anomalies with up to 99% AUROC and enable dynamic correction that raises truthfulness scores while cutting unnecessary output corruption.
-
Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering
Reasoning traces in large reasoning models expose safety failures missed by final-answer checks, and adaptive multi-principle steering reduces unsafe content in both traces and answers while preserving task performance.