FORTIS benchmark shows over-privilege is the norm in LLM agent skill selection and execution, with models reaching for higher-privilege skills and tools than required across ten frontier models and three domains.
Treble Counterfactual VLM s: A Causal Approach to Hallucination
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
UFCOD extracts Path Energy and Dynamics Energy from diffusion trajectories to perform few-shot OOD detection across unrelated domains with one fixed model.
citing papers explorer
-
FORTIS: Benchmarking Over-Privilege in Agent Skills
FORTIS benchmark shows over-privilege is the norm in LLM agent skill selection and execution, with models reaching for higher-privilege skills and tools than required across ten frontier models and three domains.
-
Geometry over Density: Few-Shot Cross-Domain OOD Detection
UFCOD extracts Path Energy and Dynamics Energy from diffusion trajectories to perform few-shot OOD detection across unrelated domains with one fixed model.