Dynamic Boundary Evaluation adaptively identifies each LLM's performance boundary on a shared difficulty scale using a calibrated item bank and a search algorithm.
arXiv preprint arXiv:2510.21910 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
other 1polarities
unclear 1representative citing papers
SafeRedirect reduces average unsafe generation rates in frontier LLMs from 71.2% to 8.0% on Internal Safety Collapse tasks by redirecting task completion with failure permission and deterministic hard stops.
Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming human-characterized alternatives.
citing papers explorer
-
Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models
Dynamic Boundary Evaluation adaptively identifies each LLM's performance boundary on a shared difficulty scale using a calibrated item bank and a search algorithm.
-
SafeRedirect: Defeating Internal Safety Collapse via Task-Completion Redirection in Frontier LLMs
SafeRedirect reduces average unsafe generation rates in frontier LLMs from 71.2% to 8.0% on Internal Safety Collapse tasks by redirecting task completion with failure permission and deterministic hard stops.
-
Characterizing Model-Native Skills
Recovering an orthogonal basis from model activations yields a model-native skill characterization that improves reasoning Pass@1 by up to 41% via targeted data selection and supports inference steering, outperforming human-characterized alternatives.