Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Science , volume=
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Regulation Zero 2 applies hierarchical MCTS with a local proposal engine and FPFS reward estimation to optimize sequences of flow regulations in ATFM, outperforming flight-centric baselines while limiting network impact.
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
Med-PaLM 2 achieves 86.5% accuracy on MedQA and approaches or exceeds prior state-of-the-art on other medical QA benchmarks while receiving higher physician preference ratings than human answers on consumer questions.
RL agents in fighting games learn to jointly predict actions and their durations, matching fixed frame-skip performance while favoring repeatable exploitative patterns against scripted bots.
The paper reviews key computational methods for ultrastable glasses, discusses their efficiency and limitations, and compares the stability levels achieved.
citing papers explorer
-
Interpretability Can Be Actionable
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
-
Regulation Zero 2: A Flow-Centric Sequential Regulation Planning Framework to Counter Regulation Cascading in Pre-tactical Air Traffic Flow Management
Regulation Zero 2 applies hierarchical MCTS with a local proposal engine and FPFS reward estimation to optimize sequences of flow regulations in ATFM, outperforming flight-centric baselines while limiting network impact.
-
Towards an AI co-scientist
A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
-
Towards Expert-Level Medical Question Answering with Large Language Models
Med-PaLM 2 achieves 86.5% accuracy on MedQA and approaches or exceeds prior state-of-the-art on other medical QA benchmarks while receiving higher physician preference ratings than human answers on consumer questions.
-
For How Long Should We Be Punching? Learning Action Duration in Fighting Games
RL agents in fighting games learn to jointly predict actions and their durations, matching fixed frame-skip performance while favoring repeatable exploitative patterns against scripted bots.
-
Computational Methods towards Ultrastable Glasses
The paper reviews key computational methods for ultrastable glasses, discusses their efficiency and limitations, and compares the stability levels achieved.
- Lessons from the Trenches on Reproducible Evaluation of Language Models