Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining , pages =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
The paper proposes a Causal-Agency Framework to restore human causal control at AI interfaces by combining causal models, uncertainty quantification, and human-centered evaluation.
citing papers explorer
-
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.
-
Human Agency, Causality, and the Human Computer Interface in High-Stakes Artificial Intelligence
The paper proposes a Causal-Agency Framework to restore human causal control at AI interfaces by combining causal models, uncertainty quantification, and human-centered evaluation.