Towards a Science of AI Evaluations,

As of January 13 · 2026 · DOI 10.1038/s42256-025-00985-0

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.

Open Weight AI Models Require Proportional Evaluation Approaches

cs.CY · 2026-06-18 · unverdicted · novelty 5.0

Open-weight AI models mostly fail four proposed proportional evaluation criteria (PE1-4) designed to address risks from public weights that closed models do not face.

citing papers explorer

Showing 2 of 2 citing papers.

Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior cs.LG · 2026-06-22 · unverdicted · none · ref 71
Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
Open Weight AI Models Require Proportional Evaluation Approaches cs.CY · 2026-06-18 · unverdicted · none · ref 4
Open-weight AI models mostly fail four proposed proportional evaluation criteria (PE1-4) designed to address risks from public weights that closed models do not face.

Towards a Science of AI Evaluations,

fields

years

verdicts

representative citing papers

citing papers explorer