VISTAQA is a new benchmark for joint visual question answering correctness and pixel-level grounding, evaluated with the GROVE metric that uses per-sample geometric mean to require both dimensions to succeed.
A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20(1):37–46, 1960
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.
Large-scale LLM analysis of 16k CTI reports over 20 years shows a fragmented vendor ecosystem with low overlap and reporting biases.
citing papers explorer
-
VISTAQA: Benchmarking Joint Visual Question Answering and Pixel-Level Evidence
VISTAQA is a new benchmark for joint visual question answering correctness and pixel-level grounding, evaluated with the GROVE metric that uses per-sample geometric mean to require both dimensions to succeed.
-
Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
LFD discovers predictive text features via LLM contrastive proposals, cross-LLM Cohen's kappa screening, and residual held-out gain selection, matching baseline accuracy while achieving higher human agreement and lower label leakage on ten tasks.
-
The CTI Echo Chamber: Fragmentation, Overlap, and Vendor Specificity in Twenty Years of Cyber Threat Reporting
Large-scale LLM analysis of 16k CTI reports over 20 years shows a fragmented vendor ecosystem with low overlap and reporting biases.