BARISTA introduces a densely annotated egocentric coffee-preparation video dataset and multi-task benchmark that reveals performance variation across models on compositional visual tasks.
Gpt-4 technical report
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
baseline 1
citation-polarity summary
roles
baseline 1polarities
baseline 1representative citing papers
BEAVER is the first practical deterministic verifier that maintains sound probability bounds on LLM safety properties using token tries and frontier data structures, finding 2-3x more violations than sampling at 1/10 the compute.
A lightweight max-pooling network with MLP detects LLM hallucinations competitively without semantic consistency computations by adaptively aggregating internal token features.
citing papers explorer
-
BEAVER: An Efficient Deterministic LLM Verifier
BEAVER is the first practical deterministic verifier that maintains sound probability bounds on LLM safety properties using token tries and frontier data structures, finding 2-3x more violations than sampling at 1/10 the compute.