Kyle Mahowald, Anna A

Are emergent abilities in large language models just in-context learning? CoRR, abs/2309 · 2024 · arXiv 2309.01809

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models

cs.CL · 2024-04-29 · unverdicted · novelty 7.0

Holmes is a probing benchmark compiling over 200 datasets from 270 studies to evaluate linguistic competence across syntax, morphology, semantics, reasoning, and discourse in more than 50 language models.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

citing papers explorer

Showing 2 of 2 citing papers.

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models cs.CL · 2024-04-29 · unverdicted · none · ref 6
Holmes is a probing benchmark compiling over 200 datasets from 270 studies to evaluate linguistic competence across syntax, morphology, semantics, reasoning, and discourse in more than 50 language models.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 187
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

Kyle Mahowald, Anna A

fields

years

verdicts

representative citing papers

citing papers explorer