This systematic survey organizes prompt engineering into a taxonomy of 58 LLM techniques and 40 others, supplies a shared vocabulary, and offers guidelines for state-of-the-art models.
Evaluation metrics in the era of gpt-4: Reliably evaluating large language models on sequence to sequence tasks.arXiv preprint arXiv:2310.13800
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LLMs generate valid solutions to over 70% of AI research problems from parametric memory alone but rediscover the exact published approach less than 19% of the time, with performance limited by cross-domain analogical transfer.
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.
citing papers explorer
-
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques
This systematic survey organizes prompt engineering into a taxonomy of 58 LLM techniques and 40 others, supplies a shared vocabulary, and offers guidelines for state-of-the-art models.
-
AInstein: Can LLMs Solve Research Problems From Parametric Memory Alone?
LLMs generate valid solutions to over 70% of AI research problems from parametric memory alone but rediscover the exact published approach less than 19% of the time, with performance limited by cross-domain analogical transfer.
-
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
A survey that organizes LLMs-as-judges research into functionality, methodology, applications, meta-evaluation, and limitations.