PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.
Which prompting technique should I use? An empirical investigation of prompting tech- niques for software engineering tasks,
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
An AI-native TDD framework operationalizes classical TDD principles as prompt-level and workflow-level governance mechanisms in a layered multi-agent architecture to improve stability and reproducibility of LLM code generation.
citing papers explorer
-
PEEM: Prompt Engineering Evaluation Metrics for Interpretable Joint Evaluation of Prompts and Responses
PEEM is a multi-criteria LLM-based evaluator for prompts and responses that aligns with standard accuracy while enabling zero-shot prompt optimization via feedback.
-
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
-
TDD Governance for Multi-Agent Code Generation via Prompt Engineering
An AI-native TDD framework operationalizes classical TDD principles as prompt-level and workflow-level governance mechanisms in a layered multi-agent architecture to improve stability and reproducibility of LLM code generation.