APPS benchmark shows models like GPT-Neo pass roughly 20% of test cases on introductory problems, indicating machine learning is beginning to learn basic coding.
arXiv preprint arXiv:1911.04942 , year=
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Post-generation grammar and schema filtering on top of confidence scoring raises syntactic validity and execution success for Text2Cypher but increases empty outputs and lowers coverage.
STEF is a schema-agnostic evaluation framework that scores SQL generation accuracy from natural language inputs using semantic feature alignment and a composite metric.
CHESS deploys four LLM agents to retrieve information, prune schemas, generate refined SQL candidates, and validate via unit tests, reporting up to 71.10% accuracy on BIRD with 83% fewer calls than leading proprietary baselines.
citing papers explorer
-
Measuring Coding Challenge Competence With APPS
APPS benchmark shows models like GPT-Neo pass roughly 20% of test cases on introductory problems, indicating machine learning is beginning to learn basic coding.
-
Extending Confidence-Based Text2Cypher with Grammar and Schema Aware Filtering
Post-generation grammar and schema filtering on top of confidence scoring raises syntactic validity and execution success for Text2Cypher but increases empty outputs and lowers coverage.
-
Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems
STEF is a schema-agnostic evaluation framework that scores SQL generation accuracy from natural language inputs using semantic feature alignment and a composite metric.
-
CHESS: Contextual Harnessing for Efficient SQL Synthesis
CHESS deploys four LLM agents to retrieve information, prune schemas, generate refined SQL candidates, and validate via unit tests, reporting up to 71.10% accuracy on BIRD with 83% fewer calls than leading proprietary baselines.