Introduces a unified framework integrating uncertainty estimation, calibration, and tool-based abstention for reliable code predictions in language models.
arXiv preprint arXiv:2402.02047 (2024)
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SE 3verdicts
UNVERDICTED 3representative citing papers
A semi-structured thematic synthesis identifies core challenges in FM selection, alignment, prompting, orchestration, testing, deployment, and cross-cutting concerns like observability for production-ready FMware.
Smaller LLMs produce functional but limited Python code with variable quantization effects and quality/maintainability concerns that require validation before use.
citing papers explorer
-
When to Answer and When to Defer: A Decision Framework for Reliable Code Predictions
Introduces a unified framework integrating uncertainty estimation, calibration, and tool-based abstention for reliable code predictions in language models.
-
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
A semi-structured thematic synthesis identifies core challenges in FM selection, alignment, prompting, orchestration, testing, deployment, and cross-cutting concerns like observability for production-ready FMware.
-
Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models
Smaller LLMs produce functional but limited Python code with variable quantization effects and quality/maintainability concerns that require validation before use.