DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.
Llms will always hallucinate, and we need to live with this,
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
MOSAIC generates executable scientific code without I/O test cases by combining student-teacher distillation with a consolidated context window to reduce hallucinations across subproblems.
Hallucinations are the space-optimal behavior for limited-capacity models performing membership testing on sparse facts, as shown by a rate-distortion theorem that equates optimal memory use to minimum KL divergence between fact and non-fact score distributions.
MOSAIC is a training-free multi-agent LLM framework with rationale, coding, reflection, and debugging agents plus a consolidated context window that outperforms prior methods on scientific coding benchmarks.
Hallucinations are inevitable on an infinite set of inputs but can be made statistically negligible with sufficient training data quality and quantity.
LLMs show accuracy drops of 0.3% to 5.9% on GSM8K math problems when culturally adapted to six countries while keeping math operations identical, with statistical significance confirmed by McNemar tests.
citing papers explorer
-
Estimating the Black-box LLM Uncertainty with Distribution-Aligned Adversarial Distillation
DisAAD trains a 1%-sized proxy model via adversarial distillation to quantify uncertainty in black-box LLMs by aligning with their output distributions.
-
No Test Cases, No Problem: Distillation-Driven Code Generation for Scientific Workflows
MOSAIC generates executable scientific code without I/O test cases by combining student-teacher distillation with a consolidated context window to reduce hallucinations across subproblems.
-
Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing
Hallucinations are the space-optimal behavior for limited-capacity models performing membership testing on sparse facts, as shown by a rate-distortion theorem that equates optimal memory use to minimum KL divergence between fact and non-fact score distributions.
-
MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding
MOSAIC is a training-free multi-agent LLM framework with rationale, coding, reflection, and debugging agents plus a consolidated context window that outperforms prior methods on scientific coding benchmarks.
-
Hallucinations are inevitable but can be made statistically negligible
Hallucinations are inevitable on an infinite set of inputs but can be made statistically negligible with sufficient training data quality and quantity.
-
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?
LLMs show accuracy drops of 0.3% to 5.9% on GSM8K math problems when culturally adapted to six countries while keeping math operations identical, with statistical significance confirmed by McNemar tests.