A new multi-domain benchmark shows scientifically fine-tuned LLMs have degraded factual reliability and are less confident yet more assertive than their base models.
Metaphysics and Epis- temology, 3
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Finetuning with Scientific Data Increases Hallucinations: A Multi-domain Factuality Evaluation of LLMs
A new multi-domain benchmark shows scientifically fine-tuned LLMs have degraded factual reliability and are less confident yet more assertive than their base models.