Constrained Paraphrase Consistency for LLM Hallucination Detection
Pith reviewed 2026-06-27 19:55 UTC · model grok-4.3
The pith
Training hallucination detectors with paraphrase-consistency constraints outperforms standard methods on factuality benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Consistency-Constrained Hallucination Detector (CCHD) formulates training as a constrained optimization problem whose standard cross-entropy loss on document-claim pairs is complemented by paraphrase-consistency constraints that bound divergence across paraphrased views and label-preservation constraints that tie paraphrases to ground truth; the problem is solved by gradient descent-ascent over model parameters and per-view Lagrange multipliers, adding only scalar dual variables and no inference-time overhead.
What carries the argument
Constrained optimization solved by gradient descent-ascent on model parameters and per-view Lagrange multipliers that enforce paraphrase-consistency and label-preservation constraints.
If this is right
- Detectors can be improved without synthesizing additional training data or new human annotations.
- Only a few scalar dual variables are added, preserving training and inference efficiency.
- The same constrained formulation applies to different backbone models such as DeBERTa and Flan-T5.
- The approach yields consistent gains over FactCG, MiniCheck, and AlignScore on standard factuality benchmarks.
Where Pith is reading between the lines
- The same constrained-optimization pattern could be tested on other tasks that rely on semantic equivalence, such as textual entailment or summarization evaluation.
- If the Lagrange-multiplier method stabilizes well, it might replace some forms of data-augmentation pipelines in detector training.
- Extending the per-view multipliers to additional consistency signals, such as entailment or contradiction relations, is a direct next experiment.
Load-bearing premise
Paraphrase-consistency constraints and label-preservation constraints can be effectively enforced via gradient descent-ascent on per-view Lagrange multipliers without introducing optimization instability or unintended bias in the learned detector.
What would settle it
A controlled replication on the same benchmarks in which the constrained model fails to outperform the unconstrained baseline or the listed strong baselines would falsify the superiority claim.
read the original abstract
Large language models (LLMs) can generate factually inconsistent claims, motivating accurate and scalable hallucination detectors. Prior work largely enlarges training sets via synthesis or new annotations, introducing increasing cost and potential bias while underusing the consistency implied by semantically equivalent paraphrases. We propose Consistency-Constrained Hallucination Detector (CCHD), which formulates training as a constrained optimization problem. The standard cross-entropy on original document-claim pairs is complemented by (i) paraphrase-consistency constraints bounding divergence across paraphrased views, and (ii) label-preservation constraints tying paraphrases to ground truth. We solve the problem by gradient descent-ascent over model parameters and per-view Lagrange multipliers, adding only a few scalar dual variables and no inference-time overhead. With DeBERTa and Flan-T5 backbones, CCHD consistently outperforms strong baselines (FactCG, MiniCheck, and AlignScore) on standard factuality benchmarks, demonstrating its superiority on hallucination detection.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Consistency-Constrained Hallucination Detector (CCHD), which augments cross-entropy loss on document-claim pairs with two sets of constraints—paraphrase-consistency bounds on divergence across paraphrased views and label-preservation constraints linking paraphrases to ground truth—solved via gradient descent-ascent on model parameters plus per-view Lagrange multipliers. It reports that DeBERTa and Flan-T5 backbones trained this way outperform FactCG, MiniCheck, and AlignScore on standard factuality benchmarks.
Significance. If the constrained optimization is verifiably stable and the constraints are shown to be active, the approach would offer a low-overhead way to exploit paraphrase consistency for hallucination detection, avoiding the cost and bias of large-scale synthetic data augmentation while adding only scalar dual variables at training time.
major comments (2)
- [Abstract] Abstract: the central claim of consistent outperformance is attributed to the paraphrase-consistency and label-preservation constraints, yet the abstract supplies no constraint-violation statistics, dual-variable trajectories, or post-training divergence measurements. Without such diagnostics it is impossible to confirm that the reported gains arise from active constraint enforcement rather than from other training details or baseline differences.
- [Abstract] Abstract (method description): the use of gradient descent-ascent on per-view Lagrange multipliers for the min-max problem is presented without discussion of convergence behavior or safeguards against oscillation. Standard analyses of such saddle-point problems show that dual variables can remain near zero or fail to enforce bounds, which would render the superiority over FactCG, MiniCheck, and AlignScore unexplained by the proposed mechanism.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the abstract and method description. We will revise the manuscript to include the requested diagnostics and discussion to better substantiate the contribution of the constraints.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim of consistent outperformance is attributed to the paraphrase-consistency and label-preservation constraints, yet the abstract supplies no constraint-violation statistics, dual-variable trajectories, or post-training divergence measurements. Without such diagnostics it is impossible to confirm that the reported gains arise from active constraint enforcement rather than from other training details or baseline differences.
Authors: We agree that providing these diagnostics would help confirm the mechanism. In the revised manuscript, we will add an analysis section with constraint violation statistics, dual variable trajectories, and divergence measurements across paraphrases, showing that the constraints are enforced during training. We will also update the abstract to reference these results. revision: yes
-
Referee: [Abstract] Abstract (method description): the use of gradient descent-ascent on per-view Lagrange multipliers for the min-max problem is presented without discussion of convergence behavior or safeguards against oscillation. Standard analyses of such saddle-point problems show that dual variables can remain near zero or fail to enforce bounds, which would render the superiority over FactCG, MiniCheck, and AlignScore unexplained by the proposed mechanism.
Authors: We acknowledge the importance of addressing convergence. The revised version will include a discussion of the observed training dynamics, including plots of dual variable evolution and any stabilization techniques employed (e.g., separate learning rates for primal and dual variables). While a full theoretical analysis is beyond the current scope, the empirical evidence will demonstrate active constraint enforcement. revision: yes
Circularity Check
No significant circularity; empirical claims rest on independent benchmark comparisons
full rationale
The paper presents CCHD as a constrained optimization that augments standard cross-entropy loss with paraphrase-consistency and label-preservation constraints, solved via gradient descent-ascent on model parameters plus per-view Lagrange multipliers. All reported results are empirical outperformance numbers against external baselines (FactCG, MiniCheck, AlignScore) on standard factuality benchmarks. No derivation chain reduces a claimed prediction to its own fitted inputs by construction, no self-citation is load-bearing for the central claim, and no ansatz or uniqueness theorem is smuggled in. The method is self-contained against external benchmarks, yielding a normal non-finding.
Axiom & Free-Parameter Ledger
free parameters (1)
- per-view Lagrange multipliers
axioms (1)
- domain assumption Semantically equivalent paraphrases of a claim must receive identical hallucination labels
Reference graph
Works this paper leans on
-
[1]
Constrained Paraphrase Consistency for LLM Hallucination Detection
INTRODUCTION Large language models (LLMs) generate fluent text for tasks such as abstractive summarization [1]. However, they are prone tofactual hallucinations, where outputs appear gram- matical but contain unsupported or incorrect information [2]. Prior work [3, 4] reports that up to 30% of generated sum- maries contain factual inconsistencies, motivat...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
2 shows the key of CCHD framework: (i) a constrained training objective that enforces agreement across semanti- cally equivalent paraphrase views (Sec
METHOD Fig. 2 shows the key of CCHD framework: (i) a constrained training objective that enforces agreement across semanti- cally equivalent paraphrase views (Sec. 2.1); and (ii) a La- grangian optimization scheme to solve the constrained prob- lem (Sec. 2.2). Finally, we introduce a practical instantiation of view generation via back-translation in Sec. ...
-
[3]
pred”, Eq. (2)) that penalizes di- vergence between output distributions, and anembedding- levelconstraint (“embd
EXPERIMENTS 3.1. Experiment Settings Model setup.We instantiate the detector withDeBERTa[18] (default) orFlan-T5[19] without architectural changes, de- noted with suffix -DBT and -FT5, respectively. A single pivot language (French by default) is used for back-translation to construct one paraphrase view per example. Analyses of backbone choice, pivot lang...
-
[4]
Instantiated via back-translation, CCHD delivers significant F1 gains across 11 factuality datasets without extra inference time
CONCLUSION We presented theConsistency-Constrained Hallucination Detector, which enfores paraphrase consistency and label- preservation constraints. Instantiated via back-translation, CCHD delivers significant F1 gains across 11 factuality datasets without extra inference time. To further enhance the generalization and reliability of CCHD, we will explore...
-
[5]
Ab- stractive text summarization: State of the art, chal- lenges, and improvements,
Hassan Shakil, Ahmad Farooq, and Jugal Kalita, “Ab- stractive text summarization: State of the art, chal- lenges, and improvements,”Neurocomputing, vol. 603, pp. 128255, 2024
2024
-
[6]
On faithfulness and factual- ity in abstractive summarization,
Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald, “On faithfulness and factual- ity in abstractive summarization,”arXiv preprint arXiv:2005.00661, 2020
-
[7]
Evaluating the factual consis- tency of abstractive text summarization,
Wojciech Kry ´sci´nski, Bryan McCann, Caiming Xiong, and Richard Socher, “Evaluating the factual consis- tency of abstractive text summarization,”arXiv preprint arXiv:1910.12840, 2019
-
[8]
Summedits: Measuring llm ability at factual reasoning through the lens of summa- rization,
Philippe Laban, Wojciech Kry ´sci´nski, Divyansh Agar- wal, Alexander Richard Fabbri, Caiming Xiong, Shafiq Joty, and Chien-Sheng Wu, “Summedits: Measuring llm ability at factual reasoning through the lens of summa- rization,” inProceedings of the 2023 conference on em- pirical methods in natural language processing, 2023, pp. 9662–9676
2023
-
[9]
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul, Adian Liusie, and Mark JF Gales, “Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models,”arXiv preprint arXiv:2303.08896, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[10]
Selfcheckagent: Zero-resource hallucination detection in generative large language models,
Diyana Muhammed, Gollam Rabby, and S ¨oren Auer, “Selfcheckagent: Zero-resource hallucination detection in generative large language models,”arXiv preprint arXiv:2502.01812, 2025
-
[11]
Hallucination detection in large language mod- els with metamorphic relations,
Borui Yang, Md Afif Al Mamun, Jie M Zhang, and Gias Uddin, “Hallucination detection in large language mod- els with metamorphic relations,”Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 425– 445, 2025
2025
-
[12]
Summac: Re-visiting nli-based models for inconsistency detection in summarization,
Philippe Laban, Tobias Schnabel, Paul N Bennett, and Marti A Hearst, “Summac: Re-visiting nli-based models for inconsistency detection in summarization,” Transactions of the Association for Computational Lin- guistics, vol. 10, pp. 163–177, 2022
2022
-
[13]
Minicheck: Efficient fact-checking of llms on ground- ing documents,
Liyan Tang, Philippe Laban, and Greg Durrett, “Minicheck: Efficient fact-checking of llms on ground- ing documents,”arXiv preprint arXiv:2404.10774, 2024
-
[14]
Factcg: Enhancing fact checkers with graph-based multi-hop data,
Deren Lei, Yaxi Li, Siyao Li, Mengya Hu, Rui Xu, Ken Archer, Mingyu Wang, Emily Ching, and Alex Deng, “Factcg: Enhancing fact checkers with graph-based multi-hop data,”arXiv preprint arXiv:2501.17144, 2025
-
[15]
On gradient descent ascent for nonconvex-concave minimax prob- lems,
Tianyi Lin, Chi Jin, and Michael Jordan, “On gradient descent ascent for nonconvex-concave minimax prob- lems,” inInternational conference on machine learning. PMLR, 2020, pp. 6083–6093
2020
-
[16]
Self-learn to explain siamese networks robustly,
Chao Chen, Yifan Shen, Guixiang Ma, Xiangnan Kong, Srinivas Rangarajan, Xi Zhang, and Sihong Xie, “Self-learn to explain siamese networks robustly,” in 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 2021, pp. 1018–1023
2021
-
[17]
Alignscore: Evaluating factual consistency with a unified alignment function,
Yuheng Zha, Yichi Yang, Ruichen Li, and Zhiting Hu, “Alignscore: Evaluating factual consistency with a unified alignment function,”arXiv preprint arXiv:2305.16739, 2023
-
[18]
Jef- freys divergence-based regularization of neural network output distribution applied to speaker recognition,
Pierre-Michel Bousquet and Mickael Rouvier, “Jef- freys divergence-based regularization of neural network output distribution applied to speaker recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5
2023
-
[19]
Towards effective paraphrasing for information dis- guise,
Anmol Agarwal, Shrey Gupta, Vamshi Bonagiri, Manas Gaur, Joseph Reagle, and Ponnurangam Kumaraguru, “Towards effective paraphrasing for information dis- guise,” inEuropean Conference on Information Re- trieval. Springer, 2023, pp. 331–340
2023
-
[20]
Pag- llm: Paraphrase and aggregate with large language mod- els for minimizing intent classification errors,
Vikas Yadav, Zheng Tang, and Vijay Srinivasan, “Pag- llm: Paraphrase and aggregate with large language mod- els for minimizing intent classification errors,” inPro- ceedings of the 47th international ACM SIGIR confer- ence on research and development in information re- trieval, 2024, pp. 2569–2573
2024
-
[21]
Neural machine translation: A re- view,
Felix Stahlberg, “Neural machine translation: A re- view,”Journal of Artificial Intelligence Research, vol. 69, pp. 343–418, 2020
2020
-
[22]
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen, “Deberta: Decoding-enhanced bert with disentangled attention,”arXiv preprint arXiv:2006.03654, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[23]
Scaling Instruction-Finetuned Language Models
Hyung Won Chung, Le Hou, Shayne Longpre, et al., “Scaling instruction-finetuned language models,” arXiv:2210.11416, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[24]
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, and Mike Zheng Shou, “Hallucination of multimodal large language models: A survey,”arXiv preprint arXiv:2404.18930, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.