pith. sign in

arxiv: 2606.08158 · v1 · pith:MKXOCB72new · submitted 2026-06-06 · 💻 cs.CL · cs.AI

Constrained Paraphrase Consistency for LLM Hallucination Detection

Pith reviewed 2026-06-27 19:55 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords hallucination detectionconstrained optimizationparaphrase consistencyLLM factualityLagrange multipliersfactuality benchmarks
0
0 comments X

The pith

Training hallucination detectors with paraphrase-consistency constraints outperforms standard methods on factuality benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes training hallucination detectors by adding constraints that force the model to produce consistent outputs on semantically equivalent paraphrases of a claim. It frames this as a constrained optimization task that augments the usual cross-entropy loss with bounds on divergence between paraphrased views and ties those views to the original ground-truth labels. The constraints are enforced through gradient descent-ascent on the model weights plus a small number of per-view Lagrange multipliers. With DeBERTa and Flan-T5 backbones the resulting detector beats FactCG, MiniCheck, and AlignScore on standard factuality benchmarks while adding no inference cost. A reader would care because the approach improves detection without requiring new data synthesis or annotations.

Core claim

The Consistency-Constrained Hallucination Detector (CCHD) formulates training as a constrained optimization problem whose standard cross-entropy loss on document-claim pairs is complemented by paraphrase-consistency constraints that bound divergence across paraphrased views and label-preservation constraints that tie paraphrases to ground truth; the problem is solved by gradient descent-ascent over model parameters and per-view Lagrange multipliers, adding only scalar dual variables and no inference-time overhead.

What carries the argument

Constrained optimization solved by gradient descent-ascent on model parameters and per-view Lagrange multipliers that enforce paraphrase-consistency and label-preservation constraints.

If this is right

  • Detectors can be improved without synthesizing additional training data or new human annotations.
  • Only a few scalar dual variables are added, preserving training and inference efficiency.
  • The same constrained formulation applies to different backbone models such as DeBERTa and Flan-T5.
  • The approach yields consistent gains over FactCG, MiniCheck, and AlignScore on standard factuality benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same constrained-optimization pattern could be tested on other tasks that rely on semantic equivalence, such as textual entailment or summarization evaluation.
  • If the Lagrange-multiplier method stabilizes well, it might replace some forms of data-augmentation pipelines in detector training.
  • Extending the per-view multipliers to additional consistency signals, such as entailment or contradiction relations, is a direct next experiment.

Load-bearing premise

Paraphrase-consistency constraints and label-preservation constraints can be effectively enforced via gradient descent-ascent on per-view Lagrange multipliers without introducing optimization instability or unintended bias in the learned detector.

What would settle it

A controlled replication on the same benchmarks in which the constrained model fails to outperform the unconstrained baseline or the listed strong baselines would falsify the superiority claim.

read the original abstract

Large language models (LLMs) can generate factually inconsistent claims, motivating accurate and scalable hallucination detectors. Prior work largely enlarges training sets via synthesis or new annotations, introducing increasing cost and potential bias while underusing the consistency implied by semantically equivalent paraphrases. We propose Consistency-Constrained Hallucination Detector (CCHD), which formulates training as a constrained optimization problem. The standard cross-entropy on original document-claim pairs is complemented by (i) paraphrase-consistency constraints bounding divergence across paraphrased views, and (ii) label-preservation constraints tying paraphrases to ground truth. We solve the problem by gradient descent-ascent over model parameters and per-view Lagrange multipliers, adding only a few scalar dual variables and no inference-time overhead. With DeBERTa and Flan-T5 backbones, CCHD consistently outperforms strong baselines (FactCG, MiniCheck, and AlignScore) on standard factuality benchmarks, demonstrating its superiority on hallucination detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes Consistency-Constrained Hallucination Detector (CCHD), which augments cross-entropy loss on document-claim pairs with two sets of constraints—paraphrase-consistency bounds on divergence across paraphrased views and label-preservation constraints linking paraphrases to ground truth—solved via gradient descent-ascent on model parameters plus per-view Lagrange multipliers. It reports that DeBERTa and Flan-T5 backbones trained this way outperform FactCG, MiniCheck, and AlignScore on standard factuality benchmarks.

Significance. If the constrained optimization is verifiably stable and the constraints are shown to be active, the approach would offer a low-overhead way to exploit paraphrase consistency for hallucination detection, avoiding the cost and bias of large-scale synthetic data augmentation while adding only scalar dual variables at training time.

major comments (2)
  1. [Abstract] Abstract: the central claim of consistent outperformance is attributed to the paraphrase-consistency and label-preservation constraints, yet the abstract supplies no constraint-violation statistics, dual-variable trajectories, or post-training divergence measurements. Without such diagnostics it is impossible to confirm that the reported gains arise from active constraint enforcement rather than from other training details or baseline differences.
  2. [Abstract] Abstract (method description): the use of gradient descent-ascent on per-view Lagrange multipliers for the min-max problem is presented without discussion of convergence behavior or safeguards against oscillation. Standard analyses of such saddle-point problems show that dual variables can remain near zero or fail to enforce bounds, which would render the superiority over FactCG, MiniCheck, and AlignScore unexplained by the proposed mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract and method description. We will revise the manuscript to include the requested diagnostics and discussion to better substantiate the contribution of the constraints.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim of consistent outperformance is attributed to the paraphrase-consistency and label-preservation constraints, yet the abstract supplies no constraint-violation statistics, dual-variable trajectories, or post-training divergence measurements. Without such diagnostics it is impossible to confirm that the reported gains arise from active constraint enforcement rather than from other training details or baseline differences.

    Authors: We agree that providing these diagnostics would help confirm the mechanism. In the revised manuscript, we will add an analysis section with constraint violation statistics, dual variable trajectories, and divergence measurements across paraphrases, showing that the constraints are enforced during training. We will also update the abstract to reference these results. revision: yes

  2. Referee: [Abstract] Abstract (method description): the use of gradient descent-ascent on per-view Lagrange multipliers for the min-max problem is presented without discussion of convergence behavior or safeguards against oscillation. Standard analyses of such saddle-point problems show that dual variables can remain near zero or fail to enforce bounds, which would render the superiority over FactCG, MiniCheck, and AlignScore unexplained by the proposed mechanism.

    Authors: We acknowledge the importance of addressing convergence. The revised version will include a discussion of the observed training dynamics, including plots of dual variable evolution and any stabilization techniques employed (e.g., separate learning rates for primal and dual variables). While a full theoretical analysis is beyond the current scope, the empirical evidence will demonstrate active constraint enforcement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on independent benchmark comparisons

full rationale

The paper presents CCHD as a constrained optimization that augments standard cross-entropy loss with paraphrase-consistency and label-preservation constraints, solved via gradient descent-ascent on model parameters plus per-view Lagrange multipliers. All reported results are empirical outperformance numbers against external baselines (FactCG, MiniCheck, AlignScore) on standard factuality benchmarks. No derivation chain reduces a claimed prediction to its own fitted inputs by construction, no self-citation is load-bearing for the central claim, and no ansatz or uniqueness theorem is smuggled in. The method is self-contained against external benchmarks, yielding a normal non-finding.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that paraphrases preserve ground-truth labels and that enforcing consistency across them improves detection performance.

free parameters (1)
  • per-view Lagrange multipliers
    Scalar dual variables introduced for each paraphrase view to enforce the consistency constraints during gradient descent-ascent.
axioms (1)
  • domain assumption Semantically equivalent paraphrases of a claim must receive identical hallucination labels
    Invoked to justify the label-preservation constraints that tie paraphrases to ground truth.

pith-pipeline@v0.9.1-grok · 5707 in / 1201 out tokens · 22869 ms · 2026-06-27T19:55:08.813891+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 11 canonical work pages · 5 internal anchors

  1. [1]

    Constrained Paraphrase Consistency for LLM Hallucination Detection

    INTRODUCTION Large language models (LLMs) generate fluent text for tasks such as abstractive summarization [1]. However, they are prone tofactual hallucinations, where outputs appear gram- matical but contain unsupported or incorrect information [2]. Prior work [3, 4] reports that up to 30% of generated sum- maries contain factual inconsistencies, motivat...

  2. [2]

    2 shows the key of CCHD framework: (i) a constrained training objective that enforces agreement across semanti- cally equivalent paraphrase views (Sec

    METHOD Fig. 2 shows the key of CCHD framework: (i) a constrained training objective that enforces agreement across semanti- cally equivalent paraphrase views (Sec. 2.1); and (ii) a La- grangian optimization scheme to solve the constrained prob- lem (Sec. 2.2). Finally, we introduce a practical instantiation of view generation via back-translation in Sec. ...

  3. [3]

    pred”, Eq. (2)) that penalizes di- vergence between output distributions, and anembedding- levelconstraint (“embd

    EXPERIMENTS 3.1. Experiment Settings Model setup.We instantiate the detector withDeBERTa[18] (default) orFlan-T5[19] without architectural changes, de- noted with suffix -DBT and -FT5, respectively. A single pivot language (French by default) is used for back-translation to construct one paraphrase view per example. Analyses of backbone choice, pivot lang...

  4. [4]

    Instantiated via back-translation, CCHD delivers significant F1 gains across 11 factuality datasets without extra inference time

    CONCLUSION We presented theConsistency-Constrained Hallucination Detector, which enfores paraphrase consistency and label- preservation constraints. Instantiated via back-translation, CCHD delivers significant F1 gains across 11 factuality datasets without extra inference time. To further enhance the generalization and reliability of CCHD, we will explore...

  5. [5]

    Ab- stractive text summarization: State of the art, chal- lenges, and improvements,

    Hassan Shakil, Ahmad Farooq, and Jugal Kalita, “Ab- stractive text summarization: State of the art, chal- lenges, and improvements,”Neurocomputing, vol. 603, pp. 128255, 2024

  6. [6]

    On faithfulness and factual- ity in abstractive summarization,

    Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald, “On faithfulness and factual- ity in abstractive summarization,”arXiv preprint arXiv:2005.00661, 2020

  7. [7]

    Evaluating the factual consis- tency of abstractive text summarization,

    Wojciech Kry ´sci´nski, Bryan McCann, Caiming Xiong, and Richard Socher, “Evaluating the factual consis- tency of abstractive text summarization,”arXiv preprint arXiv:1910.12840, 2019

  8. [8]

    Summedits: Measuring llm ability at factual reasoning through the lens of summa- rization,

    Philippe Laban, Wojciech Kry ´sci´nski, Divyansh Agar- wal, Alexander Richard Fabbri, Caiming Xiong, Shafiq Joty, and Chien-Sheng Wu, “Summedits: Measuring llm ability at factual reasoning through the lens of summa- rization,” inProceedings of the 2023 conference on em- pirical methods in natural language processing, 2023, pp. 9662–9676

  9. [9]

    SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

    Potsawee Manakul, Adian Liusie, and Mark JF Gales, “Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models,”arXiv preprint arXiv:2303.08896, 2023

  10. [10]

    Selfcheckagent: Zero-resource hallucination detection in generative large language models,

    Diyana Muhammed, Gollam Rabby, and S ¨oren Auer, “Selfcheckagent: Zero-resource hallucination detection in generative large language models,”arXiv preprint arXiv:2502.01812, 2025

  11. [11]

    Hallucination detection in large language mod- els with metamorphic relations,

    Borui Yang, Md Afif Al Mamun, Jie M Zhang, and Gias Uddin, “Hallucination detection in large language mod- els with metamorphic relations,”Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 425– 445, 2025

  12. [12]

    Summac: Re-visiting nli-based models for inconsistency detection in summarization,

    Philippe Laban, Tobias Schnabel, Paul N Bennett, and Marti A Hearst, “Summac: Re-visiting nli-based models for inconsistency detection in summarization,” Transactions of the Association for Computational Lin- guistics, vol. 10, pp. 163–177, 2022

  13. [13]

    Minicheck: Efficient fact-checking of llms on ground- ing documents,

    Liyan Tang, Philippe Laban, and Greg Durrett, “Minicheck: Efficient fact-checking of llms on ground- ing documents,”arXiv preprint arXiv:2404.10774, 2024

  14. [14]

    Factcg: Enhancing fact checkers with graph-based multi-hop data,

    Deren Lei, Yaxi Li, Siyao Li, Mengya Hu, Rui Xu, Ken Archer, Mingyu Wang, Emily Ching, and Alex Deng, “Factcg: Enhancing fact checkers with graph-based multi-hop data,”arXiv preprint arXiv:2501.17144, 2025

  15. [15]

    On gradient descent ascent for nonconvex-concave minimax prob- lems,

    Tianyi Lin, Chi Jin, and Michael Jordan, “On gradient descent ascent for nonconvex-concave minimax prob- lems,” inInternational conference on machine learning. PMLR, 2020, pp. 6083–6093

  16. [16]

    Self-learn to explain siamese networks robustly,

    Chao Chen, Yifan Shen, Guixiang Ma, Xiangnan Kong, Srinivas Rangarajan, Xi Zhang, and Sihong Xie, “Self-learn to explain siamese networks robustly,” in 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 2021, pp. 1018–1023

  17. [17]

    Alignscore: Evaluating factual consistency with a unified alignment function,

    Yuheng Zha, Yichi Yang, Ruichen Li, and Zhiting Hu, “Alignscore: Evaluating factual consistency with a unified alignment function,”arXiv preprint arXiv:2305.16739, 2023

  18. [18]

    Jef- freys divergence-based regularization of neural network output distribution applied to speaker recognition,

    Pierre-Michel Bousquet and Mickael Rouvier, “Jef- freys divergence-based regularization of neural network output distribution applied to speaker recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

  19. [19]

    Towards effective paraphrasing for information dis- guise,

    Anmol Agarwal, Shrey Gupta, Vamshi Bonagiri, Manas Gaur, Joseph Reagle, and Ponnurangam Kumaraguru, “Towards effective paraphrasing for information dis- guise,” inEuropean Conference on Information Re- trieval. Springer, 2023, pp. 331–340

  20. [20]

    Pag- llm: Paraphrase and aggregate with large language mod- els for minimizing intent classification errors,

    Vikas Yadav, Zheng Tang, and Vijay Srinivasan, “Pag- llm: Paraphrase and aggregate with large language mod- els for minimizing intent classification errors,” inPro- ceedings of the 47th international ACM SIGIR confer- ence on research and development in information re- trieval, 2024, pp. 2569–2573

  21. [21]

    Neural machine translation: A re- view,

    Felix Stahlberg, “Neural machine translation: A re- view,”Journal of Artificial Intelligence Research, vol. 69, pp. 343–418, 2020

  22. [22]

    DeBERTa: Decoding-enhanced BERT with Disentangled Attention

    Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen, “Deberta: Decoding-enhanced bert with disentangled attention,”arXiv preprint arXiv:2006.03654, 2020

  23. [23]

    Scaling Instruction-Finetuned Language Models

    Hyung Won Chung, Le Hou, Shayne Longpre, et al., “Scaling instruction-finetuned language models,” arXiv:2210.11416, 2022

  24. [24]

    Hallucination of Multimodal Large Language Models: A Survey

    Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, and Mike Zheng Shou, “Hallucination of multimodal large language models: A survey,”arXiv preprint arXiv:2404.18930, 2024