Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping

Hanze Li; Jinhao You; Kai Tang; Shuangyang Xie; Xiande Huang; Yichen Guo

arxiv: 2606.00819 · v1 · pith:MKROMYYLnew · submitted 2026-05-30 · 💻 cs.AI

Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping

Hanze Li , Jinhao You , Yichen Guo , Kai Tang , Shuangyang Xie , Xiande Huang This is my paper

Pith reviewed 2026-06-28 18:29 UTC · model grok-4.3

classification 💻 cs.AI

keywords hallucinationslarge language modelsdecoder layerslayer skippinggradient descentdriftancedecoding framework

0 comments

The pith

Decoder layer skipping via gradient driftance reduces hallucinations in LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims hallucinations in LLMs tend to originate in deeper decoder layers. It introduces DeLask, which treats the transformer forward pass as gradient descent steps and measures driftance as the cosine similarity between gradients from consecutive layers to flag reversals. Problematic layers then contribute only a partial aggregate of their hidden states rather than full output. Experiments across models and benchmarks show reduced hallucinations and improved reliability with this lightweight change to decoding. Readers would care because the method requires no retraining and applies at inference time to existing models.

Core claim

The forward computation of an L-layer Transformer is conditionally equivalent to L steps of gradient descent; driftance, defined as the cosine similarity between gradients from consecutive decoder steps, identifies layers where the descent direction reverses and which therefore tend to produce hallucinations. DeLask partially aggregates the hidden states of such layers with preceding ones instead of discarding them, thereby preserving consistency while suppressing erroneous signals.

What carries the argument

Driftance value computed from cosine similarity of gradients derived from consecutive decoder steps, used to select layers for partial hidden-state aggregation.

If this is right

Hallucinations are mitigated across diverse LLMs and benchmarks.
Overall output reliability is enhanced without model changes.
The method supplies a lightweight decoding framework applicable at inference.
The framework generalizes across different model scales and tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The partial-aggregation step could be tuned per layer depth for further gains on specific tasks.
DeLask might combine with retrieval or post-processing methods to address remaining error sources.
Measuring driftance could serve as a diagnostic tool for locating other failure modes beyond hallucinations.
The approach may extend to non-language transformer architectures that share the same layer-wise computation structure.

Load-bearing premise

The forward computation of a transformer decoder is conditionally equivalent to steps of gradient descent so that reversal of descent direction marks hallucination-prone layers.

What would settle it

Applying DeLask to standard hallucination benchmarks and observing no reduction in hallucination rates relative to the unmodified baseline model would falsify the central effectiveness claim.

read the original abstract

Large Language Models (LLMs) have achieved strong performance across diverse natural language tasks, yet their outputs often suffer from hallucinations -- content that is misaligned with factual information. In this work, we conduct a comprehensive layer-wise analysis of the decoding process and reveal that hallucinations tend to originate from deeper decoder layers. To address this issue, we introduce \textbf{DeLask} (\textbf{De}coder \textbf{La}yer \textbf{Sk}ipping), a novel decoding framework that dynamically skips layers prone to producing hallucinations. DeLask leverages the theoretical insight that the forward computation of an $L$-layer Transformer is conditionally equivalent to $L$ steps of gradient descent. We define a \emph{driftance value} by computing the cosine similarity between gradients derived from consecutive decoder steps, identifying problematic layers when the descent direction reverses. Rather than discarding such layers entirely, DeLask partially aggregates their hidden states with preceding layers, thereby preserving consistency while suppressing erroneous signals. Extensive experiments across diverse LLMs and benchmarks demonstrate that DeLask consistently mitigates hallucinations and enhances overall reliability, providing a lightweight and generalizable decoding framework for improving the robustness of large-scale language models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The method's layer-skipping rule depends on an equivalence between forward passes and gradient descent that the abstract never derives, leaving driftance as an ungrounded heuristic.

read the letter

The paper introduces DeLask, a decoding tweak that skips or partially aggregates deeper decoder layers when a new driftance metric (cosine similarity on consecutive gradients) signals a reversal. They claim this cuts hallucinations because the forward computation of an L-layer transformer is conditionally equivalent to L gradient descent steps.

The new pieces are the framework itself and the partial-aggregation rule instead of outright dropping layers. Targeting deeper layers after a layer-wise analysis is a reasonable starting point, and an inference-only change is the right direction for deployment.

The central problem is that the equivalence is stated without derivation, conditioning details, or proof. Residual streams, attention, and FFN blocks do not obviously map to explicit loss gradients on a shared objective, so the reversal signal has no clear link to hallucinations. Without that link, driftance reduces to an ad-hoc threshold.

The abstract also claims extensive experiments across LLMs and benchmarks but supplies no numbers, error bars, layer-selection rules, or aggregation parameters. That makes it impossible to judge whether the gains are real or consistent.

This is for groups working on inference-time reliability fixes. A reader already experimenting with decoding modifications might extract a usable heuristic, but the theoretical claim needs independent support before the results can be trusted.

Send it to peer review so the authors can supply the missing derivation and the actual data.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DeLask, a decoding framework for LLMs that dynamically skips or aggregates decoder layers to mitigate hallucinations. It performs a layer-wise analysis suggesting hallucinations originate in deeper layers and defines 'driftance' as the cosine similarity between gradients from consecutive decoder steps. This is motivated by the claim that the forward pass through an L-layer Transformer is conditionally equivalent to L steps of gradient descent. Layers where the descent direction reverses are considered problematic, and their hidden states are partially aggregated with those from preceding layers. The paper reports that extensive experiments on various LLMs and benchmarks show consistent improvements in reducing hallucinations.

Significance. If the gradient-descent equivalence can be rigorously justified and the experimental results hold with proper controls, DeLask would represent a lightweight, training-free method to enhance LLM reliability by intervening at the decoding stage. This could be significant for practical deployment of LLMs where hallucinations are a concern, offering a generalizable approach without the need for model retraining or fine-tuning.

major comments (2)

[Abstract] Abstract: The central claim that 'the forward computation of an L-layer Transformer is conditionally equivalent to L steps of gradient descent' is stated without derivation, conditioning details, or proof. This equivalence is load-bearing for the definition of driftance (via cosine similarity of consecutive gradients) and for interpreting reversal as identifying hallucination-prone layers; without it, the skipping/aggregation rule reduces to an ad-hoc heuristic.
[Method] Method section: No explicit description is given of how gradients are obtained for the driftance computation (with respect to which objective or loss), nor of the precise thresholds, aggregation weights, or layer-selection criteria. These parameters are required to assess whether the intervention is reproducible and whether it specifically targets the claimed hallucination mechanism.

minor comments (2)

[Abstract] The abstract asserts 'extensive experiments across diverse LLMs and benchmarks' but supplies no quantitative metrics, error bars, or baseline comparisons, which hinders immediate evaluation of the strength of the empirical claims.
The term 'driftance value' is introduced without relating it to existing similarity measures or providing a formal definition before its use in the layer-identification rule.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript accordingly to strengthen the theoretical justification and improve methodological clarity and reproducibility.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'the forward computation of an L-layer Transformer is conditionally equivalent to L steps of gradient descent' is stated without derivation, conditioning details, or proof. This equivalence is load-bearing for the definition of driftance (via cosine similarity of consecutive gradients) and for interpreting reversal as identifying hallucination-prone layers; without it, the skipping/aggregation rule reduces to an ad-hoc heuristic.

Authors: We agree that the equivalence is presented as a motivating insight without an explicit derivation or conditioning details in the current version. This leaves the driftance definition and layer intervention rule less rigorously grounded than ideal. In revision we will add a dedicated subsection (or appendix) providing the derivation under the relevant assumptions on residual connections and local loss landscapes, along with the precise conditioning required for the equivalence to hold. This will directly support the subsequent definitions and interpretations. revision: yes
Referee: [Method] Method section: No explicit description is given of how gradients are obtained for the driftance computation (with respect to which objective or loss), nor of the precise thresholds, aggregation weights, or layer-selection criteria. These parameters are required to assess whether the intervention is reproducible and whether it specifically targets the claimed hallucination mechanism.

Authors: We acknowledge the omission of these implementation specifics, which are necessary for reproducibility. In the revised Method section we will explicitly state the loss used for gradient computation, the exact threshold and decision rule for detecting reversals via cosine similarity, the aggregation weighting scheme, and the layer-selection logic. We will also include pseudocode to make the full procedure transparent and reproducible. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central derivation does not reduce to its inputs by construction

full rationale

The paper states an equivalence between L-layer forward passes and gradient descent steps as a 'theoretical insight,' then defines driftance from cosine similarity of consecutive gradients and uses it to guide layer skipping. No quoted equations or definitions exhibit self-referential reduction (e.g., a fitted parameter renamed as a prediction, or a result derived solely from a self-citation chain that itself assumes the target claim). The logic proceeds from the stated assumption outward without the prediction equaling the input by construction, satisfying the criteria for a self-contained derivation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Central claim depends on the unverified (within abstract) equivalence between transformer forward passes and gradient descent steps plus a newly introduced driftance metric whose link to hallucinations lacks independent evidence.

axioms (1)

domain assumption The forward computation of an L-layer Transformer is conditionally equivalent to L steps of gradient descent.
Invoked in the abstract as the theoretical insight that enables defining driftance from consecutive decoder steps.

invented entities (1)

driftance value no independent evidence
purpose: Quantify reversal in descent direction via cosine similarity of gradients between consecutive decoder layers to flag hallucination-prone layers.
Newly defined quantity introduced to operationalize layer skipping; no independent falsifiable handle provided in abstract.

pith-pipeline@v0.9.1-grok · 5750 in / 1196 out tokens · 33317 ms · 2026-06-28T18:29:45.952146+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 13 canonical work pages · 9 internal anchors

[1]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Am- jad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

Survey of hallucination in natural language generation,

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung, “Survey of hallucination in natural language generation,” ACM computing surveys, vol. 55, no. 12, pp. 1–38, 2023

2023
[4]

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, and Pengcheng He, “Dola: Decoding by contrasting layers improves factuality in large language models,”arXiv preprint arXiv:2309.03883, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Sources of hallucination by large language models on inference tasks,

Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, and Mark Steedman, “Sources of hallucination by large language models on inference tasks,” arXiv preprint arXiv:2305.14552, 2023

work page arXiv 2023
[6]

Bias and fairness in large language models: A survey,

Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K Ahmed, “Bias and fairness in large language models: A survey,”Computational Linguistics, vol. 50, no. 3, pp. 1097–1179, 2024

2024
[7]

Rag-hat: A hallucination-aware tuning pipeline for llm in retrieval- augmented generation,

Juntong Song, Xingguang Wang, Juno Zhu, Yuanhao Wu, Xuxin Cheng, Randy Zhong, and Cheng Niu, “Rag-hat: A hallucination-aware tuning pipeline for llm in retrieval- augmented generation,” inProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Processing: Industry Track, 2024, pp. 1548–1558

2024
[8]

Two-tiered encoder-based hallucination detection for retrieval-augmented generation in the wild,

Ilana Zimmerman, Jadin Tredup, Ethan Selfridge, and Joseph Bradley, “Two-tiered encoder-based hallucination detection for retrieval-augmented generation in the wild,” inProceed- ings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, 2024, pp. 8–22

2024
[9]

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback,

Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, et al., “Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 13807–13816

2024
[10]

Sled: Self logits evolution decoding for improving factuality in large language models,

Jianyi Zhang, Da-Cheng Juan, Cyrus Rashtchian, Chun-Sung Ferng, Heinrich Jiang, and Yiran Chen, “Sled: Self logits evolution decoding for improving factuality in large language models,”Advances in Neural Information Processing Systems, vol. 37, pp. 5188–5209, 2024

2024
[11]

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Stephanie Lin, Jacob Hilton, and Owain Evans, “Truthfulqa: Measuring how models mimic human falsehoods,”arXiv preprint arXiv:2109.07958, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

Transformers learn to implement preconditioned gradient descent for in-context learning,

Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra, “Transformers learn to implement preconditioned gradient descent for in-context learning,”Advances in Neural Informa- tion Processing Systems, vol. 36, pp. 45614–45650, 2023

2023
[13]

The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit

Huixue Zhou, Hengrui Gu, Xi Liu, Kaixiong Zhou, Mingfu Liang, Yongkang Xiao, Srinivas Govindan, Piyush Chawla, Jiyan Yang, Xiangfei Meng, et al., “The efficiency vs. accuracy trade-off: Optimizing rag-enhanced llm recommender systems using multi-head early exit,”arXiv preprint arXiv:2501.02173, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al., “Train- ing verifiers to solve math word problems,”arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[15]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,”arXiv preprint arXiv:1803.05457, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[16]

Inference-time intervention: Eliciting truthful answers from a language model,

Kenneth Li, Oam Patel, Fernanda Vi ´egas, Hanspeter Pfister, and Martin Wattenberg, “Inference-time intervention: Eliciting truthful answers from a language model,”Advances in Neural Information Processing Systems, vol. 36, pp. 41451–41530, 2023

2023
[17]

Accelerating llm inference with lossless spec- ulative decoding algorithms for heterogeneous vocabularies,

Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchan- sky, Oren Pereg, Gaurav Jain, Roy Schwartz, Moshe Wasserblat, and David Harel, “Accelerating llm inference with lossless spec- ulative decoding algorithms for heterogeneous vocabularies,” arXiv preprint arXiv:2502.05202, 2025

work page arXiv 2025
[18]

In-context sharpness as alerts: An inner representation perspective for hallucination mitigation,

Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, and Junxian He, “In-context sharpness as alerts: An inner representation perspective for hallucination mitigation,”arXiv preprint arXiv:2403.01548, 2024

work page arXiv 2024
[19]

Generating benchmarks for factuality evaluation of language models,

Dor Muhlgay, Ori Ram, Inbal Magar, Yoav Levine, Nir Ratner, Yonatan Belinkov, Omri Abend, Kevin Leyton-Brown, Am- non Shashua, and Yoav Shoham, “Generating benchmarks for factuality evaluation of language models,”arXiv preprint arXiv:2307.06908, 2023

work page arXiv 2023
[20]

Measuring Massive Multitask Language Understanding

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Man- tas Mazeika, Dawn Song, and Jacob Steinhardt, “Measuring massive multitask language understanding,”arXiv preprint arXiv:2009.03300, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009
[21]

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettle- moyer, “Triviaqa: A large scale distantly supervised chal- lenge dataset for reading comprehension,”arXiv preprint arXiv:1705.03551, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[22]

Coqa: A conversational question answering challenge,

Siva Reddy, Danqi Chen, and Christopher D Manning, “Coqa: A conversational question answering challenge,”Transactions of the Association for Computational Linguistics, vol. 7, pp. 249–266, 2019

2019

[1] [1]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Am- jad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

Survey of hallucination in natural language generation,

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung, “Survey of hallucination in natural language generation,” ACM computing surveys, vol. 55, no. 12, pp. 1–38, 2023

2023

[4] [4]

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, and Pengcheng He, “Dola: Decoding by contrasting layers improves factuality in large language models,”arXiv preprint arXiv:2309.03883, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[5] [5]

Sources of hallucination by large language models on inference tasks,

Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, and Mark Steedman, “Sources of hallucination by large language models on inference tasks,” arXiv preprint arXiv:2305.14552, 2023

work page arXiv 2023

[6] [6]

Bias and fairness in large language models: A survey,

Isabel O Gallegos, Ryan A Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, and Nesreen K Ahmed, “Bias and fairness in large language models: A survey,”Computational Linguistics, vol. 50, no. 3, pp. 1097–1179, 2024

2024

[7] [7]

Rag-hat: A hallucination-aware tuning pipeline for llm in retrieval- augmented generation,

Juntong Song, Xingguang Wang, Juno Zhu, Yuanhao Wu, Xuxin Cheng, Randy Zhong, and Cheng Niu, “Rag-hat: A hallucination-aware tuning pipeline for llm in retrieval- augmented generation,” inProceedings of the 2024 Confer- ence on Empirical Methods in Natural Language Processing: Industry Track, 2024, pp. 1548–1558

2024

[8] [8]

Two-tiered encoder-based hallucination detection for retrieval-augmented generation in the wild,

Ilana Zimmerman, Jadin Tredup, Ethan Selfridge, and Joseph Bradley, “Two-tiered encoder-based hallucination detection for retrieval-augmented generation in the wild,” inProceed- ings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, 2024, pp. 8–22

2024

[9] [9]

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback,

Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, et al., “Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 13807–13816

2024

[10] [10]

Sled: Self logits evolution decoding for improving factuality in large language models,

Jianyi Zhang, Da-Cheng Juan, Cyrus Rashtchian, Chun-Sung Ferng, Heinrich Jiang, and Yiran Chen, “Sled: Self logits evolution decoding for improving factuality in large language models,”Advances in Neural Information Processing Systems, vol. 37, pp. 5188–5209, 2024

2024

[11] [11]

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Stephanie Lin, Jacob Hilton, and Owain Evans, “Truthfulqa: Measuring how models mimic human falsehoods,”arXiv preprint arXiv:2109.07958, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[12] [12]

Transformers learn to implement preconditioned gradient descent for in-context learning,

Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra, “Transformers learn to implement preconditioned gradient descent for in-context learning,”Advances in Neural Informa- tion Processing Systems, vol. 36, pp. 45614–45650, 2023

2023

[13] [13]

The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit

Huixue Zhou, Hengrui Gu, Xi Liu, Kaixiong Zhou, Mingfu Liang, Yongkang Xiao, Srinivas Govindan, Piyush Chawla, Jiyan Yang, Xiangfei Meng, et al., “The efficiency vs. accuracy trade-off: Optimizing rag-enhanced llm recommender systems using multi-head early exit,”arXiv preprint arXiv:2501.02173, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al., “Train- ing verifiers to solve math word problems,”arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[15] [15]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,”arXiv preprint arXiv:1803.05457, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[16] [16]

Inference-time intervention: Eliciting truthful answers from a language model,

Kenneth Li, Oam Patel, Fernanda Vi ´egas, Hanspeter Pfister, and Martin Wattenberg, “Inference-time intervention: Eliciting truthful answers from a language model,”Advances in Neural Information Processing Systems, vol. 36, pp. 41451–41530, 2023

2023

[17] [17]

Accelerating llm inference with lossless spec- ulative decoding algorithms for heterogeneous vocabularies,

Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchan- sky, Oren Pereg, Gaurav Jain, Roy Schwartz, Moshe Wasserblat, and David Harel, “Accelerating llm inference with lossless spec- ulative decoding algorithms for heterogeneous vocabularies,” arXiv preprint arXiv:2502.05202, 2025

work page arXiv 2025

[18] [18]

In-context sharpness as alerts: An inner representation perspective for hallucination mitigation,

Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, and Junxian He, “In-context sharpness as alerts: An inner representation perspective for hallucination mitigation,”arXiv preprint arXiv:2403.01548, 2024

work page arXiv 2024

[19] [19]

Generating benchmarks for factuality evaluation of language models,

Dor Muhlgay, Ori Ram, Inbal Magar, Yoav Levine, Nir Ratner, Yonatan Belinkov, Omri Abend, Kevin Leyton-Brown, Am- non Shashua, and Yoav Shoham, “Generating benchmarks for factuality evaluation of language models,”arXiv preprint arXiv:2307.06908, 2023

work page arXiv 2023

[20] [20]

Measuring Massive Multitask Language Understanding

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Man- tas Mazeika, Dawn Song, and Jacob Steinhardt, “Measuring massive multitask language understanding,”arXiv preprint arXiv:2009.03300, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009

[21] [21]

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Zettle- moyer, “Triviaqa: A large scale distantly supervised chal- lenge dataset for reading comprehension,”arXiv preprint arXiv:1705.03551, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[22] [22]

Coqa: A conversational question answering challenge,

Siva Reddy, Danqi Chen, and Christopher D Manning, “Coqa: A conversational question answering challenge,”Transactions of the Association for Computational Linguistics, vol. 7, pp. 249–266, 2019

2019