SCOPE: Sequential Conformal Probing for Reliable OOD Rejection in LLM Services

Boxuan Wang; Changshun Wu; Xiaowei Huang; Yi Dong; Zhuoyun Li

arxiv: 2606.21255 · v1 · pith:34HIIUSUnew · submitted 2026-06-19 · 💻 cs.CL

SCOPE: Sequential Conformal Probing for Reliable OOD Rejection in LLM Services

Zhuoyun Li , Boxuan Wang , Changshun Wu , Xiaowei Huang , Yi Dong This is my paper

Pith reviewed 2026-06-26 14:35 UTC · model grok-4.3

classification 💻 cs.CL

keywords OOD rejectionconformal predictionLLM serviceshidden layerssupermartingale e-processout-of-distribution detectionconformal gate

0 comments

The pith

SCOPE selects a readable hidden layer and applies sequential conformal probing with a supermartingale e-process to reject out-of-distribution inputs more reliably than final-layer detectors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SCOPE to improve out-of-distribution rejection for LLM services by probing inside the model rather than at the output. It selects a readable hidden layer where service-boundary signals are clearest, builds a conformal gate calibrated on in-distribution data, and uses a supermartingale e-process to provide theoretical guarantees for held-out inputs. Experiments on multiple backbones and six boundary conditions demonstrate better gate-level rejection performance. The approach also shows that different types of OOD boundaries appear as distinct geometric patterns in the hidden space. This matters because reliable filtering of unsupported requests is essential before full generation in deployed LLM services.

Core claim

SCOPE is a framework that selects a readable hidden layer, constructs a conformal gate with IND calibration, and uses a supermartingale e-process to certify persistent service-boundary evidence, leading to improved rejection over standard final-layer detectors across multiple LLM backbones and six boundary conditions, while revealing geometric forms of OOD boundaries in hidden space.

What carries the argument

The conformal gate at a selected hidden layer combined with a supermartingale e-process for certifying boundary evidence.

If this is right

Gate-level rejection improves compared to final-layer detectors.
Different OOD boundaries manifest as distinct geometric forms in hidden space.
The supermartingale e-process provides a theoretical guarantee for held-out inputs.
Service-boundary signals are most clearly encoded at a selectable hidden layer.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

SCOPE could be adapted to other sequential models beyond LLMs for OOD detection.
Selecting different layers might allow tuning for specific types of boundary conditions.
Integrating this into production LLM services might reduce unnecessary computation on invalid inputs.

Load-bearing premise

Service-boundary signals are most clearly encoded at a selectable readable hidden layer and the supermartingale e-process supplies a valid theoretical guarantee for held-out inputs under the chosen calibration.

What would settle it

A new set of OOD boundary conditions where the selected hidden layer does not show clearer signals than the final layer, or where the e-process fails to certify evidence on held-out data.

Figures

Figures reproduced from arXiv: 2606.21255 by Boxuan Wang, Changshun Wu, Xiaowei Huang, Yi Dong, Zhuoyun Li.

**Figure 1.** Figure 1: Examples for Service-scope OOD rejection before generation. A task-specific LLM service should answer inputs within its IND scope and reject or route unsupported inputs before full generation. SCOPE provides a reliable gate for pre-generation rejection. for frozen LLM backbones, which are attractive in deployment because they offer cost-efficiency, controllability, and easy integration into existing servi… view at source ↗

**Figure 2.** Figure 2: Overview of SCOPE. The framework first selects a readable hidden layer from frozen LLM representations, then calibrates the selected score into an IND-controlled rejection gate, and finally accumulates the held-out rejection stream with an e-process to obtain an anytime-valid service-boundary certificate. conformal pipelines (Gibbs et al., 2025; Kato et al., 2023). In SCOPE, conformal calibration serves a… view at source ↗

**Figure 3.** Figure 3: shows the same gate after conformal calibration. The score is converted into binary rejection decisions and accumulated by the e-process. Far-OOD traffic accumulates evidence rapidly, near-OOD traffic grows more gradually, and the IND-only stream stays below the certification threshold. Thus, the certificate is driven by repeated IND-calibrated rejections rather than by an uncalibrated confidence score.… view at source ↗

**Figure 4.** Figure 4: Cross-model service-boundary certification. Each cell reports CLG on one backbone and one boundary. Color shows AUROC; markers show e-process certificate rate over shuffled held-out streams. threshold, and e-process parameters are fixed before the held-out stream is evaluated. Thus, the heatmap does not merely report offline OOD ranking; it tests whether a development-selected gate continues to produce … view at source ↗

**Figure 5.** Figure 5: Selected-layer representation geometry. PCA projections illustrate separable and overlapping IND/OOD hidden-state patterns at the selected layer. shows that service-boundary definition is not a secondary detail: fine-grained intent splits and intentpreserving rewrites induce different hidden-state signals. This motivates the readout-geometry analysis in Sec. 4.4. 4.4 Geometry of Selected-Layer OOD Signa… view at source ↗

**Figure 6.** Figure 6: provides two useful checks. First, AUROC generally improves and Tα decreases as model size increases, suggesting that larger backbones tend to encode more readable serviceboundary signals. Second, the relative depth of the selected layer remains qualitatively stable across the QWEN2.5 family for the same boundary type. This supports the use of a one-time layer-selection step and reinforces the main claim … view at source ↗

**Figure 7.** Figure 7: Sensitivity to the conformal threshold. Smaller ϵ gives a stricter threshold, reducing IND false rejections but also lowering OOD-TPR and certificate rate. Larger ϵ increases OOD rejection and certification by spending more IND false-rejection budget. The main experiments use ϵ = 0.05 as a balanced operating point. P1 P2 P3 P4 P5 P6 0.0 0.2 0.4 0.6 0.8 1.0 IND false rejection CP tau fixed 0.5 P1 P2 P3 P4 P… view at source ↗

**Figure 9.** Figure 9: Sensitivity to the e-process parameters. Vary [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 11.** Figure 11: Additional PCA examples of selectedlayer IND/OOD geometry. Broad OOD boundaries show clearer separation in the selected representation, while tight near-intent boundaries remain more entangled. These examples complement the main geometry analysis by illustrating why calibrated rejection is easier for broad service shifts than for fine-grained intent splits. C.3 Directional gate e-process We also examin… view at source ↗

**Figure 12.** Figure 12: E-process trace for the conformal directional gate. The directional gate accumulates strong evidence on broad far-OOD streams, but produces weaker evidence on tighter near-OOD streams. This supports the use of CLG as the main gate: a one-dimensional direction can capture coarse shifts, while a richer linear boundary is more effective for finer service-boundary signals [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗

read the original abstract

Rejecting inputs outside the defined in-distribution (IND) service scope is critical for large language model (LLM) services, where unsupported requests should be filtered before full generation. Existing out-of-distribution (OOD) detectors often rely on final outputs or final-layer representations, leaving unclear where service-boundary signals are most clearly encoded inside the model; they also lack a theoretical guarantee for held-out inputs. In this paper, we introduce SCOPE (Sequential Conformal OOD Probing and Evaluation), a framework that selects a readable hidden layer, constructs a conformal gate with IND calibration, and uses a supermartingale e-process to certify persistent service-boundary evidence. Experiments across multiple LLM backbones and six carefully designed boundary conditions show that SCOPE improves gate-level rejection over standard final-layer detectors, while revealing how different OOD boundaries take different geometric forms in hidden space.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SCOPE combines hidden-layer conformal gating with supermartingale e-process certification for LLM OOD rejection, but the guarantee claim rests on unverified filtration and exchangeability assumptions for those representations.

read the letter

The paper's core move is to move OOD detection inside the model by picking a readable hidden layer, calibrating a conformal gate on in-distribution data, and then running a supermartingale e-process to certify that the boundary signal persists on new inputs. Experiments on multiple backbones and six boundary conditions report better gate-level rejection than final-layer baselines and some geometric observations about how different OOD types sit in the hidden space.

That combination of sequential probing plus e-process certification is the new piece relative to standard final-layer detectors. The practical framing for deployed LLM services is also useful; service operators need something that can reject unsupported requests before generation with some form of guarantee.

The soft spot is exactly the one flagged in the stress test. Hidden-layer activations are high-dimensional, token-sequence dependent, and conditioned on prior context, so it is not obvious that they satisfy the filtration and exchangeability conditions required for the supermartingale property under the null. The abstract does not show the derivation that layer selection preserves these conditions, and without that step the certification for held-out inputs is not yet supported. If the full paper supplies a clear argument on this point, the concern shrinks; otherwise it is load-bearing.

This is for researchers working on reliable LLM services and conformal methods applied to internal states. A reader who wants to see how e-process ideas might transfer to representation-level detection would get value. It deserves a serious referee to check the theoretical step and the experimental controls, even if revisions are needed on the guarantee.

Referee Report

1 major / 0 minor

Summary. The paper introduces SCOPE (Sequential Conformal OOD Probing and Evaluation), a framework that selects a readable hidden layer in LLMs, constructs a conformal gate using IND calibration data, and applies a supermartingale e-process to certify persistent service-boundary evidence for OOD rejection. Experiments on multiple LLM backbones under six boundary conditions demonstrate improved gate-level rejection compared to standard final-layer detectors and provide insights into the geometric forms of different OOD boundaries in hidden space.

Significance. If the theoretical guarantees hold, the work could advance reliable OOD rejection for LLM services by moving beyond final-layer heuristics to layer-selected conformal gates with e-process certification. The multi-backbone experiments and six boundary conditions constitute a strength in empirical coverage.

major comments (1)

[the section describing the supermartingale e-process construction and conformal gate] The central claim that SCOPE certifies persistent service-boundary evidence rests on the supermartingale e-process supplying a valid theoretical guarantee after conformal calibration at a selected hidden layer. No explicit derivation is provided showing that the filtration and the layer-selection step preserve the supermartingale property when hidden-layer scores are high-dimensional and token-sequence dependent (which can violate exchangeability or filtration conditions even under exchangeable calibration data). This is load-bearing for the certification claim for arbitrary held-out inputs.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential impact of the work on reliable OOD rejection in LLM services. We address the major comment below.

read point-by-point responses

Referee: [the section describing the supermartingale e-process construction and conformal gate] The central claim that SCOPE certifies persistent service-boundary evidence rests on the supermartingale e-process supplying a valid theoretical guarantee after conformal calibration at a selected hidden layer. No explicit derivation is provided showing that the filtration and the layer-selection step preserve the supermartingale property when hidden-layer scores are high-dimensional and token-sequence dependent (which can violate exchangeability or filtration conditions even under exchangeable calibration data). This is load-bearing for the certification claim for arbitrary held-out inputs.

Authors: We appreciate the referee pointing out the need for a more rigorous justification of the theoretical guarantees. The construction relies on the fact that the calibration data is exchangeable, and the conformal scores at the selected layer are used to form the e-process. The layer selection is performed using a validation set from the IND data, which maintains exchangeability. However, we agree that an explicit derivation showing preservation of the supermartingale property under layer selection and for high-dimensional, sequence-dependent scores is missing from the manuscript. In the revised version, we will add a new subsection or appendix providing this derivation, including how the filtration is defined over the sequence of tokens and why the property holds for held-out inputs under the standard assumptions of conformal prediction. We will also discuss potential limitations when strong dependencies violate exchangeability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external conformal and e-process methods

full rationale

The paper introduces SCOPE by selecting a hidden layer, building a conformal gate via IND calibration, and applying a supermartingale e-process for certification. These steps invoke standard techniques from conformal prediction literature rather than reducing any prediction or guarantee to fitted quantities by construction within the paper. No equations are exhibited that equate outputs to inputs via self-definition, renaming, or self-citation chains. The abstract and description present the framework as an application of existing tools to LLM hidden representations, with experimental validation on multiple backbones. This is self-contained against external benchmarks and does not meet the criteria for any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Abstract only; ledger is necessarily incomplete. Standard conformal prediction assumptions and the validity of supermartingale e-processes for boundary certification are invoked without further detail.

axioms (2)

standard math Conformal prediction yields valid coverage under exchangeability of calibration and test points.
Core to the conformal gate construction mentioned in the abstract.
domain assumption Supermartingale e-process can certify persistent service-boundary evidence for held-out inputs.
Invoked to provide the theoretical guarantee stated in the abstract.

pith-pipeline@v0.9.1-grok · 5685 in / 1260 out tokens · 23220 ms · 2026-06-26T14:35:08.064569+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 11 canonical work pages · 6 internal anchors

[1]

Potsawee Manakul and Adian Liusie and Mark J. F. Gales , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , year = 2023, pages =

2023
[2]

Hallucination Detection for Generative Large Language Models by B ayesian Sequential Estimation

Wang, Xiaohua and Yan, Yuliang and Huang, Longtao and Zheng, Xiaoqing and Huang, Xuanjing. Hallucination Detection for Generative Large Language Models by B ayesian Sequential Estimation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.949

work page doi:10.18653/v1/2023.emnlp-main.949 2023
[3]

Findings of the Association for Computational Linguistics: EMNLP 2023 , year = 2023, pages =

Yi Dai and Hao Lang and Kaisheng Zeng and Fei Huang and Yongbin Li , title =. Findings of the Association for Computational Linguistics: EMNLP 2023 , year = 2023, pages =

2023
[4]

Findings of the Association for Computational Linguistics: EMNLP 2024 , year = 2024, pages =

Qing Li and Jiahui Geng and Chenyang Lyu and Derui Zhu and Maxim Panov and Fakhri Karray , title =. Findings of the Association for Computational Linguistics: EMNLP 2024 , year = 2024, pages =

2024
[5]

Don't Just Say ``

Yang Deng and Yong Zhao and Moxin Li and See. Don't Just Say ``
[6]

Navigating the Unknown: Intent Classification and Out-of-Distribution Detection Using Large Language Models

Sali, Yusuf and Toraman, S tk Can. Navigating the Unknown: Intent Classification and Out-of-Distribution Detection Using Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.791

work page doi:10.18653/v1/2025.findings-emnlp.791 2025
[7]

Edelman , title =

Gustaf Ahdritz and Tian Qin and Nikhil Vyas and Boaz Barak and Benjamin L. Edelman , title =. Proceedings of the 41st International Conference on Machine Learning , volume = 235, pages =
[8]

Farima Fatahi Bayat and Xin Liu and H. V. Jagadish and Lu Wang , title =. Findings of the Association for Computational Linguistics: ACL 2024 , year = 2024, pages =

2024
[9]

Guillaume Alain and Yoshua Bengio , title =
[10]

Findings of the Association for Computational Linguistics: EMNLP 2023 , year = 2023, pages =

Amos Azaria and Tom Mitchell , title =. Findings of the Association for Computational Linguistics: EMNLP 2023 , year = 2023, pages =

2023
[11]

First Conference on Language Modeling , year=

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets , author=. First Conference on Language Modeling , year=
[12]

The Twelfth International Conference on Learning Representations , year=

Language Models Represent Space and Time , author=. The Twelfth International Conference on Learning Representations , year=
[13]

2025 , eprint=

Linear socio-demographic representations emerge in Large Language Models from indirect cues , author=. 2025 , eprint=

2025
[14]

No Answer Needed: Predicting

Iv. No Answer Needed: Predicting. ICLR 2026 Workshop on Principled Design for Trustworthy AI - Interpretability, Robustness, and Safety across Modalities , year=

2026
[15]

2022 , eprint=

Language Models (Mostly) Know What They Know , author=. 2022 , eprint=

2022
[16]

The Eleventh International Conference on Learning Representations , year=

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation , author=. The Eleventh International Conference on Learning Representations , year=
[17]

Nature , volume = 630, number = 8017, pages =

Detecting hallucinations in large language models using semantic entropy , author =. Nature , volume = 630, number = 8017, pages =
[18]

On Large Language Models' Hallucination with Regard to Known Facts , author =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , year = 2024, pages =

2024
[19]

Proceedings of the 41st International Conference on Machine Learning , pages =

The Linear Representation Hypothesis and the Geometry of Large Language Models , author =. Proceedings of the 41st International Conference on Machine Learning , pages =
[20]

Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in

Bao, Yuntai and Zhang, Xuhong and Du, Tianyu and Zhao, Xinkui and Feng, Zhengwen and Peng, Hao and Yin, Jianwei , booktitle =. Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in
[21]

Tibshirani and Larry Wasserman , title =

Jing Lei and Max G’Sell and Alessandro Rinaldo and Ryan J. Tibshirani and Larry Wasserman , title =. Journal of the American Statistical Association , volume = 113, number = 523, pages =
[22]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume = 87, number = 4, pages =

Conformal prediction with conditional guarantees , author =. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume = 87, number = 4, pages =
[23]

Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications , pages =

A Review of Nonconformity Measures for Conformal Prediction in Regression , author =. Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications , pages =
[24]

Test martingales,

Shafer, Glenn and Shen, Alexander and Vereshchagin, Nikolai and Vovk, Vladimir , journal =. Test martingales,
[25]

Time-uniform

Howard, Steven and Ramdas, Aaditya and McAuliffe, Jon and Sekhon, Jagmohan , year = 2020, pages =. Time-uniform

2020
[26]

Proceedings of the National Academy of Sciences , volume = 121, number = 39, pages =

Beyond Neyman--Pearson: E-values enable hypothesis testing with a data-driven alpha , author =. Proceedings of the National Academy of Sciences , volume = 121, number = 39, pages =
[27]

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , author =
[28]

Lee, Kimin and Lee, Kibok and Lee, Honglak and Shin, Jinwoo , booktitle =
[29]

Liu, Weitang and Wang, Xiaoyun and Owens, John and Li, Yixuan , booktitle =
[30]

2023 , eprint=

Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

2023
[31]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025
[32]

2024 , eprint=

Out-of-Distribution Detection Should Use Conformal Prediction (and Vice-versa?) , author=. 2024 , eprint=

2024
[33]

and Chellappa, Rama and Jha, Susmit , booktitle =

Gupta, Ayush and Kaur, Ramneet and Roy, Anirban and Cobb, Adam D. and Chellappa, Rama and Jha, Susmit , booktitle =. Polysemantic Dropout: Conformal
[34]

How Good Are

Liu, Bo and Zhan, Li-Ming and Lu, Zexin and Feng, Yujie and Xue, Lei and Wu, Xiao-Ming , booktitle =. How Good Are
[35]

2026 , eprint=

Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models , author=. 2026 , eprint=

2026
[36]

2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S) , pages=

Detecting operational adversarial examples for reliable deep learning , author=. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S) , pages=. 2021 , organization=

2021
[37]

ACM transactions on embedded computing systems , volume=

Reliability assessment and safety arguments for machine learning components in system assurance , author=. ACM transactions on embedded computing systems , volume=. 2023 , publisher=

2023
[38]

, booktitle=

Jiang, Zhihan and Ren, Rui and Yu, Guangba and Wu, Yulun and Gu, Wenwei and Li, Yichen and Huang, Yujie and Feng, Cong and Yang, Zengyin and Yang, Yongqiang and Lyu, Michael R. , booktitle=. LLMPrism: Black-box Performance Diagnosis for Production LLM Training Platforms , year=
[39]

2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) , pages=

A robust framework for adaptive selection of filter ensembles to detect adversarial inputs , author=. 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) , pages=. 2022 , organization=

2022
[40]

2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , pages=

Multi-version Machine Learning and Rejuvenation for Resilient Perception in Safety-critical Systems , author=. 2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , pages=. 2025 , organization=

2025
[41]

arXiv preprint arXiv:2605.08896 , year=

FragileFlow: Spectral Control of Correct-but-Fragile Predictions for Foundation Model Robustness , author=. arXiv preprint arXiv:2605.08896 , year=

Pith/arXiv arXiv
[42]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open Foundation and Fine-Tuned Chat Models , author =. 2023 , eprint =. doi:10.48550/arXiv.2307.09288 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.09288 2023
[43]

Qwen2.5 Technical Report

Qwen2.5 Technical Report , author =. 2025 , eprint =. doi:10.48550/arXiv.2412.15115 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2025
[44]

Mistral 7B

Mistral 7B , author =. 2023 , eprint =. doi:10.48550/arXiv.2310.06825 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06825 2023
[45]

2 OLMo 2 Furious

2 OLMo 2 Furious , author =. 2025 , eprint =. doi:10.48550/arXiv.2501.00656 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.00656 2025
[46]

The Falcon Series of Open Language Models

The Falcon Series of Open Language Models , author =. 2023 , eprint =. doi:10.48550/arXiv.2311.16867 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2311.16867 2023
[47]

Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing , pages =

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , author =. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing , pages =. 2013 , address =

2013
[48]

GLUE : A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R. , booktitle =. 2018 , address =. doi:10.18653/v1/W18-5446 , url =

work page doi:10.18653/v1/w18-5446 2018
[49]

Machine Learning Proceedings 1995 , pages =

NewsWeeder: Learning to Filter Netnews , author =. Machine Learning Proceedings 1995 , pages =. 1995 , publisher =

1995
[50]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing , pages =

An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction , author =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing , pages =. 2019 , address =. doi:10.18653/v1/D19-1131 , url =

work page doi:10.18653/v1/d19-1131 2019
[51]

Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI , pages =

Efficient Intent Detection with Dual Sentence Encoders , author =. Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI , pages =. 2020 , address =. doi:10.18653/v1/2020.nlp4convai-1.5 , url =

work page doi:10.18653/v1/2020.nlp4convai-1.5 2020
[52]

Dagan, Ido and Glickman, Oren and Magnini, Bernardo , booktitle =. The. 2006 , publisher =

2006
[53]

Proceedings of the Ninth Workshop on Statistical Machine Translation , pages =

Findings of the 2014 Workshop on Statistical Machine Translation , author =. Proceedings of the Ninth Workshop on Statistical Machine Translation , pages =. 2014 , address =

2014
[54]

2002 , url =

Learning Question Classifiers , author =. 2002 , url =

2002
[55]

A broad-coverage challenge corpus for sentence understanding through inference

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , author =. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 , pages =. 2018 , address =. doi:10.18653/v1/N18-1101 , url =

work page internal anchor Pith review doi:10.18653/v1/n18-1101 2018

[1] [1]

Potsawee Manakul and Adian Liusie and Mark J. F. Gales , title =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , year = 2023, pages =

2023

[2] [2]

Hallucination Detection for Generative Large Language Models by B ayesian Sequential Estimation

Wang, Xiaohua and Yan, Yuliang and Huang, Longtao and Zheng, Xiaoqing and Huang, Xuanjing. Hallucination Detection for Generative Large Language Models by B ayesian Sequential Estimation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.949

work page doi:10.18653/v1/2023.emnlp-main.949 2023

[3] [3]

Findings of the Association for Computational Linguistics: EMNLP 2023 , year = 2023, pages =

Yi Dai and Hao Lang and Kaisheng Zeng and Fei Huang and Yongbin Li , title =. Findings of the Association for Computational Linguistics: EMNLP 2023 , year = 2023, pages =

2023

[4] [4]

Findings of the Association for Computational Linguistics: EMNLP 2024 , year = 2024, pages =

Qing Li and Jiahui Geng and Chenyang Lyu and Derui Zhu and Maxim Panov and Fakhri Karray , title =. Findings of the Association for Computational Linguistics: EMNLP 2024 , year = 2024, pages =

2024

[5] [5]

Don't Just Say ``

Yang Deng and Yong Zhao and Moxin Li and See. Don't Just Say ``

[6] [6]

Navigating the Unknown: Intent Classification and Out-of-Distribution Detection Using Large Language Models

Sali, Yusuf and Toraman, S tk Can. Navigating the Unknown: Intent Classification and Out-of-Distribution Detection Using Large Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.791

work page doi:10.18653/v1/2025.findings-emnlp.791 2025

[7] [7]

Edelman , title =

Gustaf Ahdritz and Tian Qin and Nikhil Vyas and Boaz Barak and Benjamin L. Edelman , title =. Proceedings of the 41st International Conference on Machine Learning , volume = 235, pages =

[8] [8]

Farima Fatahi Bayat and Xin Liu and H. V. Jagadish and Lu Wang , title =. Findings of the Association for Computational Linguistics: ACL 2024 , year = 2024, pages =

2024

[9] [9]

Guillaume Alain and Yoshua Bengio , title =

[10] [10]

Findings of the Association for Computational Linguistics: EMNLP 2023 , year = 2023, pages =

Amos Azaria and Tom Mitchell , title =. Findings of the Association for Computational Linguistics: EMNLP 2023 , year = 2023, pages =

2023

[11] [11]

First Conference on Language Modeling , year=

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets , author=. First Conference on Language Modeling , year=

[12] [12]

The Twelfth International Conference on Learning Representations , year=

Language Models Represent Space and Time , author=. The Twelfth International Conference on Learning Representations , year=

[13] [13]

2025 , eprint=

Linear socio-demographic representations emerge in Large Language Models from indirect cues , author=. 2025 , eprint=

2025

[14] [14]

No Answer Needed: Predicting

Iv. No Answer Needed: Predicting. ICLR 2026 Workshop on Principled Design for Trustworthy AI - Interpretability, Robustness, and Safety across Modalities , year=

2026

[15] [15]

2022 , eprint=

Language Models (Mostly) Know What They Know , author=. 2022 , eprint=

2022

[16] [16]

The Eleventh International Conference on Learning Representations , year=

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation , author=. The Eleventh International Conference on Learning Representations , year=

[17] [17]

Nature , volume = 630, number = 8017, pages =

Detecting hallucinations in large language models using semantic entropy , author =. Nature , volume = 630, number = 8017, pages =

[18] [18]

On Large Language Models' Hallucination with Regard to Known Facts , author =. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , year = 2024, pages =

2024

[19] [19]

Proceedings of the 41st International Conference on Machine Learning , pages =

The Linear Representation Hypothesis and the Geometry of Large Language Models , author =. Proceedings of the 41st International Conference on Machine Learning , pages =

[20] [20]

Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in

Bao, Yuntai and Zhang, Xuhong and Du, Tianyu and Zhao, Xinkui and Feng, Zhengwen and Peng, Hao and Yin, Jianwei , booktitle =. Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in

[21] [21]

Tibshirani and Larry Wasserman , title =

Jing Lei and Max G’Sell and Alessandro Rinaldo and Ryan J. Tibshirani and Larry Wasserman , title =. Journal of the American Statistical Association , volume = 113, number = 523, pages =

[22] [22]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume = 87, number = 4, pages =

Conformal prediction with conditional guarantees , author =. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume = 87, number = 4, pages =

[23] [23]

Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications , pages =

A Review of Nonconformity Measures for Conformal Prediction in Regression , author =. Proceedings of the Twelfth Symposium on Conformal and Probabilistic Prediction with Applications , pages =

[24] [24]

Test martingales,

Shafer, Glenn and Shen, Alexander and Vereshchagin, Nikolai and Vovk, Vladimir , journal =. Test martingales,

[25] [25]

Time-uniform

Howard, Steven and Ramdas, Aaditya and McAuliffe, Jon and Sekhon, Jagmohan , year = 2020, pages =. Time-uniform

2020

[26] [26]

Proceedings of the National Academy of Sciences , volume = 121, number = 39, pages =

Beyond Neyman--Pearson: E-values enable hypothesis testing with a data-driven alpha , author =. Proceedings of the National Academy of Sciences , volume = 121, number = 39, pages =

[27] [27]

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , author =

[28] [28]

Lee, Kimin and Lee, Kibok and Lee, Honglak and Shin, Jinwoo , booktitle =

[29] [29]

Liu, Weitang and Wang, Xiaoyun and Owens, John and Li, Yixuan , booktitle =

[30] [30]

2023 , eprint=

Llama 2: Open Foundation and Fine-Tuned Chat Models , author=. 2023 , eprint=

2023

[31] [31]

2025 , eprint=

Qwen2.5 Technical Report , author=. 2025 , eprint=

2025

[32] [32]

2024 , eprint=

Out-of-Distribution Detection Should Use Conformal Prediction (and Vice-versa?) , author=. 2024 , eprint=

2024

[33] [33]

and Chellappa, Rama and Jha, Susmit , booktitle =

Gupta, Ayush and Kaur, Ramneet and Roy, Anirban and Cobb, Adam D. and Chellappa, Rama and Jha, Susmit , booktitle =. Polysemantic Dropout: Conformal

[34] [34]

How Good Are

Liu, Bo and Zhan, Li-Ming and Lu, Zexin and Feng, Yujie and Xue, Lei and Wu, Xiao-Ming , booktitle =. How Good Are

[35] [35]

2026 , eprint=

Chain-of-Thought as a Lens: Evaluating Structured Reasoning Alignment between Human Preferences and Large Language Models , author=. 2026 , eprint=

2026

[36] [36]

2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S) , pages=

Detecting operational adversarial examples for reliable deep learning , author=. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S) , pages=. 2021 , organization=

2021

[37] [37]

ACM transactions on embedded computing systems , volume=

Reliability assessment and safety arguments for machine learning components in system assurance , author=. ACM transactions on embedded computing systems , volume=. 2023 , publisher=

2023

[38] [38]

, booktitle=

Jiang, Zhihan and Ren, Rui and Yu, Guangba and Wu, Yulun and Gu, Wenwei and Li, Yichen and Huang, Yujie and Feng, Cong and Yang, Zengyin and Yang, Yongqiang and Lyu, Michael R. , booktitle=. LLMPrism: Black-box Performance Diagnosis for Production LLM Training Platforms , year=

[39] [39]

2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) , pages=

A robust framework for adaptive selection of filter ensembles to detect adversarial inputs , author=. 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) , pages=. 2022 , organization=

2022

[40] [40]

2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , pages=

Multi-version Machine Learning and Rejuvenation for Resilient Perception in Safety-critical Systems , author=. 2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , pages=. 2025 , organization=

2025

[41] [41]

arXiv preprint arXiv:2605.08896 , year=

FragileFlow: Spectral Control of Correct-but-Fragile Predictions for Foundation Model Robustness , author=. arXiv preprint arXiv:2605.08896 , year=

Pith/arXiv arXiv

[42] [42]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Llama 2: Open Foundation and Fine-Tuned Chat Models , author =. 2023 , eprint =. doi:10.48550/arXiv.2307.09288 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.09288 2023

[43] [43]

Qwen2.5 Technical Report

Qwen2.5 Technical Report , author =. 2025 , eprint =. doi:10.48550/arXiv.2412.15115 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2025

[44] [44]

Mistral 7B

Mistral 7B , author =. 2023 , eprint =. doi:10.48550/arXiv.2310.06825 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06825 2023

[45] [45]

2 OLMo 2 Furious

2 OLMo 2 Furious , author =. 2025 , eprint =. doi:10.48550/arXiv.2501.00656 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.00656 2025

[46] [46]

The Falcon Series of Open Language Models

The Falcon Series of Open Language Models , author =. 2023 , eprint =. doi:10.48550/arXiv.2311.16867 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2311.16867 2023

[47] [47]

Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing , pages =

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , author =. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing , pages =. 2013 , address =

2013

[48] [48]

GLUE : A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R. , booktitle =. 2018 , address =. doi:10.18653/v1/W18-5446 , url =

work page doi:10.18653/v1/w18-5446 2018

[49] [49]

Machine Learning Proceedings 1995 , pages =

NewsWeeder: Learning to Filter Netnews , author =. Machine Learning Proceedings 1995 , pages =. 1995 , publisher =

1995

[50] [50]

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing , pages =

An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction , author =. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing , pages =. 2019 , address =. doi:10.18653/v1/D19-1131 , url =

work page doi:10.18653/v1/d19-1131 2019

[51] [51]

Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI , pages =

Efficient Intent Detection with Dual Sentence Encoders , author =. Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI , pages =. 2020 , address =. doi:10.18653/v1/2020.nlp4convai-1.5 , url =

work page doi:10.18653/v1/2020.nlp4convai-1.5 2020

[52] [52]

Dagan, Ido and Glickman, Oren and Magnini, Bernardo , booktitle =. The. 2006 , publisher =

2006

[53] [53]

Proceedings of the Ninth Workshop on Statistical Machine Translation , pages =

Findings of the 2014 Workshop on Statistical Machine Translation , author =. Proceedings of the Ninth Workshop on Statistical Machine Translation , pages =. 2014 , address =

2014

[54] [54]

2002 , url =

Learning Question Classifiers , author =. 2002 , url =

2002

[55] [55]

A broad-coverage challenge corpus for sentence understanding through inference

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , author =. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 , pages =. 2018 , address =. doi:10.18653/v1/N18-1101 , url =

work page internal anchor Pith review doi:10.18653/v1/n18-1101 2018