Decision Potential Surface: A Theoretical and Practical Approximation of Large Language Model Decision Boundary

Haibo Hu; Haoyang Shang; Huadi Zheng; Peizhao Hu; Qingqing Ye; Yulin Jin; Zhiyao Wu; Zi Liang

arxiv: 2510.03271 · v2 · pith:HFEY2U6Qnew · submitted 2025-09-27 · 💻 cs.LG · cs.AI

Decision Potential Surface: A Theoretical and Practical Approximation of Large Language Model Decision Boundary

Zi Liang , Zhiyao Wu , Haoyang Shang , Yulin Jin , Qingqing Ye , Huadi Zheng , Peizhao Hu , Haibo Hu This is my paper

Pith reviewed 2026-05-22 13:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords decision boundarylarge language modelsdecision potential surfaceapproximation algorithmerror boundsisohypseautoregressive samplingmodel interpretation

0 comments

The pith

The zero-height contour of the Decision Potential Surface exactly matches an LLM's decision boundary, and a K-sample method approximates it with bounded error.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper defines a Decision Potential Surface from the model's per-input confidence in separating output classes. It proves that the flat zero line on this surface is identical to the model's decision boundary, with the enclosed patches acting as distinct decision regions. From there it introduces a sampling procedure called K-DPS that reconstructs the surface, and therefore the boundary, from only K finite sequences. The authors supply explicit upper bounds showing how the absolute error, average error, and error concentration shrink as the number of samples grows. If the construction holds, analysts can finally map where an LLM changes its mind without enumerating its entire output space.

Core claim

Decision Potential Surface is derived from the confidence scores an LLM assigns when distinguishing between different classes for each input. The zero-height isohypse on this surface is proven equivalent to the model's decision boundary, while the regions it encloses correspond to the model's decision regions. The K-DPS algorithm then approximates the full surface using only K finite sequence samples and supplies theoretical upper bounds on the absolute error, expected error, and error concentration relative to the ideal surface.

What carries the argument

Decision Potential Surface (DPS), a scalar field built from per-input class-separation , whose zero contour directly traces the LLM decision boundary.

If this is right

Decision boundaries of mainstream LLMs become practically drawable instead of computationally impossible.
The size of the approximation error is explicitly controlled by the choice of sample count K.
Decision regions appear as the connected components bounded by the zero contour on the surface.
Error concentration bounds let users know how many samples are needed for a target accuracy level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The surface view could be used to locate inputs that sit near the boundary and are therefore most vulnerable to small perturbations.
Applying the same construction to vision-language models might reveal whether their decision surfaces share similar geometric traits.
Tracking how the approximated boundary moves when a model is fine-tuned would give a direct measure of behavioral change.

Load-bearing premise

The per-input values used to construct the surface can be estimated from finite autoregressive samples without breaking the exact geometric match between the zero contour and the true decision boundary.

What would settle it

Enumerate the exact decision boundary for a small autoregressive model on a closed input domain, build the corresponding DPS from the same model, and check whether the zero contour lies precisely on the enumerated boundary; any measurable mismatch falsifies the claimed equivalence.

Figures

Figures reproduced from arXiv: 2510.03271 by Haibo Hu, Haoyang Shang, Huadi Zheng, Peizhao Hu, Qingqing Ye, Yulin Jin, Zhiyao Wu, Zi Liang.

**Figure 1.** Figure 1: Effect of sampling size K on the values of decision potential function, with each blue point representing the K-DPS value for a single input sample. Each blue line represents a trend of K-DPS for one input sample. 0 2500 5000 7500 10000 12500 15000 17500 20000 K 0 1000 2000 3000 4000 Absolute Error Absolute Error vs K Mean Absolute Error [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗

**Figure 2.** Figure 2: Effect of sampling size K on the absolute error between the reference K-DPS (computed with K = 20, 000) and K-DPS values for varying K. Each blue line represents a trend of absolute error across input samples. 5.2 INFLUENCE OF SAMPLING GRAIN K We first evaluate the impact of the key hyperparameter, the sampling grain K, on the K-DPS value and the absolute errors between K-DPS and the ideal DPS. Specificall… view at source ↗

**Figure 3.** Figure 3: Empirical concentration experiments with different [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Contour visualization of the K-DPS (K = 2, 500) for Llama-3.2-1B on four datasets. Region colors represent the decision potential values. Black lines denote isohypses, with the 0- isohypse indicating decision boundary. Cubic interpolation is applied to construct the mesh grid, with visualizations using linear and nearest interpolation shown in Figures 6 and 7. 10 to 20,000, with λ set to 16, 64, 256, and 2… view at source ↗

**Figure 5.** Figure 5: Three-dimensional visualization of the K-DPS (K = 2, 500) for Llama-3.2-1B on four datasets. 0.0 0.2 0.4 0.6 0.8 1.0 Dimension 1 0.0 0.2 0.4 0.6 0.8 1.0 Dimension 2 wikipedia_mini 0 0 0 0 0 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100… view at source ↗

**Figure 6.** Figure 6: Contour visualization of 2,500-grained decision potential surface for Llama3.2-1B on four [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Contour visualization of 2,500-grained decision potential surface of Llama3.2-1B on four [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: Heatmap visualization of the decision potential surface on Four datasets. [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

read the original abstract

Decision boundary, the subspace of inputs where a machine learning model assigns equal classification probabilities to two classes, is pivotal in revealing core model properties and interpreting behaviors. While analyzing the decision boundary of large language models (LLMs) has attracted increasing attention recently, constructing it for mainstream LLMs remains computationally infeasible due to the enormous sequence-level output spaces and the autoregressive nature of LLMs. To address this issue, in this paper we propose Decision Potential Surface (DPS), a new notion for analyzing the properties of LLM decisions. DPS is derived from the confidence in distinguishing different classes for each input, which naturally captures the potential of the decision boundary. We prove that the zero-height isohypse in DPS is equivalent to the decision boundary of an LLM, with enclosed regions representing decision regions. By leveraging DPS, for the first time in the literature, we propose a practical decision boundary approximation algorithm, namely K-DPS, which only requires only K finite sequence samples to approximate an LLM's decision boundary with negligible error. We theoretically derive the upper bounds for the absolute error, expected error, and the error concentration between K-DPS and the ideal DPS, demonstrating that such errors can be traded off against sampling times.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Decision Potential Surface (DPS), constructed from per-input class-distinguishing confidence scores, and proves that its zero-height isohypse coincides with an LLM's decision boundary while enclosed regions represent decision regions. It then proposes the K-DPS algorithm, which approximates the boundary using only K finite sequence samples, and derives theoretical upper bounds on absolute error, expected error, and error concentration between the sampled and ideal DPS surfaces.

Significance. If the equivalence proof and geometric error control hold, the work would offer a novel, sampling-efficient framework for analyzing otherwise intractable decision boundaries in autoregressive LLMs, with potential value for interpretability and safety. The explicit derivation of error bounds on the surface is a positive feature that could support practical use, though significance hinges on closing the gap between height approximation and level-set fidelity.

major comments (2)

[DPS derivation and equivalence proof] Abstract and DPS derivation section: the manuscript asserts a proof that the zero-height isohypse is equivalent to the LLM decision boundary but provides none of the intermediate steps, the precise definition of how per-input confidence is extracted from autoregressive token probabilities, or the treatment of variable sequence lengths. These omissions render the central equivalence claim unverifiable.
[Error bounds for K-DPS] Error bounds section (K-DPS analysis): the upper bounds are stated for pointwise or uniform deviation in surface height. However, without an explicit lower bound on |∇DPS| near the zero contour or a Lipschitz constant on the surface, these bounds do not control the geometric displacement of the zero-isohypse itself; arbitrarily small height errors can produce arbitrarily large shifts of the decision boundary in the discrete, high-dimensional sequence space.

minor comments (2)

[Notation and definitions] Clarify notation for the confidence function used to build DPS and specify whether it is normalized across classes or derived directly from log-probabilities.
[K-DPS algorithm description] Add a brief discussion of how K is chosen in practice and whether the bounds remain useful when the surface contains flat regions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the clarity of our central claims and the relationship between surface approximation and level-set geometry. We address each major comment in turn and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: Abstract and DPS derivation section: the manuscript asserts a proof that the zero-height isohypse is equivalent to the LLM decision boundary but provides none of the intermediate steps, the precise definition of how per-input confidence is extracted from autoregressive token probabilities, or the treatment of variable sequence lengths. These omissions render the central equivalence claim unverifiable.

Authors: We agree that additional detail is required for verifiability. The per-input confidence score is obtained by taking the difference between the summed log-probabilities of tokens belonging to each of the two classes, normalized by sequence length to accommodate variable lengths. The equivalence proof proceeds by showing that the sign of this score determines the model's argmax class and that the zero level set is exactly the set where the two class probabilities are equal. We will expand the DPS derivation section with the full sequence of intermediate steps, the explicit extraction formula from the autoregressive token distribution, and the normalization procedure for variable-length sequences. revision: yes
Referee: Error bounds section (K-DPS analysis): the upper bounds are stated for pointwise or uniform deviation in surface height. However, without an explicit lower bound on |∇DPS| near the zero contour or a Lipschitz constant on the surface, these bounds do not control the geometric displacement of the zero-isohypse itself; arbitrarily small height errors can produce arbitrarily large shifts of the decision boundary in the discrete, high-dimensional sequence space.

Authors: The referee correctly identifies that uniform control on height error alone does not automatically yield a bound on the displacement of the zero contour when the gradient may approach zero. Our current analysis focuses on the approximation quality of the DPS surface itself, which is the quantity directly estimated by K-DPS. To address the geometric fidelity of the recovered boundary, we will add a new subsection that (i) states the additional assumption that |∇DPS| is bounded below by a positive constant in a neighborhood of the zero contour (a mild transversality condition that holds for typical LLM decision surfaces) and (ii) derives an explicit upper bound on the Hausdorff distance between the true and approximated zero level sets under this assumption. We will also include a brief discussion of the discrete nature of the sequence space and how the sampling procedure respects it. revision: partial

Circularity Check

1 steps flagged

DPS zero-isohypse equivalence to LLM decision boundary holds by construction from the confidence definition

specific steps

self definitional [Abstract]
"DPS is derived from the confidence in distinguishing different classes for each input, which naturally captures the potential of the decision boundary. We prove that the zero-height isohypse in DPS is equivalent to the decision boundary of an LLM, with enclosed regions representing decision regions."

The surface height is defined directly from the class-distinguishing confidence value. Consequently the locus of height zero is exactly the locus where confidence equals zero, which coincides with the definition of the decision boundary (equal probabilities). The stated 'proof' of equivalence therefore follows immediately from the construction of DPS rather than from any additional geometric or probabilistic argument.

full rationale

The central theoretical claim reduces to a definitional equivalence rather than an independent derivation. DPS is explicitly built from per-input class-distinguishing confidence; its zero contour is therefore the set where that confidence is zero, which is the standard definition of the decision boundary (equal class probabilities). The paper presents this as a proof, but the reduction is tautological from the surface construction. The subsequent K-DPS sampling and error bounds address surface height deviation, not level-set geometry, leaving the headline boundary-approximation claim partially dependent on the definitional step. No self-citations or fitted parameters appear in the provided excerpts, so the circularity is limited to this one load-bearing definitional move.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claims rest on a newly introduced surface construct and standard assumptions about probability outputs of LLMs; no external data or fitted constants are invoked in the abstract.

free parameters (1)

K
Number of finite sequence samples; controls the error-size tradeoff but is not fitted to data.

axioms (1)

domain assumption Class-distinguishing confidence can be defined and extracted for any input sequence from an autoregressive LLM.
Invoked when DPS is derived from confidence scores.

invented entities (1)

Decision Potential Surface (DPS) no independent evidence
purpose: Surface whose height encodes class-distinguishing confidence and whose zero contour is the decision boundary.
Newly defined mathematical object with no independent external evidence supplied.

pith-pipeline@v0.9.0 · 5768 in / 1483 out tokens · 43158 ms · 2026-05-22T13:18:13.360087+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 7 internal anchors

[1]

URLhttps://arxiv.org/abs/2412. 08905. Zeyuan Allen-Zhu and Yuanzhi Li. Physics of language models: Part 3.2, knowledge manipulation. arXiv preprint arXiv:2309.14402,

work page arXiv
[2]

Arthur Conmy, Augustine N

URLhttps://jmlr.org/ papers/v3/bengio03a.html. Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adri `a Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Al- ice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.),Advances in Neural Information Proces...

work page 2023
[3]

Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al

URLhttp://papers.nips.cc/paper_files/paper/2023/hash/ 34e1dbe95d34d7ebaf99b9bcaeb5b2be-Abstract-Conference.html. Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al. A mathematical framework for transformer circuits.Transformer Circuits Thread, 1(1):12,

work page 2023
[4]

URL https://arxiv.org/abs/2209.07858. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA,

work page internal anchor Pith review Pith/arXiv arXiv 2015
[5]

The Llama 3 Herd of Models

URL https://arxiv.org/abs/2407.21783. Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

ISBN 9781450368223

Association for Computing Machin- ery. ISBN 9781450368223. doi: 10.1145/3336191.3372186. URLhttps://doi.org/10. 1145/3336191.3372186. Hamid Karimi, Tyler Derr, and Jiliang Tang. Characterizing the decision boundary of deep neural networks,

work page doi:10.1145/3336191.3372186
[7]

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brah- man, Lester James V

URLhttps://arxiv.org/abs/1912.11460. Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brah- man, Lester James V . Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Ma- lik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wan...

work page arXiv 1912
[8]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

URLhttps: //arxiv.org/abs/2411.15124. Chulhee Lee and D.A. Landgrebe. Decision boundary feature extraction for neural networks.IEEE Transactions on Neural Networks, 8(1):75–83,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Sungzoon Lee and B

doi: 10.1109/72.554193. Sungzoon Lee and B. John Oommen. Decision boundary boundary feature extraction for neural networks.IEEE Transactions on Neural Networks, 8(4):865–875,

work page doi:10.1109/72.554193
[10]

ISBN 979-8-89176-251-0

Association for Computational Linguistics. ISBN 979-8-89176-251-0. doi: 10.18653/v1/2025.acl-long.256. URLhttps://aclanthology. org/2025.acl-long.256/. Yu Li, Lizhong Ding, and Xin Gao. On the decision boundary of deep neural networks,

work page doi:10.18653/v1/2025.acl-long.256 2025
[11]

On the Decision Boundary of Deep Neural Networks

URL https://arxiv.org/abs/1808.05385. Zi Liang, Haibo Hu, Qingqing Ye, Yaxin Xiao, and Haoyang Li. Why are my prompts leaked? unraveling prompt extraction threats in customized large language models.arXiv preprint arXiv:2408.02416,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu

URLhttps://arxiv.org/abs/ 2505.12871. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada,

work page arXiv 2018
[13]

David Mickisch, Felix Assion, Florens Greßner, Wiebke G ¨unther, and Mariele Motta

URLhttps://arxiv.org/abs/2509.09396. David Mickisch, Felix Assion, Florens Greßner, Wiebke G ¨unther, and Mariele Motta. Under- standing the decision boundary of deep neural networks: An empirical study,

work page arXiv
[14]

URL https://arxiv.org/abs/2002.01810. NVIDIA. Nvidia nemotron nano 2: An accurate and efficient hybrid mamba-transformer reasoning model,

work page arXiv 2002
[15]

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

URLhttps://arxiv.org/abs/2508.14444. Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language under- standing by generative pre-training

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Frank Rosenblatt

URLhttps://arxiv.org/abs/ 2412.17056. Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review, 65(6):386–408,

work page arXiv
[17]

Open Problems in Mechanistic Interpretability

Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Isaac Bloom, Stella Biderman, Adri`a Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saun...

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Open Problems in Mechanistic Interpretability

doi: 10.48550/ ARXIV .2501.16496. URLhttps://doi.org/10.48550/arXiv.2501.16496. Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.16496
[19]

In- terpretability in the wild: a circuit for indirect object identification in GPT-2 small

Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Steinhardt. In- terpretability in the wild: a circuit for indirect object identification in GPT-2 small. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5,

work page 2023
[20]

URLhttps://arxiv.org/abs/ 2504.13828. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang ...

work page arXiv
[21]

Siyan Zhao, Tung Nguyen, and Aditya Grover

URLhttps://arxiv.org/abs/1908.02802. Siyan Zhao, Tung Nguyen, and Aditya Grover. Probing the decision boundaries of in- context learning in large language models. In A. Globerson, L. Mackey, D. Bel- grave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.),Advances in Neural In- formation Processing Systems, volume 37, pp. 130408–130432. Curran Associates, Inc.,

work page arXiv 1908
[22]

14 A LLM USAGE It is used for error checking, proofreading, result visualization, and code optimization

URLhttps://proceedings.neurips.cc/paper_files/paper/2024/ file/eb5dd4476448c44e55a759a985b3bbec-Paper-Conference.pdf. 14 A LLM USAGE It is used for error checking, proofreading, result visualization, and code optimization. B PROOFS B.1 PROOF OFTHEOREM3.2 Proof.Part I: Proof of Equation

work page 2024

[1] [1]

URLhttps://arxiv.org/abs/2412. 08905. Zeyuan Allen-Zhu and Yuanzhi Li. Physics of language models: Part 3.2, knowledge manipulation. arXiv preprint arXiv:2309.14402,

work page arXiv

[2] [2]

Arthur Conmy, Augustine N

URLhttps://jmlr.org/ papers/v3/bengio03a.html. Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adri `a Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Al- ice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (eds.),Advances in Neural Information Proces...

work page 2023

[3] [3]

Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al

URLhttp://papers.nips.cc/paper_files/paper/2023/hash/ 34e1dbe95d34d7ebaf99b9bcaeb5b2be-Abstract-Conference.html. Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al. A mathematical framework for transformer circuits.Transformer Circuits Thread, 1(1):12,

work page 2023

[4] [4]

URL https://arxiv.org/abs/2209.07858. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA,

work page internal anchor Pith review Pith/arXiv arXiv 2015

[5] [5]

The Llama 3 Herd of Models

URL https://arxiv.org/abs/2407.21783. Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain.arXiv preprint arXiv:1708.06733,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

ISBN 9781450368223

Association for Computing Machin- ery. ISBN 9781450368223. doi: 10.1145/3336191.3372186. URLhttps://doi.org/10. 1145/3336191.3372186. Hamid Karimi, Tyler Derr, and Jiliang Tang. Characterizing the decision boundary of deep neural networks,

work page doi:10.1145/3336191.3372186

[7] [7]

Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brah- man, Lester James V

URLhttps://arxiv.org/abs/1912.11460. Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brah- man, Lester James V . Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Ma- lik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wan...

work page arXiv 1912

[8] [8]

Tulu 3: Pushing Frontiers in Open Language Model Post-Training

URLhttps: //arxiv.org/abs/2411.15124. Chulhee Lee and D.A. Landgrebe. Decision boundary feature extraction for neural networks.IEEE Transactions on Neural Networks, 8(1):75–83,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Sungzoon Lee and B

doi: 10.1109/72.554193. Sungzoon Lee and B. John Oommen. Decision boundary boundary feature extraction for neural networks.IEEE Transactions on Neural Networks, 8(4):865–875,

work page doi:10.1109/72.554193

[10] [10]

ISBN 979-8-89176-251-0

Association for Computational Linguistics. ISBN 979-8-89176-251-0. doi: 10.18653/v1/2025.acl-long.256. URLhttps://aclanthology. org/2025.acl-long.256/. Yu Li, Lizhong Ding, and Xin Gao. On the decision boundary of deep neural networks,

work page doi:10.18653/v1/2025.acl-long.256 2025

[11] [11]

On the Decision Boundary of Deep Neural Networks

URL https://arxiv.org/abs/1808.05385. Zi Liang, Haibo Hu, Qingqing Ye, Yaxin Xiao, and Haoyang Li. Why are my prompts leaked? unraveling prompt extraction threats in customized large language models.arXiv preprint arXiv:2408.02416,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu

URLhttps://arxiv.org/abs/ 2505.12871. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada,

work page arXiv 2018

[13] [13]

David Mickisch, Felix Assion, Florens Greßner, Wiebke G ¨unther, and Mariele Motta

URLhttps://arxiv.org/abs/2509.09396. David Mickisch, Felix Assion, Florens Greßner, Wiebke G ¨unther, and Mariele Motta. Under- standing the decision boundary of deep neural networks: An empirical study,

work page arXiv

[14] [14]

URL https://arxiv.org/abs/2002.01810. NVIDIA. Nvidia nemotron nano 2: An accurate and efficient hybrid mamba-transformer reasoning model,

work page arXiv 2002

[15] [15]

NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

URLhttps://arxiv.org/abs/2508.14444. Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language under- standing by generative pre-training

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Frank Rosenblatt

URLhttps://arxiv.org/abs/ 2412.17056. Frank Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain.Psychological Review, 65(6):386–408,

work page arXiv

[17] [17]

Open Problems in Mechanistic Interpretability

Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Isaac Bloom, Stella Biderman, Adri`a Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saun...

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Open Problems in Mechanistic Interpretability

doi: 10.48550/ ARXIV .2501.16496. URLhttps://doi.org/10.48550/arXiv.2501.16496. Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.16496

[19] [19]

In- terpretability in the wild: a circuit for indirect object identification in GPT-2 small

Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Steinhardt. In- terpretability in the wild: a circuit for indirect object identification in GPT-2 small. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5,

work page 2023

[20] [20]

URLhttps://arxiv.org/abs/ 2504.13828. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang ...

work page arXiv

[21] [21]

Siyan Zhao, Tung Nguyen, and Aditya Grover

URLhttps://arxiv.org/abs/1908.02802. Siyan Zhao, Tung Nguyen, and Aditya Grover. Probing the decision boundaries of in- context learning in large language models. In A. Globerson, L. Mackey, D. Bel- grave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang (eds.),Advances in Neural In- formation Processing Systems, volume 37, pp. 130408–130432. Curran Associates, Inc.,

work page arXiv 1908

[22] [22]

14 A LLM USAGE It is used for error checking, proofreading, result visualization, and code optimization

URLhttps://proceedings.neurips.cc/paper_files/paper/2024/ file/eb5dd4476448c44e55a759a985b3bbec-Paper-Conference.pdf. 14 A LLM USAGE It is used for error checking, proofreading, result visualization, and code optimization. B PROOFS B.1 PROOF OFTHEOREM3.2 Proof.Part I: Proof of Equation

work page 2024