Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning

Bao Wang; Stanley J. Osher

arxiv: 1907.06800 · v1 · pith:WVUTI2QMnew · submitted 2019-07-16 · 💻 cs.LG · cs.NA· math.NA· stat.ML

Graph Interpolating Activation Improves Both Natural and Robust Accuracies in Data-Efficient Deep Learning

Bao Wang , Stanley J. Osher This is my paper

Pith reviewed 2026-05-24 21:10 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NAstat.ML

keywords graph interpolating activationdata-efficient learningadversarial robustnesssemi-supervised learningLaplace-Beltrami equationmanifold learningdeep neural networks

0 comments

The pith

Replacing softmax with a graph Laplacian interpolator raises both natural accuracy and adversarial robustness for DNNs trained on limited data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper replaces the usual softmax output layer in deep neural networks with a high-dimensional interpolating function built from the graph Laplacian. In the continuum this function solves a Laplace-Beltrami equation on the data manifold. The resulting networks are shown to train effectively with far fewer labeled examples than standard architectures. They also record higher accuracy on clean test images and higher accuracy against both white-box and black-box adversarial examples. The same change supplies a direct route to semi-supervised learning.

Core claim

The central claim is that a DNN whose final activation is the graph Laplacian interpolator, rather than softmax, integrates manifold geometry into the output layer and thereby improves both natural accuracy on clean images and robust accuracy on adversarially perturbed images, with the gains being largest when the training set is small.

What carries the argument

The graph Laplacian-based high-dimensional interpolating function that replaces softmax and converges to the solution of a Laplace-Beltrami equation on the data manifold.

If this is right

High-capacity networks become usable with training sets an order of magnitude smaller than current practice.
Robustness to both white-box and black-box attacks improves without extra adversarial training.
The architecture supplies a built-in mechanism for incorporating unlabeled data in semi-supervised regimes.
End-to-end training and inference algorithms remain essentially unchanged from standard DNN pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method effectively embeds a discrete manifold-learning step inside the final layer, offering a tighter coupling than typical manifold-regularization add-ons.
Because the interpolator is data-dependent, it may adapt automatically to distribution shift between training and test sets.
The same construction could be applied to intermediate layers to propagate geometric information deeper into the network.

Load-bearing premise

The graph Laplacian interpolator can be inserted as the output activation of a standard DNN and trained end-to-end without introducing instabilities or prohibitive extra cost.

What would settle it

Train identical DNNs on a small labeled subset of CIFAR-10 or SVHN, once with the new activation and once with softmax, then compare clean test accuracy and accuracy under FGSM or PGD attacks; if the graph version shows no consistent gain the claim fails.

Figures

Figures reproduced from arXiv: 1907.06800 by Bao Wang, Stanley J. Osher.

**Figure 2.** Figure 2: Illustration of training and testing procedures of the DNN with the WNLL inter [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Plots of test errors when 1K (a) and 10K (b) training data are used to train the [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗

**Figure 4.** Figure 4: Evolution of the generation accuracy over the training procedure. Charts (a) and [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Adversarial images (left panel) selected from the MNIST dataset and the corre [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Adversarial images (left panel) selected from the CIFAR10 dataset and the corre [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Epochs v.s. accuracy in training ResNet56 on the CIFAR10. (a): without the [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of the features learned by ResNet56 with the softmax ((a), (b)) and [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of the first two principal components of the adversarial images’ [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: A randomly selected adversarial image and their top five nearest neighbors in [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

read the original abstract

Improving the accuracy and robustness of deep neural nets (DNNs) and adapting them to small training data are primary tasks in deep learning research. In this paper, we replace the output activation function of DNNs, typically the data-agnostic softmax function, with a graph Laplacian-based high dimensional interpolating function which, in the continuum limit, converges to the solution of a Laplace-Beltrami equation on a high dimensional manifold. Furthermore, we propose end-to-end training and testing algorithms for this new architecture. The proposed DNN with graph interpolating activation integrates the advantages of both deep learning and manifold learning. Compared to the conventional DNNs with the softmax function as output activation, the new framework demonstrates the following major advantages: First, it is better applicable to data-efficient learning in which we train high capacity DNNs without using a large number of training data. Second, it remarkably improves both natural accuracy on the clean images and robust accuracy on the adversarial images crafted by both white-box and black-box adversarial attacks. Third, it is a natural choice for semi-supervised learning. For reproducibility, the code is available at \url{https://github.com/BaoWangMath/DNN-DataDependentActivation}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Graph Laplacian output activation is a fresh architectural move that could help low-data robustness, but the abstract leaves the training mechanics and actual gains unverified.

read the letter

The core idea is to drop softmax and put a graph-Laplacian interpolator in its place so the output layer solves something close to a Laplace-Beltrami problem on the data manifold. That is the actual novelty here, and it is a direct attempt to bake manifold geometry into the network rather than add it as a separate regularizer. They also sketch end-to-end training and testing procedures and release the code, which is useful for anyone who wants to test the claim. Those are the concrete positives. The paper targets exactly the settings where labeled data is scarce and adversarial robustness matters, so the motivation lines up with real needs in the field. The soft spots are more substantial. The abstract supplies no numbers, no baselines, and no error bars, so the statements about “remarkably improves” both natural and robust accuracy cannot be checked. More importantly, the stress-test point about stable differentiability and cost of the graph solve is not addressed in the provided text. If the Laplacian is rebuilt from features each step or if the linear solve is done naively, the method could introduce optimization artifacts or scale poorly; nothing in the abstract rules that out. The semi-supervised angle is mentioned but not developed enough to judge either. This is the kind of paper that would interest people working on activations, semi-supervised vision, or manifold-regularized networks. A reader who already experiments with custom output layers could get practical value from the code and the geometric framing. It is coherent on its own terms and shows honest engagement with the literature, so it deserves a serious referee to see the experiments and the implementation details. I would send it to review rather than desk-reject.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes replacing the standard softmax output activation in DNNs with a graph Laplacian-based high-dimensional interpolating function that, in the continuum limit, solves a Laplace-Beltrami equation on the data manifold. It introduces end-to-end training and testing procedures for this architecture and claims three advantages over conventional DNNs: improved applicability to data-efficient regimes, higher natural accuracy on clean data and robust accuracy under white- and black-box adversarial attacks, and natural suitability for semi-supervised learning. Reproducible code is provided.

Significance. If the claimed accuracy and robustness gains are shown to be statistically significant, reproducible across architectures, and not artifacts of altered optimization dynamics, the work would provide a concrete mechanism for injecting manifold geometry into the output layer of deep networks. The explicit provision of code strengthens the contribution by enabling direct verification of the end-to-end differentiability claim.

major comments (3)

[Section 3 (training algorithm)] The central claim that the graph interpolant can be stably integrated into the output layer and trained end-to-end with SGD-style optimizers rests on unverified assumptions about differentiability. The manuscript must supply the explicit back-propagation rule through the graph-Laplacian solve (or pseudoinverse) and demonstrate that the resulting gradients remain well-conditioned for standard mini-batch sizes; without this, reported gains could arise from an incidental change in the loss landscape rather than the manifold property itself.
[Section 4 (experiments)] No quantitative results, error bars, or baseline comparisons appear in the abstract, and the full text must include tables that report natural and robust accuracy (with standard deviations over multiple runs) against at least ResNet- and VGG-style softmax baselines on CIFAR-10/100 and ImageNet subsets for the data-efficient regime. The absence of these numbers makes it impossible to assess whether the claimed improvements are load-bearing or marginal.
[Section 2 (graph interpolating activation)] The construction of the graph Laplacian from high-dimensional features is described only at a high level; the paper must specify whether the Laplacian is recomputed every epoch from the current mini-batch embeddings or held fixed, and must quantify the additional per-iteration cost relative to standard softmax. If the cost scales with batch size squared, the data-efficiency advantage may be offset by computational overhead.

minor comments (2)

Notation for the graph Laplacian matrix and its pseudoinverse should be introduced with an explicit equation number and kept consistent between the theoretical derivation and the algorithmic pseudocode.
The abstract states that the method 'remarkably improves' both accuracies; the results section should replace this phrasing with precise percentage-point gains relative to the softmax baseline.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and will revise the manuscript accordingly to improve clarity, rigor, and completeness.

read point-by-point responses

Referee: [Section 3 (training algorithm)] The central claim that the graph interpolant can be stably integrated into the output layer and trained end-to-end with SGD-style optimizers rests on unverified assumptions about differentiability. The manuscript must supply the explicit back-propagation rule through the graph-Laplacian solve (or pseudoinverse) and demonstrate that the resulting gradients remain well-conditioned for standard mini-batch sizes; without this, reported gains could arise from an incidental change in the loss landscape rather than the manifold property itself.

Authors: We agree that explicit details on differentiability are required. The accompanying code implements the graph-Laplacian solve (via pseudoinverse) and its backward pass using automatic differentiation. In the revised manuscript we will add an explicit derivation of the back-propagation rule through the linear solve and include numerical verification that gradient norms remain well-conditioned for the batch sizes employed in the experiments. This will confirm that the reported gains stem from the manifold geometry rather than incidental optimization effects. revision: yes
Referee: [Section 4 (experiments)] No quantitative results, error bars, or baseline comparisons appear in the abstract, and the full text must include tables that report natural and robust accuracy (with standard deviations over multiple runs) against at least ResNet- and VGG-style softmax baselines on CIFAR-10/100 and ImageNet subsets for the data-efficient regime. The absence of these numbers makes it impossible to assess whether the claimed improvements are load-bearing or marginal.

Authors: We will revise the abstract to include key quantitative highlights. In Section 4 we will add tables that report natural and robust accuracies together with standard deviations computed over multiple independent runs, and we will include direct comparisons against ResNet- and VGG-style softmax baselines on CIFAR-10/100 and ImageNet subsets in the data-efficient regime. These additions will enable a clear statistical assessment of the improvements. revision: yes
Referee: [Section 2 (graph interpolating activation)] The construction of the graph Laplacian from high-dimensional features is described only at a high level; the paper must specify whether the Laplacian is recomputed every epoch from the current mini-batch embeddings or held fixed, and must quantify the additional per-iteration cost relative to standard softmax. If the cost scales with batch size squared, the data-efficiency advantage may be offset by computational overhead.

Authors: We will expand Section 2 to state explicitly that the graph Laplacian is built from the current mini-batch embeddings and is recomputed at every training iteration. We will also add a complexity analysis together with empirical timing measurements that quantify the additional per-iteration cost relative to softmax; the dominant term is the linear solve whose size equals the batch size. These details will allow readers to evaluate the computational trade-off against the observed data-efficiency gains. revision: yes

Circularity Check

0 steps flagged

No circularity; architectural proposal with empirical validation

full rationale

The paper introduces a graph-Laplacian interpolating activation as a direct replacement for softmax, justified by its continuum limit to the Laplace-Beltrami equation and supported by proposed end-to-end algorithms. No derivation step equates a claimed prediction or result to its own fitted inputs or self-citations by construction. Advantages in accuracy and data efficiency are framed as empirical outcomes rather than tautological identities. The central claims rest on experimental comparisons, not on re-deriving inputs from outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5753 in / 1056 out tokens · 17907 ms · 2026-05-24T21:10:28.659469+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 19 internal anchors

[1]

Learning Activation Functions to Improve Deep Neural Networks

F. Agostinelli, M. Hoﬀman, P. Sadowski, and P. Baldi. Learning Activation Functions to Improve Deep Neural Networks. arXiv preprint arXiv:1412.6830 ,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Adversarial Machine Learning against Tesla’s Autopilot

Anonymous. Adversarial Machine Learning against Tesla’s Autopilot. https://www. schneier.com/blog/archives/2019/04/adversarial_mac.html,

work page 2019
[3]

Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

W. Brendel, J. Rauber, and M. Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248 ,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

X. Chen, C. Liu, B. Li, K. Liu, and D. Song. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv preprint arXiv:1712.05526 , 2017a. Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng. Dual Path Networks. In Advances in neural information processing systems, 2017b. J. Cohen, E. Rosenfeld, and J. Z. Kolter. Certiﬁed Adversarial ...

work page internal anchor Pith review Pith/arXiv arXiv 1902
[5]

Z. Dou, S. J. Osher, and B. Wang. Mathematical Analysis of Adversarial Attacks. arXiv preprint arXiv:1811.06492,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Maxout Networks

I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout Networks. arXiv preprint arXiv:1302.4389 ,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and Harnessing Adversarial Exam- ples. arXiv preprint arXiv:1412.6275 ,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Improving neural networks by preventing co-adaptation of feature detectors

G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improv- ing neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Adam: A Method for Stochastic Optimization

D. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Deep Residual Learning and PDEs on Manifold

Z. Li and Z. Shi. Deep Residual Learning and PDEs on Manifold. arXiv preprint arXiv:1708.05115,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 ,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

S. J. Osher, B. Wang, P. Yin, X. Luo, M. Pham, and A. Lin. Laplacian Smoothing Gradient Descent. arXiv preprint arXiv:1806.06317 ,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z.B. Celik, and A. Swami. The Limita- tions of Deep Learning in Adversarial Settings. IEEE European Symposium on Security and Privacy, pages 372–387, 2016a. N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks. IEEE Europe...

work page internal anchor Pith review Pith/arXiv arXiv
[14]

Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients

32 A. Ross and F. Doshi-Velez. Improving the Adversarial Robustness and Interpretabil- ity of Deep Neural Networks by Regularizing Their Input Gradients. arXiv preprint arXiv:1711.09404,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

URL https://openreview.net/forum?id=BkJ3ibb0-. Z. Shi, B. Wang, and S. Osher. Error Estimation of the Weighted Nonlocal Laplacian on Random Point Cloud. arXiv preprint arXiv:1809.08622 ,

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing Properties of Neural Networks. arXiv preprint arXiv:1312.6199 ,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Y. Tang. Deep Learning Using Linear Support Vector Machines. ArXiv:1306.0239,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

URL https://openreview.net/forum?id= rkZvSe-RZ. V. Verma, A. Lamb, C. Beckham, A. Najaﬁ, I. Mitiagkas, A. Courville, D. Lopez-Paz, and Y. Bengio. Manifold Mixup: Better Representations by Interpolating Hidden States. arXiv preprint arXiv:1806.05236 ,

work page internal anchor Pith review Pith/arXiv arXiv
[20]

B. Wang, A. T. Lin, Z. Shi, W. Zhu, P. Yin, A. L. Bertozzi, and S. J. Osher. Adversar- ial Defense via Data Dependent Activation Function and Total Variation Minimization. arXiv preprint arXiv:1809.08516 , 2018a. B. Wang, X. Luo, Z. Li, W. Zhu, Z. Shi, and S. Osher. Deep Neural Nets with Interpolating Function as Output Activation. In Advances in Neural I...

work page arXiv
[21]

Theoretically Principled Trade-off between Robustness and Accuracy

H. Zhang, Y. Yu, J. Jiao, E. Xing, L. Ghaoui, and M. Jordan. Theoretically Principled Trade-oﬀ between Robustness and Accuracy. arXiv preprint arXiv:1901.08573 ,

work page internal anchor Pith review Pith/arXiv arXiv 1901

[1] [1]

Learning Activation Functions to Improve Deep Neural Networks

F. Agostinelli, M. Hoﬀman, P. Sadowski, and P. Baldi. Learning Activation Functions to Improve Deep Neural Networks. arXiv preprint arXiv:1412.6830 ,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Adversarial Machine Learning against Tesla’s Autopilot

Anonymous. Adversarial Machine Learning against Tesla’s Autopilot. https://www. schneier.com/blog/archives/2019/04/adversarial_mac.html,

work page 2019

[3] [3]

Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

W. Brendel, J. Rauber, and M. Bethge. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248 ,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

X. Chen, C. Liu, B. Li, K. Liu, and D. Song. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv preprint arXiv:1712.05526 , 2017a. Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng. Dual Path Networks. In Advances in neural information processing systems, 2017b. J. Cohen, E. Rosenfeld, and J. Z. Kolter. Certiﬁed Adversarial ...

work page internal anchor Pith review Pith/arXiv arXiv 1902

[5] [5]

Z. Dou, S. J. Osher, and B. Wang. Mathematical Analysis of Adversarial Attacks. arXiv preprint arXiv:1811.06492,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Maxout Networks

I. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout Networks. arXiv preprint arXiv:1302.4389 ,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and Harnessing Adversarial Exam- ples. arXiv preprint arXiv:1412.6275 ,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Improving neural networks by preventing co-adaptation of feature detectors

G. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improv- ing neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Adam: A Method for Stochastic Optimization

D. Kingma and J. Ba. Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Deep Residual Learning and PDEs on Manifold

Z. Li and Z. Shi. Deep Residual Learning and PDEs on Manifold. arXiv preprint arXiv:1708.05115,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Y. Liu, X. Chen, C. Liu, and D. Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 ,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

S. J. Osher, B. Wang, P. Yin, X. Luo, M. Pham, and A. Lin. Laplacian Smoothing Gradient Descent. arXiv preprint arXiv:1806.06317 ,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z.B. Celik, and A. Swami. The Limita- tions of Deep Learning in Adversarial Settings. IEEE European Symposium on Security and Privacy, pages 372–387, 2016a. N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks. IEEE Europe...

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients

32 A. Ross and F. Doshi-Velez. Improving the Adversarial Robustness and Interpretabil- ity of Deep Neural Networks by Regularizing Their Input Gradients. arXiv preprint arXiv:1711.09404,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

URL https://openreview.net/forum?id=BkJ3ibb0-. Z. Shi, B. Wang, and S. Osher. Error Estimation of the Weighted Nonlocal Laplacian on Random Point Cloud. arXiv preprint arXiv:1809.08622 ,

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing Properties of Neural Networks. arXiv preprint arXiv:1312.6199 ,

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Y. Tang. Deep Learning Using Linear Support Vector Machines. ArXiv:1306.0239,

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

URL https://openreview.net/forum?id= rkZvSe-RZ. V. Verma, A. Lamb, C. Beckham, A. Najaﬁ, I. Mitiagkas, A. Courville, D. Lopez-Paz, and Y. Bengio. Manifold Mixup: Better Representations by Interpolating Hidden States. arXiv preprint arXiv:1806.05236 ,

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

B. Wang, A. T. Lin, Z. Shi, W. Zhu, P. Yin, A. L. Bertozzi, and S. J. Osher. Adversar- ial Defense via Data Dependent Activation Function and Total Variation Minimization. arXiv preprint arXiv:1809.08516 , 2018a. B. Wang, X. Luo, Z. Li, W. Zhu, Z. Shi, and S. Osher. Deep Neural Nets with Interpolating Function as Output Activation. In Advances in Neural I...

work page arXiv

[21] [21]

Theoretically Principled Trade-off between Robustness and Accuracy

H. Zhang, Y. Yu, J. Jiao, E. Xing, L. Ghaoui, and M. Jordan. Theoretically Principled Trade-oﬀ between Robustness and Accuracy. arXiv preprint arXiv:1901.08573 ,

work page internal anchor Pith review Pith/arXiv arXiv 1901