Why CNN Features Are not Gaussian: A Statistical Anatomy of Deep Representations

David Chapman; Parniyan Farvardin

arxiv: 2411.05183 · v4 · submitted 2024-11-07 · 💻 cs.CV · cs.LG

Why CNN Features Are not Gaussian: A Statistical Anatomy of Deep Representations

David Chapman , Parniyan Farvardin This is my paper

Pith reviewed 2026-05-23 17:02 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords CNN featuresstatistical distributionWeibull distributiontail dependencedeep representationsnon-Gaussiancopula modelingMatthew process

0 comments

The pith

CNN feature activations deviate substantially from Gaussian and follow long-tailed Weibull distributions instead.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the common assumption that internal activations in convolutional neural networks follow Gaussian distributions. Systematic measurements across multiple architectures and datasets show that the activations instead exhibit long tails best matched by Weibull and related families. Tail length grows with network depth while upper-tail dependence appears between pairs of features. These patterns contradict expectations from the central limit theorem and point instead to a process that concentrates semantic information in the extremes. The results indicate that networks built this way reduce noise effectively but handle outliers poorly, so density models should use long-tailed priors with upper-tail dependence rather than Gaussians.

Core claim

Deep convolutional neural networks produce internal feature activations whose distributions are substantially non-Gaussian and instead follow long-tailed families such as the Weibull. A new Discretized Characteristic Function Copula method reveals increasing tail length with depth and the emergence of upper-tail dependence between feature pairs. These patterns indicate a Matthew process that concentrates semantic signal in the tails, making the networks effective at noise reduction but less so at handling outliers.

What carries the argument

The Discretized Characteristic Function Copula (DCF-Copula) method, which models multivariate feature dependencies and exposes upper-tail dependence not captured by Gaussian assumptions.

If this is right

CNNs reduce noise effectively yet perform poorly on outlier removal tasks.
Long-tailed upper-tail-dependent priors should replace Gaussian priors when modeling deep feature densities.
Tail length increases with network depth.
Upper-tail dependence emerges between feature pairs as depth grows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar non-Gaussian tail behavior may appear in transformer or other non-convolutional deep networks.
Feature-based density estimation or generative models could gain accuracy by adopting these tail-dependent priors.
Outlier-sensitive applications that rely on deep features may require revised statistical assumptions.

Load-bearing premise

The empirical fits to Weibull and related families on the chosen architectures and datasets generalize beyond the tested cases and the observed tail behavior is not an artifact of the specific activation functions or normalization layers used.

What would settle it

Observing that feature activations across layers in a new deep CNN fit a Gaussian distribution closely on multiple standard datasets would contradict the central claim.

Figures

Figures reproduced from arXiv: 2411.05183 by David Chapman, Parniyan Farvardin.

**Figure 2.** Figure 2: Illustration of ResNet-18 (left) and VGG-19 (right) deep feature layers selected for [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Percent of nonzero features per layer. We now quantitatively compare the goodness of fit of five parametric distributions to the marginals of the non-zero features. The distributions that we compare are the uniform distribution, the Gaussian distribution, the gamma distribution, and the Weibull distribution. The optimal parameters of these distributions are determined using the method of stochastic hill c… view at source ↗

**Figure 4.** Figure 4: Histogram of marginal density for pre-trained ResNet-18 on Imagenette2 for features [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Quantitative goodness-of-fit of five standard distributions to the feature marginals for [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Tail parameter analysis for CIFAR-10, CIFAR-100, Imagenette2, and MNIST, across [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Empirical optimal Weibull θ tail parameters per layer (solid) versus theoretical estimates (dashed). We also observe that VGG-19 has significantly longer tails than comparable ResNet models for the deep intermediate layers. Across all datasets, all of the deep intermediate layers for VGG-19 exhibit long tails. For ResNet, only the deepest intermediate layer is long-tailed, with the exception of MNIST ResNe… view at source ↗

**Figure 8.** Figure 8: Select copula interdependence for pairwise features for 5 layers of ResNet-18 over Ima [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Select copula interdependence for pairwise features for 5 layers of ResNet-18 over Ima [PITH_FULL_IMAGE:figures/full_fig_p028_9.png] view at source ↗

read the original abstract

Deep convolutional neural networks (CNNs) are commonly analyzed through geometric and linear-algebraic perspectives, yet the statistical distribution of their internal feature activations remains poorly understood. In many applications, deep features are implicitly treated as Gaussian when modeling densities. In this work, we empirically examine this assumption and show that it does not accurately describe the distribution of CNN feature activations. Through a systematic study across multiple architectures and datasets, we find that the feature activations deviate substantially from Gaussian and are better characterized by Weibull and related long-tailed distributions. We further introduce a novel Discretized Characteristic Function Copula (DCF-Copula) method to model multivariate feature dependencies. We find that tail-length increases with network depth and that upper-tail dependence emerges between feature pairs. These statistical findings are not consistent with the Central Limit Theorem, and are instead indicative of a Matthew process that progressively concentrates semantic signal within the tails. These statistical findings suggest that CNNs are excellent at noise reduction, yet poor at outlier removal tasks. We recommend the use of long-tailed upper-tail-dependent priors as opposed to Gaussian priors for accurately CNN deep feature density. Code available at https://github.com/dchapman-prof/DCF-Copula

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper documents long-tailed CNN features with depth-dependent tails and a new copula for dependence, but its dismissal of the CLT rests on shaky ground.

read the letter

The main thing here is that CNN activations across several architectures are long-tailed rather than Gaussian, with tails getting heavier deeper in the network and clear upper-tail dependence between channels. They back this with fits to Weibull and similar families and introduce DCF-Copula to handle the multivariate part without assuming normality. That empirical pattern and the copula construction are the actual new pieces; prior work had noted non-Gaussianity in spots but not at this scale or with the depth trend tied to a modeling tool. The work is observational and cleanly separates the fitting from any derived claim, which is a plus. Code release helps too. The soft spot is the CLT contrast. The paper states the tails are inconsistent with the Central Limit Theorem and point instead to a Matthew process. But CNN layers apply ReLUs and normalizations to spatially and channel-dependent inputs, so the usual CLT conditions for asymptotic normality are not met to begin with. Without a derivation or simulation showing what the CLT would actually predict under the real generative process, the contrast does not land as strongly as claimed. The Matthew-process reading is interpretive rather than required by the data. Methodological details on sample sizes per layer, zero-handling, and multiple-testing correction are also thin in the abstract, though the full text may clarify them. This is useful for people doing density estimation or uncertainty work on deep features who need better priors than Gaussians. Readers already skeptical of normality assumptions in vision models will find the empirical maps helpful. It is coherent on its own terms and engages the relevant literature, so it deserves a serious referee. I would send it for review and ask the authors to tighten the CLT section and test generalization on more recent architectures.

Referee Report

2 major / 1 minor

Summary. The paper empirically analyzes the statistical distributions of internal feature activations in CNNs across architectures and datasets. It claims these activations deviate substantially from Gaussianity and are better characterized by Weibull and other long-tailed families. The work introduces the Discretized Characteristic Function Copula (DCF-Copula) to capture multivariate dependencies, reports increasing tail length with depth and emerging upper-tail dependence between features, interprets the results as inconsistent with the Central Limit Theorem and instead indicative of a Matthew process that concentrates semantic signal in tails, and recommends long-tailed upper-tail-dependent priors over Gaussian ones for feature density modeling. Reproducible code is provided.

Significance. If the empirical distribution findings and tail-dependence results hold under scrutiny, the work supplies a useful statistical characterization of deep representations that questions the routine Gaussian assumption in density estimation and feature modeling tasks. The DCF-Copula is presented as a novel modeling tool for tail dependencies. Explicit code release supports reproducibility and verification of the reported fits.

major comments (2)

[Abstract / CLT discussion] The interpretive claim (Abstract and the section contrasting findings with the CLT) that the long-tailed behavior 'is not consistent with the Central Limit Theorem' is load-bearing for the Matthew-process interpretation, yet the manuscript provides no derivation or simulation establishing that CLT conditions (independent or weakly dependent summands with finite variance) would be expected to produce Gaussian activations given the actual generative process: convolutions over spatially/channel-dependent inputs, pointwise nonlinearities (ReLU), and normalization layers.
[Empirical methodology] § on empirical distribution fitting: the reported superiority of Weibull and related families rests on distribution fitting whose details (per-layer and per-feature sample sizes, goodness-of-fit tests employed, handling of zero activations from ReLU, and multiple-testing correction across channels and layers) are not reported, undermining assessment of whether the tail-length and dependence claims are robust or artifacts of the chosen architectures/normalizations.

minor comments (1)

[Methods] The formal definition and discretization procedure for the DCF-Copula could be stated more explicitly with pseudocode or equations to aid implementation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and robustness of our empirical findings on CNN feature distributions. We respond to each major comment below and will revise the manuscript to incorporate additional details and discussion as outlined.

read point-by-point responses

Referee: [Abstract / CLT discussion] The interpretive claim (Abstract and the section contrasting findings with the CLT) that the long-tailed behavior 'is not consistent with the Central Limit Theorem' is load-bearing for the Matthew-process interpretation, yet the manuscript provides no derivation or simulation establishing that CLT conditions (independent or weakly dependent summands with finite variance) would be expected to produce Gaussian activations given the actual generative process: convolutions over spatially/channel-dependent inputs, pointwise nonlinearities (ReLU), and normalization layers.

Authors: We agree that the manuscript would benefit from a more explicit justification of why the CLT does not apply here. Our core claim remains empirical: the observed activations exhibit long tails inconsistent with Gaussianity, which we contrast with the CLT's typical prediction under standard assumptions of independent or weakly dependent summands with finite variance. The CNN generative process (convolutions inducing spatial/channel dependence, ReLU introducing asymmetry and potential infinite moments, and normalizations) violates these conditions, supporting the Matthew-process reading. To strengthen this, the revised version will include a short discussion of the relevant CLT conditions alongside a minimal simulation contrasting summed independent finite-variance variables (yielding approximate Gaussianity) with a simplified ReLU-convolution process (reproducing heavy tails). This addition addresses the load-bearing nature of the claim without altering the empirical results. revision: yes
Referee: [Empirical methodology] § on empirical distribution fitting: the reported superiority of Weibull and related families rests on distribution fitting whose details (per-layer and per-feature sample sizes, goodness-of-fit tests employed, handling of zero activations from ReLU, and multiple-testing correction across channels and layers) are not reported, undermining assessment of whether the tail-length and dependence claims are robust or artifacts of the chosen architectures/normalizations.

Authors: We acknowledge that these methodological details were insufficiently reported and will expand the relevant section in revision. Per-feature sample sizes are determined by aggregating over spatial dimensions and batch size, yielding approximately 10^4–10^5 observations per channel (varying by layer depth and input resolution). Fitting used maximum-likelihood estimation for candidate distributions (Gaussian, Weibull, log-normal, etc.), with model selection based on AIC/BIC and visual Q-Q plot inspection focused on tails; Kolmogorov-Smirnov tests were applied for quantitative comparison where sample sizes permitted. ReLU-induced zeros were handled by separately modeling the point mass at zero and fitting the continuous positive support to the nonzero activations. No formal multiple-testing correction was applied, as the analysis emphasizes qualitative trends (increasing tail length and dependence with depth) across architectures rather than per-channel hypothesis tests. The revision will add an explicit subsection with these specifications, sample-size tables, and code references to allow independent verification of robustness. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely observational with independent modeling contribution

full rationale

The paper conducts an empirical statistical analysis of CNN feature activations across architectures and datasets, fitting distributions (Weibull etc.) and introducing the DCF-Copula method for dependencies. No derivation chain exists that reduces a claimed prediction or first-principles result to its own inputs by construction. The interpretive contrast with CLT and reference to a Matthew process are post-hoc characterizations of observed data rather than load-bearing derivations. Self-citations are absent from the provided text, and the central claims rest on direct empirical measurements rather than fitted parameters renamed as predictions or ansatzes smuggled via prior work.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard statistical assumptions about i.i.d. sampling of activations and on the choice of parametric families (Weibull, etc.) whose parameters are fitted to data; the new DCF-Copula is an invented modeling entity whose independent evidence is the empirical fit itself.

free parameters (1)

Weibull shape and scale per layer
Fitted to empirical activation histograms; central to the claim that Weibull outperforms Gaussian.

axioms (1)

domain assumption Activations within a layer are treated as i.i.d. samples from a common marginal distribution
Invoked when fitting univariate distributions and when constructing the copula.

invented entities (1)

DCF-Copula no independent evidence
purpose: Model multivariate upper-tail dependence among feature activations
New construction introduced to capture observed tail dependence; independent evidence is the reported empirical improvement over Gaussian copulas.

pith-pipeline@v0.9.0 · 5739 in / 1261 out tokens · 33452 ms · 2026-05-23T17:02:49.134161+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

feature activations deviate substantially from Gaussian and are better characterized by Weibull and related long-tailed distributions... tail-length increases with network depth and that upper-tail dependence emerges between feature pairs... indicative of a Matthew process that progressively concentrates semantic signal within the tails

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CAWI: Copula-Aligned Weight Initialization for Randomized Neural Networks
cs.LG 2026-05 unverdicted novelty 7.0

CAWI replaces standard random initialization of input-to-hidden weights in randomized neural networks with samples drawn from a data-fitted copula that preserves observed feature dependencies, yielding consistent accu...

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

A class of bivariate distributions including the bivariate logistic

Mir M Ali, NN Mikhail, and M Safiul Haq. A class of bivariate distributions including the bivariate logistic. Journal of multivariate analysis , 8(3):405–412, 1978

work page 1978
[2]

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Zeyuan Allen-Zhu and Yuanzhi Li. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. In The Eleventh International Conference on Learning Representations, 2023

work page 2023
[3]

A characteristic function approach to deep implicit generative modeling

Abdul Fatir Ansari, Jonathan Scarlett, and Harold Soh. A characteristic function approach to deep implicit generative modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , June 2020

work page 2020
[4]

Portfolio optimization through hybrid deep learning and genetic algorithms vine copula-garch-evt-cvar model

Rihab Bedoui, Ramzi Benkraiem, Khaled Guesmi, and Islem Kedidi. Portfolio optimization through hybrid deep learning and genetic algorithms vine copula-garch-evt-cvar model. Tech- nological Forecasting and Social Change, 197:122887, 2023

work page 2023
[5]

Recent development in copula and its applications to the energy, forestry and environmental sciences

M Ishaq Bhatti and Hung Quang Do. Recent development in copula and its applications to the energy, forestry and environmental sciences. International Journal of Hydrogen Energy , 44(36):19453–19473, 2019

work page 2019
[6]

Novelty detection and neural network validation

Christopher M Bishop. Novelty detection and neural network validation. In ICANN’93: Proceedings of the International Conference on Artificial Neural Networks Amsterdam, The Netherlands 13–16 September 1993 3 , pages 789–794. Springer, 1993

work page 1993
[7]

Variational inference with continuously-indexed normalizing flows

Anthony Caterini, Rob Cornish, Dino Sejdinovic, and Arnaud Doucet. Variational inference with continuously-indexed normalizing flows. In Uncertainty in Artificial Intelligence , pages 44–53. PMLR, 2021

work page 2021
[8]

Anomaly detection: A survey

Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR) , 41(3):1–58, 2009

work page 2009
[9]

Under- standing and improving feature learning for out-of-distribution generalization

Yongqiang Chen, Wei Huang, Kaiwen Zhou, Yatao Bian, Bo Han, and James Cheng. Under- standing and improving feature learning for out-of-distribution generalization. Advances in Neural Information Processing Systems, 36, 2024

work page 2024
[10]

Probabilistic circuits: A unifying framework for tractable probabilistic models, 2020

YooJung Choi, Antonio Vergari, and Guy Van den Broec. Probabilistic circuits: A unifying framework for tractable probabilistic models, 2020

work page 2020
[11]

A model for association in bivariate life tables and its application in epidemi- ological studies of familial tendency in chronic disease incidence

David G Clayton. A model for association in bivariate life tables and its application in epidemi- ological studies of familial tendency in chronic disease incidence. Biometrika, 65(1):141–151, 1978

work page 1978
[12]

Feature density estimation for out-of-distribution detection via normalizing flows

Evan D Cook, Marc-Antoine Lavoie, and Steven L Waslander. Feature density estimation for out-of-distribution detection via normalizing flows. arXiv preprint arXiv:2402.06537 , 2024

work page arXiv 2024
[13]

Archimedean copula and contagion modeling in epidemiology

Jacques Demongeot, Mohamad Ghassani, Mustapha Rachdi, Idir Ouassou, and Carla Taram- asco. Archimedean copula and contagion modeling in epidemiology. Networks and Heteroge- neous Media, 8(1):149–170, 2013

work page 2013
[14]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 32

work page 2009
[15]

The mnist database of handwritten digit images for machine learning research

Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012

work page 2012
[16]

Orthogonal gradient descent for continual learning

Mehrdad Farajtabar, Navid Azizan, Alex Mott, and Ang Li. Orthogonal gradient descent for continual learning. In International conference on artificial intelligence and statistics , pages 3762–3773. PMLR, 2020

work page 2020
[17]

Does learning require memorization? a short tale about a long tail

Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages 954–959, 2020

work page 2020
[18]

What neural networks memorize and why: Discovering the long tail via influence estimation

Vitaly Feldman and Chiyuan Zhang. What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems , 33:2881–2891, 2020

work page 2020
[19]

The empirical characteristic function and its applications

Andrey Feuerverger and Roman A Mureika. The empirical characteristic function and its applications. The annals of Statistics , pages 88–97, 1977

work page 1977
[20]

On the simultaneous associativity of f (x, y) and x+ y- f (x, y)

Maurice J Frank. On the simultaneous associativity of f (x, y) and x+ y- f (x, y). Aequationes mathematicae, 19:194–226, 1979

work page 1979
[21]

A low effort approach to structured cnn design using pca

Isha Garg, Priyadarshini Panda, and Kaushik Roy. A low effort approach to structured cnn design using pca. IEEE Access, 8:1347–1360, 2019

work page 2019
[22]

Integrating flexible normalization into mi- dlevel representations of deep convolutional neural networks.Neural computation, 31(11):2138– 2176, 2019

Luis Gonzalo S´ anchez Giraldo and Odelia Schwartz. Integrating flexible normalization into mi- dlevel representations of deep convolutional neural networks.Neural computation, 31(11):2138– 2176, 2019

work page 2019
[23]

Generative adversarial networks.Communications of the ACM , 63(11):139–144, 2020

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Communications of the ACM , 63(11):139–144, 2020

work page 2020
[24]

Out-of-distribution de- tection is not all you need

Joris Gu´ erin, Kevin Delmas, Raul Ferreira, and J´ er´ emie Guiochet. Out-of-distribution de- tection is not all you need. In Proceedings of the AAAI conference on artificial intelligence , volume 37, pages 14829–14837, 2023

work page 2023
[25]

Bivariate exponential distributions

Emil J Gumbel. Bivariate exponential distributions. Journal of the American Statistical Association, 55(292):698–707, 1960

work page 1960
[26]

Large sample properties of generalized method of moments estimators

Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica: Journal of the econometric society , pages 1029–1054, 1982

work page 1982
[27]

A brief survey on semantic segmentation with deep learning

Shijie Hao, Yuan Zhou, and Yanrong Guo. A brief survey on semantic segmentation with deep learning. Neurocomputing, 406:302–321, 2020

work page 2020
[28]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[29]

What shapes feature representations? exploring datasets, architectures, and training

Katherine Hermann and Andrew Lampinen. What shapes feature representations? exploring datasets, architectures, and training. Advances in Neural Information Processing Systems , 33:9995–10006, 2020. 33

work page 2020
[30]

Imagenette: A smaller subset of 10 easily classified classes from imagenet, March 2019

Jeremy Howard. Imagenette: A smaller subset of 10 easily classified classes from imagenet, March 2019

work page 2019
[31]

Spatio-temporal wind speed prediction based on clayton copula function with deep learning fusion

Yu Huang, Bingzhe Zhang, Huizhen Pang, Biao Wang, Kwang Y Lee, Jiale Xie, and Yupeng Jin. Spatio-temporal wind speed prediction based on clayton copula function with deep learning fusion. Renewable energy, 192:526–536, 2022

work page 2022
[32]

Detect- ing out-of-distribution data through in-distribution class prior

Xue Jiang, Feng Liu, Zhen Fang, Hong Chen, Tongliang Liu, Feng Zheng, and Bo Han. Detect- ing out-of-distribution data through in-distribution class prior. In International Conference on Machine Learning, pages 15067–15088. PMLR, 2023

work page 2023
[33]

Multivariate extreme-value distributions with applications to environmental data

Harry Joe. Multivariate extreme-value distributions with applications to environmental data. Canadian Journal of Statistics , 22(1):47–64, 1994

work page 1994
[34]

A review of copula methods for measuring uncertainty in finance and eco- nomics

Jong-Min Kim. A review of copula methods for measuring uncertainty in finance and eco- nomics. Quantitative Bio-Science, 39(2):81–90, 2020

work page 2020
[35]

Auto-Encoding Variational Bayes

Diederik P Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[36]

Improved variational inference with inverse autoregressive flow.Advances in neural information processing systems, 29, 2016

Durk P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow.Advances in neural information processing systems, 29, 2016

work page 2016
[37]

Explaining distributed neural acti- vations via unsupervised learning

Soheil Kolouri, Charles E Martin, and Heiko Hoffmann. Explaining distributed neural acti- vations via unsupervised learning. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 20–28, 2017

work page 2017
[38]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[39]

Pytorch-cifar: optimized cnn aarchitectures for cifar10, 2017

Liu Kuang. Pytorch-cifar: optimized cnn aarchitectures for cifar10, 2017

work page 2017
[40]

Perfect density models cannot guarantee anomaly detec- tion

Charline Le Lan and Laurent Dinh. Perfect density models cannot guarantee anomaly detec- tion. Entropy, 23(12):1690, 2021

work page 2021
[41]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018

work page 2018
[42]

Mmd gan: Towards deeper understanding of moment matching network

Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabas Poczos. Mmd gan: Towards deeper understanding of moment matching network. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017
[43]

Align before fuse: Vision and language representation learning with momentum distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems , 34:9694–9705, 2021

work page 2021
[44]

Generative moment matching networks

Yujia Li, Kevin Swersky, and Rich Zemel. Generative moment matching networks. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research , pages 1718–1727, Lille, France, 07–09 Jul 2015. PMLR. 34

work page 2015
[45]

Deep archimedean copulas

Chun Kai Ling, Fei Fang, and J Zico Kolter. Deep archimedean copulas. Advances in Neural Information Processing Systems, 33:1535–1545, 2020

work page 2020
[46]

Unsupervised anomaly detection by robust density estimation

Boyang Liu, Pang-Ning Tan, and Jiayu Zhou. Unsupervised anomaly detection by robust density estimation. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4101–4108, 2022

work page 2022
[47]

Energy-based out-of-distribution detection

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 21464–21475. Curran Associates, Inc., 2020

work page 2020
[48]

Hybrid design of cnn and vision transformer: A review

Hanhua Long. Hybrid design of cnn and vision transformer: A review. In Proceedings of the 2024 7th International Conference on Computer Information Science and Artificial Intel- ligence, pages 121–127, 2024

work page 2024
[49]

A method of moments embedding constraint and its application to semi-supervised learning

Michael Majurski, Sumeet Menon, Parniyan Favardin, and David Chapman. A method of moments embedding constraint and its application to semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 7809–7818, 2024

work page 2024
[50]

13 financial applications of stable distributions

J Huston McCulloch. 13 financial applications of stable distributions. Handbook of statistics, 14:393–425, 1996

work page 1996
[51]

Do Deep Generative Models Know What They Don't Know?

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshmi- narayanan. Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[52]

An introduction to copulas

Roger B Nelsen. An introduction to copulas. Springer, 2006

work page 2006
[53]

Learning deconvolution network for semantic segmentation

Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1520–1528, 2015

work page 2015
[54]

Multivariate elliptically contoured stable distributions: theory and estimation

John Nolan. Multivariate elliptically contoured stable distributions: theory and estimation. Computational Statistics, 28(5):2067–2089, 2013

work page 2067
[55]

Modeling and forecasting short-term power load with copula model and deep belief network

Tinghui Ouyang, Yusen He, Huajin Li, Zhiyu Sun, and Stephen Baek. Modeling and forecasting short-term power load with copula model and deep belief network. IEEE Transactions on Emerging Topics in Computational Intelligence , 3(2):127–136, 2019

work page 2019
[56]

Complexity matters: Dynamics of feature learning in the presence of spurious correlations

GuanWen Qiu, Da Kuang, and Surbhi Goel. Complexity matters: Dynamics of feature learning in the presence of spurious correlations. arXiv preprint arXiv:2403.03375 , 2024

work page arXiv 2024
[57]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning , pages 8748–8763. PMLR, 2021

work page 2021
[58]

Variational inference with normalizing flows

Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Inter- national conference on machine learning , pages 1530–1538. PMLR, 2015. 35

work page 2015
[59]

Modeling the distribution of normal data in pre-trained deep features for anomaly detection

Oliver Rippel, Patrick Mertens, and Dorit Merhof. Modeling the distribution of normal data in pre-trained deep features for anomaly detection. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6726–6733. IEEE, 2021

work page 2020
[60]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 10684–10695, 2022

work page 2022
[61]

Gradient projection memory for continual learn- ing

Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient projection memory for continual learn- ing. In International Conference on Learning Representations, 2021

work page 2021
[62]

Learning to share visual appearance for multiclass object detection

Ruslan Salakhutdinov, Antonio Torralba, and Josh Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR 2011, pages 1481–1488, 2011

work page 2011
[63]

Opening the Black Box of Deep Neural Networks via Information

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[64]

Copula-based data augmentation on a deep learning architecture for cardiac sensor fusion.IEEE journal of biomedical and health informatics, 25(7):2521–2532, 2020

Diogo Silva, Steffen Leonhardt, and Christoph Hoog Antink. Copula-based data augmentation on a deep learning architecture for cardiac sensor fusion.IEEE journal of biomedical and health informatics, 25(7):2521–2532, 2020

work page 2020
[65]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[66]

Fonctions de r´ epartition ` a n dimensions et leurs marges

M Sklar. Fonctions de r´ epartition ` a n dimensions et leurs marges. In Annales de l’ISUP , volume 8, pages 229–231, 1959

work page 1959
[67]

Feature distribution matching for federated domain generalization

Yuwei Sun, Ng Chong, and Hideya Ochiai. Feature distribution matching for federated domain generalization. In Asian Conference on Machine Learning , pages 942–957. PMLR, 2023

work page 2023
[68]

Understanding priors in bayesian neural networks at the unit level

Mariia Vladimirova, Jakob Verbeek, Pablo Mesejo, and Julyan Arbel. Understanding priors in bayesian neural networks at the unit level. In International Conference on Machine Learning , pages 6458–6467. PMLR, 2019

work page 2019
[69]

A survey on video diffusion models

Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, and Yu-Gang Jiang. A survey on video diffusion models. ACM Computing Surveys , 2023

work page 2023
[70]

Diffusion models: A comprehensive survey of methods and applications

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys , 56(4):1–39, 2023

work page 2023
[71]

Empirical characteristic function estimation and its applications

Jun Yu. Empirical characteristic function estimation and its applications. Econometric reviews, 23(2):93–123, 2004

work page 2004
[72]

Characteristic circuits

Zhongjie Yu, Martin Trapp, and Kristian Kersting. Characteristic circuits. Advances in Neural Information Processing Systems, 36:34074–34086, 2023

work page 2023
[73]

Feature extraction and image retrieval based on alexnet

Zheng-Wu Yuan and Jun Zhang. Feature extraction and image retrieval based on alexnet. In Eighth International Conference on Digital Image Processing (ICDIP 2016) , volume 10033, pages 65–69. SPIE, 2016

work page 2016
[74]

Mathematical functions and their approximations

Luke L Yudell. Mathematical functions and their approximations . Academic Press, New York, 1975. 36

work page 1975
[75]

A systematic review on long-tailed learning

Chongsheng Zhang, George Almpanidis, Gaojuan Fan, Binquan Deng, Yanbo Zhang, Ji Liu, Aouaidjia Kamel, Paolo Soda, and Jo˜ ao Gama. A systematic review on long-tailed learning. IEEE Transactions on Neural Networks and Learning Systems , 2025

work page 2025
[76]

Understanding failures in out-of- distribution detection with deep generative models

Lily Zhang, Mark Goldstein, and Rajesh Ranganath. Understanding failures in out-of- distribution detection with deep generative models. In International Conference on Machine Learning, pages 12427–12436. PMLR, 2021

work page 2021
[77]

Interpretable convolutional neural net- works

Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. Interpretable convolutional neural net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 8827–8836, 2018

work page 2018
[78]

Capturing long-tail distributions of object subcategories

Xiangxin Zhu, Dragomir Anguelov, and Deva Ramanan. Capturing long-tail distributions of object subcategories. In 2014 IEEE Conference on Computer Vision and Pattern Recognition , pages 915–922, 2014

work page 2014
[79]

Boosting out-of-distribution detection with typical features

Yao Zhu, YueFeng Chen, Chuanlong Xie, Xiaodan Li, Rong Zhang, Hui Xue ', Xiang Tian, bolun zheng, and Yaowu Chen. Boosting out-of-distribution detection with typical features. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 20758–20769. Curran Associates, In...

work page 2022
[80]

This value measures how well the trained parametric model explains the test histogram of filter d

Compute the KL-divergence for the non-zeros samples of each filter d within the target layer of D filters, We denote this KL-divergence as KLd. This value measures how well the trained parametric model explains the test histogram of filter d

work page

Showing first 80 references.

[1] [1]

A class of bivariate distributions including the bivariate logistic

Mir M Ali, NN Mikhail, and M Safiul Haq. A class of bivariate distributions including the bivariate logistic. Journal of multivariate analysis , 8(3):405–412, 1978

work page 1978

[2] [2]

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Zeyuan Allen-Zhu and Yuanzhi Li. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. In The Eleventh International Conference on Learning Representations, 2023

work page 2023

[3] [3]

A characteristic function approach to deep implicit generative modeling

Abdul Fatir Ansari, Jonathan Scarlett, and Harold Soh. A characteristic function approach to deep implicit generative modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , June 2020

work page 2020

[4] [4]

Portfolio optimization through hybrid deep learning and genetic algorithms vine copula-garch-evt-cvar model

Rihab Bedoui, Ramzi Benkraiem, Khaled Guesmi, and Islem Kedidi. Portfolio optimization through hybrid deep learning and genetic algorithms vine copula-garch-evt-cvar model. Tech- nological Forecasting and Social Change, 197:122887, 2023

work page 2023

[5] [5]

Recent development in copula and its applications to the energy, forestry and environmental sciences

M Ishaq Bhatti and Hung Quang Do. Recent development in copula and its applications to the energy, forestry and environmental sciences. International Journal of Hydrogen Energy , 44(36):19453–19473, 2019

work page 2019

[6] [6]

Novelty detection and neural network validation

Christopher M Bishop. Novelty detection and neural network validation. In ICANN’93: Proceedings of the International Conference on Artificial Neural Networks Amsterdam, The Netherlands 13–16 September 1993 3 , pages 789–794. Springer, 1993

work page 1993

[7] [7]

Variational inference with continuously-indexed normalizing flows

Anthony Caterini, Rob Cornish, Dino Sejdinovic, and Arnaud Doucet. Variational inference with continuously-indexed normalizing flows. In Uncertainty in Artificial Intelligence , pages 44–53. PMLR, 2021

work page 2021

[8] [8]

Anomaly detection: A survey

Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR) , 41(3):1–58, 2009

work page 2009

[9] [9]

Under- standing and improving feature learning for out-of-distribution generalization

Yongqiang Chen, Wei Huang, Kaiwen Zhou, Yatao Bian, Bo Han, and James Cheng. Under- standing and improving feature learning for out-of-distribution generalization. Advances in Neural Information Processing Systems, 36, 2024

work page 2024

[10] [10]

Probabilistic circuits: A unifying framework for tractable probabilistic models, 2020

YooJung Choi, Antonio Vergari, and Guy Van den Broec. Probabilistic circuits: A unifying framework for tractable probabilistic models, 2020

work page 2020

[11] [11]

A model for association in bivariate life tables and its application in epidemi- ological studies of familial tendency in chronic disease incidence

David G Clayton. A model for association in bivariate life tables and its application in epidemi- ological studies of familial tendency in chronic disease incidence. Biometrika, 65(1):141–151, 1978

work page 1978

[12] [12]

Feature density estimation for out-of-distribution detection via normalizing flows

Evan D Cook, Marc-Antoine Lavoie, and Steven L Waslander. Feature density estimation for out-of-distribution detection via normalizing flows. arXiv preprint arXiv:2402.06537 , 2024

work page arXiv 2024

[13] [13]

Archimedean copula and contagion modeling in epidemiology

Jacques Demongeot, Mohamad Ghassani, Mustapha Rachdi, Idir Ouassou, and Carla Taram- asco. Archimedean copula and contagion modeling in epidemiology. Networks and Heteroge- neous Media, 8(1):149–170, 2013

work page 2013

[14] [14]

Imagenet: A large- scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 32

work page 2009

[15] [15]

The mnist database of handwritten digit images for machine learning research

Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012

work page 2012

[16] [16]

Orthogonal gradient descent for continual learning

Mehrdad Farajtabar, Navid Azizan, Alex Mott, and Ang Li. Orthogonal gradient descent for continual learning. In International conference on artificial intelligence and statistics , pages 3762–3773. PMLR, 2020

work page 2020

[17] [17]

Does learning require memorization? a short tale about a long tail

Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages 954–959, 2020

work page 2020

[18] [18]

What neural networks memorize and why: Discovering the long tail via influence estimation

Vitaly Feldman and Chiyuan Zhang. What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems , 33:2881–2891, 2020

work page 2020

[19] [19]

The empirical characteristic function and its applications

Andrey Feuerverger and Roman A Mureika. The empirical characteristic function and its applications. The annals of Statistics , pages 88–97, 1977

work page 1977

[20] [20]

On the simultaneous associativity of f (x, y) and x+ y- f (x, y)

Maurice J Frank. On the simultaneous associativity of f (x, y) and x+ y- f (x, y). Aequationes mathematicae, 19:194–226, 1979

work page 1979

[21] [21]

A low effort approach to structured cnn design using pca

Isha Garg, Priyadarshini Panda, and Kaushik Roy. A low effort approach to structured cnn design using pca. IEEE Access, 8:1347–1360, 2019

work page 2019

[22] [22]

Integrating flexible normalization into mi- dlevel representations of deep convolutional neural networks.Neural computation, 31(11):2138– 2176, 2019

Luis Gonzalo S´ anchez Giraldo and Odelia Schwartz. Integrating flexible normalization into mi- dlevel representations of deep convolutional neural networks.Neural computation, 31(11):2138– 2176, 2019

work page 2019

[23] [23]

Generative adversarial networks.Communications of the ACM , 63(11):139–144, 2020

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Communications of the ACM , 63(11):139–144, 2020

work page 2020

[24] [24]

Out-of-distribution de- tection is not all you need

Joris Gu´ erin, Kevin Delmas, Raul Ferreira, and J´ er´ emie Guiochet. Out-of-distribution de- tection is not all you need. In Proceedings of the AAAI conference on artificial intelligence , volume 37, pages 14829–14837, 2023

work page 2023

[25] [25]

Bivariate exponential distributions

Emil J Gumbel. Bivariate exponential distributions. Journal of the American Statistical Association, 55(292):698–707, 1960

work page 1960

[26] [26]

Large sample properties of generalized method of moments estimators

Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica: Journal of the econometric society , pages 1029–1054, 1982

work page 1982

[27] [27]

A brief survey on semantic segmentation with deep learning

Shijie Hao, Yuan Zhou, and Yanrong Guo. A brief survey on semantic segmentation with deep learning. Neurocomputing, 406:302–321, 2020

work page 2020

[28] [28]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[29] [29]

What shapes feature representations? exploring datasets, architectures, and training

Katherine Hermann and Andrew Lampinen. What shapes feature representations? exploring datasets, architectures, and training. Advances in Neural Information Processing Systems , 33:9995–10006, 2020. 33

work page 2020

[30] [30]

Imagenette: A smaller subset of 10 easily classified classes from imagenet, March 2019

Jeremy Howard. Imagenette: A smaller subset of 10 easily classified classes from imagenet, March 2019

work page 2019

[31] [31]

Spatio-temporal wind speed prediction based on clayton copula function with deep learning fusion

Yu Huang, Bingzhe Zhang, Huizhen Pang, Biao Wang, Kwang Y Lee, Jiale Xie, and Yupeng Jin. Spatio-temporal wind speed prediction based on clayton copula function with deep learning fusion. Renewable energy, 192:526–536, 2022

work page 2022

[32] [32]

Detect- ing out-of-distribution data through in-distribution class prior

Xue Jiang, Feng Liu, Zhen Fang, Hong Chen, Tongliang Liu, Feng Zheng, and Bo Han. Detect- ing out-of-distribution data through in-distribution class prior. In International Conference on Machine Learning, pages 15067–15088. PMLR, 2023

work page 2023

[33] [33]

Multivariate extreme-value distributions with applications to environmental data

Harry Joe. Multivariate extreme-value distributions with applications to environmental data. Canadian Journal of Statistics , 22(1):47–64, 1994

work page 1994

[34] [34]

A review of copula methods for measuring uncertainty in finance and eco- nomics

Jong-Min Kim. A review of copula methods for measuring uncertainty in finance and eco- nomics. Quantitative Bio-Science, 39(2):81–90, 2020

work page 2020

[35] [35]

Auto-Encoding Variational Bayes

Diederik P Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[36] [36]

Improved variational inference with inverse autoregressive flow.Advances in neural information processing systems, 29, 2016

Durk P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow.Advances in neural information processing systems, 29, 2016

work page 2016

[37] [37]

Explaining distributed neural acti- vations via unsupervised learning

Soheil Kolouri, Charles E Martin, and Heiko Hoffmann. Explaining distributed neural acti- vations via unsupervised learning. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 20–28, 2017

work page 2017

[38] [38]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[39] [39]

Pytorch-cifar: optimized cnn aarchitectures for cifar10, 2017

Liu Kuang. Pytorch-cifar: optimized cnn aarchitectures for cifar10, 2017

work page 2017

[40] [40]

Perfect density models cannot guarantee anomaly detec- tion

Charline Le Lan and Laurent Dinh. Perfect density models cannot guarantee anomaly detec- tion. Entropy, 23(12):1690, 2021

work page 2021

[41] [41]

A simple unified framework for detecting out-of-distribution samples and adversarial attacks

Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018

work page 2018

[42] [42]

Mmd gan: Towards deeper understanding of moment matching network

Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabas Poczos. Mmd gan: Towards deeper understanding of moment matching network. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

work page 2017

[43] [43]

Align before fuse: Vision and language representation learning with momentum distillation

Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems , 34:9694–9705, 2021

work page 2021

[44] [44]

Generative moment matching networks

Yujia Li, Kevin Swersky, and Rich Zemel. Generative moment matching networks. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research , pages 1718–1727, Lille, France, 07–09 Jul 2015. PMLR. 34

work page 2015

[45] [45]

Deep archimedean copulas

Chun Kai Ling, Fei Fang, and J Zico Kolter. Deep archimedean copulas. Advances in Neural Information Processing Systems, 33:1535–1545, 2020

work page 2020

[46] [46]

Unsupervised anomaly detection by robust density estimation

Boyang Liu, Pang-Ning Tan, and Jiayu Zhou. Unsupervised anomaly detection by robust density estimation. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4101–4108, 2022

work page 2022

[47] [47]

Energy-based out-of-distribution detection

Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 21464–21475. Curran Associates, Inc., 2020

work page 2020

[48] [48]

Hybrid design of cnn and vision transformer: A review

Hanhua Long. Hybrid design of cnn and vision transformer: A review. In Proceedings of the 2024 7th International Conference on Computer Information Science and Artificial Intel- ligence, pages 121–127, 2024

work page 2024

[49] [49]

A method of moments embedding constraint and its application to semi-supervised learning

Michael Majurski, Sumeet Menon, Parniyan Favardin, and David Chapman. A method of moments embedding constraint and its application to semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 7809–7818, 2024

work page 2024

[50] [50]

13 financial applications of stable distributions

J Huston McCulloch. 13 financial applications of stable distributions. Handbook of statistics, 14:393–425, 1996

work page 1996

[51] [51]

Do Deep Generative Models Know What They Don't Know?

Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshmi- narayanan. Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[52] [52]

An introduction to copulas

Roger B Nelsen. An introduction to copulas. Springer, 2006

work page 2006

[53] [53]

Learning deconvolution network for semantic segmentation

Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1520–1528, 2015

work page 2015

[54] [54]

Multivariate elliptically contoured stable distributions: theory and estimation

John Nolan. Multivariate elliptically contoured stable distributions: theory and estimation. Computational Statistics, 28(5):2067–2089, 2013

work page 2067

[55] [55]

Modeling and forecasting short-term power load with copula model and deep belief network

Tinghui Ouyang, Yusen He, Huajin Li, Zhiyu Sun, and Stephen Baek. Modeling and forecasting short-term power load with copula model and deep belief network. IEEE Transactions on Emerging Topics in Computational Intelligence , 3(2):127–136, 2019

work page 2019

[56] [56]

Complexity matters: Dynamics of feature learning in the presence of spurious correlations

GuanWen Qiu, Da Kuang, and Surbhi Goel. Complexity matters: Dynamics of feature learning in the presence of spurious correlations. arXiv preprint arXiv:2403.03375 , 2024

work page arXiv 2024

[57] [57]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning , pages 8748–8763. PMLR, 2021

work page 2021

[58] [58]

Variational inference with normalizing flows

Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Inter- national conference on machine learning , pages 1530–1538. PMLR, 2015. 35

work page 2015

[59] [59]

Modeling the distribution of normal data in pre-trained deep features for anomaly detection

Oliver Rippel, Patrick Mertens, and Dorit Merhof. Modeling the distribution of normal data in pre-trained deep features for anomaly detection. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6726–6733. IEEE, 2021

work page 2020

[60] [60]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 10684–10695, 2022

work page 2022

[61] [61]

Gradient projection memory for continual learn- ing

Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient projection memory for continual learn- ing. In International Conference on Learning Representations, 2021

work page 2021

[62] [62]

Learning to share visual appearance for multiclass object detection

Ruslan Salakhutdinov, Antonio Torralba, and Josh Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR 2011, pages 1481–1488, 2011

work page 2011

[63] [63]

Opening the Black Box of Deep Neural Networks via Information

Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[64] [64]

Copula-based data augmentation on a deep learning architecture for cardiac sensor fusion.IEEE journal of biomedical and health informatics, 25(7):2521–2532, 2020

Diogo Silva, Steffen Leonhardt, and Christoph Hoog Antink. Copula-based data augmentation on a deep learning architecture for cardiac sensor fusion.IEEE journal of biomedical and health informatics, 25(7):2521–2532, 2020

work page 2020

[65] [65]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[66] [66]

Fonctions de r´ epartition ` a n dimensions et leurs marges

M Sklar. Fonctions de r´ epartition ` a n dimensions et leurs marges. In Annales de l’ISUP , volume 8, pages 229–231, 1959

work page 1959

[67] [67]

Feature distribution matching for federated domain generalization

Yuwei Sun, Ng Chong, and Hideya Ochiai. Feature distribution matching for federated domain generalization. In Asian Conference on Machine Learning , pages 942–957. PMLR, 2023

work page 2023

[68] [68]

Understanding priors in bayesian neural networks at the unit level

Mariia Vladimirova, Jakob Verbeek, Pablo Mesejo, and Julyan Arbel. Understanding priors in bayesian neural networks at the unit level. In International Conference on Machine Learning , pages 6458–6467. PMLR, 2019

work page 2019

[69] [69]

A survey on video diffusion models

Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, and Yu-Gang Jiang. A survey on video diffusion models. ACM Computing Surveys , 2023

work page 2023

[70] [70]

Diffusion models: A comprehensive survey of methods and applications

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys , 56(4):1–39, 2023

work page 2023

[71] [71]

Empirical characteristic function estimation and its applications

Jun Yu. Empirical characteristic function estimation and its applications. Econometric reviews, 23(2):93–123, 2004

work page 2004

[72] [72]

Characteristic circuits

Zhongjie Yu, Martin Trapp, and Kristian Kersting. Characteristic circuits. Advances in Neural Information Processing Systems, 36:34074–34086, 2023

work page 2023

[73] [73]

Feature extraction and image retrieval based on alexnet

Zheng-Wu Yuan and Jun Zhang. Feature extraction and image retrieval based on alexnet. In Eighth International Conference on Digital Image Processing (ICDIP 2016) , volume 10033, pages 65–69. SPIE, 2016

work page 2016

[74] [74]

Mathematical functions and their approximations

Luke L Yudell. Mathematical functions and their approximations . Academic Press, New York, 1975. 36

work page 1975

[75] [75]

A systematic review on long-tailed learning

Chongsheng Zhang, George Almpanidis, Gaojuan Fan, Binquan Deng, Yanbo Zhang, Ji Liu, Aouaidjia Kamel, Paolo Soda, and Jo˜ ao Gama. A systematic review on long-tailed learning. IEEE Transactions on Neural Networks and Learning Systems , 2025

work page 2025

[76] [76]

Understanding failures in out-of- distribution detection with deep generative models

Lily Zhang, Mark Goldstein, and Rajesh Ranganath. Understanding failures in out-of- distribution detection with deep generative models. In International Conference on Machine Learning, pages 12427–12436. PMLR, 2021

work page 2021

[77] [77]

Interpretable convolutional neural net- works

Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. Interpretable convolutional neural net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 8827–8836, 2018

work page 2018

[78] [78]

Capturing long-tail distributions of object subcategories

Xiangxin Zhu, Dragomir Anguelov, and Deva Ramanan. Capturing long-tail distributions of object subcategories. In 2014 IEEE Conference on Computer Vision and Pattern Recognition , pages 915–922, 2014

work page 2014

[79] [79]

Boosting out-of-distribution detection with typical features

Yao Zhu, YueFeng Chen, Chuanlong Xie, Xiaodan Li, Rong Zhang, Hui Xue ', Xiang Tian, bolun zheng, and Yaowu Chen. Boosting out-of-distribution detection with typical features. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 20758–20769. Curran Associates, In...

work page 2022

[80] [80]

This value measures how well the trained parametric model explains the test histogram of filter d

Compute the KL-divergence for the non-zeros samples of each filter d within the target layer of D filters, We denote this KL-divergence as KLd. This value measures how well the trained parametric model explains the test histogram of filter d

work page