Why CNN Features Are not Gaussian: A Statistical Anatomy of Deep Representations
Pith reviewed 2026-05-23 17:02 UTC · model grok-4.3
The pith
CNN feature activations deviate substantially from Gaussian and follow long-tailed Weibull distributions instead.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deep convolutional neural networks produce internal feature activations whose distributions are substantially non-Gaussian and instead follow long-tailed families such as the Weibull. A new Discretized Characteristic Function Copula method reveals increasing tail length with depth and the emergence of upper-tail dependence between feature pairs. These patterns indicate a Matthew process that concentrates semantic signal in the tails, making the networks effective at noise reduction but less so at handling outliers.
What carries the argument
The Discretized Characteristic Function Copula (DCF-Copula) method, which models multivariate feature dependencies and exposes upper-tail dependence not captured by Gaussian assumptions.
If this is right
- CNNs reduce noise effectively yet perform poorly on outlier removal tasks.
- Long-tailed upper-tail-dependent priors should replace Gaussian priors when modeling deep feature densities.
- Tail length increases with network depth.
- Upper-tail dependence emerges between feature pairs as depth grows.
Where Pith is reading between the lines
- Similar non-Gaussian tail behavior may appear in transformer or other non-convolutional deep networks.
- Feature-based density estimation or generative models could gain accuracy by adopting these tail-dependent priors.
- Outlier-sensitive applications that rely on deep features may require revised statistical assumptions.
Load-bearing premise
The empirical fits to Weibull and related families on the chosen architectures and datasets generalize beyond the tested cases and the observed tail behavior is not an artifact of the specific activation functions or normalization layers used.
What would settle it
Observing that feature activations across layers in a new deep CNN fit a Gaussian distribution closely on multiple standard datasets would contradict the central claim.
Figures
read the original abstract
Deep convolutional neural networks (CNNs) are commonly analyzed through geometric and linear-algebraic perspectives, yet the statistical distribution of their internal feature activations remains poorly understood. In many applications, deep features are implicitly treated as Gaussian when modeling densities. In this work, we empirically examine this assumption and show that it does not accurately describe the distribution of CNN feature activations. Through a systematic study across multiple architectures and datasets, we find that the feature activations deviate substantially from Gaussian and are better characterized by Weibull and related long-tailed distributions. We further introduce a novel Discretized Characteristic Function Copula (DCF-Copula) method to model multivariate feature dependencies. We find that tail-length increases with network depth and that upper-tail dependence emerges between feature pairs. These statistical findings are not consistent with the Central Limit Theorem, and are instead indicative of a Matthew process that progressively concentrates semantic signal within the tails. These statistical findings suggest that CNNs are excellent at noise reduction, yet poor at outlier removal tasks. We recommend the use of long-tailed upper-tail-dependent priors as opposed to Gaussian priors for accurately CNN deep feature density. Code available at https://github.com/dchapman-prof/DCF-Copula
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper empirically analyzes the statistical distributions of internal feature activations in CNNs across architectures and datasets. It claims these activations deviate substantially from Gaussianity and are better characterized by Weibull and other long-tailed families. The work introduces the Discretized Characteristic Function Copula (DCF-Copula) to capture multivariate dependencies, reports increasing tail length with depth and emerging upper-tail dependence between features, interprets the results as inconsistent with the Central Limit Theorem and instead indicative of a Matthew process that concentrates semantic signal in tails, and recommends long-tailed upper-tail-dependent priors over Gaussian ones for feature density modeling. Reproducible code is provided.
Significance. If the empirical distribution findings and tail-dependence results hold under scrutiny, the work supplies a useful statistical characterization of deep representations that questions the routine Gaussian assumption in density estimation and feature modeling tasks. The DCF-Copula is presented as a novel modeling tool for tail dependencies. Explicit code release supports reproducibility and verification of the reported fits.
major comments (2)
- [Abstract / CLT discussion] The interpretive claim (Abstract and the section contrasting findings with the CLT) that the long-tailed behavior 'is not consistent with the Central Limit Theorem' is load-bearing for the Matthew-process interpretation, yet the manuscript provides no derivation or simulation establishing that CLT conditions (independent or weakly dependent summands with finite variance) would be expected to produce Gaussian activations given the actual generative process: convolutions over spatially/channel-dependent inputs, pointwise nonlinearities (ReLU), and normalization layers.
- [Empirical methodology] § on empirical distribution fitting: the reported superiority of Weibull and related families rests on distribution fitting whose details (per-layer and per-feature sample sizes, goodness-of-fit tests employed, handling of zero activations from ReLU, and multiple-testing correction across channels and layers) are not reported, undermining assessment of whether the tail-length and dependence claims are robust or artifacts of the chosen architectures/normalizations.
minor comments (1)
- [Methods] The formal definition and discretization procedure for the DCF-Copula could be stated more explicitly with pseudocode or equations to aid implementation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the scope and robustness of our empirical findings on CNN feature distributions. We respond to each major comment below and will revise the manuscript to incorporate additional details and discussion as outlined.
read point-by-point responses
-
Referee: [Abstract / CLT discussion] The interpretive claim (Abstract and the section contrasting findings with the CLT) that the long-tailed behavior 'is not consistent with the Central Limit Theorem' is load-bearing for the Matthew-process interpretation, yet the manuscript provides no derivation or simulation establishing that CLT conditions (independent or weakly dependent summands with finite variance) would be expected to produce Gaussian activations given the actual generative process: convolutions over spatially/channel-dependent inputs, pointwise nonlinearities (ReLU), and normalization layers.
Authors: We agree that the manuscript would benefit from a more explicit justification of why the CLT does not apply here. Our core claim remains empirical: the observed activations exhibit long tails inconsistent with Gaussianity, which we contrast with the CLT's typical prediction under standard assumptions of independent or weakly dependent summands with finite variance. The CNN generative process (convolutions inducing spatial/channel dependence, ReLU introducing asymmetry and potential infinite moments, and normalizations) violates these conditions, supporting the Matthew-process reading. To strengthen this, the revised version will include a short discussion of the relevant CLT conditions alongside a minimal simulation contrasting summed independent finite-variance variables (yielding approximate Gaussianity) with a simplified ReLU-convolution process (reproducing heavy tails). This addition addresses the load-bearing nature of the claim without altering the empirical results. revision: yes
-
Referee: [Empirical methodology] § on empirical distribution fitting: the reported superiority of Weibull and related families rests on distribution fitting whose details (per-layer and per-feature sample sizes, goodness-of-fit tests employed, handling of zero activations from ReLU, and multiple-testing correction across channels and layers) are not reported, undermining assessment of whether the tail-length and dependence claims are robust or artifacts of the chosen architectures/normalizations.
Authors: We acknowledge that these methodological details were insufficiently reported and will expand the relevant section in revision. Per-feature sample sizes are determined by aggregating over spatial dimensions and batch size, yielding approximately 10^4–10^5 observations per channel (varying by layer depth and input resolution). Fitting used maximum-likelihood estimation for candidate distributions (Gaussian, Weibull, log-normal, etc.), with model selection based on AIC/BIC and visual Q-Q plot inspection focused on tails; Kolmogorov-Smirnov tests were applied for quantitative comparison where sample sizes permitted. ReLU-induced zeros were handled by separately modeling the point mass at zero and fitting the continuous positive support to the nonzero activations. No formal multiple-testing correction was applied, as the analysis emphasizes qualitative trends (increasing tail length and dependence with depth) across architectures rather than per-channel hypothesis tests. The revision will add an explicit subsection with these specifications, sample-size tables, and code references to allow independent verification of robustness. revision: yes
Circularity Check
No significant circularity; purely observational with independent modeling contribution
full rationale
The paper conducts an empirical statistical analysis of CNN feature activations across architectures and datasets, fitting distributions (Weibull etc.) and introducing the DCF-Copula method for dependencies. No derivation chain exists that reduces a claimed prediction or first-principles result to its own inputs by construction. The interpretive contrast with CLT and reference to a Matthew process are post-hoc characterizations of observed data rather than load-bearing derivations. Self-citations are absent from the provided text, and the central claims rest on direct empirical measurements rather than fitted parameters renamed as predictions or ansatzes smuggled via prior work.
Axiom & Free-Parameter Ledger
free parameters (1)
- Weibull shape and scale per layer
axioms (1)
- domain assumption Activations within a layer are treated as i.i.d. samples from a common marginal distribution
invented entities (1)
-
DCF-Copula
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
feature activations deviate substantially from Gaussian and are better characterized by Weibull and related long-tailed distributions... tail-length increases with network depth and that upper-tail dependence emerges between feature pairs... indicative of a Matthew process that progressively concentrates semantic signal within the tails
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
CAWI: Copula-Aligned Weight Initialization for Randomized Neural Networks
CAWI replaces standard random initialization of input-to-hidden weights in randomized neural networks with samples drawn from a data-fitted copula that preserves observed feature dependencies, yielding consistent accu...
Reference graph
Works this paper leans on
-
[1]
A class of bivariate distributions including the bivariate logistic
Mir M Ali, NN Mikhail, and M Safiul Haq. A class of bivariate distributions including the bivariate logistic. Journal of multivariate analysis , 8(3):405–412, 1978
work page 1978
-
[2]
Towards understanding ensemble, knowledge distillation and self-distillation in deep learning
Zeyuan Allen-Zhu and Yuanzhi Li. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. In The Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[3]
A characteristic function approach to deep implicit generative modeling
Abdul Fatir Ansari, Jonathan Scarlett, and Harold Soh. A characteristic function approach to deep implicit generative modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , June 2020
work page 2020
-
[4]
Rihab Bedoui, Ramzi Benkraiem, Khaled Guesmi, and Islem Kedidi. Portfolio optimization through hybrid deep learning and genetic algorithms vine copula-garch-evt-cvar model. Tech- nological Forecasting and Social Change, 197:122887, 2023
work page 2023
-
[5]
Recent development in copula and its applications to the energy, forestry and environmental sciences
M Ishaq Bhatti and Hung Quang Do. Recent development in copula and its applications to the energy, forestry and environmental sciences. International Journal of Hydrogen Energy , 44(36):19453–19473, 2019
work page 2019
-
[6]
Novelty detection and neural network validation
Christopher M Bishop. Novelty detection and neural network validation. In ICANN’93: Proceedings of the International Conference on Artificial Neural Networks Amsterdam, The Netherlands 13–16 September 1993 3 , pages 789–794. Springer, 1993
work page 1993
-
[7]
Variational inference with continuously-indexed normalizing flows
Anthony Caterini, Rob Cornish, Dino Sejdinovic, and Arnaud Doucet. Variational inference with continuously-indexed normalizing flows. In Uncertainty in Artificial Intelligence , pages 44–53. PMLR, 2021
work page 2021
-
[8]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR) , 41(3):1–58, 2009
work page 2009
-
[9]
Under- standing and improving feature learning for out-of-distribution generalization
Yongqiang Chen, Wei Huang, Kaiwen Zhou, Yatao Bian, Bo Han, and James Cheng. Under- standing and improving feature learning for out-of-distribution generalization. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[10]
Probabilistic circuits: A unifying framework for tractable probabilistic models, 2020
YooJung Choi, Antonio Vergari, and Guy Van den Broec. Probabilistic circuits: A unifying framework for tractable probabilistic models, 2020
work page 2020
-
[11]
David G Clayton. A model for association in bivariate life tables and its application in epidemi- ological studies of familial tendency in chronic disease incidence. Biometrika, 65(1):141–151, 1978
work page 1978
-
[12]
Feature density estimation for out-of-distribution detection via normalizing flows
Evan D Cook, Marc-Antoine Lavoie, and Steven L Waslander. Feature density estimation for out-of-distribution detection via normalizing flows. arXiv preprint arXiv:2402.06537 , 2024
-
[13]
Archimedean copula and contagion modeling in epidemiology
Jacques Demongeot, Mohamad Ghassani, Mustapha Rachdi, Idir Ouassou, and Carla Taram- asco. Archimedean copula and contagion modeling in epidemiology. Networks and Heteroge- neous Media, 8(1):149–170, 2013
work page 2013
-
[14]
Imagenet: A large- scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. 32
work page 2009
-
[15]
The mnist database of handwritten digit images for machine learning research
Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012
work page 2012
-
[16]
Orthogonal gradient descent for continual learning
Mehrdad Farajtabar, Navid Azizan, Alex Mott, and Ang Li. Orthogonal gradient descent for continual learning. In International conference on artificial intelligence and statistics , pages 3762–3773. PMLR, 2020
work page 2020
-
[17]
Does learning require memorization? a short tale about a long tail
Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing , pages 954–959, 2020
work page 2020
-
[18]
What neural networks memorize and why: Discovering the long tail via influence estimation
Vitaly Feldman and Chiyuan Zhang. What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems , 33:2881–2891, 2020
work page 2020
-
[19]
The empirical characteristic function and its applications
Andrey Feuerverger and Roman A Mureika. The empirical characteristic function and its applications. The annals of Statistics , pages 88–97, 1977
work page 1977
-
[20]
On the simultaneous associativity of f (x, y) and x+ y- f (x, y)
Maurice J Frank. On the simultaneous associativity of f (x, y) and x+ y- f (x, y). Aequationes mathematicae, 19:194–226, 1979
work page 1979
-
[21]
A low effort approach to structured cnn design using pca
Isha Garg, Priyadarshini Panda, and Kaushik Roy. A low effort approach to structured cnn design using pca. IEEE Access, 8:1347–1360, 2019
work page 2019
-
[22]
Luis Gonzalo S´ anchez Giraldo and Odelia Schwartz. Integrating flexible normalization into mi- dlevel representations of deep convolutional neural networks.Neural computation, 31(11):2138– 2176, 2019
work page 2019
-
[23]
Generative adversarial networks.Communications of the ACM , 63(11):139–144, 2020
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Communications of the ACM , 63(11):139–144, 2020
work page 2020
-
[24]
Out-of-distribution de- tection is not all you need
Joris Gu´ erin, Kevin Delmas, Raul Ferreira, and J´ er´ emie Guiochet. Out-of-distribution de- tection is not all you need. In Proceedings of the AAAI conference on artificial intelligence , volume 37, pages 14829–14837, 2023
work page 2023
-
[25]
Bivariate exponential distributions
Emil J Gumbel. Bivariate exponential distributions. Journal of the American Statistical Association, 55(292):698–707, 1960
work page 1960
-
[26]
Large sample properties of generalized method of moments estimators
Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica: Journal of the econometric society , pages 1029–1054, 1982
work page 1982
-
[27]
A brief survey on semantic segmentation with deep learning
Shijie Hao, Yuan Zhou, and Yanrong Guo. A brief survey on semantic segmentation with deep learning. Neurocomputing, 406:302–321, 2020
work page 2020
-
[28]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016
work page 2016
-
[29]
What shapes feature representations? exploring datasets, architectures, and training
Katherine Hermann and Andrew Lampinen. What shapes feature representations? exploring datasets, architectures, and training. Advances in Neural Information Processing Systems , 33:9995–10006, 2020. 33
work page 2020
-
[30]
Imagenette: A smaller subset of 10 easily classified classes from imagenet, March 2019
Jeremy Howard. Imagenette: A smaller subset of 10 easily classified classes from imagenet, March 2019
work page 2019
-
[31]
Spatio-temporal wind speed prediction based on clayton copula function with deep learning fusion
Yu Huang, Bingzhe Zhang, Huizhen Pang, Biao Wang, Kwang Y Lee, Jiale Xie, and Yupeng Jin. Spatio-temporal wind speed prediction based on clayton copula function with deep learning fusion. Renewable energy, 192:526–536, 2022
work page 2022
-
[32]
Detect- ing out-of-distribution data through in-distribution class prior
Xue Jiang, Feng Liu, Zhen Fang, Hong Chen, Tongliang Liu, Feng Zheng, and Bo Han. Detect- ing out-of-distribution data through in-distribution class prior. In International Conference on Machine Learning, pages 15067–15088. PMLR, 2023
work page 2023
-
[33]
Multivariate extreme-value distributions with applications to environmental data
Harry Joe. Multivariate extreme-value distributions with applications to environmental data. Canadian Journal of Statistics , 22(1):47–64, 1994
work page 1994
-
[34]
A review of copula methods for measuring uncertainty in finance and eco- nomics
Jong-Min Kim. A review of copula methods for measuring uncertainty in finance and eco- nomics. Quantitative Bio-Science, 39(2):81–90, 2020
work page 2020
-
[35]
Auto-Encoding Variational Bayes
Diederik P Kingma. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 , 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[36]
Durk P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. Improved variational inference with inverse autoregressive flow.Advances in neural information processing systems, 29, 2016
work page 2016
-
[37]
Explaining distributed neural acti- vations via unsupervised learning
Soheil Kolouri, Charles E Martin, and Heiko Hoffmann. Explaining distributed neural acti- vations via unsupervised learning. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 20–28, 2017
work page 2017
-
[38]
Learning multiple layers of features from tiny images
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009
work page 2009
-
[39]
Pytorch-cifar: optimized cnn aarchitectures for cifar10, 2017
Liu Kuang. Pytorch-cifar: optimized cnn aarchitectures for cifar10, 2017
work page 2017
-
[40]
Perfect density models cannot guarantee anomaly detec- tion
Charline Le Lan and Laurent Dinh. Perfect density models cannot guarantee anomaly detec- tion. Entropy, 23(12):1690, 2021
work page 2021
-
[41]
A simple unified framework for detecting out-of-distribution samples and adversarial attacks
Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018
work page 2018
-
[42]
Mmd gan: Towards deeper understanding of moment matching network
Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabas Poczos. Mmd gan: Towards deeper understanding of moment matching network. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017
work page 2017
-
[43]
Align before fuse: Vision and language representation learning with momentum distillation
Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems , 34:9694–9705, 2021
work page 2021
-
[44]
Generative moment matching networks
Yujia Li, Kevin Swersky, and Rich Zemel. Generative moment matching networks. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research , pages 1718–1727, Lille, France, 07–09 Jul 2015. PMLR. 34
work page 2015
-
[45]
Chun Kai Ling, Fei Fang, and J Zico Kolter. Deep archimedean copulas. Advances in Neural Information Processing Systems, 33:1535–1545, 2020
work page 2020
-
[46]
Unsupervised anomaly detection by robust density estimation
Boyang Liu, Pang-Ning Tan, and Jiayu Zhou. Unsupervised anomaly detection by robust density estimation. In Proceedings of the AAAI conference on artificial intelligence, volume 36, pages 4101–4108, 2022
work page 2022
-
[47]
Energy-based out-of-distribution detection
Weitang Liu, Xiaoyun Wang, John Owens, and Yixuan Li. Energy-based out-of-distribution detection. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors,Advances in Neural Information Processing Systems, volume 33, pages 21464–21475. Curran Associates, Inc., 2020
work page 2020
-
[48]
Hybrid design of cnn and vision transformer: A review
Hanhua Long. Hybrid design of cnn and vision transformer: A review. In Proceedings of the 2024 7th International Conference on Computer Information Science and Artificial Intel- ligence, pages 121–127, 2024
work page 2024
-
[49]
A method of moments embedding constraint and its application to semi-supervised learning
Michael Majurski, Sumeet Menon, Parniyan Favardin, and David Chapman. A method of moments embedding constraint and its application to semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 7809–7818, 2024
work page 2024
-
[50]
13 financial applications of stable distributions
J Huston McCulloch. 13 financial applications of stable distributions. Handbook of statistics, 14:393–425, 1996
work page 1996
-
[51]
Do Deep Generative Models Know What They Don't Know?
Eric Nalisnick, Akihiro Matsukawa, Yee Whye Teh, Dilan Gorur, and Balaji Lakshmi- narayanan. Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[52]
Roger B Nelsen. An introduction to copulas. Springer, 2006
work page 2006
-
[53]
Learning deconvolution network for semantic segmentation
Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1520–1528, 2015
work page 2015
-
[54]
Multivariate elliptically contoured stable distributions: theory and estimation
John Nolan. Multivariate elliptically contoured stable distributions: theory and estimation. Computational Statistics, 28(5):2067–2089, 2013
work page 2067
-
[55]
Modeling and forecasting short-term power load with copula model and deep belief network
Tinghui Ouyang, Yusen He, Huajin Li, Zhiyu Sun, and Stephen Baek. Modeling and forecasting short-term power load with copula model and deep belief network. IEEE Transactions on Emerging Topics in Computational Intelligence , 3(2):127–136, 2019
work page 2019
-
[56]
Complexity matters: Dynamics of feature learning in the presence of spurious correlations
GuanWen Qiu, Da Kuang, and Surbhi Goel. Complexity matters: Dynamics of feature learning in the presence of spurious correlations. arXiv preprint arXiv:2403.03375 , 2024
-
[57]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning , pages 8748–8763. PMLR, 2021
work page 2021
-
[58]
Variational inference with normalizing flows
Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Inter- national conference on machine learning , pages 1530–1538. PMLR, 2015. 35
work page 2015
-
[59]
Modeling the distribution of normal data in pre-trained deep features for anomaly detection
Oliver Rippel, Patrick Mertens, and Dorit Merhof. Modeling the distribution of normal data in pre-trained deep features for anomaly detection. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6726–6733. IEEE, 2021
work page 2020
-
[60]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨ orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 10684–10695, 2022
work page 2022
-
[61]
Gradient projection memory for continual learn- ing
Gobinda Saha, Isha Garg, and Kaushik Roy. Gradient projection memory for continual learn- ing. In International Conference on Learning Representations, 2021
work page 2021
-
[62]
Learning to share visual appearance for multiclass object detection
Ruslan Salakhutdinov, Antonio Torralba, and Josh Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR 2011, pages 1481–1488, 2011
work page 2011
-
[63]
Opening the Black Box of Deep Neural Networks via Information
Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[64]
Diogo Silva, Steffen Leonhardt, and Christoph Hoog Antink. Copula-based data augmentation on a deep learning architecture for cardiac sensor fusion.IEEE journal of biomedical and health informatics, 25(7):2521–2532, 2020
work page 2020
-
[65]
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 , 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[66]
Fonctions de r´ epartition ` a n dimensions et leurs marges
M Sklar. Fonctions de r´ epartition ` a n dimensions et leurs marges. In Annales de l’ISUP , volume 8, pages 229–231, 1959
work page 1959
-
[67]
Feature distribution matching for federated domain generalization
Yuwei Sun, Ng Chong, and Hideya Ochiai. Feature distribution matching for federated domain generalization. In Asian Conference on Machine Learning , pages 942–957. PMLR, 2023
work page 2023
-
[68]
Understanding priors in bayesian neural networks at the unit level
Mariia Vladimirova, Jakob Verbeek, Pablo Mesejo, and Julyan Arbel. Understanding priors in bayesian neural networks at the unit level. In International Conference on Machine Learning , pages 6458–6467. PMLR, 2019
work page 2019
-
[69]
A survey on video diffusion models
Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, and Yu-Gang Jiang. A survey on video diffusion models. ACM Computing Surveys , 2023
work page 2023
-
[70]
Diffusion models: A comprehensive survey of methods and applications
Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys , 56(4):1–39, 2023
work page 2023
-
[71]
Empirical characteristic function estimation and its applications
Jun Yu. Empirical characteristic function estimation and its applications. Econometric reviews, 23(2):93–123, 2004
work page 2004
-
[72]
Zhongjie Yu, Martin Trapp, and Kristian Kersting. Characteristic circuits. Advances in Neural Information Processing Systems, 36:34074–34086, 2023
work page 2023
-
[73]
Feature extraction and image retrieval based on alexnet
Zheng-Wu Yuan and Jun Zhang. Feature extraction and image retrieval based on alexnet. In Eighth International Conference on Digital Image Processing (ICDIP 2016) , volume 10033, pages 65–69. SPIE, 2016
work page 2016
-
[74]
Mathematical functions and their approximations
Luke L Yudell. Mathematical functions and their approximations . Academic Press, New York, 1975. 36
work page 1975
-
[75]
A systematic review on long-tailed learning
Chongsheng Zhang, George Almpanidis, Gaojuan Fan, Binquan Deng, Yanbo Zhang, Ji Liu, Aouaidjia Kamel, Paolo Soda, and Jo˜ ao Gama. A systematic review on long-tailed learning. IEEE Transactions on Neural Networks and Learning Systems , 2025
work page 2025
-
[76]
Understanding failures in out-of- distribution detection with deep generative models
Lily Zhang, Mark Goldstein, and Rajesh Ranganath. Understanding failures in out-of- distribution detection with deep generative models. In International Conference on Machine Learning, pages 12427–12436. PMLR, 2021
work page 2021
-
[77]
Interpretable convolutional neural net- works
Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. Interpretable convolutional neural net- works. In Proceedings of the IEEE conference on computer vision and pattern recognition , pages 8827–8836, 2018
work page 2018
-
[78]
Capturing long-tail distributions of object subcategories
Xiangxin Zhu, Dragomir Anguelov, and Deva Ramanan. Capturing long-tail distributions of object subcategories. In 2014 IEEE Conference on Computer Vision and Pattern Recognition , pages 915–922, 2014
work page 2014
-
[79]
Boosting out-of-distribution detection with typical features
Yao Zhu, YueFeng Chen, Chuanlong Xie, Xiaodan Li, Rong Zhang, Hui Xue ', Xiang Tian, bolun zheng, and Yaowu Chen. Boosting out-of-distribution detection with typical features. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 20758–20769. Curran Associates, In...
work page 2022
-
[80]
This value measures how well the trained parametric model explains the test histogram of filter d
Compute the KL-divergence for the non-zeros samples of each filter d within the target layer of D filters, We denote this KL-divergence as KLd. This value measures how well the trained parametric model explains the test histogram of filter d
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.