Densely Connected Convolutional Networks
read the original abstract
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet .
This paper has not been read by Pith yet.
Forward citations
Cited by 19 Pith papers
-
DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling
DPQuant uses epoch-wise probabilistic layer rotation and DP loss sensitivity to quantize only a changing subset of layers, reducing accuracy degradation from quantization noise in DP-SGD and delivering up to 2.21x thr...
-
Unsupervised Source-Free Ranking of Biomedical Segmentation Models Under Distribution Shift
Presents the first unsupervised source-free framework for ranking semantic and instance segmentation models via prediction consistency under perturbations, with rankings correlating to target-domain performance across...
-
Mistake gating leads to energy and memory efficient continual learning
Mistake-gated plasticity reduces neural network updates by 50-80% by gating changes on classification errors, improving efficiency for continual learning without added hyperparameters.
-
A general framework for knowledge integration in machine learning for electromagnetic scattering using quasinormal modes
A universal physics-informed neural network framework for electromagnetic scattering based on quasinormal mode expansion that guarantees compliance with energy conservation and causality and shows improved data effici...
-
Towards Robust Voice Pathology Detection
Exploratory experiments combining four voice databases to evaluate XGBoost, DenseNet, and Isolation Forest on raw waveforms, spectrograms, MFCCs, and acoustic features for pathology detection, with peak F1 of 0.733.
-
A Multitask Network for Localization and Recognition of Text in Images
Presents an end-to-end multitask CNN with FPN, dynamic RoI pooling, and convolutional attention for simultaneous lexicon-free text localization and recognition in complex images.
-
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR uses periodic warm restarts of the learning rate in SGD to reach new state-of-the-art error rates of 3.14% on CIFAR-10 and 16.21% on CIFAR-100.
-
Physics-informed convolutional neural networks for fluid flow through porous media
A physics-informed CNN predicts pore-scale velocity fields from geometry and serves as a warm-start to accelerate Lattice-Boltzmann solvers in over 90% of tested cases.
-
Non-identifiability of Explanations from Model Behavior in Deep Networks of Image Authenticity Judgments
Models predicting human authenticity judgments produce inconsistent attribution maps across architectures, showing that explanations are non-identifiable.
-
Attention Residuals
Attention Residuals replaces fixed residual summation with input-dependent softmax attention over preceding layers, and a blocked variant is shown to improve uniformity and downstream performance in a 48B-parameter mo...
-
HLGFA: High-Low Resolution Guided Feature Alignment for Unsupervised Anomaly Detection
HLGFA detects anomalies by identifying breakdowns in cross-resolution feature consistency between high- and low-resolution views of normal samples, guided by structure and detail priors, and reports 97.9% pixel AUROC ...
-
Detection of Lensed Gravitational Waves in the Millihertz Band Using Frequency-Domain Lensing Feature Extraction Network
DCL-xLSTM neural network detects lensed GW events with AUC over 0.99 using training on PM and SIS lens models in the millihertz band.
-
Learning Multimodal Fixed-Point Weights using Gradient Descent
Gradient-based optimization learns symmetric Gaussian mixture modes for 2-bit fixed-point weight quantization, claiming state-of-the-art performance and self-adaptive weights.
-
Training Neural Networks with Optimal Double-Bayesian Learning
A double-Bayesian framework derives an optimal learning rate for neural network training via two antagonistic Bayesian processes.
-
PR3DICTR: A modular AI framework for medical 3D image-based detection and outcome prediction
PR3DICTR is a new open-access modular framework for 3D medical image classification and outcome prediction that works with as little as two lines of code.
-
Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis
FractalNet automatically generates and tests over 1,200 CNN architectures based on recursive fractal templates, achieving up to 80.18% accuracy on CIFAR-10 after five training epochs.
-
AMD Severity Prediction And Explainability Using Image Registration And Deep Embedded Clustering
A method using deep image registration and embedded clustering predicts AMD severity from OCT images with classification performance matching state-of-the-art and improved explainability via registration outputs.
-
Multi-Gate Residuals
Multi-Gate Residuals stabilizes activation scales in deep residual networks via multi-stream gating and attention pooling without added communication overhead.
-
Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis
Fractal templates enable systematic creation of more than 1,200 neural network variants that show strong performance and computational efficiency when trained on CIFAR-10 for five epochs.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.