FractalNet: Ultra-Deep Neural Networks without Residuals

Gregory Shakhnarovich; Gustav Larsson; Michael Maire

arxiv: 1605.07648 · v4 · pith:4ZX2FFKHnew · submitted 2016-05-24 · 💻 cs.CV

FractalNet: Ultra-Deep Neural Networks without Residuals

Gustav Larsson , Michael Maire , Gregory Shakhnarovich This is my paper

classification 💻 cs.CV

keywords networksdeepfractalneuralresidualsubnetworksanswershallow

0 comments

read the original abstract

We introduce a design strategy for neural network macro-architecture based on self-similarity. Repeated application of a simple expansion rule generates deep networks whose structural layouts are precisely truncated fractals. These networks contain interacting subpaths of different lengths, but do not include any pass-through or residual connections; every internal signal is transformed by a filter and nonlinearity before being seen by subsequent layers. In experiments, fractal networks match the excellent performance of standard residual networks on both CIFAR and ImageNet classification tasks, thereby demonstrating that residual representations may not be fundamental to the success of extremely deep convolutional neural networks. Rather, the key may be the ability to transition, during training, from effectively shallow to deep. We note similarities with student-teacher behavior and develop drop-path, a natural extension of dropout, to regularize co-adaptation of subpaths in fractal architectures. Such regularization allows extraction of high-performance fixed-depth subnetworks. Additionally, fractal networks exhibit an anytime property: shallow subnetworks provide a quick answer, while deeper subnetworks, with higher latency, provide a more accurate answer.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 9 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Massive Activations in Large Language Models
cs.CL 2024-02 unverdicted novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
Understanding intermediate layers using linear classifier probes
stat.ML 2016-10 accept novelty 7.0

Linear probes demonstrate that feature separability for classification increases monotonically with network depth in Inception v3 and ResNet-50.
mHC: Manifold-Constrained Hyper-Connections
cs.CL 2025-12 unverdicted novelty 6.0

mHC projects hyper-connection residual spaces onto a manifold to restore identity mapping, enabling stable large-scale training with performance gains over standard HC.
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks
cs.CL 2019-07 unverdicted novelty 6.0

DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.
YOLOv4: Optimal Speed and Accuracy of Object Detection
cs.CV 2020-04 unverdicted novelty 5.0

YOLOv4 achieves 43.5% AP (65.7% AP50) on MS COCO at ~65 FPS on Tesla V100 by integrating WRC, CSP, CmBN, SAT, Mish activation, Mosaic augmentation, DropBlock, and CIoU loss.
Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis
cs.LG 2025-11 unverdicted novelty 4.0

FractalNet automatically generates and tests over 1,200 CNN architectures based on recursive fractal templates, achieving up to 80.18% accuracy on CIFAR-10 after five training epochs.
Flemme: A Flexible and Modular Learning Platform for Medical Images
eess.IV 2024-08 unverdicted novelty 4.0

Flemme is a modular platform separating encoders (conv/transformer/SSM) from encoder-decoder architectures for medical images, with a hierarchical pyramid loss yielding reported average gains of 5.6% Dice and 5.57% PSNR.
Preparation of Fractal-Inspired Computational Architectures for Advanced Large Language Model Analysis
cs.LG 2025-11 unverdicted novelty 3.0

Fractal templates enable systematic creation of more than 1,200 neural network variants that show strong performance and computational efficiency when trained on CIFAR-10 for five epochs.
Genetic Network Architecture Search
cs.NE 2019-07 unverdicted novelty 3.0

Genetic algorithm searches convolution cell architectures with weight sharing via SGD, reporting 96% accuracy on CIFAR10 and 80.1% on CIFAR100.