SwiftNet: Using Graph Propagation as Meta-knowledge to Search Highly Representative Neural Architectures
Pith reviewed 2026-05-25 20:08 UTC · model grok-4.3
The pith
Graph propagation as meta-knowledge enables flexible node-wise neural architecture search without predefined cells, yielding SwiftNet models with higher accuracy density and lower search costs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GRAM adopts fine-grained node-wise search and accumulates the knowledge learned in updates into a meta-graph. As a result, GRAM can enable more flexible search space and achieve higher search efficiency. Without the constraints of predefined cell or blocks, a new structure-level pruning method removes redundant operations in neural architectures. SwiftNet, discovered by GRAM, outperforms MobileNet-V2 by 2.15x higher accuracy density and 2.42x faster with similar accuracy, and reduces the search cost by 26x compared with FBNet while achieving 2.35x higher accuracy density and 1.47x speedup.
What carries the argument
GRAM, the graph propagation mechanism that accumulates search knowledge into a meta-graph to guide architecture choices at the node level.
Load-bearing premise
The meta-graph propagation transfers useful search knowledge across architectures rather than simply memorizing high-scoring candidates from the current run.
What would settle it
An experiment that disables the graph propagation while keeping the node-wise search space and measures whether the discovered models have lower accuracy density or require more search time.
Figures
read the original abstract
Designing neural architectures for edge devices is subject to constraints of accuracy, inference latency, and computational cost. Traditionally, researchers manually craft deep neural networks to meet the needs of mobile devices. Neural Architecture Search (NAS) was proposed to automate the neural architecture design without requiring extensive domain expertise and significant manual efforts. Recent works utilized NAS to design mobile models by taking into account hardware constraints and achieved state-of-the-art accuracy with fewer parameters and less computational cost measured in Multiply-accumulates (MACs). To find highly compact neural architectures, existing works relies on predefined cells and directly applying width multiplier, which may potentially limit the model flexibility, reduce the useful feature map information, and cause accuracy drop. To conquer this issue, we propose GRAM(GRAph propagation as Meta-knowledge) that adopts fine-grained (node-wise) search method and accumulates the knowledge learned in updates into a meta-graph. As a result, GRAM can enable more flexible search space and achieve higher search efficiency. Without the constraints of predefined cell or blocks, we propose a new structure-level pruning method to remove redundant operations in neural architectures. SwiftNet, which is a set of models discovered by GRAM, outperforms MobileNet-V2 by 2.15x higher accuracy density and 2.42x faster with similar accuracy. Compared with FBNet, SwiftNet reduces the search cost by 26x and achieves 2.35x higher accuracy density and 1.47x speedup while preserving similar accuracy. SwiftNetcan obtain 63.28% top-1 accuracy on ImageNet-1K with only 53M MACs and 2.07M parameters. The corresponding inference latency is only 19.09 ms on Google Pixel 1.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes GRAM (GRAph propagation as Meta-knowledge), a NAS method that accumulates updates into a meta-graph to support fine-grained node-wise search and structure-level pruning without relying on predefined cells or blocks. It presents SwiftNet models achieving 63.28% top-1 accuracy on ImageNet-1K with 53M MACs and 2.07M parameters, claiming 2.15× higher accuracy density than MobileNet-V2, 2.42× faster inference at similar accuracy, 26× lower search cost than FBNet, and 2.35× higher accuracy density with 1.47× speedup while preserving accuracy.
Significance. If the results are shown to be robust, the meta-graph approach could advance hardware-aware NAS by enabling more flexible search spaces and knowledge reuse across architectures, potentially reducing the reliance on cell-based designs for compact mobile models.
major comments (2)
- [Abstract] Abstract: the headline claims (63.28% top-1 accuracy at 53M MACs / 2.07M params, 2.15× accuracy density vs. MobileNet-V2, 26× search-cost reduction vs. FBNet) are stated without error bars, number of independent runs, or search hyper-parameters, preventing verification of statistical reliability.
- [Abstract] Abstract / Method description: no ablation is reported that disables the meta-graph propagation while retaining the node-wise search space and structure-level pruning; this is required to establish that the claimed efficiency gains arise from cross-architecture knowledge transfer rather than the expanded search space or pruning heuristic alone.
Simulated Author's Rebuttal
Thank you for your thorough review and constructive comments. We have carefully considered the points raised regarding the statistical reliability of our results and the need for an ablation study on the meta-graph component. Our responses are as follows.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claims (63.28% top-1 accuracy at 53M MACs / 2.07M params, 2.15× accuracy density vs. MobileNet-V2, 26× search-cost reduction vs. FBNet) are stated without error bars, number of independent runs, or search hyper-parameters, preventing verification of statistical reliability.
Authors: The search hyperparameters are detailed in the experimental setup section of the manuscript. We acknowledge the absence of error bars and multiple independent runs in the abstract, which stems from the high computational cost of performing multiple full NAS searches on ImageNet. We will revise the abstract to include the search hyperparameters and a note on the single-run nature of the results to improve verifiability. revision: yes
-
Referee: [Abstract] Abstract / Method description: no ablation is reported that disables the meta-graph propagation while retaining the node-wise search space and structure-level pruning; this is required to establish that the claimed efficiency gains arise from cross-architecture knowledge transfer rather than the expanded search space or pruning heuristic alone.
Authors: We agree that an ablation isolating the meta-graph propagation (while retaining node-wise search and pruning) would help confirm its contribution to efficiency gains. We will add this ablation study to the revised manuscript. revision: yes
Circularity Check
No circularity: purely empirical NAS claims with external benchmarks
full rationale
The paper offers no derivation chain, equations, or first-principles predictions. All claims consist of empirical accuracy, MACs, parameter counts, and latency numbers benchmarked against independent external models (MobileNet-V2, FBNet). The meta-graph propagation and pruning are presented as algorithmic choices whose value is asserted via direct comparison rather than any self-referential reduction or fitted-parameter prediction. No self-citations are invoked as load-bearing uniqueness theorems. The result is therefore self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r- cnn,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969, 2017. 1
work page 2017
-
[2]
Deep neural networks for acoustic modeling in speech recognition,
G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V . Vanhoucke, P. Nguyen, B. Kings- bury, et al., “Deep neural networks for acoustic modeling in speech recognition,” in IEEE Signal processing magazine , vol. 29, 2012. 1
work page 2012
-
[3]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in arXiv preprint arXiv:1810.04805 , 2018. 1
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[4]
Phrase-based & neural unsupervised machine transla- tion,
G. Lample, M. Ott, A. Conneau, L. Denoyer, and M. Ran- zato, “Phrase-based & neural unsupervised machine transla- tion,” in Proceedings of the Conference on Empirical Meth- ods in Natural Language Processing, 2018. 1
work page 2018
-
[5]
Neural architecture search with re- inforcement learning,
B. Zoph and Q. V . Le, “Neural architecture search with re- inforcement learning,” in Proceedings of the International Conference on Learning Representations, 2017. 1, 3
work page 2017
-
[6]
Large-scale evolution of image classifiers,
E. Real, S. Moore, A. Selle, S. Saxena, Y . L. Suematsu, J. Tan, Q. V . Le, and A. Kurakin, “Large-scale evolution of image classifiers,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pp. 2902– 2911, JMLR. org, 2017. 1
work page 2017
-
[7]
Efficient multi- objective neural architecture search via lamarckian evolu- tion,
T. Elsken, J. H. Metzen, and F. Hutter, “Efficient multi- objective neural architecture search via lamarckian evolu- tion,” in Proceedings of the International Conference on Learning Representations, 2019. 1
work page 2019
-
[8]
Learning transferable architectures for scalable image recognition,
B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710, 2018. 1, 2, 3, 6
work page 2018
-
[9]
Darts: Differentiable ar- chitecture search,
H. Liu, K. Simonyan, and Y . Yang, “Darts: Differentiable ar- chitecture search,” in Proceedings of the International Con- ference on Learning Representations, 2019. 1, 3
work page 2019
-
[10]
Fbnet: Hardware-aware ef- ficient convnet design via differentiable neural architecture search,
B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer, “Fbnet: Hardware-aware ef- ficient convnet design via differentiable neural architecture search,” in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition , 2019. 1, 2, 3, 4, 6, 7
work page 2019
-
[11]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Effi- cient convolutional neural networks for mobile vision appli- cations,” in arXiv preprint arXiv:1704.04861 , 2017. 1, 2, 5
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[12]
Morphnet: Fast & simple resource- constrained structure learning of deep networks,
A. Gordon, E. Eban, O. Nachum, B. Chen, H. Wu, T.-J. Yang, and E. Choi, “Morphnet: Fast & simple resource- constrained structure learning of deep networks,” inProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1586–1595, 2018. 2, 3
work page 2018
-
[13]
ProxylessNAS: Direct neural architecture search on target task and hardware,
H. Cai, L. Zhu, and S. Han, “ProxylessNAS: Direct neural architecture search on target task and hardware,” inProceed- ings of the International Conference on Learning Represen- tations, 2019. 2, 3
work page 2019
-
[14]
Chamnet: Towards efficient network design through platform-aware model adaptation,
X. Dai, P. Zhang, B. Wu, H. Yin, F. Sun, Y . Wang, M. Dukhan, Y . Hu, Y . Wu, Y . Jia, et al. , “Chamnet: Towards efficient network design through platform-aware model adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2, 3
work page 2019
-
[15]
Mobilenetv2: Inverted residuals and linear bottle- necks,
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottle- necks,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018. 2, 3, 5, 7
work page 2018
-
[16]
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” inarXiv preprint arXiv:1602.07360, 2016. 2
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[17]
Efficient neural architecture search via parameter sharing,
H. Pham, M. Y . Guan, B. Zoph, Q. V . Le, and J. Dean, “Efficient neural architecture search via parameter sharing,” in Proceedings of the International Conference on Machine Learning, 2018. 3
work page 2018
-
[18]
Snas: stochastic neu- ral architecture search,
S. Xie, H. Zheng, C. Liu, and L. Lin, “Snas: stochastic neu- ral architecture search,” in Proceedings of the International Conference on Learning Representations, 2019. 3
work page 2019
-
[19]
MnasNet: Platform-Aware Neural Architecture Search for Mobile
M. Tan, B. Chen, R. Pang, V . Vasudevan, and Q. V . Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in arXiv preprint arXiv:1807.11626, 2018. 3, 6
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[20]
Netadapt: Platform-aware neural network adaptation for mobile applications,
T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. San- dler, V . Sze, and H. Adam, “Netadapt: Platform-aware neural network adaptation for mobile applications,” in Proceedings of the European Conference on Computer Vision (ECCV) , pp. 285–300, 2018. 3
work page 2018
-
[21]
Amc: Automl for model compression and acceleration on mobile devices,
Y . He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800, 2018. 3
work page 2018
-
[22]
The graph neural network model,
F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” in IEEE Transactions on Neural Networks, vol. 20, pp. 61–80, IEEE,
-
[23]
Gated graph sequence neural networks,
Y . Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” in Proceedings of the In- ternational Conference on Learning Representations , 2016. 3
work page 2016
-
[24]
Semi-supervised classifica- tion with graph convolutional networks,
T. N. Kipf and M. Welling, “Semi-supervised classifica- tion with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations ,
-
[25]
Graph hypernet- works for neural architecture search,
C. Zhang, M. Ren, and R. Urtasun, “Graph hypernet- works for neural architecture search,” in arXiv preprint arXiv:1810.05749, 2018. 3
-
[26]
Fast and accurate deep network learning by exponential linear units (elus),
D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in Proceedings of the International Conference on Learning Representations, 2016. 4, 5
work page 2016
-
[27]
Gradient- based learning applied to document recognition,
Y . LeCun, L. Bottou, Y . Bengio, P. Haffner,et al., “Gradient- based learning applied to document recognition,” in Pro- ceedings of the IEEE, vol. 86, pp. 2278–2324, 1998. 4, 8
work page 1998
-
[28]
Regularized evolution for image classifier architecture search,
E. Real, A. Aggarwal, Y . Huang, and Q. V . Le, “Regularized evolution for image classifier architecture search,” in Pro- ceedings of the Association for the Advance of Artificial In- telligence Conference on Artificial Intelligence, 2019. 6
work page 2019
-
[29]
Learning multiple layers of features from tiny images,
A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” tech. rep., Citeseer, 2009. 6
work page 2009
-
[30]
On the importance of initialization and momentum in deep learning.,
I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the importance of initialization and momentum in deep learning.,” ICML (3), vol. 28, no. 1139-1147, p. 5, 2013. 6
work page 2013
-
[31]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009. 6
work page 2009
-
[32]
Rethinking the inception architecture for computer vision,
C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826, 2016. 6
work page 2016
-
[33]
Divide the gradient by a run- ning average of its recent magnitude. coursera neural netw,
T. Tieleman and G. Hinton, “Divide the gradient by a run- ning average of its recent magnitude. coursera neural netw,” Mach. Learn, 2012. 6
work page 2012
-
[34]
Densely connected convolutional networks,
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708, 2017. 6
work page 2017
-
[35]
V . Glezer, V . Ivanoff, and T. Tscherbach, “Investigation of complex and hypercomplex receptive fields of visual cortex of the cat as spatial frequency filters,” in Vision research, vol. 13, pp. 1875–IN6, Elsevier, 1973. 7
work page 1973
-
[36]
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning al- gorithms,” in arXiv preprint arXiv:1708.07747, 2017. 8
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[37]
Reading digits in natural images with unsupervised fea- ture learning,
Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised fea- ture learning,” in Proceesings of the NIPS workshop on deep learning and unsupervised feature learning, 2011. 8
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.