pith. sign in

arxiv: 1906.08305 · v2 · pith:RNJDGRGZnew · submitted 2019-06-19 · 💻 cs.LG · cs.AI· cs.CV

SwiftNet: Using Graph Propagation as Meta-knowledge to Search Highly Representative Neural Architectures

Pith reviewed 2026-05-25 20:08 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CV
keywords neural architecture searchgraph propagationmeta-knowledgeedge computingImageNetmobile neural networksaccuracy densityarchitecture pruning
0
0 comments X

The pith

Graph propagation as meta-knowledge enables flexible node-wise neural architecture search without predefined cells, yielding SwiftNet models with higher accuracy density and lower search costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes GRAM to use graph propagation for accumulating meta-knowledge in neural architecture search. This supports a fine-grained node-wise search space and a structure-level pruning method to remove redundant operations. The resulting SwiftNet models achieve 2.15 times higher accuracy density than MobileNet-V2 and reduce search cost by 26 times compared to FBNet, while reaching 63.28 percent top-1 accuracy on ImageNet with few parameters and MACs. This approach matters for automating the design of efficient networks for edge devices under accuracy and cost constraints.

Core claim

GRAM adopts fine-grained node-wise search and accumulates the knowledge learned in updates into a meta-graph. As a result, GRAM can enable more flexible search space and achieve higher search efficiency. Without the constraints of predefined cell or blocks, a new structure-level pruning method removes redundant operations in neural architectures. SwiftNet, discovered by GRAM, outperforms MobileNet-V2 by 2.15x higher accuracy density and 2.42x faster with similar accuracy, and reduces the search cost by 26x compared with FBNet while achieving 2.35x higher accuracy density and 1.47x speedup.

What carries the argument

GRAM, the graph propagation mechanism that accumulates search knowledge into a meta-graph to guide architecture choices at the node level.

Load-bearing premise

The meta-graph propagation transfers useful search knowledge across architectures rather than simply memorizing high-scoring candidates from the current run.

What would settle it

An experiment that disables the graph propagation while keeping the node-wise search space and measures whether the discovered models have lower accuracy density or require more search time.

Figures

Figures reproduced from arXiv: 1906.08305 by Feng Yan, Hai Li, Harris Teague, Hsin-Pai Cheng, Shiyu Li, Tunhou Zhang, Yiran Chen, Yukun Yang.

Figure 1
Figure 1. Figure 1: ImageNet-1K top-1 accuracy density vs Model MACs. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview diagram of the search process. To form a sampled DNN, we subsample multiple DAGs from the complete DAG. After [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A representative architecture discovered by GRAM. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: ImageNet-1K top-1 accuracy density comparison be [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Model scale vs Structure pruning level [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Trade-off between latency and accuracy on ImageNet-1K [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Designing neural architectures for edge devices is subject to constraints of accuracy, inference latency, and computational cost. Traditionally, researchers manually craft deep neural networks to meet the needs of mobile devices. Neural Architecture Search (NAS) was proposed to automate the neural architecture design without requiring extensive domain expertise and significant manual efforts. Recent works utilized NAS to design mobile models by taking into account hardware constraints and achieved state-of-the-art accuracy with fewer parameters and less computational cost measured in Multiply-accumulates (MACs). To find highly compact neural architectures, existing works relies on predefined cells and directly applying width multiplier, which may potentially limit the model flexibility, reduce the useful feature map information, and cause accuracy drop. To conquer this issue, we propose GRAM(GRAph propagation as Meta-knowledge) that adopts fine-grained (node-wise) search method and accumulates the knowledge learned in updates into a meta-graph. As a result, GRAM can enable more flexible search space and achieve higher search efficiency. Without the constraints of predefined cell or blocks, we propose a new structure-level pruning method to remove redundant operations in neural architectures. SwiftNet, which is a set of models discovered by GRAM, outperforms MobileNet-V2 by 2.15x higher accuracy density and 2.42x faster with similar accuracy. Compared with FBNet, SwiftNet reduces the search cost by 26x and achieves 2.35x higher accuracy density and 1.47x speedup while preserving similar accuracy. SwiftNetcan obtain 63.28% top-1 accuracy on ImageNet-1K with only 53M MACs and 2.07M parameters. The corresponding inference latency is only 19.09 ms on Google Pixel 1.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes GRAM (GRAph propagation as Meta-knowledge), a NAS method that accumulates updates into a meta-graph to support fine-grained node-wise search and structure-level pruning without relying on predefined cells or blocks. It presents SwiftNet models achieving 63.28% top-1 accuracy on ImageNet-1K with 53M MACs and 2.07M parameters, claiming 2.15× higher accuracy density than MobileNet-V2, 2.42× faster inference at similar accuracy, 26× lower search cost than FBNet, and 2.35× higher accuracy density with 1.47× speedup while preserving accuracy.

Significance. If the results are shown to be robust, the meta-graph approach could advance hardware-aware NAS by enabling more flexible search spaces and knowledge reuse across architectures, potentially reducing the reliance on cell-based designs for compact mobile models.

major comments (2)
  1. [Abstract] Abstract: the headline claims (63.28% top-1 accuracy at 53M MACs / 2.07M params, 2.15× accuracy density vs. MobileNet-V2, 26× search-cost reduction vs. FBNet) are stated without error bars, number of independent runs, or search hyper-parameters, preventing verification of statistical reliability.
  2. [Abstract] Abstract / Method description: no ablation is reported that disables the meta-graph propagation while retaining the node-wise search space and structure-level pruning; this is required to establish that the claimed efficiency gains arise from cross-architecture knowledge transfer rather than the expanded search space or pruning heuristic alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for your thorough review and constructive comments. We have carefully considered the points raised regarding the statistical reliability of our results and the need for an ablation study on the meta-graph component. Our responses are as follows.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the headline claims (63.28% top-1 accuracy at 53M MACs / 2.07M params, 2.15× accuracy density vs. MobileNet-V2, 26× search-cost reduction vs. FBNet) are stated without error bars, number of independent runs, or search hyper-parameters, preventing verification of statistical reliability.

    Authors: The search hyperparameters are detailed in the experimental setup section of the manuscript. We acknowledge the absence of error bars and multiple independent runs in the abstract, which stems from the high computational cost of performing multiple full NAS searches on ImageNet. We will revise the abstract to include the search hyperparameters and a note on the single-run nature of the results to improve verifiability. revision: yes

  2. Referee: [Abstract] Abstract / Method description: no ablation is reported that disables the meta-graph propagation while retaining the node-wise search space and structure-level pruning; this is required to establish that the claimed efficiency gains arise from cross-architecture knowledge transfer rather than the expanded search space or pruning heuristic alone.

    Authors: We agree that an ablation isolating the meta-graph propagation (while retaining node-wise search and pruning) would help confirm its contribution to efficiency gains. We will add this ablation study to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical NAS claims with external benchmarks

full rationale

The paper offers no derivation chain, equations, or first-principles predictions. All claims consist of empirical accuracy, MACs, parameter counts, and latency numbers benchmarked against independent external models (MobileNet-V2, FBNet). The meta-graph propagation and pruning are presented as algorithmic choices whose value is asserted via direct comparison rather than any self-referential reduction or fitted-parameter prediction. No self-citations are invoked as load-bearing uniqueness theorems. The result is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract supplies no explicit free parameters, axioms, or invented entities; all claims rest on the unstated assumption that the reported ImageNet numbers were obtained under standard training protocols and that the meta-graph mechanism improves search without additional hidden tuning.

pith-pipeline@v0.9.0 · 5877 in / 1249 out tokens · 19786 ms · 2026-05-25T20:08:50.807992+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 5 internal anchors

  1. [1]

    Mask r- cnn,

    K. He, G. Gkioxari, P. Doll ´ar, and R. Girshick, “Mask r- cnn,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969, 2017. 1

  2. [2]

    Deep neural networks for acoustic modeling in speech recognition,

    G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V . Vanhoucke, P. Nguyen, B. Kings- bury, et al., “Deep neural networks for acoustic modeling in speech recognition,” in IEEE Signal processing magazine , vol. 29, 2012. 1

  3. [3]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in arXiv preprint arXiv:1810.04805 , 2018. 1

  4. [4]

    Phrase-based & neural unsupervised machine transla- tion,

    G. Lample, M. Ott, A. Conneau, L. Denoyer, and M. Ran- zato, “Phrase-based & neural unsupervised machine transla- tion,” in Proceedings of the Conference on Empirical Meth- ods in Natural Language Processing, 2018. 1

  5. [5]

    Neural architecture search with re- inforcement learning,

    B. Zoph and Q. V . Le, “Neural architecture search with re- inforcement learning,” in Proceedings of the International Conference on Learning Representations, 2017. 1, 3

  6. [6]

    Large-scale evolution of image classifiers,

    E. Real, S. Moore, A. Selle, S. Saxena, Y . L. Suematsu, J. Tan, Q. V . Le, and A. Kurakin, “Large-scale evolution of image classifiers,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pp. 2902– 2911, JMLR. org, 2017. 1

  7. [7]

    Efficient multi- objective neural architecture search via lamarckian evolu- tion,

    T. Elsken, J. H. Metzen, and F. Hutter, “Efficient multi- objective neural architecture search via lamarckian evolu- tion,” in Proceedings of the International Conference on Learning Representations, 2019. 1

  8. [8]

    Learning transferable architectures for scalable image recognition,

    B. Zoph, V . Vasudevan, J. Shlens, and Q. V . Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710, 2018. 1, 2, 3, 6

  9. [9]

    Darts: Differentiable ar- chitecture search,

    H. Liu, K. Simonyan, and Y . Yang, “Darts: Differentiable ar- chitecture search,” in Proceedings of the International Con- ference on Learning Representations, 2019. 1, 3

  10. [10]

    Fbnet: Hardware-aware ef- ficient convnet design via differentiable neural architecture search,

    B. Wu, X. Dai, P. Zhang, Y . Wang, F. Sun, Y . Wu, Y . Tian, P. Vajda, Y . Jia, and K. Keutzer, “Fbnet: Hardware-aware ef- ficient convnet design via differentiable neural architecture search,” in Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition , 2019. 1, 2, 3, 4, 6, 7

  11. [11]

    MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Effi- cient convolutional neural networks for mobile vision appli- cations,” in arXiv preprint arXiv:1704.04861 , 2017. 1, 2, 5

  12. [12]

    Morphnet: Fast & simple resource- constrained structure learning of deep networks,

    A. Gordon, E. Eban, O. Nachum, B. Chen, H. Wu, T.-J. Yang, and E. Choi, “Morphnet: Fast & simple resource- constrained structure learning of deep networks,” inProceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1586–1595, 2018. 2, 3

  13. [13]

    ProxylessNAS: Direct neural architecture search on target task and hardware,

    H. Cai, L. Zhu, and S. Han, “ProxylessNAS: Direct neural architecture search on target task and hardware,” inProceed- ings of the International Conference on Learning Represen- tations, 2019. 2, 3

  14. [14]

    Chamnet: Towards efficient network design through platform-aware model adaptation,

    X. Dai, P. Zhang, B. Wu, H. Yin, F. Sun, Y . Wang, M. Dukhan, Y . Hu, Y . Wu, Y . Jia, et al. , “Chamnet: Towards efficient network design through platform-aware model adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 2, 3

  15. [15]

    Mobilenetv2: Inverted residuals and linear bottle- necks,

    M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottle- necks,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018. 2, 3, 5, 7

  16. [16]

    SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

    F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” inarXiv preprint arXiv:1602.07360, 2016. 2

  17. [17]

    Efficient neural architecture search via parameter sharing,

    H. Pham, M. Y . Guan, B. Zoph, Q. V . Le, and J. Dean, “Efficient neural architecture search via parameter sharing,” in Proceedings of the International Conference on Machine Learning, 2018. 3

  18. [18]

    Snas: stochastic neu- ral architecture search,

    S. Xie, H. Zheng, C. Liu, and L. Lin, “Snas: stochastic neu- ral architecture search,” in Proceedings of the International Conference on Learning Representations, 2019. 3

  19. [19]

    MnasNet: Platform-Aware Neural Architecture Search for Mobile

    M. Tan, B. Chen, R. Pang, V . Vasudevan, and Q. V . Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in arXiv preprint arXiv:1807.11626, 2018. 3, 6

  20. [20]

    Netadapt: Platform-aware neural network adaptation for mobile applications,

    T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. San- dler, V . Sze, and H. Adam, “Netadapt: Platform-aware neural network adaptation for mobile applications,” in Proceedings of the European Conference on Computer Vision (ECCV) , pp. 285–300, 2018. 3

  21. [21]

    Amc: Automl for model compression and acceleration on mobile devices,

    Y . He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800, 2018. 3

  22. [22]

    The graph neural network model,

    F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” in IEEE Transactions on Neural Networks, vol. 20, pp. 61–80, IEEE,

  23. [23]

    Gated graph sequence neural networks,

    Y . Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” in Proceedings of the In- ternational Conference on Learning Representations , 2016. 3

  24. [24]

    Semi-supervised classifica- tion with graph convolutional networks,

    T. N. Kipf and M. Welling, “Semi-supervised classifica- tion with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations ,

  25. [25]

    Graph hypernet- works for neural architecture search,

    C. Zhang, M. Ren, and R. Urtasun, “Graph hypernet- works for neural architecture search,” in arXiv preprint arXiv:1810.05749, 2018. 3

  26. [26]

    Fast and accurate deep network learning by exponential linear units (elus),

    D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in Proceedings of the International Conference on Learning Representations, 2016. 4, 5

  27. [27]

    Gradient- based learning applied to document recognition,

    Y . LeCun, L. Bottou, Y . Bengio, P. Haffner,et al., “Gradient- based learning applied to document recognition,” in Pro- ceedings of the IEEE, vol. 86, pp. 2278–2324, 1998. 4, 8

  28. [28]

    Regularized evolution for image classifier architecture search,

    E. Real, A. Aggarwal, Y . Huang, and Q. V . Le, “Regularized evolution for image classifier architecture search,” in Pro- ceedings of the Association for the Advance of Artificial In- telligence Conference on Artificial Intelligence, 2019. 6

  29. [29]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” tech. rep., Citeseer, 2009. 6

  30. [30]

    On the importance of initialization and momentum in deep learning.,

    I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton, “On the importance of initialization and momentum in deep learning.,” ICML (3), vol. 28, no. 1139-1147, p. 5, 2013. 6

  31. [31]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255, Ieee, 2009. 6

  32. [32]

    Rethinking the inception architecture for computer vision,

    C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826, 2016. 6

  33. [33]

    Divide the gradient by a run- ning average of its recent magnitude. coursera neural netw,

    T. Tieleman and G. Hinton, “Divide the gradient by a run- ning average of its recent magnitude. coursera neural netw,” Mach. Learn, 2012. 6

  34. [34]

    Densely connected convolutional networks,

    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708, 2017. 6

  35. [35]

    Investigation of complex and hypercomplex receptive fields of visual cortex of the cat as spatial frequency filters,

    V . Glezer, V . Ivanoff, and T. Tscherbach, “Investigation of complex and hypercomplex receptive fields of visual cortex of the cat as spatial frequency filters,” in Vision research, vol. 13, pp. 1875–IN6, Elsevier, 1973. 7

  36. [36]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    H. Xiao, K. Rasul, and R. V ollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning al- gorithms,” in arXiv preprint arXiv:1708.07747, 2017. 8

  37. [37]

    Reading digits in natural images with unsupervised fea- ture learning,

    Y . Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y . Ng, “Reading digits in natural images with unsupervised fea- ture learning,” in Proceesings of the NIPS workshop on deep learning and unsupervised feature learning, 2011. 8