pith. sign in

arxiv: 2406.14706 · v2 · submitted 2024-06-20 · 💻 cs.ET · cs.AR

WAGONN: Weight Bit Agglomeration in Crossbar Arrays for Reduced Impact of Interconnect Resistance on DNN Inference Accuracy

Pith reviewed 2026-05-23 23:58 UTC · model grok-4.3

classification 💻 cs.ET cs.AR
keywords crossbar arraysweight shufflinginterconnect resistancein-memory computingDNN acceleratorsaccuracy recovery8T-SRAM
0
0 comments X

The pith

Weight shuffling in crossbar arrays counters interconnect resistance and lifts DNN accuracy from 48% to 83.5%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SWANN, a weight-shuffling method for crossbar arrays used in in-memory DNN accelerators, to reduce the accuracy degradation caused by wire resistance in scaled process nodes. Simulations for 128x128 8T-SRAM arrays in 7nm technology show the approach raises ResNet-20 accuracy on CIFAR-10 from 47.78% to 83.5% while adding under 1% energy, 1% latency, and 16% area overhead when one ADC serves each array. The same shuffling can be combined with Partial-Word-Line Activation and extends to ferroelectric-transistor crossbars with similarly low overhead.

Core claim

SWANN is a weight shuffling technique in crossbar arrays which alleviates the detrimental effect of wire resistance on in-memory computing, enhancing accuracy from 47.78% to 83.5% for ResNet-20/CIFAR-10 while incurring less than 1% energy increase and about 1% latency and 16% area overhead with one ADC per array.

What carries the argument

Weight shuffling (SWANN) within crossbar arrays that rearranges stored weights to reduce the impact of interconnect resistance on analog current summation during inference.

If this is right

  • Combining SWANN with Partial-Word-Line Activation yields further accuracy gains beyond the 83.5% baseline.
  • The same shuffling approach applies to ferroelectric-transistor crossbar arrays with comparable overhead.
  • Energy consumption rises by less than 1% while latency stays near 1% and area at 16% for one-ADC-per-array designs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If shuffling logic can be made reconfigurable at runtime, it could adapt to process variation or temperature changes without redesign.
  • Designers might prioritize crossbar sizes and ADC counts differently once resistance mitigation is available through mapping rather than circuit changes.

Load-bearing premise

Weight shuffling can be realized in hardware using only the reported overheads and without new accuracy or timing errors that cancel the gains.

What would settle it

Fabricating and testing a 7nm 128x128 8T-SRAM crossbar array with an implemented weight-shuffling controller and measuring whether end-to-end accuracy reaches 83.5% at the stated power and latency cost.

Figures

Figures reproduced from arXiv: 2406.14706 by Chunguang Wang, Dong Eun Kim, Jeffry Victor, Kaushik Roy, Sumeet Gupta.

Figure 8
Figure 8. Figure 8: Effect of variations on baseline design and [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: SAMBA Architecture for DNN accelerators [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
read the original abstract

Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms. However, in deeply scaled technologies, interconnect resistance severely impairs IMC robustness, leading to a drop in the system accuracy. To address this problem, we propose SWANN - a technique based on shuffling weights in crossbar arrays which alleviates the detrimental effect of wire resistance on IMC. For 8T-SRAM-based 128x128 crossbar arrays in 7nm technology, SWANN enhances the accuracy from 47.78% to 83.5% for ResNet-20/CIFAR-10. We also show that SWANN can be used synergistically with Partial-Word-LineActivation, further boosting the accuracy. Moreover, we evaluate the implications of SWANN for compact ferroelectric-transistorbased crossbar arrays. SWANN incurs minimal hardware overhead, with less than a 1% increase in energy consumption. Additionally, the latency and area overheads of SWANN are ~1% and ~16%, respectively when 1 ADC is utilized per crossbar array.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes SWANN (noted as WAGONN in the title), a weight-shuffling technique for crossbar arrays in in-memory computing (IMC) DNN accelerators to reduce the detrimental effects of interconnect resistance. For 8T-SRAM-based 128x128 arrays in 7nm technology, it reports an accuracy improvement from 47.78% to 83.5% on ResNet-20/CIFAR-10, with further gains when combined with Partial-Word-Line Activation; the approach is also evaluated on ferroelectric-transistor-based arrays. Overheads are stated as <1% energy, ~1% latency, and ~16% area (with 1 ADC per array).

Significance. If the central accuracy claims are substantiated with full implementation details and re-validated simulations, the result would be relevant for practical IMC accelerators in scaled nodes, where wire resistance is a known limiter. The low-overhead claim and potential synergy with existing techniques would strengthen its utility if the hardware mapping does not introduce offsetting parasitics.

major comments (2)
  1. Abstract and method description: The weight-shuffling procedure (including the concrete algorithm, placement rules, and re-extraction of the resistance matrix after shuffling) is not described. This leaves the headline accuracy gain (47.78% to 83.5%) unverifiable against potential new errors from added muxes, control lines, or timing skew.
  2. Abstract and results: No simulation setup, error bars, exclusion criteria, or post-shuffling re-simulation that includes the shuffler hardware are supplied, so the reported accuracy numbers cannot be checked against the claim that net improvement is preserved.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight the need for greater explicitness in the method description and simulation details to ensure verifiability of the accuracy results. We will revise the manuscript to address these points by expanding the relevant sections with additional concrete information, while preserving the core technical contributions.

read point-by-point responses
  1. Referee: Abstract and method description: The weight-shuffling procedure (including the concrete algorithm, placement rules, and re-extraction of the resistance matrix after shuffling) is not described. This leaves the headline accuracy gain (47.78% to 83.5%) unverifiable against potential new errors from added muxes, control lines, or timing skew.

    Authors: The manuscript provides the SWANN algorithm description, placement rules, and resistance matrix handling in Section III, including how weights are agglomerated to mitigate interconnect effects. However, we agree that the abstract is high-level and that explicit pseudocode, a step-by-step placement example, and direct discussion of re-extraction after shuffling would improve clarity. We will also add analysis showing that the introduced muxes and control lines do not offset the accuracy gains, as their parasitics are accounted for in the post-mapping simulations. These additions will be included in the revised version. revision: yes

  2. Referee: Abstract and results: No simulation setup, error bars, exclusion criteria, or post-shuffling re-simulation that includes the shuffler hardware are supplied, so the reported accuracy numbers cannot be checked against the claim that net improvement is preserved.

    Authors: Section IV details the 7nm 8T-SRAM 128x128 array setup, ResNet-20/CIFAR-10 evaluation, and comparison with Partial-Word-Line Activation, along with the ferroelectric transistor case. We acknowledge that error bars, explicit exclusion criteria, and a dedicated post-shuffling re-simulation incorporating shuffler hardware (muxes, timing) are not presented with sufficient prominence. In revision we will add a simulation parameters table, report variability across runs where applicable, and include results from hardware-inclusive re-simulations confirming the accuracy improvement is retained after overheads. revision: yes

Circularity Check

0 steps flagged

No circularity; accuracy gains reported from external simulation of proposed shuffling

full rationale

The manuscript proposes SWANN as a weight-shuffling technique to reduce interconnect resistance impact in SRAM crossbar arrays and reports accuracy numbers (47.78% to 83.5% on ResNet-20/CIFAR-10) obtained via simulation. No equations, fitted parameters presented as predictions, self-definitional relations, or load-bearing self-citations appear in the provided text. The central claim rests on an external simulation benchmark rather than reducing to its own inputs by construction, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no equations, methods section, or modeling assumptions can be inspected, so the ledger cannot be populated beyond noting the absence of information.

pith-pipeline@v0.9.0 · 5748 in / 1073 out tokens · 19048 ms · 2026-05-23T23:58:04.027618+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    GENIEx: A Generalized Approach to Emulating Non-Ideality in Memristive Xbars using Neural Networks,

    I. Chakraborty, M. Fayez Ali, D. Eun Kim, A. Ankit, and K. Roy, “GENIEx: A Generalized Approach to Emulating Non-Ideality in Memristive Xbars using Neural Networks,” in 2020 57th ACM/IEEE Design Automation Conference (DAC) , 2020, pp. 1 –6. doi: 10.1109/DAC1807 2.2020.9218688

  2. [2]

    TiM -DNN: Ternary In -Memory Accelerator for Deep Neural Networks,

    S. Jain, S. K. Gupta, and A. Raghunathan, “TiM -DNN: Ternary In -Memory Accelerator for Deep Neural Networks,” IEEE Trans Very Large Scale Integr VLSI Syst, vol. 28, no. 7, pp. 1567 –1577, 2020, doi: 10.1109/TVLSI.2020.2993045

  3. [3]

    Benchmarking TPU, GPU, and CPU Platforms for Deep Learning

    Y. Wang , G. -Y. Wei, D. Brooks, and J. A. Paulson, “Benchmarking TPU, GPU, and CPU Platforms for Deep Learning.”

  4. [4]

    X-Former: In -Memory Acceleration of Transformers,

    S. Sridharan, J. R. Stevens, K. Roy, and A. Raghunathan, “X-Former: In -Memory Acceleration of Transformers,” IEEE Trans Very Large Scale Integr VLS I Syst, vol. 31, no. 8, pp. 1223 –1233, 2023, doi: 10.1109/TVLSI.2023.3282046

  5. [5]

    Ternary Compute -Enabled Memory using Ferroelectric Transistors for Accelerating Deep Neural Networks,

    S. K. Thirumala, S. Jain, S. K. Gupta, and A. Raghunathan, “Ternary Compute -Enabled Memory using Ferroelectric Transistors for Accelerating Deep Neural Networks,” in 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) , 2020, pp. 31 –36. doi: 10.23919/DATE48585.2020.9116495

  6. [6]

    Simultaneous Local- ization and Actuation Using Electromagnetic Navigation Systems,

    F. H. Meng and W. D. Lu, “Compute -In-Memory Technologies for Deep Learning Acceleration,” IEEE Nanotechnol Mag , vol. 18, no. 1, pp. 44–52, Feb. 2024, doi: 10.1109/MNANO.2023.3340321

  7. [7]

    PUMA: A Programmable Ultra -Efficient Memristor -Based Accelerator for Machine Learning Inference,

    A. Ankit et al., “PUMA: A Programmable Ultra -Efficient Memristor -Based Accelerator for Machine Learning Inference,” in Proceedings of the Twenty -Fourth International Conference on Architectural Support for Programming Languages and Operating Systems , in ASPLOS ’19. New York, NY, USA: Association for Computing Machinery, 2019, pp. 715 –731. doi: 10.114...

  8. [8]

    Design Space Exploration and Comparative Eva luation of Memory Technologies for Synaptic Crossbar Arrays: Device - Circuit Non -Idealities and System Accuracy

    C. Wang, J. Victor, and S. K. Gupta, “Design Space Exploration and Comparative Eva luation of Memory Technologies for Synaptic Crossbar Arrays: Device - Circuit Non -Idealities and System Accuracy”

  9. [9]

    Modeling and Circuit Analysis of Interconnects with TaS2 Barrier/Liner,

    X. Chen, C. -L. Lo, M. C. Johnson, Z. Chen, and S. K. Gupta, “Modeling and Circuit Analysis of Interconnects with TaS2 Barrier/Liner,” in 2021 Device Research Conference (DRC) , 2021, pp. 1 –2. doi: 10.1109/DRC52342.2021.9467160

  10. [10]

    Interconnect scaling challenges, and opportunities to enable system -level performance beyond 30 nm p itch,

    G. Bonilla, N. Lanzillo, C. -K. Hu, C. J. Penny, and A. Kumar, “Interconnect scaling challenges, and opportunities to enable system -level performance beyond 30 nm p itch,” in 2020 IEEE International Electron Devices Meeting (IEDM) , 2020, pp. 20.4.1 -20.4.4. doi: 10.1109/IEDM13553.2020.9372093

  11. [11]

    Interconnect scaling: Challenges and opportunities,

    R. Brain, “Interconnect scaling: Challenges and opportunities,” in 2016 IEEE International Electron Devices Meeting (IEDM) , 2016, pp. 9.3.1 -9.3.4. doi: 10.1109/IEDM.2016.7838381

  12. [12]

    Vortex: Variation -aware training for memristor X -bar,

    B. Liu, H. Li, Y. Chen, X. Li, Q. Wu, and T. Huang, “Vortex: Variation -aware training for memristor X -bar,” in 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC) , 2015, pp. 1 –6. doi: 10.1145/2744769.2744930

  13. [13]

    Reduction and IR -drop compensations techniques for reliable neuromorphic computing systems,

    B. Liu et al. , “Reduction and IR -drop compensations techniques for reliable neuromorphic computing systems,” in 2014 IEEE/ACM International Conference on Computer -Aided Design (ICCAD) , 2014, pp. 63 –70. doi: 10.1109/ICCAD.2014.7001330

  14. [14]

    NEAT: Nonlinearity Aware Training for Accurate, Energy-Efficient, and Robust Implementation of Neural Networks on 1T -1R Crossbars,

    A. Bhattacharjee, L. Bhatnagar, Y. Kim, and P. Panda, “NEAT: Nonlinearity Aware Training for Accurate, Energy-Efficient, and Robust Implementation of Neural Networks on 1T -1R Crossbars,” Trans. Comp.-Aided Des. Integ. Cir. Sys., vol. 41, no. 8, pp. 2625 –2637, Aug. 2022, doi: 10.1109/TCAD.2021.3109857

  15. [15]

    Examining and Mitigating the Impact of Crossbar Non -Idealities for Accurate Implementation of Sparse Deep Neural Networks,

    A. Bhattacharjee, L. Bhatnagar, and P. Panda, “Examining and Mitigating the Impact of Crossbar Non -Idealities for Accurate Implementation of Sparse Deep Neural Networks,” in Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe , in DATE ’22. Leuven, BEL: European Design and Automation Association, 2022, pp. 1119 –1122

  16. [16]

    Effect of Device Variation on Mapping Binary Neural Network to Memristor Crossbar Array,

    W. Yi, Y. Kim, and J.-J. Kim, “Effect of Device Variation on Mapping Binary Neural Network to Memristor Crossbar Array,” in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) , 2019, pp. 320–

  17. [17]

    doi: 10.23919/DATE.2019.8714817

  18. [18]

    X-CHANGR: Changing Memristive Crossbar Mapping for Mitigating Line - Resistance Induced Accuracy Degradation in Deep Neural Networks

    A. Agrawal, C. Lee, and K. Roy, “X-CHANGR: Changing Memristive Crossbar Mapping for Mitigating Line - Resistance Induced Accuracy Degradation in Deep Neural Networks.”

  19. [19]

    Beringer, M

    R. Beringer, M. J. G. Castle, and E. H. Sondheimer, “CON DUCT I V IT Y OF TH I N METALL I C F ILMS I am indebted for prepublication use of a part of their d ata on oxygen. P H YSI CAL The &~6uence of a Transverse Magnetic Field on the Conductivity of Thin Metallic Films,” 1950

  20. [20]

    Electrical -Resistivity Model for Polycrystalline Films: the Case of Arbitrary Re6ection at Extern al Surfaces,

    A. F. Mayadas and M. Shatzkes, “Electrical -Resistivity Model for Polycrystalline Films: the Case of Arbitrary Re6ection at Extern al Surfaces,” 1970

  21. [21]

    Modeling and Benchmarking Back End Of The Line Technologies on Circuit Designs at Advanced Nodes,

    V. Huang, D. E. Shim, J. Kim, S. Pentapati, S. K. Lim, and A. Naeemi, “Modeling and Benchmarking Back End Of The Line Technologies on Circuit Designs at Advanced Nodes,” in 2020 IEEE International Interconnect Technology Conference (IITC) , 2020, pp. 37 –39. doi: 10.1109/IITC47697.2020.9515629

  22. [22]

    ASAP7: A 7 -nm finFET predictive process design kit,

    L. T. Clark et al. , “ASAP7: A 7 -nm finFET predictive process design kit,” Microelectronics J , vol. 53, pp. 105 – 115, Jul. 2016, doi: 10.1016/j.mejo.2016.04.006

  23. [23]

    Modeling and Comparative Analysis of Hysteretic Ferroelectric and Anti -ferroelectric FETs,

    A. K. Saha an d S. K. Gupta, “Modeling and Comparative Analysis of Hysteretic Ferroelectric and Anti -ferroelectric FETs,” in 2018 76th Device Research Conference (DRC) , 2018, pp. 1 –2. doi: 10.1109/DRC.2018.8442136

  24. [24]

    In-Memory Computing Primitive for S ensor Data Fusion in 28 nm HKMG FeFET Technology,

    K. Ni et al., “In-Memory Computing Primitive for S ensor Data Fusion in 28 nm HKMG FeFET Technology,” in 2018 IEEE International Electron Devices Meeting (IEDM), 2018, pp. 16.1.1 -16.1.4. doi: 10.1109/IEDM.2018.8614527. [24] “https://en.wikichip.org/wiki/7_nm_lithography_process .” Accessed: Nov. 27, 2023. [ Online]. Available: https://en.wikichip.org/wik...

  25. [25]

    ISAAC: A Convolutional Neural Network Accelerator with In -Situ Analog Arithmetic in Crossbars,

    A. Shafiee et al. , “ISAAC: A Convolutional Neural Network Accelerator with In -Situ Analog Arithmetic in Crossbars,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) , 2016, pp. 14 –26. doi: 10.1109/ISCA.2016.12

  26. [26]

    SAMBA: Sparsity Aware In -Memory Computing Based Machine Learning Accelerator,

    D. E. Kim, A. Ankit, C . Wang, and K. Roy, “SAMBA: Sparsity Aware In -Memory Computing Based Machine Learning Accelerator,” IEEE Transactions on Computers , vol. 72, no. 9, pp. 2615 –2627, 2023, doi: 10.1109/TC.2023.3257513

  27. [27]

    TAICHI: A Tiled Architecture for In - Memory Computing and Heterogeneous Integration,

    X. Wang et al. , “TAICHI: A Tiled Architecture for In - Memory Computing and Heterogeneous Integration,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 2, pp. 559 –563, 2022, doi: 10.1109/TCSII.2021.3097035. Jeffry Victor is a PhD candidate at the Department of Electrical and Computer Engineering, Purdue University, under the s...