WAGONN: Weight Bit Agglomeration in Crossbar Arrays for Reduced Impact of Interconnect Resistance on DNN Inference Accuracy
Pith reviewed 2026-05-23 23:58 UTC · model grok-4.3
The pith
Weight shuffling in crossbar arrays counters interconnect resistance and lifts DNN accuracy from 48% to 83.5%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SWANN is a weight shuffling technique in crossbar arrays which alleviates the detrimental effect of wire resistance on in-memory computing, enhancing accuracy from 47.78% to 83.5% for ResNet-20/CIFAR-10 while incurring less than 1% energy increase and about 1% latency and 16% area overhead with one ADC per array.
What carries the argument
Weight shuffling (SWANN) within crossbar arrays that rearranges stored weights to reduce the impact of interconnect resistance on analog current summation during inference.
If this is right
- Combining SWANN with Partial-Word-Line Activation yields further accuracy gains beyond the 83.5% baseline.
- The same shuffling approach applies to ferroelectric-transistor crossbar arrays with comparable overhead.
- Energy consumption rises by less than 1% while latency stays near 1% and area at 16% for one-ADC-per-array designs.
Where Pith is reading between the lines
- If shuffling logic can be made reconfigurable at runtime, it could adapt to process variation or temperature changes without redesign.
- Designers might prioritize crossbar sizes and ADC counts differently once resistance mitigation is available through mapping rather than circuit changes.
Load-bearing premise
Weight shuffling can be realized in hardware using only the reported overheads and without new accuracy or timing errors that cancel the gains.
What would settle it
Fabricating and testing a 7nm 128x128 8T-SRAM crossbar array with an implemented weight-shuffling controller and measuring whether end-to-end accuracy reaches 83.5% at the stated power and latency cost.
Figures
read the original abstract
Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms. However, in deeply scaled technologies, interconnect resistance severely impairs IMC robustness, leading to a drop in the system accuracy. To address this problem, we propose SWANN - a technique based on shuffling weights in crossbar arrays which alleviates the detrimental effect of wire resistance on IMC. For 8T-SRAM-based 128x128 crossbar arrays in 7nm technology, SWANN enhances the accuracy from 47.78% to 83.5% for ResNet-20/CIFAR-10. We also show that SWANN can be used synergistically with Partial-Word-LineActivation, further boosting the accuracy. Moreover, we evaluate the implications of SWANN for compact ferroelectric-transistorbased crossbar arrays. SWANN incurs minimal hardware overhead, with less than a 1% increase in energy consumption. Additionally, the latency and area overheads of SWANN are ~1% and ~16%, respectively when 1 ADC is utilized per crossbar array.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes SWANN (noted as WAGONN in the title), a weight-shuffling technique for crossbar arrays in in-memory computing (IMC) DNN accelerators to reduce the detrimental effects of interconnect resistance. For 8T-SRAM-based 128x128 arrays in 7nm technology, it reports an accuracy improvement from 47.78% to 83.5% on ResNet-20/CIFAR-10, with further gains when combined with Partial-Word-Line Activation; the approach is also evaluated on ferroelectric-transistor-based arrays. Overheads are stated as <1% energy, ~1% latency, and ~16% area (with 1 ADC per array).
Significance. If the central accuracy claims are substantiated with full implementation details and re-validated simulations, the result would be relevant for practical IMC accelerators in scaled nodes, where wire resistance is a known limiter. The low-overhead claim and potential synergy with existing techniques would strengthen its utility if the hardware mapping does not introduce offsetting parasitics.
major comments (2)
- Abstract and method description: The weight-shuffling procedure (including the concrete algorithm, placement rules, and re-extraction of the resistance matrix after shuffling) is not described. This leaves the headline accuracy gain (47.78% to 83.5%) unverifiable against potential new errors from added muxes, control lines, or timing skew.
- Abstract and results: No simulation setup, error bars, exclusion criteria, or post-shuffling re-simulation that includes the shuffler hardware are supplied, so the reported accuracy numbers cannot be checked against the claim that net improvement is preserved.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight the need for greater explicitness in the method description and simulation details to ensure verifiability of the accuracy results. We will revise the manuscript to address these points by expanding the relevant sections with additional concrete information, while preserving the core technical contributions.
read point-by-point responses
-
Referee: Abstract and method description: The weight-shuffling procedure (including the concrete algorithm, placement rules, and re-extraction of the resistance matrix after shuffling) is not described. This leaves the headline accuracy gain (47.78% to 83.5%) unverifiable against potential new errors from added muxes, control lines, or timing skew.
Authors: The manuscript provides the SWANN algorithm description, placement rules, and resistance matrix handling in Section III, including how weights are agglomerated to mitigate interconnect effects. However, we agree that the abstract is high-level and that explicit pseudocode, a step-by-step placement example, and direct discussion of re-extraction after shuffling would improve clarity. We will also add analysis showing that the introduced muxes and control lines do not offset the accuracy gains, as their parasitics are accounted for in the post-mapping simulations. These additions will be included in the revised version. revision: yes
-
Referee: Abstract and results: No simulation setup, error bars, exclusion criteria, or post-shuffling re-simulation that includes the shuffler hardware are supplied, so the reported accuracy numbers cannot be checked against the claim that net improvement is preserved.
Authors: Section IV details the 7nm 8T-SRAM 128x128 array setup, ResNet-20/CIFAR-10 evaluation, and comparison with Partial-Word-Line Activation, along with the ferroelectric transistor case. We acknowledge that error bars, explicit exclusion criteria, and a dedicated post-shuffling re-simulation incorporating shuffler hardware (muxes, timing) are not presented with sufficient prominence. In revision we will add a simulation parameters table, report variability across runs where applicable, and include results from hardware-inclusive re-simulations confirming the accuracy improvement is retained after overheads. revision: yes
Circularity Check
No circularity; accuracy gains reported from external simulation of proposed shuffling
full rationale
The manuscript proposes SWANN as a weight-shuffling technique to reduce interconnect resistance impact in SRAM crossbar arrays and reports accuracy numbers (47.78% to 83.5% on ResNet-20/CIFAR-10) obtained via simulation. No equations, fitted parameters presented as predictions, self-definitional relations, or load-bearing self-citations appear in the provided text. The central claim rests on an external simulation benchmark rather than reducing to its own inputs by construction, satisfying the self-contained criterion.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
GENIEx: A Generalized Approach to Emulating Non-Ideality in Memristive Xbars using Neural Networks,
I. Chakraborty, M. Fayez Ali, D. Eun Kim, A. Ankit, and K. Roy, “GENIEx: A Generalized Approach to Emulating Non-Ideality in Memristive Xbars using Neural Networks,” in 2020 57th ACM/IEEE Design Automation Conference (DAC) , 2020, pp. 1 –6. doi: 10.1109/DAC1807 2.2020.9218688
-
[2]
TiM -DNN: Ternary In -Memory Accelerator for Deep Neural Networks,
S. Jain, S. K. Gupta, and A. Raghunathan, “TiM -DNN: Ternary In -Memory Accelerator for Deep Neural Networks,” IEEE Trans Very Large Scale Integr VLSI Syst, vol. 28, no. 7, pp. 1567 –1577, 2020, doi: 10.1109/TVLSI.2020.2993045
-
[3]
Benchmarking TPU, GPU, and CPU Platforms for Deep Learning
Y. Wang , G. -Y. Wei, D. Brooks, and J. A. Paulson, “Benchmarking TPU, GPU, and CPU Platforms for Deep Learning.”
-
[4]
X-Former: In -Memory Acceleration of Transformers,
S. Sridharan, J. R. Stevens, K. Roy, and A. Raghunathan, “X-Former: In -Memory Acceleration of Transformers,” IEEE Trans Very Large Scale Integr VLS I Syst, vol. 31, no. 8, pp. 1223 –1233, 2023, doi: 10.1109/TVLSI.2023.3282046
-
[5]
S. K. Thirumala, S. Jain, S. K. Gupta, and A. Raghunathan, “Ternary Compute -Enabled Memory using Ferroelectric Transistors for Accelerating Deep Neural Networks,” in 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) , 2020, pp. 31 –36. doi: 10.23919/DATE48585.2020.9116495
-
[6]
Simultaneous Local- ization and Actuation Using Electromagnetic Navigation Systems,
F. H. Meng and W. D. Lu, “Compute -In-Memory Technologies for Deep Learning Acceleration,” IEEE Nanotechnol Mag , vol. 18, no. 1, pp. 44–52, Feb. 2024, doi: 10.1109/MNANO.2023.3340321
-
[7]
PUMA: A Programmable Ultra -Efficient Memristor -Based Accelerator for Machine Learning Inference,
A. Ankit et al., “PUMA: A Programmable Ultra -Efficient Memristor -Based Accelerator for Machine Learning Inference,” in Proceedings of the Twenty -Fourth International Conference on Architectural Support for Programming Languages and Operating Systems , in ASPLOS ’19. New York, NY, USA: Association for Computing Machinery, 2019, pp. 715 –731. doi: 10.114...
-
[8]
C. Wang, J. Victor, and S. K. Gupta, “Design Space Exploration and Comparative Eva luation of Memory Technologies for Synaptic Crossbar Arrays: Device - Circuit Non -Idealities and System Accuracy”
-
[9]
Modeling and Circuit Analysis of Interconnects with TaS2 Barrier/Liner,
X. Chen, C. -L. Lo, M. C. Johnson, Z. Chen, and S. K. Gupta, “Modeling and Circuit Analysis of Interconnects with TaS2 Barrier/Liner,” in 2021 Device Research Conference (DRC) , 2021, pp. 1 –2. doi: 10.1109/DRC52342.2021.9467160
-
[10]
G. Bonilla, N. Lanzillo, C. -K. Hu, C. J. Penny, and A. Kumar, “Interconnect scaling challenges, and opportunities to enable system -level performance beyond 30 nm p itch,” in 2020 IEEE International Electron Devices Meeting (IEDM) , 2020, pp. 20.4.1 -20.4.4. doi: 10.1109/IEDM13553.2020.9372093
-
[11]
Interconnect scaling: Challenges and opportunities,
R. Brain, “Interconnect scaling: Challenges and opportunities,” in 2016 IEEE International Electron Devices Meeting (IEDM) , 2016, pp. 9.3.1 -9.3.4. doi: 10.1109/IEDM.2016.7838381
-
[12]
Vortex: Variation -aware training for memristor X -bar,
B. Liu, H. Li, Y. Chen, X. Li, Q. Wu, and T. Huang, “Vortex: Variation -aware training for memristor X -bar,” in 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC) , 2015, pp. 1 –6. doi: 10.1145/2744769.2744930
-
[13]
Reduction and IR -drop compensations techniques for reliable neuromorphic computing systems,
B. Liu et al. , “Reduction and IR -drop compensations techniques for reliable neuromorphic computing systems,” in 2014 IEEE/ACM International Conference on Computer -Aided Design (ICCAD) , 2014, pp. 63 –70. doi: 10.1109/ICCAD.2014.7001330
-
[14]
A. Bhattacharjee, L. Bhatnagar, Y. Kim, and P. Panda, “NEAT: Nonlinearity Aware Training for Accurate, Energy-Efficient, and Robust Implementation of Neural Networks on 1T -1R Crossbars,” Trans. Comp.-Aided Des. Integ. Cir. Sys., vol. 41, no. 8, pp. 2625 –2637, Aug. 2022, doi: 10.1109/TCAD.2021.3109857
-
[15]
A. Bhattacharjee, L. Bhatnagar, and P. Panda, “Examining and Mitigating the Impact of Crossbar Non -Idealities for Accurate Implementation of Sparse Deep Neural Networks,” in Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe , in DATE ’22. Leuven, BEL: European Design and Automation Association, 2022, pp. 1119 –1122
work page 2022
-
[16]
Effect of Device Variation on Mapping Binary Neural Network to Memristor Crossbar Array,
W. Yi, Y. Kim, and J.-J. Kim, “Effect of Device Variation on Mapping Binary Neural Network to Memristor Crossbar Array,” in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE) , 2019, pp. 320–
work page 2019
-
[17]
doi: 10.23919/DATE.2019.8714817
-
[18]
A. Agrawal, C. Lee, and K. Roy, “X-CHANGR: Changing Memristive Crossbar Mapping for Mitigating Line - Resistance Induced Accuracy Degradation in Deep Neural Networks.”
-
[19]
R. Beringer, M. J. G. Castle, and E. H. Sondheimer, “CON DUCT I V IT Y OF TH I N METALL I C F ILMS I am indebted for prepublication use of a part of their d ata on oxygen. P H YSI CAL The &~6uence of a Transverse Magnetic Field on the Conductivity of Thin Metallic Films,” 1950
work page 1950
-
[20]
A. F. Mayadas and M. Shatzkes, “Electrical -Resistivity Model for Polycrystalline Films: the Case of Arbitrary Re6ection at Extern al Surfaces,” 1970
work page 1970
-
[21]
Modeling and Benchmarking Back End Of The Line Technologies on Circuit Designs at Advanced Nodes,
V. Huang, D. E. Shim, J. Kim, S. Pentapati, S. K. Lim, and A. Naeemi, “Modeling and Benchmarking Back End Of The Line Technologies on Circuit Designs at Advanced Nodes,” in 2020 IEEE International Interconnect Technology Conference (IITC) , 2020, pp. 37 –39. doi: 10.1109/IITC47697.2020.9515629
-
[22]
ASAP7: A 7 -nm finFET predictive process design kit,
L. T. Clark et al. , “ASAP7: A 7 -nm finFET predictive process design kit,” Microelectronics J , vol. 53, pp. 105 – 115, Jul. 2016, doi: 10.1016/j.mejo.2016.04.006
-
[23]
Modeling and Comparative Analysis of Hysteretic Ferroelectric and Anti -ferroelectric FETs,
A. K. Saha an d S. K. Gupta, “Modeling and Comparative Analysis of Hysteretic Ferroelectric and Anti -ferroelectric FETs,” in 2018 76th Device Research Conference (DRC) , 2018, pp. 1 –2. doi: 10.1109/DRC.2018.8442136
-
[24]
In-Memory Computing Primitive for S ensor Data Fusion in 28 nm HKMG FeFET Technology,
K. Ni et al., “In-Memory Computing Primitive for S ensor Data Fusion in 28 nm HKMG FeFET Technology,” in 2018 IEEE International Electron Devices Meeting (IEDM), 2018, pp. 16.1.1 -16.1.4. doi: 10.1109/IEDM.2018.8614527. [24] “https://en.wikichip.org/wiki/7_nm_lithography_process .” Accessed: Nov. 27, 2023. [ Online]. Available: https://en.wikichip.org/wik...
-
[25]
ISAAC: A Convolutional Neural Network Accelerator with In -Situ Analog Arithmetic in Crossbars,
A. Shafiee et al. , “ISAAC: A Convolutional Neural Network Accelerator with In -Situ Analog Arithmetic in Crossbars,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) , 2016, pp. 14 –26. doi: 10.1109/ISCA.2016.12
-
[26]
SAMBA: Sparsity Aware In -Memory Computing Based Machine Learning Accelerator,
D. E. Kim, A. Ankit, C . Wang, and K. Roy, “SAMBA: Sparsity Aware In -Memory Computing Based Machine Learning Accelerator,” IEEE Transactions on Computers , vol. 72, no. 9, pp. 2615 –2627, 2023, doi: 10.1109/TC.2023.3257513
-
[27]
TAICHI: A Tiled Architecture for In - Memory Computing and Heterogeneous Integration,
X. Wang et al. , “TAICHI: A Tiled Architecture for In - Memory Computing and Heterogeneous Integration,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 2, pp. 559 –563, 2022, doi: 10.1109/TCSII.2021.3097035. Jeffry Victor is a PhD candidate at the Department of Electrical and Computer Engineering, Purdue University, under the s...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.