PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations

Alexandra Gallyas-Sanhueza; Christoph Studer; Maria Bobbett; Oscar Casta\~neda

arxiv: 1907.08641 · v1 · pith:RAX4GFPCnew · submitted 2019-07-19 · 💻 cs.AR

PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations

Oscar Casta\~neda , Maria Bobbett , Alexandra Gallyas-Sanhueza , Christoph Studer This is my paper

Pith reviewed 2026-05-24 18:46 UTC · model grok-4.3

classification 💻 cs.AR

keywords processing in memoryin-memory acceleratorcontent-addressable memorymatrix-vector productneural network accelerationcryptographyforward error correctionstandard-cell CMOS

0 comments

The pith

PPAC performs matrix-vector-product operations inside associative memory to accelerate neural networks, hash lookups, cryptography, and error correction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes PPAC, a digital in-memory accelerator built around associative content-addressable memory that supports a family of matrix-vector-product-like computations. It shows that one hardware structure can handle low-precision neural networks, exact and approximate hash lookups, cryptography, and forward error correction without custom analog circuits. A reader would care because the design uses only standard-cell CMOS, which simplifies automated layout and allows the same accelerator to move across technology nodes. Post-layout results in 28 nm demonstrate throughput and energy numbers competitive with both digital and mixed-signal PIM designs while covering a broader set of tasks. The central argument is that moving these specific operations into memory yields efficiency gains without sacrificing versatility or design portability.

Core claim

PPAC integrates parallel processing directly into content-addressable memory arrays so that matrix-vector-product-like operations execute in place, delivering acceleration for low-precision neural networks, exact/approximate hash lookups, cryptography, and forward error correction; because the architecture remains fully digital and standard-cell based, it supports automated design flows and straightforward porting between CMOS nodes.

What carries the argument

The Parallel Processor in Associative Content-addressable memory (PPAC) array, which embeds simple processing elements inside a content-addressable memory so that parallel multiply-accumulate-style operations occur locally without moving data off the memory array.

If this is right

A single PPAC instance can replace several specialized accelerators for the listed workloads.
Design teams can use standard digital tools and libraries rather than custom analog or mixed-signal flows.
The same PPAC layout can be reused when moving to a new process node without redesigning analog components.
Throughput and energy efficiency remain competitive with recent PIM accelerators while supporting more application types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Systems that already contain content-addressable memory could add PPAC-style processing with modest extra area.
The digital nature may simplify verification and testing compared with analog PIM approaches.
Edge devices that must run both inference and lightweight cryptography could share one accelerator block.

Load-bearing premise

Applications can be mapped to the PPAC array with low overhead and that post-layout simulations in 28 nm CMOS will accurately reflect the power, area, and speed of a fabricated chip.

What would settle it

Fabricate a PPAC test chip in 28 nm CMOS, map one of the claimed workloads (for example a small neural-network layer or a cryptographic primitive) onto it, and compare measured throughput and energy per operation against the post-layout predictions; a large gap would falsify the performance claims.

Figures

Figures reproduced from arXiv: 1907.08641 by Alexandra Gallyas-Sanhueza, Christoph Studer, Maria Bobbett, Oscar Casta\~neda.

**Figure 2.** Figure 2: Parallel Processor in Associative CAM (PPAC) architecture. (a) High-level architecture. (b) Each bit-cell includes an XNOR and an AND gate to [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Layout of the 256 × 256 PPAC with B = Bs = 16. All banks but one are colored using different shades of blue. For the gray bank, one row is shown in green, while the row memory and row ALU of another row are shown in orange and red, respectively. increasing the number of words M results in a higher area and power consumption than increasing the number of bits per word N by the same factor. This behavior is … view at source ↗

read the original abstract

Processing in memory (PIM) moves computation into memories with the goal of improving throughput and energy-efficiency compared to traditional von Neumann-based architectures. Most existing PIM architectures are either general-purpose but only support atomistic operations, or are specialized to accelerate a single task. We propose the Parallel Processor in Associative Content-addressable memory (PPAC), a novel in-memory accelerator that supports a range of matrix-vector-product (MVP)-like operations that find use in traditional and emerging applications. PPAC is, for example, able to accelerate low-precision neural networks, exact/approximate hash lookups, cryptography, and forward error correction. The fully-digital nature of PPAC enables its implementation with standard-cell-based CMOS, which facilitates automated design and portability among technology nodes. To demonstrate the efficacy of PPAC, we provide post-layout implementation results in 28nm CMOS for different array sizes. A comparison with recent digital and mixed-signal PIM accelerators reveals that PPAC is competitive in terms of throughput and energy-efficiency, while accelerating a wide range of applications and simplifying development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PPAC is a new digital PIM architecture that handles multiple MVP-like tasks with standard cells and posts competitive 28nm layout numbers.

read the letter

The punchline is that PPAC offers a versatile digital in-memory accelerator for several MVP-like tasks without relying on analog circuits or specialized cells. It supports low-precision neural networks, exact and approximate hash lookups, cryptography, and forward error correction, all while being implementable with standard-cell CMOS. The new contribution is the PPAC architecture itself, which combines content-addressable memory with parallel processing to handle these different operations. The paper demonstrates this with post-layout results in 28nm for various array sizes and shows competitive throughput and energy efficiency against recent digital and mixed-signal PIM designs. It does well by emphasizing portability and automated design flow, which is a real advantage over many PIM proposals that require custom layouts. The fully digital nature simplifies things and the comparisons are direct on the metrics that matter for accelerators. Soft spots are minor but worth noting. All results are from post-layout simulations, not silicon measurements, so correlation to actual chips remains to be seen. The mappings are illustrated for some workloads but not all, and the paper states this explicitly. There are no load-bearing flaws in the central claims or contradictions in the approach. This work is aimed at researchers and engineers in computer architecture focused on processing-in-memory and hardware acceleration for emerging applications. A reader looking for practical digital PIM options that span multiple domains would get concrete implementation data and comparisons from it. I would recommend sending it to peer review. The implementation results provide enough substance for referees to evaluate the design choices and performance claims properly.

Referee Report

0 major / 3 minor

Summary. The paper proposes PPAC, a fully digital in-memory accelerator based on associative content-addressable memory that supports a range of matrix-vector-product-like operations. It claims to accelerate low-precision neural networks, exact/approximate hash lookups, cryptography, and forward error correction. The architecture is implemented using standard-cell CMOS, with post-layout results reported in 28 nm for multiple array sizes; these results are positioned as competitive in throughput and energy efficiency against recent digital and mixed-signal PIM accelerators while offering greater application versatility and design portability.

Significance. If the post-layout results and application mappings hold, the work is significant because it demonstrates a standard-cell-based PIM design that avoids mixed-signal complexities while spanning multiple domains. Explicit acknowledgment that only post-layout data are supplied and that mappings are shown for a subset of workloads is a strength; the paper thereby avoids overclaiming fabricated-silicon performance. The approach of using associative memory for MVP-like operations across domains is a concrete contribution to portable PIM research.

minor comments (3)

[§4] §4 (Implementation): the post-layout area and power numbers for the 256×256 array are presented without an accompanying breakdown of the contribution from the associative CAM cells versus peripheral logic; adding this would strengthen the claim that the architecture scales efficiently.
[Table 2] Table 2: the energy-efficiency comparison lists PPAC against prior work but does not state whether the prior-work numbers were obtained under identical activity factors or workload assumptions; a footnote clarifying this would improve fairness.
[§5.2] §5.2 (Application mapping): the hash-lookup and FEC examples are illustrated at a high level; a short pseudocode or dataflow diagram for at least one mapping would make the “low overhead” claim easier to verify.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of PPAC, the recognition of its versatility across applications, and the recommendation for minor revision. The strengths noted regarding post-layout results and avoidance of overclaiming are appreciated.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript is an architecture and implementation paper describing a standard-cell digital PIM design (PPAC) for MVP-like operations. It supports its claims via explicit hardware mappings for a subset of workloads and post-layout results in 28 nm CMOS rather than any derivation chain, fitted parameters, or self-referential predictions. No equations, ansatzes, or uniqueness theorems appear that could reduce to inputs by construction; the central thesis rests on the described circuit organization and measured metrics, which are externally falsifiable via fabrication. Any self-citations are incidental and non-load-bearing.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the feasibility of mapping multiple application classes onto a single digital associative-memory structure and on the accuracy of post-layout area/power estimates in a commercial 28nm process; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption Standard-cell library and 28nm CMOS process models are accurate for post-layout estimation
Invoked when reporting post-layout results and comparisons.

invented entities (1)

PPAC architecture no independent evidence
purpose: Hardware structure enabling versatile MVP-like operations inside memory
New design introduced by the paper; no independent evidence outside the proposal itself.

pith-pipeline@v0.9.0 · 5732 in / 1291 out tokens · 23159 ms · 2026-05-24T18:46:32.135695+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PPAC ... supports a range of matrix-vector-product (MVP)-like operations ... Hamming similarity ... inner-product ... GF(2) MVPs, and programmable logic array (PLA) functionality.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

post-layout implementation results in 28 nm CMOS ... 256×256 PPAC array achieves 92 TOP/s at 4.15 fJ/OP

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Hitting the memory wall: Implications of the obvious,

W. Wulf and S. McKee, “Hitting the memory wall: Implications of the obvious,” ACM SIGARCH Computer Architecture News , vol. 23, no. 1, pp. 20–24, March 1995

work page 1995
[2]

Evolution of memory architecture,

R. Nair, “Evolution of memory architecture,” Proceedings of the IEEE , vol. 103, no. 8, pp. 1331–1345, August 2015

work page 2015
[3]

Compute caches,

S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and R. Das, “Compute caches,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA) , February 2017, pp. 481–492

work page 2017
[4]

Neural cache: Bit-serial in-cache acceleration of deep neural networks,

C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaauw, and R. Das, “Neural cache: Bit-serial in-cache acceleration of deep neural networks,” in Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA) , June 2018, pp. 383–396

work page 2018
[5]

AC-DIMM: Asso- ciative computing with STT-MRAM,

Q. Guo, X. Guo, R. Patel, E. ˙Ipek, and E. Friedman, “AC-DIMM: Asso- ciative computing with STT-MRAM,” in Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA) , June 2013, pp. 189–200

work page 2013
[6]

A microprocessor implemented in 65nm CMOS with conﬁgurable and bit-scalable accelerator for programmable in-memory computing,

H. Jia, Y . Tang, H. Valavi, J. Zhang, and N. Verma, “A microprocessor implemented in 65nm CMOS with conﬁgurable and bit-scalable accelerator for programmable in-memory computing,” arXiv preprint: 1811.04047 , pp. 1–10, November 2018. [Online]. Available: https://arxiv.org/abs/1811.04047

work page arXiv 2018
[7]

Characterization of an associative memory chip in 28 nm CMOS technology,

A. Annovi, G. Calderini, S. Capra, B. Checcucci, F. Crescioli, F. De Canio, G. Fedi, L. Frontini, M. Garci, C. Gentsos, T. Kubota, V . Liberali, F. Palla, J. Shojaii, C.-L. Sotiropoulou, A. Stabile, G. Traversi, and S. Viret, “Characterization of an associative memory chip in 28 nm CMOS technology,” in Proceedings of the IEEE International Symposium in Ci...

work page 2018
[8]

Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory,

D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay, “Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory,” in Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA) , June 2016, pp. 380–392

work page 2016
[9]

DRISA: A DRAM-based reconﬁgurable in-situ accelerator,

S. Li, D. Niu, K. Malladi, H. Zheng, B. Brennan, and Y . Xie, “DRISA: A DRAM-based reconﬁgurable in-situ accelerator,” in Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO) , October 2017, pp. 288–301

work page 2017
[10]

BRein memory: A single-chip binary/ternary reconﬁgurable in-memory deep neural network accelerator achieving 1.4 TOPS at 0.6 W,

K. Ando, K. Ueyoshi, K. Orimo, H. Yonekawa, S. Sato, H. Nakahara, S. Takameaeda-Yamazaki, M. Ikebe, T. Asai, T. Kuroda, and M. Mo- tomura, “BRein memory: A single-chip binary/ternary reconﬁgurable in-memory deep neural network accelerator achieving 1.4 TOPS at 0.6 W,” IEEE Journal Of Solid-State Circuits (JSSC) , vol. 53, no. 4, pp. 983–994, April 2018

work page 2018
[11]

Content-addresable memory (CAM) circuits and architectures: A tutorial and survey,

K. Pagiamtzis and A. Sheikholeslami, “Content-addresable memory (CAM) circuits and architectures: A tutorial and survey,” IEEE Journal Of Solid-State Circuits (JSSC) , vol. 41, no. 3, pp. 712–727, March 2006

work page 2006
[12]

VLSI implementation of routing tables: tries and CAMs,

T.-B. Pei and C. Zukowski, “VLSI implementation of routing tables: tries and CAMs,” in Proceedings of the IEEE Conference on Computer Communications (INFCOM) , April 1991, pp. 515–524

work page 1991
[13]

Highly-associative caches for low-power processors,

M. Zhang and K. Asanovi ´c, “Highly-associative caches for low-power processors,” in Kool Chips Workshop, IEEE/ACM International Sympo- sium on Microarchitecture (MICRO) , December 2000, pp. 1–6

work page 2000
[14]

Foster, Content Addressable Parallel Processors

C. Foster, Content Addressable Parallel Processors . John Wiley and Sons, Inc., 1976

work page 1976
[15]

A general-purpose CMOS associative processor IC and system,

C. Stormon, N. Troullinos, E. Saleh, A. Chavan, M. Brule, and J. Oldﬁeld, “A general-purpose CMOS associative processor IC and system,” IEEE Micro, vol. 12, no. 6, pp. 68–78, December 1992

work page 1992
[16]

Near-optimal hashing algorithms for approxi- mate nearest neighbor in high dimensions,

A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approxi- mate nearest neighbor in high dimensions,” Communications of the ACM , vol. 51, no. 1, pp. 117–122, January 2008

work page 2008
[17]

Bi- narized neural networks,

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y . Bengio, “Bi- narized neural networks,” in Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS) , December 2016, pp. 4114–4122

work page 2016
[18]

The STOne transform: Multi-resolution image enhancement and compressive video,

T. Goldstein, L. Xu, K. F. Kelly, and R. Baraniuk, “The STOne transform: Multi-resolution image enhancement and compressive video,” IEEE Transactions on Image Processing , vol. 24, no. 12, pp. 5581–5593, December 2015

work page 2015
[19]

An always-on 3.8µJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS,

D. Bankman, L. Yang, B. Moons, M. Verhelst, and B. Murmann, “An always-on 3.8µJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS,” in IEEE International Solid- State Circuits Conference (ISSCC) , February 2018, pp. 222–224

work page 2018
[20]

Daemen and V

J. Daemen and V . Rijmen, The design of Rijndael: AES - The Advanced Encryption Standard. Springer Science & Business Media, 2002

work page 2002
[21]

A high-throughput low-power soft bit-ﬂipping LDPC decoder in 28 nm FD-SOI,

K. Cushon, P. Larsson-Edefors, and P. Andrekson, “A high-throughput low-power soft bit-ﬂipping LDPC decoder in 28 nm FD-SOI,” in Proceedings of the IEEE European Solid State Circuits Conference (ESSCIRC), September 2018, pp. 102–105

work page 2018
[22]

Channel polarization: A method for constructing capacity- achieving codes,

E. Arıkan, “Channel polarization: A method for constructing capacity- achieving codes,” in IEEE International Symposium on Information Theory (ISIT) , July 2008, pp. 1173–1177

work page 2008
[23]

UNPU: An energy-efﬁcient deep neural network accelerator with fully variable weight bit precision,

J. Lee, C. Kin, S. Kang, D. Shin, S. Kim, and H.-J. Yoo, “UNPU: An energy-efﬁcient deep neural network accelerator with fully variable weight bit precision,” IEEE Journal of Solid-State Circuits (JSSC) , vol. 54, no. 1, pp. 173–185, January 2019

work page 2019
[24]

XNOR neural engine: A hardware accelerator IP for 21.6-fJ/op binary neural network inference,

F. Conti, P. D. Schiavone, and L. Benini, “XNOR neural engine: A hardware accelerator IP for 21.6-fJ/op binary neural network inference,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 37, no. 11, pp. 2940–2951, November 2018

work page 2018

[1] [1]

Hitting the memory wall: Implications of the obvious,

W. Wulf and S. McKee, “Hitting the memory wall: Implications of the obvious,” ACM SIGARCH Computer Architecture News , vol. 23, no. 1, pp. 20–24, March 1995

work page 1995

[2] [2]

Evolution of memory architecture,

R. Nair, “Evolution of memory architecture,” Proceedings of the IEEE , vol. 103, no. 8, pp. 1331–1345, August 2015

work page 2015

[3] [3]

Compute caches,

S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and R. Das, “Compute caches,” in Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA) , February 2017, pp. 481–492

work page 2017

[4] [4]

Neural cache: Bit-serial in-cache acceleration of deep neural networks,

C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaauw, and R. Das, “Neural cache: Bit-serial in-cache acceleration of deep neural networks,” in Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA) , June 2018, pp. 383–396

work page 2018

[5] [5]

AC-DIMM: Asso- ciative computing with STT-MRAM,

Q. Guo, X. Guo, R. Patel, E. ˙Ipek, and E. Friedman, “AC-DIMM: Asso- ciative computing with STT-MRAM,” in Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA) , June 2013, pp. 189–200

work page 2013

[6] [6]

A microprocessor implemented in 65nm CMOS with conﬁgurable and bit-scalable accelerator for programmable in-memory computing,

H. Jia, Y . Tang, H. Valavi, J. Zhang, and N. Verma, “A microprocessor implemented in 65nm CMOS with conﬁgurable and bit-scalable accelerator for programmable in-memory computing,” arXiv preprint: 1811.04047 , pp. 1–10, November 2018. [Online]. Available: https://arxiv.org/abs/1811.04047

work page arXiv 2018

[7] [7]

Characterization of an associative memory chip in 28 nm CMOS technology,

A. Annovi, G. Calderini, S. Capra, B. Checcucci, F. Crescioli, F. De Canio, G. Fedi, L. Frontini, M. Garci, C. Gentsos, T. Kubota, V . Liberali, F. Palla, J. Shojaii, C.-L. Sotiropoulou, A. Stabile, G. Traversi, and S. Viret, “Characterization of an associative memory chip in 28 nm CMOS technology,” in Proceedings of the IEEE International Symposium in Ci...

work page 2018

[8] [8]

Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory,

D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay, “Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory,” in Proceedings of the ACM/IEEE International Symposium on Computer Architecture (ISCA) , June 2016, pp. 380–392

work page 2016

[9] [9]

DRISA: A DRAM-based reconﬁgurable in-situ accelerator,

S. Li, D. Niu, K. Malladi, H. Zheng, B. Brennan, and Y . Xie, “DRISA: A DRAM-based reconﬁgurable in-situ accelerator,” in Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO) , October 2017, pp. 288–301

work page 2017

[10] [10]

BRein memory: A single-chip binary/ternary reconﬁgurable in-memory deep neural network accelerator achieving 1.4 TOPS at 0.6 W,

K. Ando, K. Ueyoshi, K. Orimo, H. Yonekawa, S. Sato, H. Nakahara, S. Takameaeda-Yamazaki, M. Ikebe, T. Asai, T. Kuroda, and M. Mo- tomura, “BRein memory: A single-chip binary/ternary reconﬁgurable in-memory deep neural network accelerator achieving 1.4 TOPS at 0.6 W,” IEEE Journal Of Solid-State Circuits (JSSC) , vol. 53, no. 4, pp. 983–994, April 2018

work page 2018

[11] [11]

Content-addresable memory (CAM) circuits and architectures: A tutorial and survey,

K. Pagiamtzis and A. Sheikholeslami, “Content-addresable memory (CAM) circuits and architectures: A tutorial and survey,” IEEE Journal Of Solid-State Circuits (JSSC) , vol. 41, no. 3, pp. 712–727, March 2006

work page 2006

[12] [12]

VLSI implementation of routing tables: tries and CAMs,

T.-B. Pei and C. Zukowski, “VLSI implementation of routing tables: tries and CAMs,” in Proceedings of the IEEE Conference on Computer Communications (INFCOM) , April 1991, pp. 515–524

work page 1991

[13] [13]

Highly-associative caches for low-power processors,

M. Zhang and K. Asanovi ´c, “Highly-associative caches for low-power processors,” in Kool Chips Workshop, IEEE/ACM International Sympo- sium on Microarchitecture (MICRO) , December 2000, pp. 1–6

work page 2000

[14] [14]

Foster, Content Addressable Parallel Processors

C. Foster, Content Addressable Parallel Processors . John Wiley and Sons, Inc., 1976

work page 1976

[15] [15]

A general-purpose CMOS associative processor IC and system,

C. Stormon, N. Troullinos, E. Saleh, A. Chavan, M. Brule, and J. Oldﬁeld, “A general-purpose CMOS associative processor IC and system,” IEEE Micro, vol. 12, no. 6, pp. 68–78, December 1992

work page 1992

[16] [16]

Near-optimal hashing algorithms for approxi- mate nearest neighbor in high dimensions,

A. Andoni and P. Indyk, “Near-optimal hashing algorithms for approxi- mate nearest neighbor in high dimensions,” Communications of the ACM , vol. 51, no. 1, pp. 117–122, January 2008

work page 2008

[17] [17]

Bi- narized neural networks,

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y . Bengio, “Bi- narized neural networks,” in Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS) , December 2016, pp. 4114–4122

work page 2016

[18] [18]

The STOne transform: Multi-resolution image enhancement and compressive video,

T. Goldstein, L. Xu, K. F. Kelly, and R. Baraniuk, “The STOne transform: Multi-resolution image enhancement and compressive video,” IEEE Transactions on Image Processing , vol. 24, no. 12, pp. 5581–5593, December 2015

work page 2015

[19] [19]

An always-on 3.8µJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS,

D. Bankman, L. Yang, B. Moons, M. Verhelst, and B. Murmann, “An always-on 3.8µJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS,” in IEEE International Solid- State Circuits Conference (ISSCC) , February 2018, pp. 222–224

work page 2018

[20] [20]

Daemen and V

J. Daemen and V . Rijmen, The design of Rijndael: AES - The Advanced Encryption Standard. Springer Science & Business Media, 2002

work page 2002

[21] [21]

A high-throughput low-power soft bit-ﬂipping LDPC decoder in 28 nm FD-SOI,

K. Cushon, P. Larsson-Edefors, and P. Andrekson, “A high-throughput low-power soft bit-ﬂipping LDPC decoder in 28 nm FD-SOI,” in Proceedings of the IEEE European Solid State Circuits Conference (ESSCIRC), September 2018, pp. 102–105

work page 2018

[22] [22]

Channel polarization: A method for constructing capacity- achieving codes,

E. Arıkan, “Channel polarization: A method for constructing capacity- achieving codes,” in IEEE International Symposium on Information Theory (ISIT) , July 2008, pp. 1173–1177

work page 2008

[23] [23]

UNPU: An energy-efﬁcient deep neural network accelerator with fully variable weight bit precision,

J. Lee, C. Kin, S. Kang, D. Shin, S. Kim, and H.-J. Yoo, “UNPU: An energy-efﬁcient deep neural network accelerator with fully variable weight bit precision,” IEEE Journal of Solid-State Circuits (JSSC) , vol. 54, no. 1, pp. 173–185, January 2019

work page 2019

[24] [24]

XNOR neural engine: A hardware accelerator IP for 21.6-fJ/op binary neural network inference,

F. Conti, P. D. Schiavone, and L. Benini, “XNOR neural engine: A hardware accelerator IP for 21.6-fJ/op binary neural network inference,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 37, no. 11, pp. 2940–2951, November 2018

work page 2018