One-Cycle Reconfigurable In-Memory Logic for Non-Volatile Memory
Today’s computers are typically based on von-Neumann architecture with separate computing and memory units connected via buses. This configuration leads to high memory access latency, limited memory bandwidth, energy-hungry data transfer, and significant leakage power from holding data in volatile memory. These bottlenecks may be successfully addressed with a processing-in-memory (PIM) architecture, in which data is processed entirely in computer memory. It has been shown that many applications in deep learning, graph processing, and bioinformatics rely heavily on bulk bit-wise addition and comparison operations. However, due to the intrinsic complexity of X(N)OR logic, the throughput of PIM platforms unavoidably diminishes when dealing with such bulk bit-wise operations. This is because these functions are constructed in a multi-cycle fashion, where intermediate data-write-back brings extra latency and energy consumption. Thus, a single-cycle in-memory computing circuit capable of realizing various Boolean logic and full-adder outputs may be pivotal to PIM advancement.
Researchers at Arizona State University have developed a PIM design that converts any memory sub-array based on non-volatile resistive bit-cells into a potential processing unit. The memory includes the data matrix stored in terms of resistive states of memory cells. Through modifying peripheral circuits, the address decoder receives three addresses and activates three memory rows with resistive bit-cells (i.e., data operands). In this way, three bitcells are activated in each memory bit-line and sensed simultaneously, leading to different parallel resistive levels at the sense amplifier side. By selecting different reference resistance levels and a modified sense amplifier, a full set of single-cycle 1-/2-/3-input reconfigurable complete Boolean logic and full-adder outputs could be intrinsically read out based on input operand data in the memory array.
• In-memory processing
• Deep learning
• Graph processing
• Computational biology
Benefits and Advantages
• Small area overhead (<10%)
• Reconfigurable complete Boolean logic—(N)AND, (N)OR, X(N)OR—and majority function are readily achieved in only one memory cycle
• Full-adder function achievable in only one sensing cycle