DARTH-PUM: A Hybrid Processing-Using-Memory Architecture

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Analog processing-using-memory (PUM; a.k.a. in-memory computing) makes use of electrical interactions inside memory arrays to perform bulk matrix-vector multiplication (MVM) operations. However, many popular matrix-based kernels need to execute non-MVM operations, which analog PUM cannot directly perform. To retain its energy efficiency, analog PUM architectures augment memory arrays with CMOS-based domain-specific fixed-function hardware to provide complete kernel functionality, but the difficulty of integrating such specialized CMOS logic with memory arrays has largely limited analog PUM to being an accelerator for machine learning inference, or for closely related kernels. An opportunity exists to harness analog PUM for general-purpose computation: recent works have shown that memory arrays can also perform Boolean PUM operations, albeit with very different supporting hardware and electrical signals than analog PUM. We propose DARTH-PUM, a general-purpose hybrid PUM architecture that tackles key hardware and software challenges to integrating analog PUM and digital PUM. We propose optimized peripheral circuitry, coordinating hardware to manage and interface between both types of PUM, an easy-to-use programming interface, and low-cost support for flexible data widths. These design elements allow us to build a practical PUM architecture that can execute kernels fully in memory, and can scale easily to cater to domains ranging from embedded applications to large-scale data-driven computing. We show how three popular applications (AES encryption, convolutional neural networks, large-language models) can map to and benefit from DARTH-PUM, with speedups of 59.4x, 14.8x, and 40.8x over an analog+CPU baseline.

💡 Research Summary

DARTH‑PUM introduces a truly hybrid processing‑using‑memory (PUM) architecture that unifies analog and digital in‑memory computing on a single chip. The authors identify two longstanding limitations: analog PUM excels at matrix‑vector multiplication (MVM) but struggles with non‑MVM operations, requiring costly CMOS‑based fixed‑function units; digital PUM, while capable of general‑purpose Boolean logic, delivers far lower throughput for large‑scale linear algebra. DARTH‑PUM resolves this by tightly coupling analog and digital tiles, providing a “best‑of‑both‑worlds” platform where each tile performs the operations it handles most efficiently.

Key architectural contributions include: (1) a physically adjacent tile layout with high‑bandwidth inter‑tile links that minimize data movement; (2) a shared, dynamically rate‑matched peripheral subsystem that combines SAR and ramp ADCs with configurable DACs, reducing the area and power overhead of analog‑digital conversion; (3) a domain‑specific language (DSL) and API that let programmers describe kernels in a hardware‑agnostic way, automatically inserting bit‑slicing, sign handling, and conversion steps; (4) flexible support for a wide range of operand widths (4‑12 bits) via consistent bit‑slicing across both analog and digital domains, eliminating the need for multiple high‑resolution DACs.

The paper evaluates three representative workloads. In AES encryption, the key‑schedule matrix operations are offloaded to the analog tile while the bit‑level round functions run on the digital tile, achieving a 59.4× speedup and 39.6× energy reduction versus an analog‑plus‑CPU baseline. For convolutional neural networks, convolutions are performed as MVMs in the analog tile, and activation, pooling, and batch‑norm are handled by digital logic, yielding 14.8× speedup and 51.2× energy savings. In large‑language‑model inference (transformer), attention and feed‑forward matrix multiplications are accelerated in analog, whereas softmax, scaling, and token selection are executed digitally, delivering a 40.8× speedup and 110.7× energy reduction.

A detailed analysis of peripheral circuitry shows how the mixed‑ADC approach balances conversion latency against precision, while dynamic rate‑matching aligns the production rate of analog results with the consumption rate of digital logic, preventing bottlenecks. The authors also discuss negative‑number representation (offset subtraction vs. differential pairs) and demonstrate that differential pairs provide better resilience to parasitic effects.

Although the prototype is built with ReRAM devices, the authors argue that the hybrid concept is technology‑agnostic and can be mapped to SRAM, DRAM, PCM, or MRAM with appropriate adjustments to the logic family. Consequently, DARPH‑PUM can scale from low‑power embedded systems to data‑center accelerators, offering a unified in‑memory computing substrate that eliminates the need for external CPUs or application‑specific ASICs.

In summary, DARTH‑PUM advances the state of the art by integrating analog MVM efficiency with digital Boolean flexibility, delivering order‑of‑magnitude improvements in performance and energy across diverse domains. Its modular peripheral design, flexible programming model, and support for variable bit‑widths make it a compelling candidate for the next generation of memory‑centric computing platforms.

DARTH-PUM: A Hybrid Processing-Using-Memory Architecture

💡 Research Summary

Comments & Academic Discussion

Leave a Comment