A Low-Cost Reliable Racetrack Cache Based on Data Compression

Reading time: 5 minute
...

📝 Original Info

  • Title: A Low-Cost Reliable Racetrack Cache Based on Data Compression
  • ArXiv ID: 2512.01915
  • Date: 2025-12-01
  • Authors: Elham Cheshmikhani, Fateme Shokouhinia, Hamed Farbeh

📝 Abstract

SRAM-based cache memory faces several scalability limitations in deep nanoscale technologies, e.g., high leakage current, low cell stability, and low density. Emerging Non-Volatile Memory (NVM) technologies have received lots of attention in recent years, where Racetrack Memory (RTM) is among the most promising ones. RTM has the highest density among all NVMs and its access performance is comparable to SRAM technology. Therefore, RTM is a suitable alternative for SRAM in the Last-Level Caches (LLCs). Despite all its benefits, RTM confronts different reliability challenges due to the stochastic behavior of its storage element and highly error-prone data shifting, leading to a high probability of multiple-bit errors. Conventional Error-Correcting Codes (ECCs) are either incapable of tolerating multiple-bit errors or require a large amount of extra storage for check bits. This paper proposes taking advantage of value locality for compressing data blocks and freeing up a large fraction of cache blocks for storing data redundancy of strong ECCs. Utilizing the proposed scheme, a large majority of cache blocks are protected by strong ECCs to tolerate multiple-bit errors without any storage overhead. The evaluation using gem5 full-system simulator demonstrates that the proposed scheme enhances the mean-time-to-failure of the cache by an average of 11.3x with less than 1% hardware and performance overhead.

💡 Deep Analysis

Figure 1

📄 Full Content

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS (TCAS-II), VOL. 14, NO. 8, OCTOBER 2023 1 A Low-Cost Reliable Racetrack Cache Based on Data Compression Elham Cheshmikhani, Fateme Shokouhinia, and Hamed Farbeh, Member, IEEE, Abstract—SRAM-based cache memory faces several scalability limitations in deep nanoscale technologies, e.g., high leakage current, low cell stability, and low density. Emerging Non-Volatile Memory (NVM) technologies have received lots of attention in recent years, where Racetrack Memory (RTM) is among the most promising ones. RTM has the highest density among all NVMs and its access performance is comparable to SRAM technology. Therefore, RTM is a suitable alternative for SRAM in the Last- Level Caches (LLCs). Despite all its benefits, RTM confronts different reliability challenges due to the stochastic behavior of its storage element and highly error-prone data shifting, leading to a high probability of multiple-bit errors. Conventional Error-Correcting Codes (ECCs) are either incapable of tolerating multiple-bit errors or require a large amount of extra storage for check bits. This paper proposes taking advantage of value locality for compressing data blocks and freeing up a large fraction of cache blocks for storing data redundancy of strong ECCs. Utilizing the proposed scheme, a large majority of cache blocks are protected by strong ECCs to tolerate multiple-bit errors without any storage overhead. The evaluation using gem5 full-system simulator demonstrates that the proposed scheme enhances the mean-time-to-failure of the cache by an average of 11.3x with less than 1% hardware and performance overhead. Index Terms—Cache Memory, Racetrack Memory (RTM), Reliability, Error-Correcting Codes (ECCs), Shift Error. I. INTRODUCTION In recent years, with technology’s downscaling trend and high-performance AI applications, there has been a surge in re- search and development efforts aimed at improving the perfor- mance and efficiency of computing systems. Cache memories as a vital component of modern computing architectures play a vital role in bridging the gap between fast but small processor registers and slow but large main memory. Traditionally, cache designs have focused on utilizing Static Random-Access Memory (SRAM) cells to store frequently accessed data. However, due to the limitations of scaling, increasing power demands, and their volatility, emerging Non-Volatile Memories (NVMs) including Racetrack Memory (RTM), also known as Domain-Wall Memory (DWM), have gained recognition as a potential replacement for SRAM in cache designs. RTM caches leverage the physics of magnetic domains to store and retrieve data, offering several potential advantages over conventional SRAM caches. These advantages include extremely higher storage density, lower power consumption, non-volatility, and reduced susceptibility to radiation-induced E. Cheshmikhani is with the Department of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran. E-mail: e cheshmikhani@sbu.ac.ir F. Shokouhinia is with the Department of Computing Science, Simon Farser University, BC, Canada. Email: fateme shokouhinia@sfu.ca H. Farbeh is with the Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran. E-mail: farbeh@aut.ac.ir Manuscript received October 25, 2023. errors, making them particularly appealing for both low- power and high-performance computing systems [1]. However, despite these promising benefits, RTM faces several reliability challenges including shift errors and domain-wall tilting [2]. The core principle behind domain RTM caches is the uti- lization of nanoscale magnetic domains, called domain walls, to represent and store binary data [3]. The movement of these domain walls along a magnetic nanowire creates resistance variations that can be sensed and interpreted as digital states [4]. Each domain contains a data bit accessed via access ports. As there are few access ports for several domains, shifting domains is required to be aligned with the access ports. The wrong number of shifts, named shift error, results in accessing the wrong data in RTM. Tilting error refers to the misalignment of cells in the domain [2], [5]. In addition to RTM-specific errors, all error sources in Spin-Transfer Torque MRAM (STT-MRAM), as the RTM predecessor, including retention failure, write failure, and read disturbance are also probable to occur in RTMs, which make RTM as the most error-prone memory technology [6]. Several studies addressed various sources of errors in RTM and tried to mitigate their occurrence rate [3], [4], [7]–[12]. However, to make it applicable, an error protection/recovery mechanism is necessary besides the efforts for error rate reduc- tion. Employing Error-Correcting Codes (ECCs) is the most widespread approach for LLC safeguarding. However, the con- ventional Single-Error Correction and Double-Error Detection (SEC-DED) codes do not offer enough pr

📸 Image Gallery

1.png 3.png IPC1.png MTTF2.png Proposed.png architecture.png breakdown1.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut