CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers
Memory controller scheduling is crucial in multicore processors, where DRAM bandwidth is shared. Since increased number of requests from multiple cores of processors becomes a source of bottleneck, scheduling the requests efficiently is necessary to utilize all the computing power these processors offer. However, current multicore processors are using traditional memory controllers, which are designed for single-core processors. They are unable to adapt to changing characteristics of memory workloads that run simultaneously on multiple cores. Existing schedulers may disrupt locality and bank parallelism among data requests coming from different cores. Hence, novel memory controllers that consider and adapt to the memory access characteristics, and share memory resources efficiently and fairly are necessary. We introduce Core-Aware Dynamic Scheduler (CADS) for multicore memory controller. CADS uses Reinforcement Learning (RL) to alter its scheduling strategy dynamically at runtime. Our scheduler utilizes locality among data requests from multiple cores and exploits parallelism in accessing multiple banks of DRAM. CADS is also able to share the DRAM while guaranteeing fairness to all cores accessing memory. Using CADS policy, we achieve 20% better cycles per instruction (CPI) in running memory intensive and compute intensive PARSEC parallel benchmarks simultaneously, and 16% better CPI with SPEC 2006 benchmarks.
💡 Research Summary
The paper addresses a critical bottleneck in modern multicore processors: the shared DRAM bandwidth managed by the memory controller. Traditional scheduling policies such as First‑Ready, FCFS, and FR‑FCFS were designed for single‑core environments and prioritize requests with the lowest latency, which works well when only one workload is active. In a multicore setting, however, multiple applications with heterogeneous memory access patterns run concurrently, leading to unfairness (starvation of memory‑intensive or compute‑intensive cores) and sub‑optimal utilization of bank parallelism. Existing research‑oriented schedulers tend to focus on a single characteristic—row locality, bank‑level parallelism, or request history—making them rigid and effective only for a narrow set of workloads.
To overcome these limitations, the authors propose CADS (Core‑Aware Dynamic Scheduler), a memory‑controller‑level scheduler that leverages Reinforcement Learning (RL) to adapt its policy at runtime based on the observed behavior of each core. The RL agent is the memory controller itself; the environment state is represented by a feature vector composed of both local (per‑core) and global (system‑wide) metrics:
- NumPet – number of pending requests from a core (local).
- RowHitPet – number of pending requests that would result in a row‑hit (local, captures data locality).
- HistPet – recent history of requests from a core (local, distinguishes memory‑intensive from compute‑intensive workloads).
- BPPet – number of requests that can be served in parallel across banks (global, reflects bank‑level parallelism).
For each core, CADS maintains an independent linear model parameterized by θ = (θ₀,…,θₙ). The model predicts the long‑term reward of “increasing the priority of ready requests from core i”. The only action the scheduler can take is to raise the priority of a selected core’s ready requests; if multiple ready requests belong to the same core they are issued using the FR‑FCFS rule.
The reward function is built around a metric called Memory‑Related Starvation (MRStarvation), defined as the ratio of the number of accesses to their stall time for each core. This metric penalizes cores that experience high stall time relative to their request volume, with a higher weight for non‑memory‑intensive workloads because their stalls are more detrimental to overall throughput. The global reward is a function of the MRStarvation values of all cores: when the values are equal (fair scheduling) the reward is positive; when they diverge (unfairness) the reward becomes negative. Q‑Learning updates the θ parameters to minimize the difference between predicted and observed rewards, allowing the scheduler to converge to a policy that balances performance and fairness.
To keep hardware overhead low, the authors deliberately limit the number of features and actions, ensuring that the learning logic can be implemented with simple arithmetic (multiplications and additions) and modest storage for the Q‑table. They evaluate CADS using the M5 simulator, extending it to model 4‑, 8‑, and 16‑core systems. Workloads consist of mixed PARSEC benchmarks (both memory‑intensive and compute‑intensive) and SPEC CPU2006 suites. Compared against baseline FCFS and FR‑FCFS controllers, CADS achieves up to 20 % CPI reduction for mixed PARSEC workloads and 16 % CPI improvement for SPEC benchmarks. The performance gains are consistent across different core counts, demonstrating scalability of the learning mechanism.
Key contributions of the paper are:
- Problem articulation – a clear analysis of why static, single‑characteristic schedulers fail in heterogeneous multicore environments.
- RL‑based core‑aware design – a novel integration of per‑core linear models with a global reward that captures both locality and bank parallelism while enforcing fairness.
- Reward engineering – the MRStarvation metric elegantly balances throughput and fairness, adapting to the intensity of each workload.
- Hardware‑friendly implementation – a lightweight learning engine that can be realized in contemporary memory‑controller pipelines without excessive area or power cost.
- Comprehensive evaluation – extensive simulation across multiple core counts and benchmark suites, showing consistent CPI improvements.
The authors suggest future work in extending CADS to emerging memory technologies such as DDR‑5, HBM, and 3‑D‑stacked DRAM, where bank parallelism and latency characteristics differ markedly. They also propose exploring multi‑objective RL formulations that incorporate power consumption and Quality‑of‑Service constraints, as well as more sophisticated exploration strategies to accelerate convergence in larger core counts. Overall, CADS demonstrates that reinforcement learning can be a practical, scalable tool for dynamic memory‑controller scheduling in modern high‑performance multicore systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment