TDPNavigator-Placer: Thermal- and Wirelength-Aware Chiplet Placement in 2.5D Systems Through Multi-Agent Reinforcement Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The rapid growth of electronics has accelerated the adoption of 2.5D integrated circuits, where effective automated chiplet placement is essential as systems scale to larger and more heterogeneous chiplet assemblies. Existing placement methods typically focus on minimizing wirelength or transforming multi-objective optimization into a single objective through weighted sum, which limits their ability to handle competing design requirements. Wirelength reduction and thermal management are inherently conflicting objectives, making prior approaches inadequate for practical deployment. To address this challenge, we propose TDPNavigator-Placer, a novel multi-agent reinforcement learning framework that dynamically optimizes placement based on chiplet’s thermal design power (TDP). This approach explicitly assigns these inherently conflicting objectives to specialized agents, each operating under distinct reward mechanisms and environmental constraints within a unified placement paradigm. Experimental results demonstrate that TDPNavigator-Placer delivers a significantly improved Pareto front over state-of-the-art methods, enabling more balanced trade-offs between wirelength and thermal performance.

💡 Research Summary

The paper addresses the long‑standing trade‑off between wirelength minimization and thermal management in 2.5D chiplet placement. Traditional methods either focus solely on wirelength or collapse multiple objectives into a single weighted sum, which fails to capture the inherent conflict between dense placement (which reduces interconnect length) and heat concentration (which rises with proximity of high‑power blocks). To overcome this limitation, the authors propose TDPNavigator‑Placer, a novel multi‑agent reinforcement learning (MARL) framework that explicitly separates the two objectives into dedicated agents.

Core Idea – TDP‑Based Task Decomposition
Each chiplet’s thermal design power (TDP) is compared against a predefined threshold (80 W in the experiments). Chiplets with TDP > threshold are classified as high‑power and routed to a thermal agent; those with lower TDP are assigned to a wirelength agent. This “TDP navigator” module determines the placement order (high‑power first) to ensure sufficient spacing among heat‑generating blocks before lower‑power blocks are placed.

State Representation
Both agents share three common masks: a view mask (showing already placed chiplets), a position mask (indicating feasible cells for the next placement), and a rotation‑position mask (allowing 90° rotation). The thermal agent receives an additional thermal mask that encodes a hotspot temperature map generated by a fast thermal simulator. The wirelength agent receives a wire mask that estimates the change in total wirelength for each candidate placement. All masks are normalized to the range

TDPNavigator-Placer: Thermal- and Wirelength-Aware Chiplet Placement in 2.5D Systems Through Multi-Agent Reinforcement Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment