Accelerating Diffusion Models for Generative AI Applications with Silicon Photonics

Accelerating Diffusion Models for Generative AI Applications with Silicon Photonics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Diffusion models have revolutionized generative AI, with their inherent capacity to generate highly realistic state-of-the-art synthetic data. However, these models employ an iterative denoising process over computationally intensive layers such as UNets and attention mechanisms. This results in high inference energy on conventional electronic platforms, and thus, there is an emerging need to accelerate these models in a sustainable manner. To address this challenge, we present a novel silicon photonics-based accelerator for diffusion models. Experimental evaluations demonstrate that our photonic accelerator achieves at least 3x better energy efficiency and 5.5x throughput improvement compared to state-of-the-art diffusion model accelerators.


💡 Research Summary

**
The paper addresses the growing energy and latency challenges of diffusion models (DMs), which have become a cornerstone of generative AI for image, video, and text‑to‑image synthesis. DMs require hundreds to thousands of iterative denoising steps, each dominated by matrix‑vector multiplications (MACs) in UNet encoders/decoders and multi‑head attention (MHA) layers. Conventional electronic accelerators (GPUs, FPGAs, ASICs) struggle to keep up because transistor scaling is slowing, metal interconnects become bandwidth‑limited, and data movement dominates power consumption.

To overcome these bottlenecks, the authors propose a silicon‑photonic accelerator that performs the bulk of MAC operations in the optical domain. The design relies on non‑coherent wavelength‑division multiplexing (WDM) and micro‑ring resonators (MRs) to encode both activations and weights onto separate optical wavelengths. A single waveguide carries many wavelengths simultaneously, enabling parallel MACs without the need for electronic data shuffling. Optical signals are generated by on‑chip VCSEL arrays, modulated by two MR banks (one for activations, one for weights), and then summed by balanced photodetectors (BPDs) that also handle signed values. The electronic control unit (ECU) manages memory interfacing, buffering, and the mapping of digital tensors to the photonic domain.

A hybrid MR‑tuning scheme combines electro‑optic (EO) tuning for fast, low‑power (≈4 µW/nm) fine adjustments with thermo‑optic (TO) tuning for large wavelength shifts (≈27 mW/FSR). Thermal Eigenmode Decomposition (TED) is used to minimize cross‑talk between neighboring resonators, reducing overall power and thermal crosstalk. The accelerator’s architecture consists of two main blocks: a Residual Unit (convolution + normalization) and an MHA Unit (multiple attention heads + linear/add). Each block uses two MR arrays of size R × C; the first array imprints activations, the second imprints weights. After modulation, the optical signals are detected, producing analog sums that correspond to weighted sums of the inputs.

The authors evaluate the accelerator on three representative diffusion models: DDPM, Latent Diffusion Models (LDM), and Stable Diffusion (SDM). Compared with state‑of‑the‑art electronic diffusion accelerators, the photonic design achieves at least a 3× improvement in energy efficiency and a 5.5× increase in throughput. The gains are most pronounced for attention‑heavy SDM, where optical MHA provides far greater parallelism than electronic implementations. The system also demonstrates flexibility: the same hardware can be re‑configured (by adjusting MR array dimensions and wavelength count) to accommodate the different computational profiles of pixel‑space, latent‑space, and attention‑rich diffusion variants.

Limitations are acknowledged. TO tuning introduces microsecond‑scale latency, which may be a bottleneck for ultra‑low‑latency applications. Photon shot noise and temperature‑induced wavelength drift can affect numerical accuracy, requiring careful calibration. The current implementation still relies on electronic DAC/ADC stages, which contribute non‑trivial power overhead.

Future work outlined includes integrating optical memory (photonic PIM) to eliminate electronic data storage, scaling WDM to support many more wavelengths for higher MAC density, and developing autonomous temperature‑compensation and on‑the‑fly weight updates to make the system robust in data‑center environments. The paper concludes that silicon photonics offers a viable path to sustainable, high‑performance diffusion model inference, potentially reshaping the hardware landscape for next‑generation generative AI.


Comments & Academic Discussion

Loading comments...

Leave a Comment