Backpropagation in matrix notation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this note we calculate the gradient of the network function in matrix notation.

💡 Research Summary

The paper presents a systematic derivation of back‑propagation formulas for feed‑forward neural networks using only matrix operations, aiming to eliminate the cumbersome index‑heavy notation typical of coordinate‑wise derivations. The author first defines the network as a composition of alternating linear maps (matrices (W_i)) and coordinate‑wise nonlinear maps ((\Sigma_i)), where each (\Sigma_i) applies a set of scalar activation functions element‑wise via the Hadamard product. The overall function is written compactly as
(f(X;W)=\Sigma_k\circ W_k\cdot\Sigma_{k-1}\circ W_{k-1}\cdots\Sigma_1\circ W_1\cdot X).

For the scalar case ((n_i=1) for all layers), the author reduces each weight matrix to a scalar (w_i) and derives the gradient (\nabla w_i) by straightforward application of the chain rule. A temporary variable (\Delta_i) (the error signal) is introduced, satisfying (\Delta_i = \Delta_{i+1} w_{i+1} \sigma’i) with (\Delta{k+1}=1). The final expression (\nabla w_i = \Delta_i \sigma_{i-1}) mirrors the classic (\delta)-propagation but is expressed purely in matrix‑style notation.

In the general multi‑dimensional case, three kinds of matrix products are employed: the usual column‑by‑row product ((A\cdot B)), the element‑wise (Hadamard) product ((A\circ B)), and an “inverted” product ((A\bullet B = B\cdot A)). Using these, the gradient with respect to each weight matrix (W_i) is written recursively as:
\

Backpropagation in matrix notation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment