Inheritance Between Feedforward and Convolutional Networks via Model Projection

Inheritance Between Feedforward and Convolutional Networks via Model Projection
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Techniques for feedforward networks (FFNs) and convolutional networks (CNNs) are frequently reused across families, but the relationship between the underlying model classes is rarely made explicit. We introduce a unified node-level formalization with tensor-valued activations and show that generalized feedforward networks form a strict subset of generalized convolutional networks. Motivated by the mismatch in per-input parameterization between the two families, we propose model projection, a parameter-efficient transfer learning method for CNNs that freezes pretrained per-input-channel filters and learns a single scalar gate for each (output channel, input channel) contribution. Projection keeps all convolutional layers adaptable to downstream tasks while substantially reducing the number of trained parameters in convolutional layers. We prove that projected nodes take the generalized FFN form, enabling projected CNNs to inherit feedforward techniques that do not rely on homogeneous layer inputs. Experiments across multiple ImageNet-pretrained backbones and several downstream image classification datasets show that model projection is a strong transfer learning baseline under simple training recipes.


💡 Research Summary

The paper tackles a fundamental yet under‑explored question: when and how can techniques and theoretical results be transferred between feed‑forward networks (FFNs) and convolutional neural networks (CNNs)? To answer this, the authors first introduce a unified, tensor‑valued node formalism that captures both architectures at the same level of abstraction. In this formalism a generalized feed‑forward network (GFFN) node performs a weighted sum across input channels while preserving the tensor shape of each channel, and a generalized CNN (GCNN) node performs a per‑channel convolution followed by a sum across channels. By showing that a GCNN with a kernel size of one in every spatial dimension reduces exactly to a GFFN, they prove (Theorem 3.5) that the class of GFFNs is a strict subset of the class of GCNNs. Consequently, any property that holds for all GCNNs automatically holds for GFFNs (Corollary 3.6).

The reverse direction—allowing CNNs to inherit FFN‑specific properties—is not true in general because CNN nodes contain many parameters per input channel. To bridge this gap the authors propose model projection. In projection, each pretrained convolutional filter (one per input channel) is frozen, and a single scalar gate γ_{jk} is introduced for every (output channel j, input channel k) pair. Biases remain trainable. The projected node therefore computes

\


Comments & Academic Discussion

Loading comments...

Leave a Comment