XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge

XTransfer: Modality-Agnostic Few-Shot Model Transfer for Human Sensing at the Edge
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Deep learning for human sensing on edge systems presents significant potential for smart applications. However, its training and development are hindered by the limited availability of sensor data and resource constraints of edge systems. While transferring pre-trained models to different sensing applications is promising, existing methods often require extensive sensor data and computational resources, resulting in high costs and limited transferability. In this paper, we propose XTransfer, a first-of-its-kind method enabling modality-agnostic, few-shot model transfer with resource-efficient design. XTransfer flexibly uses pre-trained models and transfers knowledge across different modalities by (i) model repairing that safely mitigates modality shift by adapting pre-trained layers with only few sensor data, and (ii) layer recombining that efficiently searches and recombines layers of interest from source models in a layer-wise manner to restructure models. We benchmark various baselines across diverse human sensing datasets spanning different modalities. The results show that XTransfer achieves state-of-the-art performance while significantly reducing the costs of sensor data collection, model training, and edge deployment.


💡 Research Summary

The paper addresses the pressing challenge of deploying deep learning models for human sensing on edge devices, where labeled sensor data are scarce and computational resources are limited. Human‑centric sensor modalities such as IMU, radar, ultrasound, and physiological signals suffer from high noise sensitivity, low SNR, user variability, and hardware heterogeneity, making large‑scale data collection costly and often infeasible. Existing few‑shot learning (FSL), transfer learning, and cross‑domain FSL approaches either rely on massive same‑modality datasets or fail to handle the severe modality shift that occurs when transferring models across fundamentally different data types, leading to performance degradation or negative transfer.

To quantify modality shift, the authors introduce a lightweight, modality‑agnostic metric: Mean Magnitude of Channels (MMC). By measuring MMC at each layer of a pre‑trained source model and comparing it with MMC computed on a few target‑domain sensor samples, they demonstrate a strong correlation between MMC shifts and drops in layer‑wise accuracy convergence. This observation motivates the central goal of XTransfer: minimize layer‑wise MMC shifts under cross‑modality few‑shot settings.

XTransfer consists of two synergistic components: the SRR (Splice‑Repair‑Removal) pipeline and the Layer‑Wise Search (LWS) control.

  1. Splice – Instead of using fixed up/down‑sampling that discards information, SRR inserts a trainable connector between heterogeneous layers. The connector comprises a pre‑header (adaptive convolution), a Resizer, and an encoder‑decoder pair, ensuring shape compatibility while remaining lightweight.

  2. Repair – Each connector is fine‑tuned by an anchor‑based generative transfer module. The “anchor” is the source‑layer MMC projected into a low‑dimensional orthogonal space via PCA (typically two components). This projection removes redundancy, suppresses noise, and highlights the most discriminative channels. Target‑domain MMCs are projected into the same PCA space, and the module minimizes the distance between the anchor and sensing centers, effectively aligning the latent feature distributions with only a handful of labeled samples.

  3. Removal – After alignment, redundant channels are pruned, further reducing parameters and inference cost without harming the repaired representation.

While repairing improves many layers, not all become useful. To address this, LWS performs a resource‑constrained, NAS‑inspired search over the pool of repaired layers. A pre‑search check quickly discards layers that show negligible MMC reduction, and a dynamic search range adapts the exploration breadth based on intermediate performance, keeping the search tractable even with multiple source models. Selected layers are recombined to form the final architecture, preserving beneficial repaired representations while discarding ineffective ones.

The authors evaluate XTransfer on six human‑sensing benchmarks spanning different target modalities (e.g., activity recognition from IMU, vital‑sign extraction from radar, emotion detection from ultrasound) and three source modalities (image, text, audio). Using a 5‑shot setting with leave‑one‑out cross‑validation, XTransfer consistently outperforms state‑of‑the‑art baselines such as ProtoNet, MAML, cross‑domain FSL, and CLIP‑based distillation, achieving an average accuracy gain of 7.3 percentage points. Moreover, the resulting models are 42 % smaller in parameter count and 35 % faster in inference, whereas conventional pruning leads to a 50.5 % accuracy drop.

In summary, XTransfer introduces a novel paradigm for modality‑agnostic few‑shot model transfer: (i) layer‑wise MMC‑based diagnostics, (ii) anchor‑guided alignment in a reduced orthogonal feature space, and (iii) efficient layer recombination via LWS. This combination enables high‑performance, low‑cost deployment of human‑sensing models on edge devices, dramatically reducing the need for extensive data collection and heavy computation while maintaining or improving predictive accuracy.


Comments & Academic Discussion

Loading comments...

Leave a Comment