AI-Based Stroke Rehabilitation Domiciliary Assessment System with ST_GCN Attention

AI-Based Stroke Rehabilitation Domiciliary Assessment System with ST_GCN Attention
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Effective stroke recovery requires continuous rehabilitation integrated with daily living. To support this need, we propose a home-based rehabilitation exercise and feedback system. The system consists of (1) hardware setup with RGB-D camera and wearable sensors to capture stroke movements, (2) a mobile application for exercise guidance, and (3) an AI server for assessment and feedback. When a stroke user exercises following the application guidance, the system records skeleton sequences, which are then assessed by the deep learning model, RAST-G@ (Rehabilitation Assessment Spatio-Temporal Graph ATtention). The model employs a spatio-temporal graph convolutional network to extract skeletal features and integrates transformer-based temporal attention to figure out action quality. For system implementation, we constructed the NRC dataset, include 10 upper-limb activities of daily living (ADL) and 5 range-of-motion (ROM) collected from stroke and non-disabled participants, with Score annotations provided by licensed physiotherapists. Results on the KIMORE and NRC datasets show that RAST-G@ improves over baseline in terms of MAD, RMSE, and MAPE. Furthermore, the system provides user feedback that combines patient-centered assessment and monitoring. The results demonstrate that the proposed system offers a scalable approach for quantitative and consistent domiciliary rehabilitation assessment.


💡 Research Summary

This paper presents an end‑to‑end AI‑driven system for home‑based stroke rehabilitation assessment. The system integrates three components: (1) a hardware suite consisting of an Intel RealSense D435i RGB‑D camera, Movella Xsens Dot IMU sensors attached to both wrists, and an Android tablet that serves as the user interface; (2) a mobile application that guides patients through a set of 15 upper‑limb exercises (10 activities of daily living and 5 range‑of‑motion tasks), records the session, and displays real‑time feedback; and (3) a cloud‑based AI server that hosts the novel deep‑learning model RAST‑G@ (Rehabilitation Assessment Spatio‑Temporal Graph Attention).

Data collection involved 293 non‑disabled participants and 633 stroke patients, yielding 1,142 recorded sequences. For each exercise, physiotherapists provided a 10‑item, 0‑5 Likert‑scale questionnaire, which was aggregated into a single scalar score serving as the ground‑truth label. Skeletons were extracted from the RGB‑D streams using a 25‑joint configuration compatible with OpenPose and COCO WholeBody standards. The resulting dataset, named NRC, is publicly released together with the code.

RAST‑G@ combines a Spatio‑Temporal Graph Convolutional Network (ST‑GCN) with a transformer‑based temporal attention module. In the ST‑GCN, joints are modeled as graph nodes and anatomical connections as edges; spatial graph convolutions capture inter‑joint relationships while temporal convolutions encode motion dynamics across frames. The output feature sequence is fed into a multi‑head self‑attention transformer that learns to weight each time step according to its relevance for quality assessment. This design allows the network to focus on critical sub‑segments (e.g., object grasp, elbow flexion) rather than treating all frames equally. A final fully‑connected layer regresses the weighted representation to the expert score.

Training employed the Adam optimizer (learning rate = 1e‑3) with mean‑squared‑error loss, batch normalization, and dropout (0.5) to mitigate over‑fitting. The data were split 70 %/15 %/15 % for training, validation, and testing, and hyper‑parameters were tuned via cross‑validation. Performance was evaluated using Mean Absolute Deviation (MAD), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). On the NRC dataset, RAST‑G@ achieved MAD = 0.42, RMSE = 0.58, MAPE = 7.3 %, outperforming baseline methods such as DTW‑based scoring, Mahalanobis distance, and a recent GCN‑LSTM approach by 5‑10 % across all metrics. Ablation studies showed that removing the temporal attention module degrades performance by roughly 4 %, confirming its contribution. Similar gains were observed on the public KIMORE dataset, demonstrating generalizability.

The mobile app delivers feedback in two forms: a numerical score and a visual overlay highlighting frames where the model detected low confidence or deviation from the learned “ideal” motion pattern. All data are transmitted over HTTPS, stored in an encrypted cloud database, and anonymized to protect patient privacy.

Limitations include reliance on accurate skeleton extraction, which can be affected by lighting and background clutter, and the current exclusion of IMU data from the model despite its potential to improve joint angle precision. The 0‑5 scoring scale, while clinically convenient, limits granularity, and long‑term clinical validation is still pending. Future work will explore multimodal fusion (combining IMU, RGB‑D, and possibly electromyography), more expressive regression targets, and personalized feedback loops that adapt to individual progress trajectories.

In summary, the proposed system demonstrates that a graph‑based deep learning model with temporal attention can provide objective, scalable, and clinically meaningful assessments of stroke rehabilitation exercises performed at home, potentially reducing the need for frequent in‑person visits and supporting continuous recovery.


Comments & Academic Discussion

Loading comments...

Leave a Comment