Abstract
This work introduces an integrated framework for comprehensive mobility assessment using low-resolution thermal imaging, aiming to overcome privacy and cost limitations inherent in RGB and marker-based systems. Focusing on the clinically established Timed Up and Go (TUG) test, the framework consists of two stages. First, human keypoints are detected from challenging thermal frames using a MobileNetV3-Small encoder paired with a ViTPose decoder through transfer learning, achieving strong accuracy (AP 0.861) with lightweight computation suitable for resource-constrained environments. Second, the extracted keypoint feature maps are processed by an Anatomy-Guided Vision Transformer (AG-ViT), which incorporates anatomical priors to enhance representation learning. The model jointly performs TUG subtask segmentation, bilateral gait phase recognition, and temporal gait parameter estimation. Experiments show robust performance, with macro-average F1-scores up to 0.971 for subtask classification and high agreement for key gait parameters, including Pearson correlations above 0.95 and ICC(2,1) values above 0.92 during the walking segment. These results highlight the potential of thermal imaging as a scalable, objective, and privacy-preserving solution for clinical mobility and rehabilitation monitoring.